Transfer Files

The cluster provides different storage areas for different purposes:

Storage Type	Path	Quota	Backed up	Purpose/Notes
Home Directory Flash	`/home/<username>`	200 GB	Backed up with snapshots	Use for important files and software
Pool Hard Disk	`/home/<username>/orcd/pool`	1 TB	Disaster recovery backup	Storing larger datasets
Scratch Flash	`/home/<username>/orcd/scratch`	1 TB	Not backed up	Scratch space for I/O heavy jobs

On our node, we also have a total of 28TB NVMe SSD mounted at /scratch, which is not backed up. You can use this for high-speed temporary storage. This folder is not accessible from the login node.

If you want to transfer files to/from this folder, you will likely need to first transfer to/from your personal storage (pool at /home/<username>/orcd/pool, or scratch at /home/<username>/orcd/scratch), then move files while on the compute node.

Uploading Files to the Cluster¶

The easiest way to transfer files is using rsync over SSH. From your local machine, run:

# Basic syntax
rsync -avz <local_path> <mit_username>@orcd-login001.mit.edu:<remote_path>

# Upload a single file
rsync -avz ~/Documents/data.csv dvdai@orcd-login001.mit.edu:~/

# Upload an entire directory
rsync -avz ~/Documents/project/ dvdai@orcd-login001.mit.edu:~/project/

# Upload with progress bar
rsync -avz --progress ~/largefile.zip dvdai@orcd-login001.mit.edu:~/orcd/scratch

Useful rsync flags:

-a: Archive mode (preserves permissions, timestamps)
-v: Verbose (shows files being transferred)
-z: Compression (faster for text files)
--progress: Shows transfer progress

Downloading Files from the Cluster¶

From your local machine, use rsync to download files:

# Download a file to current local directory
rsync -avz dvdai@orcd-login001.mit.edu:~/results.csv ./

# Download a directory
rsync -avz dvdai@orcd-login001.mit.edu:~/project/ ~/Downloads/project/

Downloading from the Internet¶

On the cluster, you can also download files directly from the internet using wget or curl:

# Direct download with wget
wget https://example.com/dataset.zip

# Download with custom filename
wget -O mydata.zip https://example.com/dataset.zip

# Download to specific directory
cd ~/orcd/scratch
wget https://example.com/largefile.tar.gz

For Google Drive files, you can use gdown:

pip install gdown
gdown https://drive.google.com/uc?id=FILE_ID

For Kaggle datasets, use the Kaggle CLI:

pip install kaggle
kaggle datasets download -d dataset-name

Back to the Getting Started Guide.