Transfer Files
The cluster provides different storage areas for different purposes:
| Storage Type | Path | Quota | Backed up | Purpose/Notes |
|---|---|---|---|---|
| Home Directory Flash | /home/<username> |
200 GB | Backed up with snapshots | Use for important files and software |
| Pool Hard Disk | /home/<username>/orcd/pool |
1 TB | Disaster recovery backup | Storing larger datasets |
| Scratch Flash | /home/<username>/orcd/scratch |
1 TB | Not backed up | Scratch space for I/O heavy jobs |
On our node, we also have a total of 28TB NVMe SSD mounted at /scratch, which is not backed up. You can use this for high-speed temporary storage. This folder is not accessible from the login node.
If you want to transfer files to/from this folder, you will likely need to first transfer to/from your personal storage (pool at /home/<username>/orcd/pool, or scratch at /home/<username>/orcd/scratch), then move files while on the compute node.
Uploading Files to the Cluster¶
The easiest way to transfer files is using rsync over SSH. From your local machine, run:
# Basic syntax
rsync -avz <local_path> <mit_username>@orcd-login001.mit.edu:<remote_path>
# Upload a single file
rsync -avz ~/Documents/data.csv dvdai@orcd-login001.mit.edu:~/
# Upload an entire directory
rsync -avz ~/Documents/project/ dvdai@orcd-login001.mit.edu:~/project/
# Upload with progress bar
rsync -avz --progress ~/largefile.zip dvdai@orcd-login001.mit.edu:~/orcd/scratch
Useful rsync flags:
-
-a: Archive mode (preserves permissions, timestamps) -
-v: Verbose (shows files being transferred) -
-z: Compression (faster for text files) -
--progress: Shows transfer progress
Downloading Files from the Cluster¶
From your local machine, use rsync to download files:
# Download a file to current local directory
rsync -avz dvdai@orcd-login001.mit.edu:~/results.csv ./
# Download a directory
rsync -avz dvdai@orcd-login001.mit.edu:~/project/ ~/Downloads/project/
Downloading from the Internet¶
On the cluster, you can also download files directly from the internet using wget or curl:
# Direct download with wget
wget https://example.com/dataset.zip
# Download with custom filename
wget -O mydata.zip https://example.com/dataset.zip
# Download to specific directory
cd ~/orcd/scratch
wget https://example.com/largefile.tar.gz
For Google Drive files, you can use gdown:
pip install gdown
gdown https://drive.google.com/uc?id=FILE_ID
For Kaggle datasets, use the Kaggle CLI:
pip install kaggle
kaggle datasets download -d dataset-name
Back to the Getting Started Guide.