Skip to content

Transfer Files

The MIB server has the following storage that are shared among users:

Storage Location Path Capacity Purpose
Home Directory /home/<username> 50 GB per user Scripts, code, small files
Scratch Storage /scratch 28 TB shared for all Datasets, checkpoints, outputs

The /scratch directory is also soft linked to your home directory for convenience. You can access it via ~/scratch. If you want to transfer files to/from this folder, you will likely need to first transfer to/from your personal storage (pool at /home/<username>/orcd/pool, or scratch at /home/<username>/orcd/scratch), then move files while on the compute node.

Uploading Files to the Cluster

The easiest way to transfer files is using rsync over SSH. From your local machine, run:

# Basic syntax
rsync -avz <local_path> <mit_username>@mib.media.mit.edu:<remote_path>

# Upload a single file
rsync -avz ~/Documents/data.csv dvdai@mib.media.mit.edu:~/

# Upload an entire directory
rsync -avz ~/Documents/project/ dvdai@mib.media.mit.edu:~/project/

# Upload with progress bar
rsync -avz --progress ~/largefile.zip dvdai@mib.media.mit.edu:~/

Useful rsync flags:

  • -a: Archive mode (preserves permissions, timestamps)

  • -v: Verbose (shows files being transferred)

  • -z: Compression (faster for text files)

  • --progress: Shows transfer progress

Downloading Files from the Cluster

From your local machine, use rsync to download files:

# Download a file to current local directory
rsync -avz dvdai@mib.media.mit.edu:~/results.csv ./

# Download a directory
rsync -avz dvdai@mib.media.mit.edu:/scratch/project/ ~/Downloads/project/

Downloading from the Internet

On the cluster, you can also download files directly from the internet using wget or curl:

# Direct download with wget
wget https://example.com/dataset.zip

# Download with custom filename
wget -O mydata.zip https://example.com/dataset.zip

# Download to specific directory
cd /scratch/datasets
wget https://example.com/largefile.tar.gz

For Google Drive files, you can use gdown:

pip install gdown
gdown https://drive.google.com/uc?id=FILE_ID

For Kaggle datasets, use the Kaggle CLI:

pip install kaggle
kaggle datasets download -d dataset-name

Back to the Getting Started Guide.