Mounting Remote Data for GPU Training
This guide explains how to access data from Computer A (data server) on Computer B (GPU machine) for machine learning training workflows.
Overview
When training machine learning models, you often need:
- A computer with GPU capabilities for training (Computer B)
- Access to training data stored on another machine (Computer A)
This tutorial will show you how to securely connect these computers using SSH, allowing the GPU machine to access data without copying everything locally.
Prerequisites
- SSH access to both computers
- Admin/sudo privileges on both machines
- Basic knowledge of terminal commands
Step 1: Generate SSH Key on GPU Computer (if needed)
If you don't already have an SSH key on your GPU computer (Computer B):
# On Computer B (GPU)
ssh-keygen -t rsa -b 4096
Press Enter to accept default locations and add a passphrase if desired.
Step 2: Copy SSH Public Key to Data Computer
# On Computer B (GPU)
# View your public key
cat ~/.ssh/id_rsa.pub
# Copy the output to clipboard
Now transfer this key to Computer A (data server):
# Option 1: Using ssh-copy-id (easiest)
ssh-copy-id username@computerA
# Option 2: Manual setup
# First, SSH into Computer A
ssh username@computerA
# Then on Computer A, create .ssh directory if it doesn't exist
mkdir -p ~/.ssh
chmod 700 ~/.ssh
# Add your public key to authorized_keys
echo "ssh-rsa AAAA...your key here..." >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
# Exit back to Computer B
exit
Step 3: Test the SSH Connection
Ensure you can connect without a password:
# On Computer B (GPU)
ssh username@computerA
If successful, you should connect without entering a password.
Step 4: Mount Remote Data using SSHFS
Install SSHFS on your GPU computer:
# On Computer B (GPU)
# For Ubuntu/Debian
sudo apt-get update
sudo apt-get install sshfs
# For CentOS/RHEL/Fedora
sudo dnf install fuse-sshfs
Create a mount point and mount the remote directory:
# On Computer B (GPU)
# Create mount directory
mkdir -p ~/data_mount
# Mount the remote directory
sshfs username@computerA:/path/to/data ~/data_mount
# Verify the mount worked
ls ~/data_mount
Step 5: Using the Mounted Data for Training
Now you can access the data in your training scripts as if it were local:
# Example PyTorch script
import torch
from torch.utils.data import Dataset, DataLoader
# Point to your mounted data directory
data_dir = "~/data_mount/dataset"
# Your training code...
Additional Options
Automating Mount on Startup
To automatically mount the remote directory when your GPU computer starts:
Edit your fstab file:
sudo nano /etc/fstab
Add this line (all on one line):
username@computerA:/path/to/data /home/username/data_mount fuse.sshfs defaults,_netdev,user,idmap=user,follow_symlinks,identityfile=/home/username/.ssh/id_rsa,allow_other,reconnect 0 0
Save and exit
Unmounting
To unmount the remote directory:
# On Computer B (GPU)
fusermount -u ~/data_mount
Performance Considerations
For better performance with large datasets, try these SSHFS options:
sshfs username@computerA:/path/to/data ~/data_mount -o Compression=no,big_writes,cache=yes,kernel_cache
If you experience frequent disconnections, add reconnect options:
sshfs username@computerA:/path/to/data ~/data_mount -o reconnect,ServerAliveInterval=15,ServerAliveCountMax=3
Alternative: NFS for Better Performance
For production setups with large datasets, consider using NFS instead of SSHFS for better performance.
Troubleshooting
Connection Issues
- Verify SSH keys are correctly set up
- Check firewall settings on both computers
- Ensure the SSH service is running:
sudo systemctl status sshd
Permission Problems
- Check file permissions on the data directory
- Verify your user has read access to the data files
Mount Errors
- Make sure FUSE is installed and configured properly
- Check if the mount point directory exists and is empty
Security Considerations
- Use key-based authentication only (disable password login)
- Consider restricting SSH access by IP address
- Use a non-standard SSH port for additional security
For any issues or questions, please contact your system administrator.