4/10/2025

Mounting Remote Data for GPU Training

 

Mounting Remote Data for GPU Training

This guide explains how to access data from Computer A (data server) on Computer B (GPU machine) for machine learning training workflows.

Overview

When training machine learning models, you often need:

  1. A computer with GPU capabilities for training (Computer B)
  2. Access to training data stored on another machine (Computer A)

This tutorial will show you how to securely connect these computers using SSH, allowing the GPU machine to access data without copying everything locally.

Prerequisites

  • SSH access to both computers
  • Admin/sudo privileges on both machines
  • Basic knowledge of terminal commands

Step 1: Generate SSH Key on GPU Computer (if needed)

If you don't already have an SSH key on your GPU computer (Computer B):

# On Computer B (GPU)
ssh-keygen -t rsa -b 4096

Press Enter to accept default locations and add a passphrase if desired.

Step 2: Copy SSH Public Key to Data Computer

# On Computer B (GPU)
# View your public key
cat ~/.ssh/id_rsa.pub

# Copy the output to clipboard

Now transfer this key to Computer A (data server):

# Option 1: Using ssh-copy-id (easiest)
ssh-copy-id username@computerA

# Option 2: Manual setup
# First, SSH into Computer A
ssh username@computerA

# Then on Computer A, create .ssh directory if it doesn't exist
mkdir -p ~/.ssh
chmod 700 ~/.ssh

# Add your public key to authorized_keys
echo "ssh-rsa AAAA...your key here..." >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

# Exit back to Computer B
exit

Step 3: Test the SSH Connection

Ensure you can connect without a password:

# On Computer B (GPU)
ssh username@computerA

If successful, you should connect without entering a password.

Step 4: Mount Remote Data using SSHFS

Install SSHFS on your GPU computer:

# On Computer B (GPU)
# For Ubuntu/Debian
sudo apt-get update
sudo apt-get install sshfs

# For CentOS/RHEL/Fedora
sudo dnf install fuse-sshfs

Create a mount point and mount the remote directory:

# On Computer B (GPU)
# Create mount directory
mkdir -p ~/data_mount

# Mount the remote directory
sshfs username@computerA:/path/to/data ~/data_mount

# Verify the mount worked
ls ~/data_mount

Step 5: Using the Mounted Data for Training

Now you can access the data in your training scripts as if it were local:

# Example PyTorch script
import torch
from torch.utils.data import Dataset, DataLoader

# Point to your mounted data directory
data_dir = "~/data_mount/dataset"

# Your training code...

Additional Options

Automating Mount on Startup

To automatically mount the remote directory when your GPU computer starts:

  1. Edit your fstab file:

    sudo nano /etc/fstab
    
  2. Add this line (all on one line):

    username@computerA:/path/to/data /home/username/data_mount fuse.sshfs defaults,_netdev,user,idmap=user,follow_symlinks,identityfile=/home/username/.ssh/id_rsa,allow_other,reconnect 0 0
    
  3. Save and exit

Unmounting

To unmount the remote directory:

# On Computer B (GPU)
fusermount -u ~/data_mount

Performance Considerations

  • For better performance with large datasets, try these SSHFS options:

    sshfs username@computerA:/path/to/data ~/data_mount -o Compression=no,big_writes,cache=yes,kernel_cache
    
  • If you experience frequent disconnections, add reconnect options:

    sshfs username@computerA:/path/to/data ~/data_mount -o reconnect,ServerAliveInterval=15,ServerAliveCountMax=3
    

Alternative: NFS for Better Performance

For production setups with large datasets, consider using NFS instead of SSHFS for better performance.

Troubleshooting

Connection Issues

  • Verify SSH keys are correctly set up
  • Check firewall settings on both computers
  • Ensure the SSH service is running: sudo systemctl status sshd

Permission Problems

  • Check file permissions on the data directory
  • Verify your user has read access to the data files

Mount Errors

  • Make sure FUSE is installed and configured properly
  • Check if the mount point directory exists and is empty

Security Considerations

  • Use key-based authentication only (disable password login)
  • Consider restricting SSH access by IP address
  • Use a non-standard SSH port for additional security

For any issues or questions, please contact your system administrator.

No comments:

Post a Comment