MareArts Computer Vision Study.: 2025.04

4/17/2025

.screenrc setup for better usage

.screenrc

code ~/.screenrc or vim or nano

and put this

# Current settings for scrolling and encoding
termcapinfo xterm* ti@:te@
defutf8 on
term screen-256color
defscrollback 10000
encoding utf8

# Add these lines for better keyboard handling
bindkey -k ku stuff \033[A 
bindkey -k kd stuff \033[B
bindkey -k kl stuff \033[D
bindkey -k kr stuff \033[C

# Allow alternate screen
altscreen on

# Set terminal to xterm-256color for better compatibility
terminfo xterm-256color hs@:cs=\E[%i%p1%d;%p2%dr:im=\E[4h:ei=\E[4l

# Make bash history work properly
shell -$SHELL

#study.marearts.com

detach or recreate screen

Thank you.

4/10/2025

Mounting Remote Data for GPU Training

This guide explains how to access data from Computer A (data server) on Computer B (GPU machine) for machine learning training workflows.

Overview

When training machine learning models, you often need:

A computer with GPU capabilities for training (Computer B)
Access to training data stored on another machine (Computer A)

This tutorial will show you how to securely connect these computers using SSH, allowing the GPU machine to access data without copying everything locally.

Prerequisites

SSH access to both computers
Admin/sudo privileges on both machines
Basic knowledge of terminal commands

Step 1: Generate SSH Key on GPU Computer (if needed)

If you don't already have an SSH key on your GPU computer (Computer B):

# On Computer B (GPU)
ssh-keygen -t rsa -b 4096

Press Enter to accept default locations and add a passphrase if desired.

Step 2: Copy SSH Public Key to Data Computer

# On Computer B (GPU)
# View your public key
cat ~/.ssh/id_rsa.pub

# Copy the output to clipboard

Now transfer this key to Computer A (data server):

# Option 1: Using ssh-copy-id (easiest)
ssh-copy-id username@computerA

# Option 2: Manual setup
# First, SSH into Computer A
ssh username@computerA

# Then on Computer A, create .ssh directory if it doesn't exist
mkdir -p ~/.ssh
chmod 700 ~/.ssh

# Add your public key to authorized_keys
echo "ssh-rsa AAAA...your key here..." >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

# Exit back to Computer B
exit

Step 3: Test the SSH Connection

Ensure you can connect without a password:

# On Computer B (GPU)
ssh username@computerA

If successful, you should connect without entering a password.

Step 4: Mount Remote Data using SSHFS

Install SSHFS on your GPU computer:

# On Computer B (GPU)
# For Ubuntu/Debian
sudo apt-get update
sudo apt-get install sshfs

# For CentOS/RHEL/Fedora
sudo dnf install fuse-sshfs

Create a mount point and mount the remote directory:

# On Computer B (GPU)
# Create mount directory
mkdir -p ~/data_mount

# Mount the remote directory
sshfs username@computerA:/path/to/data ~/data_mount

# Verify the mount worked
ls ~/data_mount

Step 5: Using the Mounted Data for Training

Now you can access the data in your training scripts as if it were local:

# Example PyTorch script
import torch
from torch.utils.data import Dataset, DataLoader

# Point to your mounted data directory
data_dir = "~/data_mount/dataset"

# Your training code...

Additional Options

Automating Mount on Startup

To automatically mount the remote directory when your GPU computer starts:

Edit your fstab file:
```
sudo nano /etc/fstab
```

Add this line (all on one line):

username@computerA:/path/to/data /home/username/data_mount fuse.sshfs defaults,_netdev,user,idmap=user,follow_symlinks,identityfile=/home/username/.ssh/id_rsa,allow_other,reconnect 0 0

Save and exit

Unmounting

To unmount the remote directory:

# On Computer B (GPU)
fusermount -u ~/data_mount

Performance Considerations

For better performance with large datasets, try these SSHFS options:

sshfs username@computerA:/path/to/data ~/data_mount -o Compression=no,big_writes,cache=yes,kernel_cache

If you experience frequent disconnections, add reconnect options:

sshfs username@computerA:/path/to/data ~/data_mount -o reconnect,ServerAliveInterval=15,ServerAliveCountMax=3

Alternative: NFS for Better Performance

For production setups with large datasets, consider using NFS instead of SSHFS for better performance.

Troubleshooting

Connection Issues

Verify SSH keys are correctly set up
Check firewall settings on both computers
Ensure the SSH service is running: sudo systemctl status sshd

Permission Problems

Check file permissions on the data directory
Verify your user has read access to the data files

Mount Errors

Make sure FUSE is installed and configured properly
Check if the mount point directory exists and is empty

Security Considerations

Use key-based authentication only (disable password login)
Consider restricting SSH access by IP address
Use a non-standard SSH port for additional security

For any issues or questions, please contact your system administrator.

4/02/2025

Explanation for data Normalization and Min/Max calculation.

Let me explain how a specific normalized feature value is calculated using one concrete example.

Let's take the feature "GroupSize" which has:

Min value: -0.045121
Max value: 103.032967

These values are post-normalization, but we can work backwards to understand how they were calculated.

The Normalization Formula

The normalization function you're using is:

normalized_features = (features - mean) / std

Where:

features are the original, raw values
mean is the average of all values for that feature in the training set
std is the standard deviation of all values for that feature in the training set

Working Through An Example

Let's say we have these raw values for GroupSize in the training set:

Raw values: [0, 0, 0, 0, 0, 1, 1, 1, 32, 64]

First, we calculate the mean:

Mean = (0+0+0+0+0+1+1+1+32+64)/10 = 9.9

Then we calculate the standard deviation:

Each deviation: [-9.9, -9.9, -9.9, -9.9, -9.9, -8.9, -8.9, -8.9, 22.1, 54.1]
Squared deviations: [98.01, 98.01, 98.01, 98.01, 98.01, 79.21, 79.21, 79.21, 488.41, 2926.81]
Average squared deviation: 4143.9/10 = 414.39
Standard deviation = √414.39 ≈ 20.36

Now, we can normalize each value:

For 0: (0 - 9.9) / 20.36 = -0.486
For 1: (1 - 9.9) / 20.36 = -0.437
For 32: (32 - 9.9) / 20.36 = 1.085
For 64: (64 - 9.9) / 20.36 = 2.657

Explaining the Min/Max Values

Going back to your data:

The min value for GroupSize (-0.045121) represents the original minimum value in your dataset after normalization
The max value (103.032967) represents the original maximum value after normalization

For GroupSize, this extreme range suggests:

Your raw data has a wide range of values
The high maximum suggests outliers that are far from the mean, creating a highly skewed distribution
The standard deviation is relatively small compared to the maximum value

Concrete Calculation

If we assume the mean of raw GroupSize is μ and standard deviation is σ, then:

Minimum normalized value: (min_raw - μ) / σ = -0.045121
Maximum normalized value: (max_raw - μ) / σ = 103.032967

This tells us that your maximum raw value is over 103 standard deviations away from the mean, which is extremely far! This confirms that your raw data has a heavily skewed distribution with significant outliers.

The fact that most normalized values for GroupSize are close to the minimum (-0.045121) suggests that the most common value is slightly below the mean, while a few extreme outliers are pulling the mean upward.

This type of skewed distribution is exactly why techniques like masking and autoencoder approaches are beneficial - they can help the model learn robust representations even with such extreme distributions.

Pages

4/17/2025

.screenrc setup for better usage

4/10/2025

Mounting Remote Data for GPU Training

Mounting Remote Data for GPU Training

Overview

Prerequisites

Step 1: Generate SSH Key on GPU Computer (if needed)

Step 2: Copy SSH Public Key to Data Computer

Step 3: Test the SSH Connection

Step 4: Mount Remote Data using SSHFS

Step 5: Using the Mounted Data for Training

Additional Options

Automating Mount on Startup

Unmounting

Performance Considerations

Alternative: NFS for Better Performance

Troubleshooting

Connection Issues

Permission Problems

Mount Errors

Security Considerations

4/02/2025

Explanation for data Normalization and Min/Max calculation.

The Normalization Formula

Working Through An Example

Explaining the Min/Max Values

Concrete Calculation