4/17/2025

.screenrc setup for better usage

 

.screenrc

code ~/.screenrc or vim or nano

and put this 

# Current settings for scrolling and encoding
termcapinfo xterm* ti@:te@
defutf8 on
term screen-256color
defscrollback 10000
encoding utf8

# Add these lines for better keyboard handling
bindkey -k ku stuff \033[A
bindkey -k kd stuff \033[B
bindkey -k kl stuff \033[D
bindkey -k kr stuff \033[C

# Allow alternate screen
altscreen on

# Set terminal to xterm-256color for better compatibility
terminfo xterm-256color hs@:cs=\E[%i%p1%d;%p2%dr:im=\E[4h:ei=\E[4l

# Make bash history work properly
shell -$SHELL

#study.marearts.com



detach or recreate screen

Thank you.

4/10/2025

Mounting Remote Data for GPU Training

 

Mounting Remote Data for GPU Training

This guide explains how to access data from Computer A (data server) on Computer B (GPU machine) for machine learning training workflows.

Overview

When training machine learning models, you often need:

  1. A computer with GPU capabilities for training (Computer B)
  2. Access to training data stored on another machine (Computer A)

This tutorial will show you how to securely connect these computers using SSH, allowing the GPU machine to access data without copying everything locally.

Prerequisites

  • SSH access to both computers
  • Admin/sudo privileges on both machines
  • Basic knowledge of terminal commands

Step 1: Generate SSH Key on GPU Computer (if needed)

If you don't already have an SSH key on your GPU computer (Computer B):

# On Computer B (GPU)
ssh-keygen -t rsa -b 4096

Press Enter to accept default locations and add a passphrase if desired.

Step 2: Copy SSH Public Key to Data Computer

# On Computer B (GPU)
# View your public key
cat ~/.ssh/id_rsa.pub

# Copy the output to clipboard

Now transfer this key to Computer A (data server):

# Option 1: Using ssh-copy-id (easiest)
ssh-copy-id username@computerA

# Option 2: Manual setup
# First, SSH into Computer A
ssh username@computerA

# Then on Computer A, create .ssh directory if it doesn't exist
mkdir -p ~/.ssh
chmod 700 ~/.ssh

# Add your public key to authorized_keys
echo "ssh-rsa AAAA...your key here..." >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

# Exit back to Computer B
exit

Step 3: Test the SSH Connection

Ensure you can connect without a password:

# On Computer B (GPU)
ssh username@computerA

If successful, you should connect without entering a password.

Step 4: Mount Remote Data using SSHFS

Install SSHFS on your GPU computer:

# On Computer B (GPU)
# For Ubuntu/Debian
sudo apt-get update
sudo apt-get install sshfs

# For CentOS/RHEL/Fedora
sudo dnf install fuse-sshfs

Create a mount point and mount the remote directory:

# On Computer B (GPU)
# Create mount directory
mkdir -p ~/data_mount

# Mount the remote directory
sshfs username@computerA:/path/to/data ~/data_mount

# Verify the mount worked
ls ~/data_mount

Step 5: Using the Mounted Data for Training

Now you can access the data in your training scripts as if it were local:

# Example PyTorch script
import torch
from torch.utils.data import Dataset, DataLoader

# Point to your mounted data directory
data_dir = "~/data_mount/dataset"

# Your training code...

Additional Options

Automating Mount on Startup

To automatically mount the remote directory when your GPU computer starts:

  1. Edit your fstab file:

    sudo nano /etc/fstab
    
  2. Add this line (all on one line):

    username@computerA:/path/to/data /home/username/data_mount fuse.sshfs defaults,_netdev,user,idmap=user,follow_symlinks,identityfile=/home/username/.ssh/id_rsa,allow_other,reconnect 0 0
    
  3. Save and exit

Unmounting

To unmount the remote directory:

# On Computer B (GPU)
fusermount -u ~/data_mount

Performance Considerations

  • For better performance with large datasets, try these SSHFS options:

    sshfs username@computerA:/path/to/data ~/data_mount -o Compression=no,big_writes,cache=yes,kernel_cache
    
  • If you experience frequent disconnections, add reconnect options:

    sshfs username@computerA:/path/to/data ~/data_mount -o reconnect,ServerAliveInterval=15,ServerAliveCountMax=3
    

Alternative: NFS for Better Performance

For production setups with large datasets, consider using NFS instead of SSHFS for better performance.

Troubleshooting

Connection Issues

  • Verify SSH keys are correctly set up
  • Check firewall settings on both computers
  • Ensure the SSH service is running: sudo systemctl status sshd

Permission Problems

  • Check file permissions on the data directory
  • Verify your user has read access to the data files

Mount Errors

  • Make sure FUSE is installed and configured properly
  • Check if the mount point directory exists and is empty

Security Considerations

  • Use key-based authentication only (disable password login)
  • Consider restricting SSH access by IP address
  • Use a non-standard SSH port for additional security

For any issues or questions, please contact your system administrator.

4/02/2025

Explanation for data Normalization and Min/Max calculation.

Let me explain how a specific normalized feature value is calculated using one concrete example.

Let's take the feature "GroupSize" which has:

  • Min value: -0.045121
  • Max value: 103.032967

These values are post-normalization, but we can work backwards to understand how they were calculated.

The Normalization Formula

The normalization function you're using is:

normalized_features = (features - mean) / std

Where:

  • features are the original, raw values
  • mean is the average of all values for that feature in the training set
  • std is the standard deviation of all values for that feature in the training set

Working Through An Example

Let's say we have these raw values for GroupSize in the training set:

  • Raw values: [0, 0, 0, 0, 0, 1, 1, 1, 32, 64]

First, we calculate the mean:

  • Mean = (0+0+0+0+0+1+1+1+32+64)/10 = 9.9

Then we calculate the standard deviation:

  • Each deviation: [-9.9, -9.9, -9.9, -9.9, -9.9, -8.9, -8.9, -8.9, 22.1, 54.1]
  • Squared deviations: [98.01, 98.01, 98.01, 98.01, 98.01, 79.21, 79.21, 79.21, 488.41, 2926.81]
  • Average squared deviation: 4143.9/10 = 414.39
  • Standard deviation = √414.39 ≈ 20.36

Now, we can normalize each value:

  • For 0: (0 - 9.9) / 20.36 = -0.486
  • For 1: (1 - 9.9) / 20.36 = -0.437
  • For 32: (32 - 9.9) / 20.36 = 1.085
  • For 64: (64 - 9.9) / 20.36 = 2.657

Explaining the Min/Max Values

Going back to your data:

  • The min value for GroupSize (-0.045121) represents the original minimum value in your dataset after normalization
  • The max value (103.032967) represents the original maximum value after normalization

For GroupSize, this extreme range suggests:

  1. Your raw data has a wide range of values
  2. The high maximum suggests outliers that are far from the mean, creating a highly skewed distribution
  3. The standard deviation is relatively small compared to the maximum value

Concrete Calculation

If we assume the mean of raw GroupSize is ฮผ and standard deviation is ฯƒ, then:

  • Minimum normalized value: (min_raw - ฮผ) / ฯƒ = -0.045121
  • Maximum normalized value: (max_raw - ฮผ) / ฯƒ = 103.032967

This tells us that your maximum raw value is over 103 standard deviations away from the mean, which is extremely far! This confirms that your raw data has a heavily skewed distribution with significant outliers.

The fact that most normalized values for GroupSize are close to the minimum (-0.045121) suggests that the most common value is slightly below the mean, while a few extreme outliers are pulling the mean upward.

This type of skewed distribution is exactly why techniques like masking and autoencoder approaches are beneficial - they can help the model learn robust representations even with such extreme distributions.

3/24/2025

Setup Online annotation with LabelMe

Complete Tutorial: LabelMe with Login System and Dataset Management

This guide provides step-by-step instructions for setting up LabelMe with a custom login system and proper dataset management. We'll cover the entire workflow: login → annotation → save → logout.

System Overview

We'll build a system with the following components:

  1. LabelMe running in a Docker container with a configured dataset folder
  2. PHP-based authentication system
  3. Web server (Apache) to host the login portal
  4. Dataset management structure

Prerequisites

  • A Linux server (Ubuntu 20.04 LTS or newer recommended)
  • Root or sudo access to the server
  • Docker and Docker Compose installed
  • Apache web server with PHP support

Part 1: Server Preparation

Step 1: Update System and Install Required Packages

# Update package lists
sudo apt update
sudo apt upgrade -y

# Install necessary packages
sudo apt install -y docker.io docker-compose apache2 php libapache2-mod-php php-json
sudo systemctl enable docker
sudo systemctl start docker

# Add your user to docker group to avoid using sudo with docker commands
sudo usermod -aG docker $USER
# Log out and log back in for this to take effect

Step 2: Create Project Directory Structure

# Create main project directory
mkdir -p ~/labelme-project
cd ~/labelme-project

# Create directories for different components
mkdir -p docker-labelme
mkdir -p web-portal
mkdir -p datasets/{project1,project2}
mkdir -p annotations

# Add some sample images to project1 (optional)
# You can replace this with your own dataset copying commands
mkdir -p datasets/project1/images
# Copy some sample images if you have them
# cp /path/to/your/images/*.jpg datasets/project1/images/

Part 2: Set Up LabelMe Docker Container

Step 1: Create Docker Compose Configuration

Create a file docker-labelme/docker-compose.yml:

cd ~/labelme-project/docker-labelme
nano docker-compose.yml

Add the following content:

version: '3'
services:
  labelme:
    image: wkentaro/labelme
    container_name: labelme-server
    ports:
      - "8080:8080"
    volumes:
      - ../datasets:/data
      - ../annotations:/home/developer/.labelmerc
    environment:
      - LABELME_SERVER=1
      - LABELME_PORT=8080
      - LABELME_HOST=0.0.0.0
    command: labelme --server --port 8080 --host 0.0.0.0 /data
    restart: unless-stopped

Step 2: Create LabelMe Configuration File

This step ensures annotations are saved in the proper format and location:

cd ~/labelme-project
nano annotations/.labelmerc

Add the following content:

{
  "auto_save": true,
  "display_label_popup": true,
  "store_data": true,
  "keep_prev": false,
  "flags": null,
  "flags_2": null,
  "flags_3": null,
  "label_flags": null,
  "labels": ["person", "car", "bicycle", "dog", "cat", "tree", "building"],
  "file_search": true,
  "show_label_text": true
}

Customize the labels list according to your annotation needs.

Step 3: Start LabelMe Container

cd ~/labelme-project/docker-labelme
docker-compose up -d

Verify it's running:

docker ps

You should see the labelme-server container running and listening on port 8080.

Part 3: Set Up Web Portal with Login System

Step 1: Create the Login Page

cd ~/labelme-project/web-portal
nano index.php

Add the following content:

<?php
// Display errors during development (remove in production)
ini_set('display_errors', 1);
error_reporting(E_ALL);

// Start session
session_start();

// Check if there's an error message
$error_message = isset($_SESSION['error_message']) ? $_SESSION['error_message'] : '';
// Clear error message after displaying it
unset($_SESSION['error_message']);
?>

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>LabelMe Login</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            background-color: #f4f4f4;
            margin: 0;
            padding: 0;
            display: flex;
            justify-content: center;
            align-items: center;
            height: 100vh;
        }
        .login-container {
            background-color: white;
            padding: 30px;
            border-radius: 8px;
            box-shadow: 0 2px 10px rgba(0, 0, 0, 0.1);
            width: 350px;
        }
        h2 {
            text-align: center;
            color: #333;
            margin-bottom: 20px;
        }
        input[type="text"],
        input[type="password"] {
            width: 100%;
            padding: 12px;
            margin: 8px 0;
            display: inline-block;
            border: 1px solid #ccc;
            box-sizing: border-box;
            border-radius: 4px;
        }
        button {
            background-color: #4CAF50;
            color: white;
            padding: 14px 20px;
            margin: 10px 0;
            border: none;
            cursor: pointer;
            width: 100%;
            border-radius: 4px;
            font-size: 16px;
        }
        button:hover {
            opacity: 0.8;
        }
        .error-message {
            color: #f44336;
            text-align: center;
            margin-top: 10px;
        }
        .logo {
            text-align: center;
            margin-bottom: 20px;
        }
    </style>
</head>
<body>
    <div class="login-container">
        <div class="logo">
            <h2>LabelMe Annotation</h2>
        </div>
        <form id="loginForm" action="auth.php" method="post">
            <div>
                <label for="username"><b>Username</b></label>
                <input type="text" placeholder="Enter Username" name="username" required>
            </div>
            <div>
                <label for="password"><b>Password</b></label>
                <input type="password" placeholder="Enter Password" name="password" required>
            </div>
            <div>
                <label for="project"><b>Select Project</b></label>
                <select name="project" style="width: 100%; padding: 12px; margin: 8px 0; display: inline-block; border: 1px solid #ccc; box-sizing: border-box; border-radius: 4px;">
                    <option value="project1">Project 1</option>
                    <option value="project2">Project 2</option>
                </select>
            </div>
            <button type="submit">Login</button>
            <?php if (!empty($error_message)): ?>
                <div class="error-message"><?php echo htmlspecialchars($error_message); ?></div>
            <?php endif; ?>
        </form>
    </div>
</body>
</html>

Step 2: Create the Authentication Script

cd ~/labelme-project/web-portal
nano auth.php

Add the following content:

<?php
// Start session management
session_start();

// Display errors during development (remove in production)
ini_set('display_errors', 1);
error_reporting(E_ALL);

// Configuration - Store these securely in production
$users = [
    'admin' => [
        'password' => password_hash('admin123', PASSWORD_DEFAULT), // Use hashed passwords
        'role' => 'admin'
    ],
    'user1' => [
        'password' => password_hash('user123', PASSWORD_DEFAULT),
        'role' => 'annotator'
    ],
    'user2' => [
        'password' => password_hash('user456', PASSWORD_DEFAULT),
        'role' => 'annotator'
    ]
];

// Base path to the LabelMe application
$labelme_base_url = 'http://localhost:8080'; // Change this to your LabelMe server address

// Handle login form submission
if ($_SERVER['REQUEST_METHOD'] === 'POST') {
    $username = isset($_POST['username']) ? $_POST['username'] : '';
    $password = isset($_POST['password']) ? $_POST['password'] : '';
    $project = isset($_POST['project']) ? $_POST['project'] : 'project1';
    
    // Validate credentials
    if (isset($users[$username]) && password_verify($password, $users[$username]['password'])) {
        // Set session variables
        $_SESSION['logged_in'] = true;
        $_SESSION['username'] = $username;
        $_SESSION['role'] = $users[$username]['role'];
        $_SESSION['project'] = $project;
        $_SESSION['last_activity'] = time();
        
        // Redirect to LabelMe
        header("Location: labelme.php");
        exit;
    } else {
        // Failed login
        $_SESSION['error_message'] = "Invalid username or password";
        header("Location: index.php");
        exit;
    }
}

// For logout
if (isset($_GET['logout'])) {
    // Log this logout
    $log_file = 'user_activity.log';
    $log_message = date('Y-m-d H:i:s') . " - User: " . ($_SESSION['username'] ?? 'unknown') . 
                " - Action: Logged out\n";
    file_put_contents($log_file, $log_message, FILE_APPEND);
    
    // Clear session data
    session_unset();
    session_destroy();
    
    // Redirect to login page
    header("Location: index.php");
    exit;
}
?>

Step 3: Create the LabelMe Proxy Page

cd ~/labelme-project/web-portal
nano labelme.php

Add the following content:

<?php
// Start session management
session_start();

// Display errors during development (remove in production)
ini_set('display_errors', 1);
error_reporting(E_ALL);

// Check if user is logged in
if (!isset($_SESSION['logged_in']) || $_SESSION['logged_in'] !== true) {
    // Not logged in, redirect to login page
    header("Location: index.php");
    exit;
}

// Security: Check for session timeout (30 minutes)
$timeout = 30 * 60; // 30 minutes in seconds
if (isset($_SESSION['last_activity']) && (time() - $_SESSION['last_activity'] > $timeout)) {
    // Session has expired
    session_unset();
    session_destroy();
    header("Location: index.php?timeout=1");
    exit;
}

// Update last activity time
$_SESSION['last_activity'] = time();

// Configuration
$labelme_base_url = 'http://localhost:8080'; // Change this to your LabelMe server address
$project = $_SESSION['project'] ?? 'project1';
$labelme_url = $labelme_base_url . '/' . $project;

// Log user activity
$log_file = 'user_activity.log';
$log_message = date('Y-m-d H:i:s') . " - User: " . $_SESSION['username'] . 
               " - Role: " . $_SESSION['role'] . 
               " - Project: " . $project . 
               " - Action: Accessed LabelMe\n";
file_put_contents($log_file, $log_message, FILE_APPEND);
?>

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>LabelMe Annotation Tool</title>
    <style>
        body, html {
            margin: 0;
            padding: 0;
            height: 100%;
            overflow: hidden;
        }
        .header {
            background-color: #333;
            color: white;
            padding: 10px;
            display: flex;
            justify-content: space-between;
            align-items: center;
        }
        .user-info {
            font-size: 14px;
        }
        .logout-btn {
            background-color: #f44336;
            color: white;
            border: none;
            padding: 5px 10px;
            cursor: pointer;
            border-radius: 3px;
            text-decoration: none;
            margin-left: 10px;
        }
        .logout-btn:hover {
            background-color: #d32f2f;
        }
        .project-selector {
            margin-left: 20px;
        }
        iframe {
            width: 100%;
            height: calc(100% - 50px);
            border: none;
        }
    </style>
</head>
<body>
    <div class="header">
        <div>
            <h3 style="margin:0;">LabelMe Annotation Tool</h3>
            <span>Project: <strong><?php echo htmlspecialchars($project); ?></strong></span>
        </div>
        <div class="user-info">
            Logged in as: <strong><?php echo htmlspecialchars($_SESSION['username']); ?></strong> 
            (<?php echo htmlspecialchars($_SESSION['role']); ?>)
            
            <form method="post" action="" style="display:inline-block">
                <select name="project" class="project-selector" onchange="this.form.submit()">
                    <option value="project1" <?php echo $project == 'project1' ? 'selected' : ''; ?>>Project 1</option>
                    <option value="project2" <?php echo $project == 'project2' ? 'selected' : ''; ?>>Project 2</option>
                </select>
            </form>
            
            <a href="auth.php?logout=1" class="logout-btn">Logout</a>
        </div>
    </div>
    
    <iframe src="<?php echo $labelme_url; ?>" allow="fullscreen"></iframe>
</body>
</html>

<?php
// Handle project switching
if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_POST['project'])) {
    $newProject = $_POST['project'];
    $_SESSION['project'] = $newProject;
    
    // Log project switch
    $log_message = date('Y-m-d H:i:s') . " - User: " . $_SESSION['username'] . 
                  " - Action: Switched to project " . $newProject . "\n";
    file_put_contents($log_file, $log_message, FILE_APPEND);
    
    // Redirect to refresh the page with new project
    header("Location: labelme.php");
    exit;
}
?>

Step 4: Setup Apache Virtual Host

sudo nano /etc/apache2/sites-available/labelme-portal.conf

Add the following configuration:

<VirtualHost *:80>
    ServerName labelme.yourdomain.com  # Change this to your domain or IP
    DocumentRoot /home/username/labelme-project/web-portal  # Update with your actual path
    
    <Directory /home/username/labelme-project/web-portal>  # Update with your actual path
        Options Indexes FollowSymLinks
        AllowOverride All
        Require all granted
    </Directory>
    
    ErrorLog ${APACHE_LOG_DIR}/labelme-error.log
    CustomLog ${APACHE_LOG_DIR}/labelme-access.log combined
</VirtualHost>

Update the paths to match your actual user and directory structure.

Step 5: Enable the Site and Restart Apache

sudo a2ensite labelme-portal.conf
sudo systemctl restart apache2

Step 6: Set Proper Permissions

# Set appropriate permissions for the web files
cd ~/labelme-project
sudo chown -R www-data:www-data web-portal
sudo chmod -R 755 web-portal

# Ensure the annotation directory is writable
sudo chown -R www-data:www-data annotations
sudo chmod -R 777 annotations

# Ensure datasets are accessible
sudo chmod -R 755 datasets

Part 4: Dataset Management

Step 1: Organize Your Datasets

Structure your dataset directories as follows:

datasets/
├── project1/
│   ├── images/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   └── ...
│   └── annotations/  # LabelMe will save annotations here
├── project2/
│   ├── images/
│   │   ├── image1.jpg
│   │   └── ...
│   └── annotations/
└── ...

Step 2: Add Scripts for Managing Datasets (Optional)

Create a script to add new projects:

cd ~/labelme-project
nano add-project.sh

Add the following content:

#!/bin/bash
# Script to add a new project to the LabelMe setup

# Check if a project name was provided
if [ -z "$1" ]; then
    echo "Usage: $0 <project_name>"
    exit 1
fi

PROJECT_NAME="$1"
PROJECT_DIR="$HOME/labelme-project/datasets/$PROJECT_NAME"

# Create project directory structure
mkdir -p "$PROJECT_DIR/images"
mkdir -p "$PROJECT_DIR/annotations"

# Set permissions
chmod -R 755 "$PROJECT_DIR"

# Update the web portal to include the new project
# (This is a simplified approach - you'll need to manually edit index.php and labelme.php)
echo "Project directory created at: $PROJECT_DIR"
echo "Now copy your images to: $PROJECT_DIR/images/"
echo "Remember to manually update index.php and labelme.php to include the new project"

Make the script executable:

chmod +x add-project.sh

Part 5: Testing the Complete System

Step 1: Access the Web Portal

Open your browser and navigate to:

  • http://your-server-ip/ or http://labelme.yourdomain.com/

Step 2: Login and Test the Workflow

  1. Log in with the credentials (e.g., username: admin, password: admin123)
  2. Select a project from the dropdown
  3. After login, you should see the LabelMe interface embedded in the page
  4. Test annotating an image:
    • Click on an image
    • Draw polygons/shapes around objects
    • Enter labels for the objects
    • Annotations are auto-saved to the corresponding project folder
  5. Try switching projects using the dropdown in the header
  6. Log out and verify you're redirected to the login page

Part 6: Security Enhancements (for Production)

Enable HTTPS

sudo apt install certbot python3-certbot-apache
sudo certbot --apache -d labelme.yourdomain.com

Improve Password Security

Edit the auth.php file to use a database instead of hardcoded users.

Troubleshooting

LabelMe Not Loading

If LabelMe doesn't load in the iframe:

  1. Check if LabelMe is running: docker ps
  2. Make sure port 8080 is accessible
  3. Check the Docker container logs: docker logs labelme-server

Permission Issues

If you encounter permission issues with annotations:

sudo chmod -R 777 ~/labelme-project/annotations
sudo chown -R www-data:www-data ~/labelme-project/datasets

Annotation not Saving

If annotations aren't saving properly:

  1. Check the .labelmerc configuration file
  2. Verify the permissions on the annotations directory
  3. Check for error messages in the Apache logs: sudo tail -f /var/log/apache2/error.log

Conclusion

You now have a complete LabelMe annotation system with:

  • Secure login/authentication
  • Project selection
  • Dataset organization
  • User activity logging
  • Session management

This setup allows your team to collaborate on annotation projects while maintaining control over who can access the system and what projects they can work on.

3/18/2025

Download YouTube Video Python code

 .

import yt_dlp
import os
from typing import Optional
import sys
import platform

def format_size(bytes):
"""Convert bytes to human readable format"""
for unit in ['B', 'KB', 'MB', 'GB']:
if bytes < 1024:
return f"{bytes:.2f} {unit}"
bytes /= 1024
return f"{bytes:.2f} TB"

def get_browser_cookie_path():
"""Get the default browser cookie path based on the operating system"""
system = platform.system()
if system == "Windows":
return "chrome"
elif system == "Darwin": # macOS
return "safari"
else: # Linux and others
return "chrome"

def download_video(url: str, output_path: Optional[str] = None, use_cookies: bool = True, browser: Optional[str] = None) -> str:
"""
Download a YouTube video in the best quality using yt-dlp.
Args:
url (str): The URL of the YouTube video
output_path (str, optional): Directory to save the video
use_cookies (bool): Whether to use browser cookies for authentication
browser (str, optional): Browser to extract cookies from (chrome, firefox, safari, etc.)
"""
try:
if not output_path:
output_path = os.getcwd()
os.makedirs(output_path, exist_ok=True)
# Configure yt-dlp options for best quality
ydl_opts = {
'format': 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best', # Best video + audio quality
'outtmpl': os.path.join(output_path, '%(title)s.%(ext)s'),
'merge_output_format': 'mp4', # Merge to MP4
'progress_hooks': [lambda d: print(f"\rDownloading: {d['_percent_str']} of {d['_total_bytes_str']}", end="") if d['status'] == 'downloading' else None],
'postprocessor_hooks': [lambda d: print("\nMerging video and audio...") if d['status'] == 'started' else None],
'quiet': False,
'no_warnings': False,
# Additional options for best quality
'format_sort': ['res:2160', 'res:1440', 'res:1080', 'res:720'],
'video_multistreams': True,
'audio_multistreams': True,
'prefer_free_formats': True,
'postprocessors': [{
'key': 'FFmpegVideoConvertor',
'preferedformat': 'mp4',
}],
}
# Add cookie authentication if enabled
if use_cookies:
if not browser:
browser = get_browser_cookie_path()
ydl_opts['cookiesfrombrowser'] = (browser,)
print(f"Using cookies from {browser} for authentication...")
print(f"Fetching video information...")
# Create yt-dlp object and download the video
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
# Get video info first
info = ydl.extract_info(url, download=False)
video_title = info.get('title', 'video')
duration = info.get('duration')
formats = info.get('formats', [])
# Find best quality format
best_video = max(
(f for f in formats if f.get('vcodec') != 'none'),
key=lambda f: (
f.get('height', 0),
f.get('filesize', 0)
),
default=None
)
# Print video details
print(f"\nVideo details:")
print(f"Title: {video_title}")
print(f"Duration: {duration//60}:{duration%60:02d}")
if best_video:
print(f"Best quality available: {best_video.get('height', 'N/A')}p")
if best_video.get('filesize'):
print(f"Approximate size: {format_size(best_video['filesize'])}")
print("\nStarting download in best quality...")
# Download the video
ydl.download([url])
# Get the output filename
output_file = os.path.join(output_path, f"{video_title}.mp4")
print(f"\nDownload completed successfully!")
print(f"Saved to: {output_file}")
return output_file
except Exception as e:
print(f"\nError: {str(e)}")
print("\nTroubleshooting steps:")
print("1. Check if the video URL is correct")
print("2. Check your internet connection")
print("3. Make sure yt-dlp is up to date: pip install -U yt-dlp")
print("4. Install or update ffmpeg (required for best quality):")
print(" - On macOS: brew install ffmpeg")
print(" - On Ubuntu/Debian: sudo apt-get install ffmpeg")
print(" - On Windows: download from https://ffmpeg.org/download.html")
print("5. For private videos, make sure:")
print(" - You're logged into YouTube in your browser")
print(" - You have access to the private video")
print(" - The selected browser contains your YouTube login cookies")
return ""

def main():
"""
Main function to handle user input for video download.
"""
print("YouTube Video Downloader (Best Quality)")
print("-------------------------------------")
print("This will download videos in the highest available quality")
print("Note: Higher quality downloads may take longer and use more disk space")
# Parse command line arguments
import argparse
parser = argparse.ArgumentParser(description='Download YouTube videos in best quality')
parser.add_argument('--url', '-u', help='YouTube video URL to download')
parser.add_argument('--output', '-o', help='Output directory')
parser.add_argument('--no-cookies', action='store_true', help='Disable browser cookie authentication')
parser.add_argument('--browser', '-b', choices=['chrome', 'firefox', 'safari', 'edge', 'opera'],
help='Browser to extract cookies from')
args = parser.parse_args()
if args.url:
# Run in command line mode
download_video(args.url,
output_path=args.output,
use_cookies=not args.no_cookies,
browser=args.browser)
return
# Run in interactive mode
while True:
url = input("\nEnter the YouTube video URL (or 'q' to quit): ").strip()
if url.lower() == 'q':
print("Goodbye!")
break
if not url:
print("Please enter a valid URL")
continue
use_cookies = True
browser_choice = None
auth_choice = input("Do you need to access a private video? (y/n): ").strip().lower()
if auth_choice == 'y':
print("\nSelect your browser for authentication:")
print("1. Chrome (default)")
print("2. Firefox")
print("3. Safari")
print("4. Edge")
print("5. Opera")
print("6. None (no authentication)")
browser_num = input("Enter your choice (1-6): ").strip()
if browser_num == '6':
use_cookies = False
else:
browsers = {
'1': 'chrome',
'2': 'firefox',
'3': 'safari',
'4': 'edge',
'5': 'opera'
}
browser_choice = browsers.get(browser_num, 'chrome')
output_dir = input("Enter output directory (press Enter for current directory): ").strip()
if not output_dir:
output_dir = None
download_video(url, output_path=output_dir, use_cookies=use_cookies, browser=browser_choice)
choice = input("\nWould you like to download another video? (y/n): ").strip().lower()
if choice != 'y':
print("Goodbye! -marearts.com-")
break

if __name__ == "__main__":
main()

..


Thank you.


What is LightGBM?

 LightGBM (Light Gradient Boosting Machine) is a gradient boosting framework developed by Microsoft that uses tree-based learning algorithms. It's designed to be efficient, fast, and capable of handling large-scale data with high dimensionality.

Here's a visualization of how LightGBM works:

Key features of LightGBM that make it powerful:

  1. Leaf-wise Tree Growth: Unlike traditional algorithms that grow trees level-wise, LightGBM grows trees leaf-wise, focusing on the leaf that will bring the maximum reduction in loss. This creates more complex trees but uses fewer splits, resulting in higher accuracy with the same number of leaves.

  2. Gradient-based One-Side Sampling (GOSS): This technique retains instances with large gradients (those that need more training) and randomly samples instances with small gradients. This allows LightGBM to focus computational resources on the more informative examples without losing accuracy.

  3. Exclusive Feature Bundling (EFB): For sparse datasets, many features are mutually exclusive (never take non-zero values simultaneously). LightGBM bundles these features together, treating them as a single feature. This reduces memory usage and speeds up training.

  4. Gradient Boosting Framework: Like other boosting algorithms, LightGBM builds trees sequentially, with each new tree correcting the errors of the existing ensemble.

LightGBM is particularly well-suited for your solver selection task because:

  • It handles categorical features natively
  • It works well with the moderate dataset size you have
  • It can create complex decision boundaries needed for multi-class classification
  • It's faster than traditional gradient boosting frameworks, allowing you to train with more boosting rounds

When properly tuned, LightGBM can often achieve better performance than neural networks for tabular data, especially with the right hyperparameters and sufficient boosting rounds.