12/31/2024
Persion Detection dataset
hipblas, cublas algorithm
Insight about "fsdp_config.activation_checkpointing" option
The `fsdp_config.activation_checkpointing` does come with a computational overhead since it recomputes activations during the backward pass, but it's generally the most efficient option for large models like LLaMA 70B for several reasons:
1. Alternative memory saving options usually have bigger throughput impacts:
- `activation_cpu_offload=true`: Much slower due to CPU-GPU data transfer
- `fp32_cpu_offload=true`: Also involves slow CPU-GPU transfers
- Reducing batch size: Directly reduces throughput
- Reducing model size: Changes model behavior
2. Activation checkpointing's overhead is typically around 20-30% compute time, but:
- Only affects backward pass
- Modern GPUs often have compute headroom
- Memory savings (50-70%) usually outweigh the computation cost
3. The recompute strategy is smart:
- Only recomputes necessary activations
- Can utilize GPU compute while other operations are happening
- Modern implementations are optimized to minimize recomputation
So while it's not zero-cost, it's usually the minimum-impact option that allows large models to fit in memory while maintaining reasonable throughput.
12/30/2024
What is "GPU_MAX_HW_QUEUES"?
GPU_MAX_HW_QUEUES
controls the maximum number of hardware queues that can be used for GPU operations in ROCm/HIP environment (AMD GPUs).
Specifically:
- Each hardware queue can handle GPU operations (kernels) independently
- Default value is typically 8
- In your script, it's set to 4:
export GPU_MAX_HW_QUEUES=4
This setting affects:
- Parallel kernel execution
- Memory transfers
- Overall GPU utilization
- Resource allocation
Lower values (like 4) can:
- Reduce overhead
- Improve memory usage
- Lead to more predictable performance
- Be better for large models like LLaMA 70B
Higher values can:
- Allow more parallel operations
- Potentially improve throughput for smaller models
- Use more GPU resources
In your case, setting GPU_MAX_HW_QUEUES=4
is a conservative choice that helps manage memory and scheduling overhead when training the large LLaMA 70B model.
12/28/2024
fashion ai dataset, ordered by years
2024-2023:
- DeepFashion2 (2023 Update)
- 491K images, 801K clothing items
- 13 clothing categories
- Paired cross-pose images
- High resolution (1024x768)
- Style, occlusion, landmarks annotations
- FashionAI (2023)
- 180K+ images
- Hierarchical attribute system
- Focus on e-commerce applications
- Multi-label classification
- Fine-grained attribute annotations
- ACGPN Dataset (2023)
- 40K high-resolution images
- Detailed semantic parsing maps
- Virtual try-on ready
- Human pose annotations included
2022-2021:
- VITON-HD (2022)
- 13,679 front-view pairs
- High resolution (1024x768)
- Clean background images
- Precise segmentation masks
- LIP Dataset (2022 Version)
- 50K images
- 19 semantic parts
- Instance-level human parsing
- Multiple viewpoints
- Fashion-MNIST+ (2021)
- Enhanced version of Fashion-MNIST
- 70K images
- Additional attribute annotations
- Higher resolution than original
2020-2019:
- DeepFashion2 (Original 2019)
- 191K images
- 13 clothing categories
- Commercial-consumer image pairs
- Landmark detection
- FashionGen (2019)
- 325K images
- Multi-modal fashion dataset
- Text descriptions included
- Attribute annotations
2018-2017:
- ModaNet (2018)
- 55K street-style images
- 13 clothing categories
- Pixel-level segmentation
- Built on Paperdoll dataset
- DeepFashion (2017)
- 800K images
- 50 clothing categories
- Multiple tasks (category/attribute prediction)
- Landmark detection
2016-2015:
- Clothing Co-Parsing (CCP)
- 2,098 images
- 59 clothing categories
- Pixel-level annotations
- Early benchmark dataset
- Fashion10000 (2015)
- 32K images
- Basic attribute labels
- Focus on style classification
Key Trends Over Time:
- Resolution: Steady increase from 224x224 to 1024x768+
- Dataset Size: Growing from thousands to hundreds of thousands
- Annotation Quality: Moving from basic labels to multi-task annotations
- Real-world Applicability: More focus on practical use cases
- Diversity: Including more poses, styles, and demographics
- Task Coverage: From simple classification to complex parsing/virtual try-on
Fashion ai dataset.
ModaNet (2018) was groundbreaking but there have been several more recent datasets and models for fashion segmentation and analysis. Here are some notable recent ones:
DeepFashion2 (2023 Update)
- 491K images with 801K clothing items
- 13 clothes categories (similar to ModaNet)
- More detailed annotations including style, occlusion, zoom-in
- Higher quality annotations and more diverse images
- Link: https://github.com/switchablenorms/DeepFashion2
VITON-HD (2022)
- High resolution virtual try-on dataset
- 13,679 front-view woman/clothing image pairs
- High quality segmentation masks
- Particularly good for virtual try-on applications
FashionAI Dataset (2023)
- From Alibaba
- Over 180K images
- Focus on attribute recognition
- Detailed hierarchical attribute annotations
- More modern fashion styles and better image quality
LIP (Look Into Person) Dataset (2022 version)
- 50,000 images with pixel-level annotations
- 19 semantic human part labels
- Multiple viewpoints and poses
- Human parsing focused but includes detailed clothing segmentation
ACGPN Dataset (2023)
- 40,000 high-resolution person images
- Detailed parsing maps
- Semantic segmentation for clothes
- Focuses on both parsing and virtual try-on
Key Improvements in Recent Datasets:
- Higher resolution images
- Better annotation quality
- More diverse poses and viewpoints
- More modern fashion styles
- Better handling of occlusion and layering
- More detailed attribute annotations
- Multi-task annotations (segmentation + attributes + landmarks)
For your specific use case, I would recommend:
- DeepFashion2 as your primary dataset - it's the most comprehensive and recent
- Augment with ACGPN if you need higher resolution images
- Consider FashionAI if you need very detailed attribute recognition
"fashion attribute recognition" or "clothing parsing"
AI model for clothing analysis and attribute extraction from person images. This is often called "fashion attribute recognition" or "clothing parsing" in computer vision.
For this task, you'll want to consider several components:
- Person/Clothing Segmentation
- First, you'll need to segment different clothing items
- Models like DeepFashion2 or ModaNet provide good architectures for this
- You can use Mask R-CNN or similar instance segmentation models as a base
- Attribute Recognition For each segmented clothing item, you'll need to recognize:
- Category (top, pants, hat, etc.)
- Color
- Material
- Pattern
- Style/type
- Specific attributes (collar type, sleeve length, etc.)
Available Datasets:
- DeepFashion Dataset
- Over 800,000 images
- 50 clothing categories
- Multiple attributes per item
- Includes landmarks and segmentation
- Good for both segmentation and attribute recognition
- ModaNet
- About 55,000 fully annotated images
- 13 clothing categories
- Instance segmentation masks
- Strong street-style focus
- Fashion-MNIST
- Simpler dataset, good for initial testing
- 70,000 grayscale images
- 10 clothing categories
- Limited attributes
- Clothing Co-Parsing (CCP) Dataset
- 2,098 fashion images
- 59 clothing categories
- Pixel-level annotations
- Good for fine-grained parsing
Recommended Approach:
- Model Architecture:
- Use a two-stage approach: a. First stage: Mask R-CNN or YOLOv8 for segmentation b. Second stage: ResNet or EfficientNet backbone with attribute-specific heads
- Training Strategy:
- Pre-train on large datasets like DeepFashion
- Fine-tune on your specific use case
- Use multi-task learning for different attributes
- Implementation Frameworks:
- PyTorch or TensorFlow
- Consider using MMFashion (open-source fashion analysis toolbox)
- HuggingFace Transformers for recent vision models
12/25/2024
Installing cuDNN on Ubuntu 22.04
Installing cuDNN on Ubuntu 22.04
Step 1: Download cuDNN
- Go to https://developer.nvidia.com/cudnn
- Sign in to your NVIDIA Developer account (or create one if needed)
- Navigate to Downloads
- Find and download cuDNN v9.6.0 for Ubuntu 22.04 (.deb package)
Step 2: Install cuDNN
Run these commands in order:
# Install the downloaded package sudo dpkg -i cudnn-local-repo-ubuntu2204-9.6.0_1.0-1_amd64.deb # Copy the keyring sudo cp /var/cudnn-local-repo-ubuntu2204-9.6.0/cudnn-*-keyring.gpg /usr/share/keyrings/ # Update package list sudo apt-get update # Install cuDNN sudo apt-get -y install cudnn # Install CUDA 12 specific package sudo apt-get -y install cudnn-cuda-12
Step 3: Verify Installation
# Check if cuDNN is installed correctly find /usr -name "libcudnn.so*"
Note: Direct download links won't work - you must download through NVIDIA's website after logging in.
12/11/2024
Pedestrian and human attribute dataset.
For Pedestrian Detection:
- CityPersons - High-quality pedestrian detection dataset with diverse urban scenes from multiple European cities
- Caltech Pedestrian Dataset - Contains approximately 250,000 frames with 350,000 bounding boxes and 2,300 unique pedestrians
- INRIA Person Dataset - Includes full-body pedestrians in various poses and backgrounds
- MOT (Multiple Object Tracking) Dataset - Contains pedestrians in crowded scenes
For Human Attribute Analysis:
- RAP (Richly Annotated Pedestrian) Dataset - Over 40 attributes including clothing types, colors, and accessories
- PETA Dataset - Large-scale surveillance person attribute dataset with 19,000 images
- Market-1501 Attribute Dataset - Contains 27 attributes for clothing and personal items
- DeepFashion Dataset - Focuses on clothing items with detailed annotations
Some considerations when choosing a dataset:
- Make sure to check the license terms for each dataset
- Consider the image quality and diversity needed for your specific use case
- Check if the annotations match your requirements (bounding boxes, attributes, etc.)
- Verify that the dataset size is sufficient for your model training needs
11/19/2024
Print detail Model structure
Refer to code
..
..
This is Llama 3.1 8b Model Structure
11/17/2024
Hook Llama 3.1 8b layer and print dimension
refer to code
.
..
Thank you.
11/03/2024
Auto Number Plate Recognition (ANPR), SDK source code
# install
pip install marearts-anpr
# code
..
# Ask license is here: https://study.marearts.com/p/anpr-lpr-solution.html
# Live Test is here: https://live.marearts.com
10/30/2024
brief explain about "Audio → Spectrogram → Mel-spectrogram → MFCC"
Audio → Spectrogram → Mel-spectrogram → MFCC
- Spectrogram
- Raw time-frequency representation
- Shows energy at each frequency over time
- Doesn't account for human perception
- Mel-spectrogram
- Spectrogram mapped to mel scale
- Mimics human frequency perception
- Still maintains all frequency band information
- MFCC
- Derived FROM the mel-spectrogram
- Additional step: DCT (Discrete Cosine Transform) is applied
- Keeps only lower coefficients (dimensionality reduction)
- Decorrelates features
.
- Audio → Spectrogram
- Start with raw audio waveform
- Apply pre-emphasis to boost higher frequencies
- Frame the signal into short segments (typically 20-40ms with overlap)
- Apply window function (usually Hamming) to reduce edge effects
- Perform FFT on each frame
- Calculate power spectrum (|FFT|²)
- Spectrogram → Mel-spectrogram
- Create mel filter banks (triangular overlapping windows)
- Convert frequencies to mel scale using formula: mel = 2595 * log10(1 + f/700)
- Apply mel filter banks to power spectrum
- Sum up the energy in each mel band
- Mel-spectrogram → MFCC
- Take logarithm of mel filter bank energies (to match human perception)
- Apply Discrete Cosine Transform (DCT)
- Keep first N coefficients (typically 13-39)
- Optionally:
- Calculate delta (velocity) features
- Calculate delta-delta (acceleration) features
- Apply cepstral mean normalization (CMN)
..
10/26/2024
Download Youtube Video as best Quality
code..
..
That's it.
but install this
pip install yt-dlp
Thank you!!!
-
Image size of origin is 320*240. Processing time is 30.96 second took. The result of stitching The resul...
-
As you can see in the following video, I created a class that stitching n cameras in real time. https://www.youtube.com/user/feelmare/sear...
-
In the YUV color format, Y is bright information, U is blue color area, V is red color area. Show the below picture. The picture is u-v col...
-
fig 1. Left: set 4 points (Left Top, Right Top, Right Bottom, Left Bottom), right:warped image to (0,0) (300,0), (300,300), (0,300) Fi...
-
This source code based on -> http://feelmare.blogspot.kr/2011/08/two-image-mosaic-paranoma-based-on-sift.html This link page introduces...
-
1. GEMM (General Matrix Multiplication): - This is the basic operation: C = A × B (matrix multiplication) - Fundamental operation in deep le...
-
* Introduction - The solution shows panorama image from multi images. The panorama images is processing by real-time stitching algorithm...
-
This is dithering example, it make image like a stippling effect. I referenced to blew website. wiki page: https://en.wikipedia.org/wik...
-
This article explain how to access the thread index when you make block and thread with two dimensions. please refer to this page about me...
-
I am wondering that two hog features can compare or not. There was a article about this question on this page -> http://stackoverflow...