MareArts Computer Vision Study.: 2024.12

12/31/2024

Persion Detection dataset

# The Ultimate Guide to Person Detection Datasets (2024 Edition)

Are you working on a computer vision project involving person detection? Choosing the right dataset can make or break your model's performance. In this comprehensive guide, we'll explore the best person detection datasets available in 2024, from industry standards to exciting new releases.

## Table of Contents

- [Industry Standard Datasets](#industry-standard-datasets)

- [Specialized Datasets](#specialized-datasets)

- [New Datasets for 2024](#new-datasets-for-2024)

- [How to Choose the Right Dataset](#how-to-choose-the-right-dataset)

## Industry Standard Datasets

### COCO (Common Objects in Context)

**The Gold Standard for Computer Vision**

- **Size**: 200,000+ images with 250,000+ person instances

- **What Makes It Special**:

- Diverse scenarios and lighting conditions

- High-quality annotations including segmentation masks

- Regular updates and strong community support

- **Best For**: General-purpose detection and benchmarking

### CrowdHuman

**Your Go-To for Crowded Scenes**

- **Size**: 15,000 images containing 470,000 person instances

- **Standout Features**:

- Average of 22.6 people per image

- Multiple annotation types (full body, visible body, head)

- Real-world crowd scenarios

- **Best For**: Surveillance systems and crowd monitoring

### MOT20

**Perfect for Video Applications**

- **Size**: 2.2M+ annotated boxes across video sequences

- **Key Strengths**:

- Temporal information

- Challenging crowd scenarios

- Moving camera situations

- **Best For**: Multi-object tracking and surveillance

## Specialized Datasets

### CityPersons

**Urban Environment Specialist**

- **Size**: 35,000 person instances

- **Resolution**: Crisp 2048x1024 images

- **Perfect For**:

- Autonomous driving

- Urban surveillance

- Street-level analysis

### SCUT-HEAD

**Head Detection Expert**

- **Size**: 4,500 images with 111,000 head annotations

- **Unique Features**:

- Specialized for head detection

- Various viewing angles

- Crowd density information

- **Best For**: Head counting and crowd analysis

## New Datasets for 2024

### HumanFlow

**Revolutionary Crowd Analysis**

- **Focus**: Dense crowd movement patterns

- **Size**: 50,000+ tracked trajectories

- **Unique Offering**: Group behavior analysis and flow patterns

### NightPersons

**Low-Light Detection Champion**

- **Specialty**: Night-time and low-light scenarios

- **Size**: 25,000 annotated instances

- **Extra Value**: Multi-spectrum data including thermal imaging

### MultiViewPeople

**Multi-Camera Innovation**

- **Size**: 1M+ synchronized frames

- **Highlight Features**:

- Multiple synchronized camera views

- Indoor and outdoor scenarios

- Activity labels

## How to Choose the Right Dataset

### 1. Consider Your Application

- **General Detection**: Start with COCO

- **Crowd Analysis**: CrowdHuman is your friend

- **Urban/Traffic**: CityPersons won't disappoint

- **Night Operations**: NightPersons is essential

- **Multi-Camera Setup**: MultiViewPeople has you covered

### 2. Check Your Resources

- **Storage Capacity**: Larger datasets need more space

- **Computing Power**: Consider your training infrastructure

- **Time Constraints**: Smaller datasets might be sufficient for prototyping

### 3. Evaluate Data Quality

- Look for consistent annotations

- Check update frequency

- Consider community support and available tools

### 4. Think About Your Environment

- Indoor vs. outdoor requirements

- Lighting conditions

- Camera angles and positions

- Scene complexity

## Conclusion

The perfect dataset for your person detection project depends on your specific needs. While COCO remains the industry standard, specialized datasets like CrowdHuman or the new NightPersons might better suit your particular use case. Don't be afraid to combine multiple datasets for better results!

### Pro Tips

1. Start with a smaller subset for initial testing

2. Consider data augmentation to enhance diversity

3. Check licensing terms before using in commercial projects

4. Look for datasets with similar conditions to your deployment environment

Need help getting started? Drop a comment below, and I'll be happy to help you choose the right dataset for your project!

---

*Last updated: December 2024*

hipblas, cublas algorithm

The HipBLASLt tuning process and algorithm selection is based on these factors in your data:

```
dev_cap,m,n,k,trans_a,trans_b,type_a,type_b,type_d,bias_type,lda,ldb,ldd,epi,comp,scale,ws_min,ws_max,algo_id,aidx
```

Key parameters:
1. Matrix Dimensions:
- `m,n,k`: Matrix dimensions for GEMM operations
- Example: `904,8192,2048,8192` = matrix sizes

2. Data Types:
- `type_a,type_b`: Input types (float8e4m3, bfloat16)
- `type_d`: Output type (bfloat16)
- `comp`: Computation type (f32)

3. Memory Layout:
- `trans_a,trans_b`: Matrix transposition (T=transposed, N=not)
- `lda,ldb,ldd`: Leading dimensions

4. Algorithm Selection:
- `algo_id`: Specific algorithm identifier
- `aidx`: Algorithm variant index
- workspace limits: `ws_min,ws_max`

The tuning process (`TE_HIPBLASLT_TUNING_RUN_COUNT=30` and `TE_HIPBLASLT_TUNING_ALGO_COUNT=100`) tests different combinations and selects the best based on:
1. Performance (speed)
2. Numerical stability
3. Memory usage
4. Hardware compatibility (dev_cap=904)

This tuning happens in the Tensor Engine (TE) library during the GEMM operations.

Insight about "fsdp_config.activation_checkpointing" option

The `fsdp_config.activation_checkpointing` does come with a computational overhead since it recomputes activations during the backward pass, but it's generally the most efficient option for large models like LLaMA 70B for several reasons:

1. Alternative memory saving options usually have bigger throughput impacts:

- `activation_cpu_offload=true`: Much slower due to CPU-GPU data transfer

- `fp32_cpu_offload=true`: Also involves slow CPU-GPU transfers

- Reducing batch size: Directly reduces throughput

- Reducing model size: Changes model behavior

2. Activation checkpointing's overhead is typically around 20-30% compute time, but:

- Only affects backward pass

- Modern GPUs often have compute headroom

- Memory savings (50-70%) usually outweigh the computation cost

3. The recompute strategy is smart:

- Only recomputes necessary activations

- Can utilize GPU compute while other operations are happening

- Modern implementations are optimized to minimize recomputation

So while it's not zero-cost, it's usually the minimum-impact option that allows large models to fit in memory while maintaining reasonable throughput.

12/30/2024

What is "GPU_MAX_HW_QUEUES"?

GPU_MAX_HW_QUEUES controls the maximum number of hardware queues that can be used for GPU operations in ROCm/HIP environment (AMD GPUs).

Specifically:

Each hardware queue can handle GPU operations (kernels) independently
Default value is typically 8
In your script, it's set to 4: export GPU_MAX_HW_QUEUES=4

This setting affects:

Parallel kernel execution
Memory transfers
Overall GPU utilization
Resource allocation

Lower values (like 4) can:

Reduce overhead
Improve memory usage
Lead to more predictable performance
Be better for large models like LLaMA 70B

Higher values can:

Allow more parallel operations
Potentially improve throughput for smaller models
Use more GPU resources

In your case, setting GPU_MAX_HW_QUEUES=4 is a conservative choice that helps manage memory and scheduling overhead when training the large LLaMA 70B model.

12/28/2024

fashion ai dataset, ordered by years

2024-2023:

DeepFashion2 (2023 Update)

491K images, 801K clothing items
13 clothing categories
Paired cross-pose images
High resolution (1024x768)
Style, occlusion, landmarks annotations

FashionAI (2023)

180K+ images
Hierarchical attribute system
Focus on e-commerce applications
Multi-label classification
Fine-grained attribute annotations

ACGPN Dataset (2023)

40K high-resolution images
Detailed semantic parsing maps
Virtual try-on ready
Human pose annotations included

2022-2021:

VITON-HD (2022)

13,679 front-view pairs
High resolution (1024x768)
Clean background images
Precise segmentation masks

LIP Dataset (2022 Version)

50K images
19 semantic parts
Instance-level human parsing
Multiple viewpoints

Fashion-MNIST+ (2021)

Enhanced version of Fashion-MNIST
70K images
Additional attribute annotations
Higher resolution than original

2020-2019:

DeepFashion2 (Original 2019)

191K images
13 clothing categories
Commercial-consumer image pairs
Landmark detection

FashionGen (2019)

325K images
Multi-modal fashion dataset
Text descriptions included
Attribute annotations

2018-2017:

ModaNet (2018)

55K street-style images
13 clothing categories
Pixel-level segmentation
Built on Paperdoll dataset

DeepFashion (2017)

800K images
50 clothing categories
Multiple tasks (category/attribute prediction)
Landmark detection

2016-2015:

Clothing Co-Parsing (CCP)

2,098 images
59 clothing categories
Pixel-level annotations
Early benchmark dataset

Fashion10000 (2015)

32K images
Basic attribute labels
Focus on style classification

Key Trends Over Time:

Resolution: Steady increase from 224x224 to 1024x768+
Dataset Size: Growing from thousands to hundreds of thousands
Annotation Quality: Moving from basic labels to multi-task annotations
Real-world Applicability: More focus on practical use cases
Diversity: Including more poses, styles, and demographics
Task Coverage: From simple classification to complex parsing/virtual try-on

Fashion ai dataset.

ModaNet (2018) was groundbreaking but there have been several more recent datasets and models for fashion segmentation and analysis. Here are some notable recent ones:

DeepFashion2 (2023 Update)

491K images with 801K clothing items
13 clothes categories (similar to ModaNet)
More detailed annotations including style, occlusion, zoom-in
Higher quality annotations and more diverse images
Link: https://github.com/switchablenorms/DeepFashion2

VITON-HD (2022)

High resolution virtual try-on dataset
13,679 front-view woman/clothing image pairs
High quality segmentation masks
Particularly good for virtual try-on applications

FashionAI Dataset (2023)

From Alibaba
Over 180K images
Focus on attribute recognition
Detailed hierarchical attribute annotations
More modern fashion styles and better image quality

LIP (Look Into Person) Dataset (2022 version)

50,000 images with pixel-level annotations
19 semantic human part labels
Multiple viewpoints and poses
Human parsing focused but includes detailed clothing segmentation

ACGPN Dataset (2023)

40,000 high-resolution person images
Detailed parsing maps
Semantic segmentation for clothes
Focuses on both parsing and virtual try-on

Key Improvements in Recent Datasets:

Higher resolution images
Better annotation quality
More diverse poses and viewpoints
More modern fashion styles
Better handling of occlusion and layering
More detailed attribute annotations
Multi-task annotations (segmentation + attributes + landmarks)

For your specific use case, I would recommend:

DeepFashion2 as your primary dataset - it's the most comprehensive and recent
Augment with ACGPN if you need higher resolution images
Consider FashionAI if you need very detailed attribute recognition

"fashion attribute recognition" or "clothing parsing"

AI model for clothing analysis and attribute extraction from person images. This is often called "fashion attribute recognition" or "clothing parsing" in computer vision.

For this task, you'll want to consider several components:

Person/Clothing Segmentation

First, you'll need to segment different clothing items
Models like DeepFashion2 or ModaNet provide good architectures for this
You can use Mask R-CNN or similar instance segmentation models as a base

Attribute Recognition For each segmented clothing item, you'll need to recognize:

Category (top, pants, hat, etc.)
Color
Material
Pattern
Style/type
Specific attributes (collar type, sleeve length, etc.)

Available Datasets:

DeepFashion Dataset

Over 800,000 images
50 clothing categories
Multiple attributes per item
Includes landmarks and segmentation
Good for both segmentation and attribute recognition

ModaNet

About 55,000 fully annotated images
13 clothing categories
Instance segmentation masks
Strong street-style focus

Fashion-MNIST

Simpler dataset, good for initial testing
70,000 grayscale images
10 clothing categories
Limited attributes

Clothing Co-Parsing (CCP) Dataset

2,098 fashion images
59 clothing categories
Pixel-level annotations
Good for fine-grained parsing

Recommended Approach:

Model Architecture:

Use a two-stage approach: a. First stage: Mask R-CNN or YOLOv8 for segmentation b. Second stage: ResNet or EfficientNet backbone with attribute-specific heads

Training Strategy:

Pre-train on large datasets like DeepFashion
Fine-tune on your specific use case
Use multi-task learning for different attributes

Implementation Frameworks:

PyTorch or TensorFlow
Consider using MMFashion (open-source fashion analysis toolbox)
HuggingFace Transformers for recent vision models

12/25/2024

Installing cuDNN on Ubuntu 22.04

Step 1: Download cuDNN

Go to https://developer.nvidia.com/cudnn
Sign in to your NVIDIA Developer account (or create one if needed)
Navigate to Downloads
Find and download cuDNN v9.6.0 for Ubuntu 22.04 (.deb package)

Step 2: Install cuDNN

Run these commands in order:

# Install the downloaded package
sudo dpkg -i cudnn-local-repo-ubuntu2204-9.6.0_1.0-1_amd64.deb

# Copy the keyring
sudo cp /var/cudnn-local-repo-ubuntu2204-9.6.0/cudnn-*-keyring.gpg /usr/share/keyrings/

# Update package list
sudo apt-get update

# Install cuDNN
sudo apt-get -y install cudnn

# Install CUDA 12 specific package
sudo apt-get -y install cudnn-cuda-12

Step 3: Verify Installation

# Check if cuDNN is installed correctly
find /usr -name "libcudnn.so*"

Note: Direct download links won't work - you must download through NVIDIA's website after logging in.

12/11/2024

Pedestrian and human attribute dataset.

For Pedestrian Detection:

CityPersons - High-quality pedestrian detection dataset with diverse urban scenes from multiple European cities
Caltech Pedestrian Dataset - Contains approximately 250,000 frames with 350,000 bounding boxes and 2,300 unique pedestrians
INRIA Person Dataset - Includes full-body pedestrians in various poses and backgrounds
MOT (Multiple Object Tracking) Dataset - Contains pedestrians in crowded scenes

For Human Attribute Analysis:

RAP (Richly Annotated Pedestrian) Dataset - Over 40 attributes including clothing types, colors, and accessories
PETA Dataset - Large-scale surveillance person attribute dataset with 19,000 images
Market-1501 Attribute Dataset - Contains 27 attributes for clothing and personal items
DeepFashion Dataset - Focuses on clothing items with detailed annotations

Some considerations when choosing a dataset:

Make sure to check the license terms for each dataset
Consider the image quality and diversity needed for your specific use case
Check if the annotations match your requirements (bounding boxes, attributes, etc.)
Verify that the dataset size is sufficient for your model training needs

Pages