.
..
Thank you.
Computer Vision & Machine Learning Research Laboratory
.
..
Thank you.
LightGBM (Light Gradient Boosting Machine) is a gradient boosting framework developed by Microsoft that uses tree-based learning algorithms. It's designed to be efficient, fast, and capable of handling large-scale data with high dimensionality.
Here's a visualization of how LightGBM works:
Key features of LightGBM that make it powerful:
Leaf-wise Tree Growth: Unlike traditional algorithms that grow trees level-wise, LightGBM grows trees leaf-wise, focusing on the leaf that will bring the maximum reduction in loss. This creates more complex trees but uses fewer splits, resulting in higher accuracy with the same number of leaves.
Gradient-based One-Side Sampling (GOSS): This technique retains instances with large gradients (those that need more training) and randomly samples instances with small gradients. This allows LightGBM to focus computational resources on the more informative examples without losing accuracy.
Exclusive Feature Bundling (EFB): For sparse datasets, many features are mutually exclusive (never take non-zero values simultaneously). LightGBM bundles these features together, treating them as a single feature. This reduces memory usage and speeds up training.
Gradient Boosting Framework: Like other boosting algorithms, LightGBM builds trees sequentially, with each new tree correcting the errors of the existing ensemble.
LightGBM is particularly well-suited for your solver selection task because:
When properly tuned, LightGBM can often achieve better performance than neural networks for tabular data, especially with the right hyperparameters and sufficient boosting rounds.
.
# Find the largest directories in your home
du -h --max-depth=1 ~ | sort -rh | head -20
# Find the largest files
find ~ -type f -exec du -h {} \; | sort -rh | head -20
# Alternatively for a cleaner view of largest files
find ~ -type f -size +100M -exec ls -lh {} \; | sort -k 5 -rh | head -20
..
Thank you
checkgpu.py
..
.
๐
Thank you!
python code:
.
..
output is looks like:
Starting camera detection...
Checking camera indices 0-9...
----------------------------------------
✓ Camera 0 is ONLINE:
Resolution: 640x480
FPS: 30.0
Backend: V4L2
Frame shape: (480, 640, 3)
Format: YUYV
Stability test: 5/5 frames captured successfully
[ WARN:0@0.913] global cap_v4l.cpp:999 open VIDEOIO(V4L2:/dev/video1): can't open camera by index
[ERROR:0@0.972] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
✗ Camera 1: Not available
✓ Camera 2 is ONLINE:
Resolution: 640x480
FPS: 30.0
Backend: V4L2
Frame shape: (480, 640, 3)
Format: YUYV
Stability test: 5/5 frames captured successfully
[ WARN:0@1.818] global cap_v4l.cpp:999 open VIDEOIO(V4L2:/dev/video3): can't open camera by index
[ERROR:0@1.820] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
✗ Camera 3: Not available
[ WARN:0@1.820] global cap_v4l.cpp:999 open VIDEOIO(V4L2:/dev/video4): can't open camera by index
[ERROR:0@1.822] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
✗ Camera 4: Not available
[ WARN:0@1.822] global cap_v4l.cpp:999 open VIDEOIO(V4L2:/dev/video5): can't open camera by index
[ERROR:0@1.823] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
✗ Camera 5: Not available
[ WARN:0@1.824] global cap_v4l.cpp:999 open VIDEOIO(V4L2:/dev/video6): can't open camera by index
[ERROR:0@1.825] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
✗ Camera 6: Not available
[ WARN:0@1.825] global cap_v4l.cpp:999 open VIDEOIO(V4L2:/dev/video7): can't open camera by index
[ERROR:0@1.828] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
✗ Camera 7: Not available
[ WARN:0@1.828] global cap_v4l.cpp:999 open VIDEOIO(V4L2:/dev/video8): can't open camera by index
[ERROR:0@1.830] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
✗ Camera 8: Not available
[ WARN:0@1.830] global cap_v4l.cpp:999 open VIDEOIO(V4L2:/dev/video9): can't open camera by index
[ERROR:0@1.831] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
✗ Camera 9: Not available
----------------------------------------
Summary:
Working camera indices: [0, 2]
----------------------------------------
Camera check complete!
so you can know which one is online
Thank you!
I'll create a simple example of a tiny neural network to demonstrate fp8 vs fp32 memory usage. Let's make a small model with these layers:
1. Input: 784 features (like MNIST image 28x28)
2. Hidden layer 1: 512 neurons
3. Hidden layer 2: 256 neurons
4. Output: 10 neurons (for 10 digit classes)
Let's calculate the memory needed for weights:
1. First Layer Weights:
```
784 × 512 = 401,408 weights
+ 512 biases
= 401,920 parameters
```
2. Second Layer Weights:
```
512 × 256 = 131,072 weights
+ 256 biases
= 131,328 parameters
```
3. Output Layer Weights:
```
256 × 10 = 2,560 weights
+ 10 biases
= 2,570 parameters
```
Total Parameters: 535,818
Memory Usage:
```
FP32: 535,818 × 4 bytes = 2,143,272 bytes ≈ 2.14 MB
FP8: 535,818 × 1 byte = 535,818 bytes ≈ 0.54 MB
```
Let's demonstrate this with some actual matrix multiplication:
```python
# Example of one batch of inference
Input size: 32 images (batch) × 784 features
32 × 784 = 25,088 numbers
For first layer multiplication:
(32 × 784) × (784 × 512) → (32 × 512)
```
During computation:
1. With fp32:
```
Weights in memory: 401,920 × 4 = 1,607,680 bytes
Input in memory: 25,088 × 4 = 100,352 bytes
Output in memory: 16,384 × 4 = 65,536 bytes
Total: ≈ 1.77 MB
```
2. With fp8:
```
Weights in memory: 401,920 × 1 = 401,920 bytes
Input in memory: 25,088 × 1 = 25,088 bytes
Output in memory: 16,384 × 1 = 16,384 bytes
Total: ≈ 0.44 MB
```
During actual computation:
```
1. Load a tile/block of the weight matrix (let's say 128×128)
fp8: 128×128 = 16,384 bytes
2. Convert this block to fp32: 16,384 × 4 = 65,536 bytes
3. Perform multiplication in fp32
4. Convert result back to fp8
5. Move to next block
```
This shows how even though we compute in fp32, keeping the model in fp8:
1. Uses 1/4 the memory for storage
2. Only needs small blocks in fp32 temporarily
3. Can process larger batches or models with same memory