5 different case of FSDP and TP usage.
Showing posts with label tp. Show all posts
Showing posts with label tp. Show all posts
10/11/2024
How FSDP, TP works on distributed training?
10/10/2024
FSDP and TP explanation for 2 layer model
FSDP and TP are complementary parallelism techniques:
- FSDP (Fully Sharded Data Parallelism):
- Shards model parameters across GPUs
- Each GPU holds a portion of each layer's parameters
- During forward/backward pass, it gathers/scatters parameters as needed
- Reduces memory usage per GPU, allowing larger models
- TP (Tensor Parallelism):
- Splits individual tensors (layers) across GPUs
- Each GPU computes a portion of a layer's operations
- Useful for very large layers that don't fit on a single GPU
When combined:
- FSDP handles overall model distribution
- TP handles distribution of large individual layers
- This allows for even larger models and better GPU utilization
Textual Representation:
GPU 1 GPU 2 GPU 3 GPU 4 +--------+ +--------+ +--------+ +--------+ | L1 P1 | | L1 P2 | | L2 P1 | | L2 P2 | | TP1 | | TP2 | | TP1 | | TP2 | +--------+ +--------+ +--------+ +--------+ | | | | +------------+ +------------+ Layer 1 Layer 2 L1, L2: Layers 1 and 2 P1, P2: Parameter shards (FSDP) TP1, TP2: Tensor Parallel splits
Subscribe to:
Posts (Atom)
-
Logistic Classifier The logistic classifier is similar to equation of the plane. W is weight vector, X is input vector and y is output...
-
I use MOG2 algorithm to background subtraction. The process is resize to small for more fast processing to blur for avoid noise affectio...
-
This is data acquisition source code of LMS511(SICK co.) Source code is made by MFC(vs 2008). The sensor is communicated by TCP/IP. ...
-
Background subtractor example souce code. OpenCV support about 3 types subtraction algorithm. Those are MOG, MOG2, GMG algorithms. Det...
-
Image size of origin is 320*240. Processing time is 30.96 second took. The result of stitching The resul...
-
Created Date : 2009.10. Language : C++ Tool : Visual Studio C++ 2008 Library & Utilized : Point Grey-FlyCapture, Triclops, OpenCV...
-
As you can see in the following video, I created a class that stitching n cameras in real time. https://www.youtube.com/user/feelmare/sear...
-
The MNIST dataset is a dataset of handwritten digits, comprising 60 000 training examples and 10 000 test examples. The dataset can be downl...
-
This post is about how to copy Mat data to vector and copy vector data to Mat. Reference this example source code. printf ( "/////...
-
This example source code is to extract HOG feature from images. And save descriptors to XML file. The source code explain how to use HOGD...