GPU_MAX_HW_QUEUES
controls the maximum number of hardware queues that can be used for GPU operations in ROCm/HIP environment (AMD GPUs).
Specifically:
- Each hardware queue can handle GPU operations (kernels) independently
- Default value is typically 8
- In your script, it's set to 4:
export GPU_MAX_HW_QUEUES=4
This setting affects:
- Parallel kernel execution
- Memory transfers
- Overall GPU utilization
- Resource allocation
Lower values (like 4) can:
- Reduce overhead
- Improve memory usage
- Lead to more predictable performance
- Be better for large models like LLaMA 70B
Higher values can:
- Allow more parallel operations
- Potentially improve throughput for smaller models
- Use more GPU resources
In your case, setting GPU_MAX_HW_QUEUES=4
is a conservative choice that helps manage memory and scheduling overhead when training the large LLaMA 70B model.
No comments:
Post a Comment