12/30/2024

What is "GPU_MAX_HW_QUEUES"?

 GPU_MAX_HW_QUEUES controls the maximum number of hardware queues that can be used for GPU operations in ROCm/HIP environment (AMD GPUs).

Specifically:

  1. Each hardware queue can handle GPU operations (kernels) independently
  2. Default value is typically 8
  3. In your script, it's set to 4: export GPU_MAX_HW_QUEUES=4

This setting affects:

  • Parallel kernel execution
  • Memory transfers
  • Overall GPU utilization
  • Resource allocation

Lower values (like 4) can:

  • Reduce overhead
  • Improve memory usage
  • Lead to more predictable performance
  • Be better for large models like LLaMA 70B

Higher values can:

  • Allow more parallel operations
  • Potentially improve throughput for smaller models
  • Use more GPU resources

In your case, setting GPU_MAX_HW_QUEUES=4 is a conservative choice that helps manage memory and scheduling overhead when training the large LLaMA 70B model.

No comments:

Post a Comment