12/31/2024

hipblas, cublas algorithm



The HipBLASLt tuning process and algorithm selection is based on these factors in your data:

```
dev_cap,m,n,k,trans_a,trans_b,type_a,type_b,type_d,bias_type,lda,ldb,ldd,epi,comp,scale,ws_min,ws_max,algo_id,aidx
```

Key parameters:
1. Matrix Dimensions:
- `m,n,k`: Matrix dimensions for GEMM operations
- Example: `904,8192,2048,8192` = matrix sizes

2. Data Types:
- `type_a,type_b`: Input types (float8e4m3, bfloat16)
- `type_d`: Output type (bfloat16)
- `comp`: Computation type (f32)

3. Memory Layout:
- `trans_a,trans_b`: Matrix transposition (T=transposed, N=not)
- `lda,ldb,ldd`: Leading dimensions

4. Algorithm Selection:
- `algo_id`: Specific algorithm identifier
- `aidx`: Algorithm variant index
- workspace limits: `ws_min,ws_max`

The tuning process (`TE_HIPBLASLT_TUNING_RUN_COUNT=30` and `TE_HIPBLASLT_TUNING_ALGO_COUNT=100`) tests different combinations and selects the best based on:
1. Performance (speed)
2. Numerical stability
3. Memory usage
4. Hardware compatibility (dev_cap=904)

This tuning happens in the Tensor Engine (TE) library during the GEMM operations.

No comments:

Post a Comment