1/17/2025

HipBlasLT type definition explanation


1. About Output Types (D):

No, the output D is not limited to fp32/int32. Looking at the table, D can be:

- fp32

- fp16

- bf16

- fp8

- bf8

- int8


2. Input/Output Patterns:

When A is fp16, you have two options:

```

Option 1:

A: fp16 → B: fp16 → C: fp16 → D: fp16 → Compute: fp32


Option 2:

A: fp16 → B: fp16 → C: fp16 → D: fp32 → Compute: fp32

```


The compute/scale is always higher precision (fp32 or int32) to maintain accuracy during calculations, even if inputs/outputs are lower precision.


3. Key Patterns in the Table:

- Inputs A and B must always match in type

- C typically matches A and B, except with fp8/bf8 inputs

- When using fp8/bf8 inputs, C and D can be higher precision (fp32, fp16, or bf16)

- The compute precision is always fp32 for floating point types

- For integer operations (int8), the compute precision is int32


4. Why Different Combinations?

- Performance: Lower precision (fp16, fp8) = faster computation + less memory

- Accuracy: Higher precision (fp32) = better accuracy but slower

- Memory Usage: fp16/fp8 use less memory than fp32

- Mixed Precision: Use lower precision for inputs but higher precision for output to balance speed and accuracy


Example Use Cases:

```

High Accuracy Needs:

A(fp32) → B(fp32) → C(fp32) → D(fp32) → Compute(fp32)


Balanced Performance:

A(fp16) → B(fp16) → C(fp16) → D(fp32) → Compute(fp32)


Maximum Performance:

A(fp8) → B(fp8) → C(fp8) → D(fp8) → Compute(fp32)

```


No comments:

Post a Comment