Add the maximum possible remainder (TILE-1) before dividing!
Simple Example with 10:
Let's use TILE=10 for easier understanding:
Without ceiling (wrong):
11 / 10 = 1 (need 2!) ❌
19 / 10 = 1 (need 2!) ❌
With ceiling technique:
(11 + 9) / 10 = 20 / 10 = 2 ✓
(19 + 9) / 10 = 28 / 10 = 2 ✓
(20 + 9) / 10 = 29 / 10 = 2 ✓ (still 2, correct!)
(21 + 9) / 10 = 30 / 10 = 3 ✓
Why Add (TILE-1)?
Think of it like this:
If remainder = 0 (perfectly divisible):
20 / 10 = 2.0
(20 + 9) / 10 = 29 / 10 = 2.9 → 2 (same!)
If remainder > 0 (needs extra block):
21 / 10 = 2.1 (remainder 1)
(21 + 9) / 10 = 30 / 10 = 3.0 → 3 (pushed to next!)
The Magic:
Adding (TILE-1):
- Remainder 0: Adds 0.9999... → stays same integer
- Remainder ≥1: Adds enough to reach next integer
Visual Pattern:
Value | +9 | /10 | Result
------|----|----|-------
10 | 19 | 1.9| 1 ✓
11 | 20 | 2.0| 2 ✓ (jumps up!)
19 | 28 | 2.8| 2 ✓
20 | 29 | 2.9| 2 ✓
21 | 30 | 3.0| 3 ✓ (jumps up!)
Formula Summary:
// Ceiling division formula:
ceil(A/B) = (A + B - 1) / B
// For our GEMM tiles:
num_blocks = (matrix_size + tile_size - 1) / tile_size
It's simple: "Add almost one tile, then divide" - this guarantees rounding up!
No comments:
Post a Comment