8/22/2024

ROCm HIP asynchronous operation sample code

 





HIP (Heterogeneous-Compute Interface for Portability) provides similar functionality to CUDA streams for asynchronous execution. The concepts and usage are very similar, making it easier to port CUDA code to HIP. Here's an overview of HIP's equivalent features for asynchronous execution:

1. HIP Streams:
In HIP, streams are represented by the `hipStream_t` type, which is analogous to CUDA's `cudaStream_t`.

2. Creating and Destroying Streams:
```cpp
hipStream_t stream;
hipError_t hipStreamCreate(hipStream_t* stream);
hipError_t hipStreamDestroy(hipStream_t stream);
```

3. Asynchronous Memory Operations:
```cpp
hipError_t hipMemcpyAsync(void* dst, const void* src, size_t count, hipMemcpyKind kind, hipStream_t stream);
hipError_t hipMemsetAsync(void* dst, int value, size_t count, hipStream_t stream);
```

4. Launching Kernels on Streams:
```cpp
hipLaunchKernelGGL(kernel, dim3(gridSize), dim3(blockSize), 0, stream, /* kernel arguments */);
```

5. Stream Synchronization:
```cpp
hipError_t hipStreamSynchronize(hipStream_t stream);
hipError_t hipDeviceSynchronize();
```

6. Stream Query:
```cpp
hipError_t hipStreamQuery(hipStream_t stream);
```

7. Stream Callbacks:
```cpp
hipError_t hipStreamAddCallback(hipStream_t stream, hipStreamCallback_t callback, void* userData, unsigned int flags);
```

8. Stream Priorities:
```cpp
hipError_t hipStreamCreateWithPriority(hipStream_t* stream, unsigned int flags, int priority);
```

Here's a simple example demonstrating asynchronous execution with HIP streams:

```cpp
#include <hip/hip_runtime.h>
#include <stdio.h>

#define N 1000000
#define STREAMS 4

__global__ void vectorAdd(float* a, float* b, float* c, int numElements) {
int idx = blockDim.x * blockIdx.x + threadIdx.x;
if (idx < numElements) {
c[idx] = a[idx] + b[idx];
}
}

int main() {
float *h_a, *h_b, *h_c;
float *d_a, *d_b, *d_c;
size_t size = N * sizeof(float);

// Allocate host memory
h_a = (float*)malloc(size);
h_b = (float*)malloc(size);
h_c = (float*)malloc(size);

// Initialize host arrays
for (int i = 0; i < N; i++) {
h_a[i] = rand() / (float)RAND_MAX;
h_b[i] = rand() / (float)RAND_MAX;
}

// Allocate device memory
hipMalloc(&d_a, size);
hipMalloc(&d_b, size);
hipMalloc(&d_c, size);

// Create streams
hipStream_t streams[STREAMS];
for (int i = 0; i < STREAMS; i++) {
hipStreamCreate(&streams[i]);
}

// Launch kernel on multiple streams
int streamSize = N / STREAMS;
dim3 blockSize(256);
dim3 gridSize((streamSize + blockSize.x - 1) / blockSize.x);

for (int i = 0; i < STREAMS; i++) {
int offset = i * streamSize;
hipMemcpyAsync(&d_a[offset], &h_a[offset], streamSize * sizeof(float), hipMemcpyHostToDevice, streams[i]);
hipMemcpyAsync(&d_b[offset], &h_b[offset], streamSize * sizeof(float), hipMemcpyHostToDevice, streams[i]);
hipLaunchKernelGGL(vectorAdd, gridSize, blockSize, 0, streams[i], &d_a[offset], &d_b[offset], &d_c[offset], streamSize);
hipMemcpyAsync(&h_c[offset], &d_c[offset], streamSize * sizeof(float), hipMemcpyDeviceToHost, streams[i]);
}

// Synchronize all streams
for (int i = 0; i < STREAMS; i++) {
hipStreamSynchronize(streams[i]);
}

// Verify results
for (int i = 0; i < N; i++) {
if (fabs(h_c[i] - (h_a[i] + h_b[i])) > 1e-5) {
fprintf(stderr, "Result verification failed at element %d!\n", i);
exit(1);
}
}

printf("Test PASSED\n");

// Clean up
for (int i = 0; i < STREAMS; i++) {
hipStreamDestroy(streams[i]);
}
hipFree(d_a);
hipFree(d_b);
hipFree(d_c);
free(h_a);
free(h_b);
free(h_c);

return 0;
}
```

This example demonstrates how to use multiple streams to overlap computation and data transfer, similar to the CUDA example in the article you referenced. The key points are:

1. Creating multiple streams
2. Using `hipMemcpyAsync` for asynchronous data transfer
3. Launching kernels on specific streams
4. Synchronizing streams after all operations are queued

By using streams, you can potentially improve performance by overlapping operations and utilizing the GPU more efficiently.

hpcc install on cuda system. version 2

Please following the process

1. First, add the ROCm repository (if you haven't already):
wget -qO - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list

2. Update your package list:
sudo apt update

3. Install only the HIP compiler and development tools:
sudo apt install hip-base hip-doc
This should install the basic HIP tools without the full runtime that caused issues before.

4. After installation, add the HIP binaries to your PATH. Add this line to your ~/.bashrc file:
export PATH=$PATH:/opt/rocm/bin

5. Then, apply the changes:
source ~/.bashrc

6. Verify the installation:
hipcc --version

Install HIP (ROCm) compiler on CUDA system.


Try this process.


1. First, add the ROCm repository to your system. For Ubuntu, you can use these commands:

wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list


2. Update your package list:

sudo apt update


3. Install the HIP runtime and compiler for CUDA:

sudo apt install hip-runtime-nvidia hip-dev


4. Set up environment variables. Add these lines to your `~/.bashrc` file:

export HIP_PLATFORM=nvidia
export PATH=$PATH:/opt/rocm/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib

Then run `source ~/.bashrc` to apply the changes.


5. Verify the installation:

hipconfig --full


6. Now try compiling your code again:

hipcc vector_add.cpp -o vector_add


8/04/2024

object detection data coordinate convert, {polygon, center) -> (left, top, right, bottom)

 

I recently download some object detection dataset from Roboflow. 

But the coordinate is strange, some coordinate is polygon, some is cenxter x,y and width, height.

My favourite coordinate is left, top, right, bottom.

So this code figure out type of coordinate and covert it to [left, top, right, bottom]

Thank you.



.

import cv2
import numpy as np
import os
from tqdm import tqdm

def convert_and_write_coordinates(txt_path, rect_coords, poly_coords, w, h):
"""Converts coordinates to (label, left, top, right, bottom) and writes them back in a normalized format."""
with open(txt_path, 'w') as file:
# Write rectangle coordinates in normalized (label, left, top, right, bottom) format
for coord_set in rect_coords:
center_x, center_y, width, height = coord_set
left = (center_x - width / 2) * w
top = (center_y - height / 2) * h
right = (center_x + width / 2) * w
bottom = (center_y + height / 2) * h

# Normalize coordinates
left_norm = left / w
top_norm = top / h
right_norm = right / w
bottom_norm = bottom / h

# Write the rectangle coordinates in normalized format
file.write(f"0 {left_norm} {right_norm} {top_norm} {bottom_norm}\n")

# Write polygon coordinates in normalized (label, left, top, right, bottom) format
for points in poly_coords:
# Convert polygon points to bounding box
x_coords = [x * w for x, y in points]
y_coords = [y * h for x, y in points]
left = min(x_coords)
top = min(y_coords)
right = max(x_coords)
bottom = max(y_coords)

# Normalize coordinates
left_norm = left / w
top_norm = top / h
right_norm = right / w
bottom_norm = bottom / h

# Write the polygon coordinates in normalized bounding box format
file.write(f"0 {left_norm} {right_norm} {top_norm} {bottom_norm}\n")

def draw_annotations(image_path, txt_path):
# Load the image
image = cv2.imread(image_path)
if image is None:
print(f"Failed to load image: {image_path}")
return False

# Get image dimensions
h, w, _ = image.shape

# Read the annotation data from the corresponding text file
rect_coords = [] # To store rectangle coordinates
poly_coords = [] # To store polygon coordinates
with open(txt_path, 'r') as file:
lines = file.readlines()
for line in lines:
parts = line.strip().split()
label = int(parts[0]) # Extract the label index
coords = [float(p) for p in parts[1:]]

if len(coords) == 4: # Assuming it's (center_x, center_y, width, height)
rect_coords.append((coords[0], coords[1], coords[2], coords[3]))
elif len(coords) % 2 == 0: # Assuming pairs of coordinates for a polygon
points = [(coords[i], coords[i + 1]) for i in range(0, len(coords), 2)]
poly_coords.append(points)

# Write the new coordinates to the text file
convert_and_write_coordinates(txt_path, rect_coords, poly_coords, w, h)
return True

def process_images_in_folder(folder_path):
image_files = [f for f in os.listdir(folder_path) if f.lower().endswith(('.jpg', '.png'))]
for filename in tqdm(image_files, desc="Processing images"):
image_path = os.path.join(folder_path, filename)
txt_path = os.path.join(folder_path, os.path.splitext(filename)[0] + '.txt')

if os.path.isfile(txt_path):
draw_annotations(image_path, txt_path)
else:
print(f"No corresponding text file found for image: {filename}")

def main():
# Specify the directory containing the images and text files
folder_path = './val_indonesia_roboflow_2024_08_03/'

# Process the images in the folder
process_images_in_folder(folder_path)

if __name__ == "__main__":
main()

..



bonus code 

Display image + annotation rect box

.

import cv2
import os
from tqdm import tqdm

def display_image_with_annotations(image_path, txt_path):
"""Display the image with rectangle annotations."""
# Load the image
image = cv2.imread(image_path)
if image is None:
print(f"Failed to load image: {image_path}")
return False

# Get image dimensions
h, w, _ = image.shape

# Read the annotation data from the text file
with open(txt_path, 'r') as file:
lines = file.readlines()
for line in lines:
parts = line.strip().split()
if len(parts) == 5: # Check for valid rectangle data
# Extract label and normalized coordinates
label = int(parts[0]) # The label is currently unused, always 0 in your case
left_norm, right_norm, top_norm, bottom_norm = map(float, parts[1:])

# Convert normalized coordinates to absolute pixel coordinates
left = int(left_norm * w)
right = int(right_norm * w)
top = int(top_norm * h)
bottom = int(bottom_norm * h)

# Draw rectangle on the image
cv2.rectangle(image, (left, top), (right, bottom), (255, 0, 0), 2)

# Display the image with annotations
cv2.imshow('Annotated Image', image)
key = cv2.waitKey(0)
cv2.destroyAllWindows()

# Return True if 'q' was pressed, otherwise False
return key == ord('q')

def process_images_in_folder(folder_path):
image_files = [f for f in os.listdir(folder_path) if f.lower().endswith(('.jpg', '.png'))]
for filename in tqdm(image_files, desc="Processing images"):
image_path = os.path.join(folder_path, filename)
txt_path = os.path.join(folder_path, os.path.splitext(filename)[0] + '.txt')

if os.path.isfile(txt_path):
if display_image_with_annotations(image_path, txt_path):
print("Exiting image display loop.")
break
else:
print(f"No corresponding text file found for image: {filename}")

def main():
# Specify the directory containing the images and text files
folder_path = './train_indonesia_roboflow_2024_08_03/'

# Process the images in the folder
process_images_in_folder(folder_path)

if __name__ == "__main__":
main()

..



marearts

๐Ÿ™‡๐Ÿป‍♂️