MareArts Computer Vision Study.: 2024.08

8/22/2024

ROCm HIP asynchronous operation sample code

HIP (Heterogeneous-Compute Interface for Portability) provides similar functionality to CUDA streams for asynchronous execution. The concepts and usage are very similar, making it easier to port CUDA code to HIP. Here's an overview of HIP's equivalent features for asynchronous execution:

1. HIP Streams:
   In HIP, streams are represented by the `hipStream_t` type, which is analogous to CUDA's `cudaStream_t`.

2. Creating and Destroying Streams:
   ```cpp
   hipStream_t stream;
   hipError_t hipStreamCreate(hipStream_t* stream);
   hipError_t hipStreamDestroy(hipStream_t stream);
   ```

3. Asynchronous Memory Operations:
   ```cpp
   hipError_t hipMemcpyAsync(void* dst, const void* src, size_t count, hipMemcpyKind kind, hipStream_t stream);
   hipError_t hipMemsetAsync(void* dst, int value, size_t count, hipStream_t stream);
   ```

4. Launching Kernels on Streams:
   ```cpp
   hipLaunchKernelGGL(kernel, dim3(gridSize), dim3(blockSize), 0, stream, /* kernel arguments */);
   ```

5. Stream Synchronization:
   ```cpp
   hipError_t hipStreamSynchronize(hipStream_t stream);
   hipError_t hipDeviceSynchronize();
   ```

6. Stream Query:
   ```cpp
   hipError_t hipStreamQuery(hipStream_t stream);
   ```

7. Stream Callbacks:
   ```cpp
   hipError_t hipStreamAddCallback(hipStream_t stream, hipStreamCallback_t callback, void* userData, unsigned int flags);
   ```

8. Stream Priorities:
   ```cpp
   hipError_t hipStreamCreateWithPriority(hipStream_t* stream, unsigned int flags, int priority);
   ```

Here's a simple example demonstrating asynchronous execution with HIP streams:

```cpp
#include <hip/hip_runtime.h>
#include <stdio.h>

#define N 1000000
#define STREAMS 4

__global__ void vectorAdd(float* a, float* b, float* c, int numElements) {
    int idx = blockDim.x * blockIdx.x + threadIdx.x;
    if (idx < numElements) {
        c[idx] = a[idx] + b[idx];
    }
}

int main() {
    float *h_a, *h_b, *h_c;
    float *d_a, *d_b, *d_c;
    size_t size = N * sizeof(float);

    // Allocate host memory
    h_a = (float*)malloc(size);
    h_b = (float*)malloc(size);
    h_c = (float*)malloc(size);

    // Initialize host arrays
    for (int i = 0; i < N; i++) {
        h_a[i] = rand() / (float)RAND_MAX;
        h_b[i] = rand() / (float)RAND_MAX;
    }

    // Allocate device memory
    hipMalloc(&d_a, size);
    hipMalloc(&d_b, size);
    hipMalloc(&d_c, size);

    // Create streams
    hipStream_t streams[STREAMS];
    for (int i = 0; i < STREAMS; i++) {
        hipStreamCreate(&streams[i]);
    }

    // Launch kernel on multiple streams
    int streamSize = N / STREAMS;
    dim3 blockSize(256);
    dim3 gridSize((streamSize + blockSize.x - 1) / blockSize.x);

    for (int i = 0; i < STREAMS; i++) {
        int offset = i * streamSize;
        hipMemcpyAsync(&d_a[offset], &h_a[offset], streamSize * sizeof(float), hipMemcpyHostToDevice, streams[i]);
        hipMemcpyAsync(&d_b[offset], &h_b[offset], streamSize * sizeof(float), hipMemcpyHostToDevice, streams[i]);
        hipLaunchKernelGGL(vectorAdd, gridSize, blockSize, 0, streams[i], &d_a[offset], &d_b[offset], &d_c[offset], streamSize);
        hipMemcpyAsync(&h_c[offset], &d_c[offset], streamSize * sizeof(float), hipMemcpyDeviceToHost, streams[i]);
    }

    // Synchronize all streams
    for (int i = 0; i < STREAMS; i++) {
        hipStreamSynchronize(streams[i]);
    }

    // Verify results
    for (int i = 0; i < N; i++) {
        if (fabs(h_c[i] - (h_a[i] + h_b[i])) > 1e-5) {
            fprintf(stderr, "Result verification failed at element %d!\n", i);
            exit(1);
        }
    }

    printf("Test PASSED\n");

    // Clean up
    for (int i = 0; i < STREAMS; i++) {
        hipStreamDestroy(streams[i]);
    }
    hipFree(d_a);
    hipFree(d_b);
    hipFree(d_c);
    free(h_a);
    free(h_b);
    free(h_c);

    return 0;
}
```

This example demonstrates how to use multiple streams to overlap computation and data transfer, similar to the CUDA example in the article you referenced. The key points are:

1. Creating multiple streams
2. Using `hipMemcpyAsync` for asynchronous data transfer
3. Launching kernels on specific streams
4. Synchronizing streams after all operations are queued

By using streams, you can potentially improve performance by overlapping operations and utilizing the GPU more efficiently.

hpcc install on cuda system. version 2

Please following the process

1. First, add the ROCm repository (if you haven't already):

   wget -qO - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -

   echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list

2. Update your package list:

   sudo apt update

3. Install only the HIP compiler and development tools:

   sudo apt install hip-base hip-doc

   This should install the basic HIP tools without the full runtime that caused issues before.

4. After installation, add the HIP binaries to your PATH. Add this line to your ~/.bashrc file:

   export PATH=$PATH:/opt/rocm/bin

5. Then, apply the changes:

   source ~/.bashrc

6. Verify the installation:

   hipcc --version

Install HIP (ROCm) compiler on CUDA system.

Try this process.

1. First, add the ROCm repository to your system. For Ubuntu, you can use these commands:

   wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -

   echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list

2. Update your package list:

   sudo apt update

3. Install the HIP runtime and compiler for CUDA:

   sudo apt install hip-runtime-nvidia hip-dev

4. Set up environment variables. Add these lines to your `~/.bashrc` file:

   export HIP_PLATFORM=nvidia

   export PATH=$PATH:/opt/rocm/bin

   export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib

   Then run `source ~/.bashrc` to apply the changes.

5. Verify the installation:

   hipconfig --full

6. Now try compiling your code again:

   hipcc vector_add.cpp -o vector_add

8/04/2024

object detection data coordinate convert, {polygon, center) -> (left, top, right, bottom)

I recently download some object detection dataset from Roboflow.

But the coordinate is strange, some coordinate is polygon, some is cenxter x,y and width, height.

My favourite coordinate is left, top, right, bottom.

So this code figure out type of coordinate and covert it to [left, top, right, bottom]

Thank you.

import cv2
import numpy as np
import os
from tqdm import tqdm

def convert_and_write_coordinates(txt_path, rect_coords, poly_coords, w, h):
    """Converts coordinates to (label, left, top, right, bottom) and writes them back in a normalized format."""
    with open(txt_path, 'w') as file:
        # Write rectangle coordinates in normalized (label, left, top, right, bottom) format
        for coord_set in rect_coords:
            center_x, center_y, width, height = coord_set
            left = (center_x - width / 2) * w
            top = (center_y - height / 2) * h
            right = (center_x + width / 2) * w
            bottom = (center_y + height / 2) * h

            # Normalize coordinates
            left_norm = left / w
            top_norm = top / h
            right_norm = right / w
            bottom_norm = bottom / h

            # Write the rectangle coordinates in normalized format
            file.write(f"0 {left_norm} {right_norm} {top_norm} {bottom_norm}\n")

        # Write polygon coordinates in normalized (label, left, top, right, bottom) format
        for points in poly_coords:
            # Convert polygon points to bounding box
            x_coords = [x * w for x, y in points]
            y_coords = [y * h for x, y in points]
            left = min(x_coords)
            top = min(y_coords)
            right = max(x_coords)
            bottom = max(y_coords)

            # Normalize coordinates
            left_norm = left / w
            top_norm = top / h
            right_norm = right / w
            bottom_norm = bottom / h

            # Write the polygon coordinates in normalized bounding box format
            file.write(f"0 {left_norm} {right_norm} {top_norm} {bottom_norm}\n")

def draw_annotations(image_path, txt_path):
    # Load the image
    image = cv2.imread(image_path)
    if image is None:
        print(f"Failed to load image: {image_path}")
        return False

    # Get image dimensions
    h, w, _ = image.shape

    # Read the annotation data from the corresponding text file
    rect_coords = []  # To store rectangle coordinates
    poly_coords = []  # To store polygon coordinates
    with open(txt_path, 'r') as file:
        lines = file.readlines()
        for line in lines:
            parts = line.strip().split()
            label = int(parts[0])  # Extract the label index
            coords = [float(p) for p in parts[1:]]

            if len(coords) == 4:  # Assuming it's (center_x, center_y, width, height)
                rect_coords.append((coords[0], coords[1], coords[2], coords[3]))
            elif len(coords) % 2 == 0:  # Assuming pairs of coordinates for a polygon
                points = [(coords[i], coords[i + 1]) for i in range(0, len(coords), 2)]
                poly_coords.append(points)

    # Write the new coordinates to the text file
    convert_and_write_coordinates(txt_path, rect_coords, poly_coords, w, h)
    return True

def process_images_in_folder(folder_path):
    image_files = [f for f in os.listdir(folder_path) if f.lower().endswith(('.jpg', '.png'))]
    for filename in tqdm(image_files, desc="Processing images"):
        image_path = os.path.join(folder_path, filename)
        txt_path = os.path.join(folder_path, os.path.splitext(filename)[0] + '.txt')

        if os.path.isfile(txt_path):
            draw_annotations(image_path, txt_path)
        else:
            print(f"No corresponding text file found for image: {filename}")

def main():
    # Specify the directory containing the images and text files
    folder_path = './val_indonesia_roboflow_2024_08_03/'

    # Process the images in the folder
    process_images_in_folder(folder_path)

if __name__ == "__main__":
    main()

bonus code

Display image + annotation rect box

import cv2
import os
from tqdm import tqdm

def display_image_with_annotations(image_path, txt_path):
    """Display the image with rectangle annotations."""
    # Load the image
    image = cv2.imread(image_path)
    if image is None:
        print(f"Failed to load image: {image_path}")
        return False

    # Get image dimensions
    h, w, _ = image.shape

    # Read the annotation data from the text file
    with open(txt_path, 'r') as file:
        lines = file.readlines()
        for line in lines:
            parts = line.strip().split()
            if len(parts) == 5:  # Check for valid rectangle data
                # Extract label and normalized coordinates
                label = int(parts[0])  # The label is currently unused, always 0 in your case
                left_norm, right_norm, top_norm, bottom_norm = map(float, parts[1:])

                # Convert normalized coordinates to absolute pixel coordinates
                left = int(left_norm * w)
                right = int(right_norm * w)
                top = int(top_norm * h)
                bottom = int(bottom_norm * h)

                # Draw rectangle on the image
                cv2.rectangle(image, (left, top), (right, bottom), (255, 0, 0), 2)

    # Display the image with annotations
    cv2.imshow('Annotated Image', image)
    key = cv2.waitKey(0)
    cv2.destroyAllWindows()

    # Return True if 'q' was pressed, otherwise False
    return key == ord('q')

def process_images_in_folder(folder_path):
    image_files = [f for f in os.listdir(folder_path) if f.lower().endswith(('.jpg', '.png'))]
    for filename in tqdm(image_files, desc="Processing images"):
        image_path = os.path.join(folder_path, filename)
        txt_path = os.path.join(folder_path, os.path.splitext(filename)[0] + '.txt')

        if os.path.isfile(txt_path):
            if display_image_with_annotations(image_path, txt_path):
                print("Exiting image display loop.")
                break
        else:
            print(f"No corresponding text file found for image: {filename}")

def main():
    # Specify the directory containing the images and text files
    folder_path = './train_indonesia_roboflow_2024_08_03/'

    # Process the images in the folder
    process_images_in_folder(folder_path)

if __name__ == "__main__":
    main()

marearts

🙇🏻‍♂️

Pages