MareArts Computer Vision Study.: ONNX Runtime with ROCm (AMD GPU) Setup Guide

8/22/2025

ONNX Runtime with ROCm (AMD GPU) Setup Guide

Installation

Prerequisites

ROCm installed (6.0+ recommended)
Python 3.8-3.10

Install ONNX Runtime with ROCm Support

# 1. Remove existing ONNX Runtime (if any)
pip uninstall -y onnxruntime onnxruntime-gpu

# 2. Install from AMD ROCm repository
# For ROCm 6.4
pip install onnxruntime-rocm -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/

# For ROCm 6.2
pip install onnxruntime-rocm -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.2/

# For ROCm 6.0
pip install onnxruntime-rocm -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.0/

Verify Installation

import onnxruntime as ort

# Check available providers
print("Available providers:", ort.get_available_providers())

# Should show: ['MIGraphXExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider']

Simple Usage Example

import onnxruntime as ort
import numpy as np

# Load ONNX model with ROCm
providers = ['ROCMExecutionProvider', 'CPUExecutionProvider']
session = ort.InferenceSession("model.onnx", providers=providers)

# Check which provider is being used
print(f"Using: {session.get_providers()[0]}")

# Prepare input (example: batch_size=1, 3 channels, 640x640 image)
input_data = np.random.randn(1, 3, 640, 640).astype(np.float32)

# Run inference
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})

print(f"Output shape: {output[0].shape}")

Advanced: Using MIGraphX (AMD Optimized)

# MIGraphX is AMD's optimized graph execution provider
# It can be faster than ROCMExecutionProvider for some models

providers = [
    'MIGraphXExecutionProvider',  # Fastest on AMD
    'ROCMExecutionProvider',       # Standard ROCm
    'CPUExecutionProvider'         # Fallback
]

session = ort.InferenceSession("model.onnx", providers=providers)

Complete Example: Image Detection

import onnxruntime as ort
import numpy as np
import cv2

def load_model(model_path, use_gpu=True):
    """Load ONNX model with ROCm support"""
    if use_gpu:
        providers = ['MIGraphXExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider']
    else:
        providers = ['CPUExecutionProvider']
    
    session = ort.InferenceSession(model_path, providers=providers)
    print(f"Model loaded with: {session.get_providers()[0]}")
    return session

def preprocess_image(image_path, size=640):
    """Preprocess image for inference"""
    image = cv2.imread(image_path)
    resized = cv2.resize(image, (size, size))
    rgb = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB)
    normalized = rgb.astype(np.float32) / 255.0
    transposed = normalized.transpose(2, 0, 1)  # HWC to CHW
    batched = np.expand_dims(transposed, axis=0)  # Add batch dimension
    return batched, image

def run_inference(session, input_data):
    """Run model inference"""
    input_name = session.get_inputs()[0].name
    outputs = session.run(None, {input_name: input_data})
    return outputs

# Usage
model = load_model("rtdetr_fp32.onnx", use_gpu=True)
input_data, original_image = preprocess_image("test.jpg")
outputs = run_inference(model, input_data)

print(f"Detection output shape: {outputs[0].shape}")

Troubleshooting

1. ROCMExecutionProvider not available

# Check ROCm installation
import subprocess
result = subprocess.run(['rocm-smi'], capture_output=True, text=True)
print(result.stdout)

2. Fallback to CPU

If ONNX Runtime falls back to CPU despite having ROCm:

Check ROCm version compatibility
Verify GPU is visible: rocm-smi
Set environment variable: export HIP_VISIBLE_DEVICES=0

3. Performance Tips

Use MIGraphXExecutionProvider for best performance on AMD GPUs
FP16 models can be faster but may have slight accuracy loss
Batch processing improves throughput

Environment Variables

# Select specific GPU
export HIP_VISIBLE_DEVICES=0

# Enable verbose logging
export ORT_ROCM_VERBOSE_LEVEL=1

# Set memory limit (in MB)
export ORT_ROCM_MEM_LIMIT=4096

Performance Comparison

Provider	Relative Speed	Use Case
MIGraphXExecutionProvider	Fastest	Production, optimized models
ROCMExecutionProvider	Fast	General purpose
CPUExecutionProvider	Slowest	Fallback, debugging

Notes

ONNX Runtime ROCm version should match your ROCm installation
Not all ONNX operators are supported on ROCm - unsupported ops fall back to CPU
For best performance, export models with static shapes

MareArts Computer Vision Study.

Pages

8/22/2025

ONNX Runtime with ROCm (AMD GPU) Setup Guide

ONNX Runtime with ROCm (AMD GPU) Setup Guide

Installation

Prerequisites

Install ONNX Runtime with ROCm Support

Verify Installation

Simple Usage Example

Advanced: Using MIGraphX (AMD Optimized)

Complete Example: Image Detection

Troubleshooting

1. ROCMExecutionProvider not available

2. Fallback to CPU

3. Performance Tips

Environment Variables

Performance Comparison

Notes

No comments:

Post a Comment