8/22/2025

ONNX Runtime with ROCm (AMD GPU) Setup Guide

ONNX Runtime with ROCm (AMD GPU) Setup Guide

Installation

Prerequisites

  • ROCm installed (6.0+ recommended)
  • Python 3.8-3.10

Install ONNX Runtime with ROCm Support

# 1. Remove existing ONNX Runtime (if any)
pip uninstall -y onnxruntime onnxruntime-gpu

# 2. Install from AMD ROCm repository
# For ROCm 6.4
pip install onnxruntime-rocm -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/

# For ROCm 6.2
pip install onnxruntime-rocm -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.2/

# For ROCm 6.0
pip install onnxruntime-rocm -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.0/

Verify Installation

import onnxruntime as ort

# Check available providers
print("Available providers:", ort.get_available_providers())

# Should show: ['MIGraphXExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider']

Simple Usage Example

import onnxruntime as ort
import numpy as np

# Load ONNX model with ROCm
providers = ['ROCMExecutionProvider', 'CPUExecutionProvider']
session = ort.InferenceSession("model.onnx", providers=providers)

# Check which provider is being used
print(f"Using: {session.get_providers()[0]}")

# Prepare input (example: batch_size=1, 3 channels, 640x640 image)
input_data = np.random.randn(1, 3, 640, 640).astype(np.float32)

# Run inference
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})

print(f"Output shape: {output[0].shape}")

Advanced: Using MIGraphX (AMD Optimized)

# MIGraphX is AMD's optimized graph execution provider
# It can be faster than ROCMExecutionProvider for some models

providers = [
    'MIGraphXExecutionProvider',  # Fastest on AMD
    'ROCMExecutionProvider',       # Standard ROCm
    'CPUExecutionProvider'         # Fallback
]

session = ort.InferenceSession("model.onnx", providers=providers)

Complete Example: Image Detection

import onnxruntime as ort
import numpy as np
import cv2

def load_model(model_path, use_gpu=True):
    """Load ONNX model with ROCm support"""
    if use_gpu:
        providers = ['MIGraphXExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider']
    else:
        providers = ['CPUExecutionProvider']
    
    session = ort.InferenceSession(model_path, providers=providers)
    print(f"Model loaded with: {session.get_providers()[0]}")
    return session

def preprocess_image(image_path, size=640):
    """Preprocess image for inference"""
    image = cv2.imread(image_path)
    resized = cv2.resize(image, (size, size))
    rgb = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB)
    normalized = rgb.astype(np.float32) / 255.0
    transposed = normalized.transpose(2, 0, 1)  # HWC to CHW
    batched = np.expand_dims(transposed, axis=0)  # Add batch dimension
    return batched, image

def run_inference(session, input_data):
    """Run model inference"""
    input_name = session.get_inputs()[0].name
    outputs = session.run(None, {input_name: input_data})
    return outputs

# Usage
model = load_model("rtdetr_fp32.onnx", use_gpu=True)
input_data, original_image = preprocess_image("test.jpg")
outputs = run_inference(model, input_data)

print(f"Detection output shape: {outputs[0].shape}")

Troubleshooting

1. ROCMExecutionProvider not available

# Check ROCm installation
import subprocess
result = subprocess.run(['rocm-smi'], capture_output=True, text=True)
print(result.stdout)

2. Fallback to CPU

If ONNX Runtime falls back to CPU despite having ROCm:

  • Check ROCm version compatibility
  • Verify GPU is visible: rocm-smi
  • Set environment variable: export HIP_VISIBLE_DEVICES=0

3. Performance Tips

  • Use MIGraphXExecutionProvider for best performance on AMD GPUs
  • FP16 models can be faster but may have slight accuracy loss
  • Batch processing improves throughput

Environment Variables

# Select specific GPU
export HIP_VISIBLE_DEVICES=0

# Enable verbose logging
export ORT_ROCM_VERBOSE_LEVEL=1

# Set memory limit (in MB)
export ORT_ROCM_MEM_LIMIT=4096

Performance Comparison

ProviderRelative SpeedUse Case
MIGraphXExecutionProviderFastestProduction, optimized models
ROCMExecutionProviderFastGeneral purpose
CPUExecutionProviderSlowestFallback, debugging

Notes

  • ONNX Runtime ROCm version should match your ROCm installation
  • Not all ONNX operators are supported on ROCm - unsupported ops fall back to CPU
  • For best performance, export models with static shapes

No comments:

Post a Comment