ONNX Runtime with ROCm (AMD GPU) Setup Guide
Installation
Prerequisites
- ROCm installed (6.0+ recommended)
- Python 3.8-3.10
Install ONNX Runtime with ROCm Support
# 1. Remove existing ONNX Runtime (if any)
pip uninstall -y onnxruntime onnxruntime-gpu
# 2. Install from AMD ROCm repository
# For ROCm 6.4
pip install onnxruntime-rocm -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/
# For ROCm 6.2
pip install onnxruntime-rocm -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.2/
# For ROCm 6.0
pip install onnxruntime-rocm -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.0/
Verify Installation
import onnxruntime as ort
# Check available providers
print("Available providers:", ort.get_available_providers())
# Should show: ['MIGraphXExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider']
Simple Usage Example
import onnxruntime as ort
import numpy as np
# Load ONNX model with ROCm
providers = ['ROCMExecutionProvider', 'CPUExecutionProvider']
session = ort.InferenceSession("model.onnx", providers=providers)
# Check which provider is being used
print(f"Using: {session.get_providers()[0]}")
# Prepare input (example: batch_size=1, 3 channels, 640x640 image)
input_data = np.random.randn(1, 3, 640, 640).astype(np.float32)
# Run inference
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})
print(f"Output shape: {output[0].shape}")
Advanced: Using MIGraphX (AMD Optimized)
# MIGraphX is AMD's optimized graph execution provider
# It can be faster than ROCMExecutionProvider for some models
providers = [
'MIGraphXExecutionProvider', # Fastest on AMD
'ROCMExecutionProvider', # Standard ROCm
'CPUExecutionProvider' # Fallback
]
session = ort.InferenceSession("model.onnx", providers=providers)
Complete Example: Image Detection
import onnxruntime as ort
import numpy as np
import cv2
def load_model(model_path, use_gpu=True):
"""Load ONNX model with ROCm support"""
if use_gpu:
providers = ['MIGraphXExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider']
else:
providers = ['CPUExecutionProvider']
session = ort.InferenceSession(model_path, providers=providers)
print(f"Model loaded with: {session.get_providers()[0]}")
return session
def preprocess_image(image_path, size=640):
"""Preprocess image for inference"""
image = cv2.imread(image_path)
resized = cv2.resize(image, (size, size))
rgb = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB)
normalized = rgb.astype(np.float32) / 255.0
transposed = normalized.transpose(2, 0, 1) # HWC to CHW
batched = np.expand_dims(transposed, axis=0) # Add batch dimension
return batched, image
def run_inference(session, input_data):
"""Run model inference"""
input_name = session.get_inputs()[0].name
outputs = session.run(None, {input_name: input_data})
return outputs
# Usage
model = load_model("rtdetr_fp32.onnx", use_gpu=True)
input_data, original_image = preprocess_image("test.jpg")
outputs = run_inference(model, input_data)
print(f"Detection output shape: {outputs[0].shape}")
Troubleshooting
1. ROCMExecutionProvider not available
# Check ROCm installation
import subprocess
result = subprocess.run(['rocm-smi'], capture_output=True, text=True)
print(result.stdout)
2. Fallback to CPU
If ONNX Runtime falls back to CPU despite having ROCm:
- Check ROCm version compatibility
- Verify GPU is visible:
rocm-smi
- Set environment variable:
export HIP_VISIBLE_DEVICES=0
3. Performance Tips
- Use
MIGraphXExecutionProvider
for best performance on AMD GPUs - FP16 models can be faster but may have slight accuracy loss
- Batch processing improves throughput
Environment Variables
# Select specific GPU
export HIP_VISIBLE_DEVICES=0
# Enable verbose logging
export ORT_ROCM_VERBOSE_LEVEL=1
# Set memory limit (in MB)
export ORT_ROCM_MEM_LIMIT=4096
Performance Comparison
Provider | Relative Speed | Use Case |
---|---|---|
MIGraphXExecutionProvider | Fastest | Production, optimized models |
ROCMExecutionProvider | Fast | General purpose |
CPUExecutionProvider | Slowest | Fallback, debugging |
Notes
- ONNX Runtime ROCm version should match your ROCm installation
- Not all ONNX operators are supported on ROCm - unsupported ops fall back to CPU
- For best performance, export models with static shapes
No comments:
Post a Comment