GPU Acceleration

High-performance GPU computing features in XDL.

Overview

XDL’s AMP (Accelerated Math Processing) module provides GPU acceleration with 11 backend support:

Apple Platforms (macOS/iOS)

Metal - Native Apple GPU compute (default on macOS)
MPS - Metal Performance Shaders (optimized operations)
CoreML - Apple Neural Engine acceleration

NVIDIA Platforms

CUDA - NVIDIA GPUs (best performance on NVIDIA hardware)
cuDNN - Deep learning acceleration

AMD Platforms

ROCm - AMD GPUs (optimized for AMD hardware)

Windows

DirectML - ML acceleration on DirectX
DirectX 12 - GPU compute via DirectML delegation

Cross-Platform

Vulkan - Modern cross-platform GPU compute
OpenCL - Universal GPU fallback (AMD, Intel, NVIDIA)
CPU (SIMD) - Fallback for systems without GPU support

Key Features

Automatic Acceleration

GPU acceleration is transparent - existing code runs faster without changes:

; These operations are automatically GPU-accelerated
a = findgen(10000000)
b = findgen(10000000)
c = a + b           ; GPU vector addition
d = sin(a)          ; GPU trigonometric functions
e = a * b + c       ; GPU complex expressions

Performance Gains

Typical speedups on GPU vs CPU:

Operation	Array Size	CPU Time	GPU Time	Speedup
Vector Add	10M	45ms	2ms	22.5x
Sin	10M	120ms	5ms	24x
Matrix Multiply	4096x4096	850ms	12ms	70x
FFT	1M	200ms	8ms	25x

Documentation

GPU Compute Implementation - Technical overview
Performance Impact Analysis - Benchmarks
AMP Multi-Backend - Backend configuration
GPU Demo Guide - Examples and tutorials

Supported Operations

Vector Operations

Addition, subtraction, multiplication, division
Element-wise operations
Vector reductions (sum, min, max, mean)

Mathematical Functions

Trigonometric: sin, cos, tan, asin, acos, atan
Exponential: exp, log, log10, sqrt, pow
Hyperbolic: sinh, cosh, tanh

Matrix Operations

Matrix multiplication
Matrix transpose
Matrix inversion
Eigenvalue decomposition

Advanced Operations

FFT (Fast Fourier Transform)
Convolution
Correlation
Image processing

Backend Selection

GPU backend is selected automatically based on available hardware:

# Check available GPU backends
xdl --features

# Force specific backend
XDL_GPU_BACKEND=metal xdl script.xdl    # macOS
XDL_GPU_BACKEND=cuda xdl script.xdl     # NVIDIA
XDL_GPU_BACKEND=rocm xdl script.xdl     # AMD
XDL_GPU_BACKEND=vulkan xdl script.xdl   # Cross-platform
XDL_GPU_BACKEND=opencl xdl script.xdl   # Universal
XDL_GPU_BACKEND=directml xdl script.xdl # Windows
XDL_GPU_BACKEND=cpu xdl script.xdl      # CPU fallback

Build with Specific Features

# OpenCL support
cargo build --features opencl

# CUDA support
cargo build --features cuda

# DirectML support (Windows)
cargo build --features directml

# All backends
cargo build --features all-backends

Profiling

Enable GPU profiling:

# Enable profiling
XDL_GPU_PROFILE=1 xdl script.xdl

# Detailed profiling
XDL_GPU_PROFILE=verbose xdl script.xdl

Memory Management

XDL automatically manages GPU memory:

Automatic transfer - Data moved to/from GPU as needed
Memory pooling - Efficient reuse of GPU memory
Spill to CPU - Graceful handling of large datasets

Limitations

Current limitations:

Maximum array size: 2GB per array
Some operations fall back to CPU
Multi-GPU support in development

Next Steps

Quick Start - Get started with GPU
Performance Guide - Optimization tips
Technical Details - Implementation details
GPU Demo Guide - Examples and tutorials