GPU Acceleration

High-performance GPU computing features in XDL.

Overview

XDL’s AMP (Accelerated Math Processing) module provides GPU acceleration with 11 backend support:

Apple Platforms (macOS/iOS)

  • Metal - Native Apple GPU compute (default on macOS)
  • MPS - Metal Performance Shaders (optimized operations)
  • CoreML - Apple Neural Engine acceleration

NVIDIA Platforms

  • CUDA - NVIDIA GPUs (best performance on NVIDIA hardware)
  • cuDNN - Deep learning acceleration

AMD Platforms

  • ROCm - AMD GPUs (optimized for AMD hardware)

Windows

  • DirectML - ML acceleration on DirectX
  • DirectX 12 - GPU compute via DirectML delegation

Cross-Platform

  • Vulkan - Modern cross-platform GPU compute
  • OpenCL - Universal GPU fallback (AMD, Intel, NVIDIA)
  • CPU (SIMD) - Fallback for systems without GPU support

Key Features

Automatic Acceleration

GPU acceleration is transparent - existing code runs faster without changes:

; These operations are automatically GPU-accelerated
a = findgen(10000000)
b = findgen(10000000)
c = a + b           ; GPU vector addition
d = sin(a)          ; GPU trigonometric functions
e = a * b + c       ; GPU complex expressions

Performance Gains

Typical speedups on GPU vs CPU:

Operation Array Size CPU Time GPU Time Speedup
Vector Add 10M 45ms 2ms 22.5x
Sin 10M 120ms 5ms 24x
Matrix Multiply 4096x4096 850ms 12ms 70x
FFT 1M 200ms 8ms 25x

Documentation

Supported Operations

Vector Operations

  • Addition, subtraction, multiplication, division
  • Element-wise operations
  • Vector reductions (sum, min, max, mean)

Mathematical Functions

  • Trigonometric: sin, cos, tan, asin, acos, atan
  • Exponential: exp, log, log10, sqrt, pow
  • Hyperbolic: sinh, cosh, tanh

Matrix Operations

  • Matrix multiplication
  • Matrix transpose
  • Matrix inversion
  • Eigenvalue decomposition

Advanced Operations

  • FFT (Fast Fourier Transform)
  • Convolution
  • Correlation
  • Image processing

Backend Selection

GPU backend is selected automatically based on available hardware:

# Check available GPU backends
xdl --features

# Force specific backend
XDL_GPU_BACKEND=metal xdl script.xdl    # macOS
XDL_GPU_BACKEND=cuda xdl script.xdl     # NVIDIA
XDL_GPU_BACKEND=rocm xdl script.xdl     # AMD
XDL_GPU_BACKEND=vulkan xdl script.xdl   # Cross-platform
XDL_GPU_BACKEND=opencl xdl script.xdl   # Universal
XDL_GPU_BACKEND=directml xdl script.xdl # Windows
XDL_GPU_BACKEND=cpu xdl script.xdl      # CPU fallback

Build with Specific Features

# OpenCL support
cargo build --features opencl

# CUDA support
cargo build --features cuda

# DirectML support (Windows)
cargo build --features directml

# All backends
cargo build --features all-backends

Profiling

Enable GPU profiling:

# Enable profiling
XDL_GPU_PROFILE=1 xdl script.xdl

# Detailed profiling
XDL_GPU_PROFILE=verbose xdl script.xdl

Memory Management

XDL automatically manages GPU memory:

  • Automatic transfer - Data moved to/from GPU as needed
  • Memory pooling - Efficient reuse of GPU memory
  • Spill to CPU - Graceful handling of large datasets

Limitations

Current limitations:

  • Maximum array size: 2GB per array
  • Some operations fall back to CPU
  • Multi-GPU support in development

Next Steps