Hardware Acceleration Guide

Orbit-Compute: Heterogeneous Computing for Maximum Performance

Status: Production Ready
Applies to: orbit-compute v1.0+

Overview

Orbit-RS includes a sophisticated heterogeneous compute acceleration framework (orbit-compute) that automatically detects and leverages diverse hardware including CPUs with SIMD, GPUs, and specialized AI accelerators to accelerate database workloads. This guide explains which operations benefit from acceleration and how to configure clients and queries to use or disable acceleration.

Accelerated Workload Types

1. Vector Operations (`SIMDBatch` + `GPUCompute`)

Workload Classification: High Priority for GPU Acceleration

Operations:

Vector similarity search (<->, <=>, <#> operators)
Vector aggregations (SUM, AVG on vector columns)
Matrix multiplications for embeddings
Vector normalization and distance calculations
Batch vector operations on large datasets

Hardware Acceleration:

GPU: 8-15x speedup for large vector datasets (>1K vectors)
CPU SIMD: 3-5x speedup with AVX-512/NEON
Neural Engine: 10-50x speedup for ML inference on vectors

Example Queries:

-- Vector similarity search (GPU accelerated)
SELECT content, embedding <-> '[0.1, 0.2, 0.3]' AS distance 
FROM documents 
ORDER BY distance LIMIT 10;

-- Vector aggregations (SIMD accelerated)  
SELECT AVG(embedding) FROM documents WHERE category = 'science';

-- Batch vector operations (GPU preferred)
SELECT id, VECTOR_NORM(embedding) FROM documents;

2. Matrix Operations (`SIMDBatch::MatrixOps`)

Workload Classification: High Priority for GPU/Neural Engine

Operations:

JOIN operations on large tables with numeric keys
Matrix-based analytical queries
Linear algebra operations in SQL functions
Tensor operations for ML workloads

Hardware Acceleration:

GPU: 8-15x speedup for matrix operations
Neural Engine: 10-50x speedup for ML-specific matrix ops
CPU SIMD: 3-8x speedup with optimized BLAS

Example Queries:

-- Large JOIN operations (GPU accelerated)
SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id 
WHERE o.order_date > '2024-01-01';

-- Matrix-based analytics (Neural Engine preferred)
SELECT ML_MATRIX_MULTIPLY(features, weights) FROM ml_models;

3. Aggregation Operations (`SIMDBatch::Reduction`)

Workload Classification: Medium Priority for SIMD/GPU

Operations:

COUNT, SUM, AVG, MIN, MAX on large datasets
GROUP BY operations with numeric aggregations
Window functions over large partitions
Statistical functions (STDDEV, VARIANCE)

Hardware Acceleration:

CPU SIMD: 3-8x speedup for numeric aggregations
GPU: 5-12x speedup for massive datasets (>1M rows)
Specialized reductions: Custom kernels for common patterns

Example Queries:

-- Aggregations on large datasets (SIMD/GPU accelerated)
SELECT category, COUNT(*), AVG(price), MAX(price) 
FROM products GROUP BY category;

-- Window functions (SIMD preferred)
SELECT *, AVG(salary) OVER (PARTITION BY department) FROM employees;

4. Time Series Operations (`GPUCompute::MemoryBound`)

Workload Classification: Medium Priority for GPU

Operations:

Time series aggregations and rollups
Moving averages and trend analysis
Seasonal decomposition
Time-based JOINs and correlations

Hardware Acceleration:

GPU: 5-12x speedup for time series analytics
CPU SIMD: 2-5x speedup for sequential processing
Memory optimization: Fast NVMe for time series data

Example Queries:

-- Time series aggregations (GPU preferred)
SELECT date_trunc('hour', timestamp), AVG(value), COUNT(*)
FROM sensor_data WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY 1 ORDER BY 1;

-- Moving averages (SIMD accelerated)
SELECT timestamp, AVG(value) OVER (
    ORDER BY timestamp ROWS BETWEEN 10 PRECEDING AND CURRENT ROW
) FROM metrics;

5. Graph Operations (`NeuralInference`)

Workload Classification: High Priority for Neural Engine

Operations:

Graph traversals and path finding
Community detection algorithms
Graph neural network inference
Knowledge graph reasoning

Hardware Acceleration:

Neural Engine: 10-50x speedup for graph ML
GPU: 4-10x speedup for parallel graph algorithms
Specialized DSPs: Optimal for graph signal processing

Example Queries:

-- Graph traversals (Neural Engine preferred)
GRAPH MATCH (a)-[:CONNECTS*1..3]-(b) 
WHERE a.type = 'user' AND b.type = 'product'
RETURN a.id, b.id, path_length;

-- Graph analytics (GPU accelerated)
SELECT community_detection(graph_data) FROM social_network;

6. Full-Text Search (`GPUCompute::ComputeBound`)

Workload Classification: Medium Priority for GPU

Operations:

Text similarity and fuzzy matching
Regular expression operations on large text
Natural language processing functions
Search result ranking and scoring

Hardware Acceleration:

GPU: 2-8x speedup for parallel text processing
Neural Engine: 10-50x speedup for semantic search
CPU SIMD: 2-4x speedup for string operations

Example Queries:

-- Text similarity (GPU/Neural Engine preferred)
SELECT id, content, SIMILARITY(content, 'search query') as score
FROM documents WHERE score > 0.7 ORDER BY score DESC;

-- Regex operations (GPU accelerated for large datasets)
SELECT * FROM logs WHERE content ~ 'ERROR.*[0-9]{4}-[0-9]{2}-[0-9]{2}';

Hardware Selection Strategy

The orbit-compute scheduler uses the following priority order for workload assignment:

Decision Matrix

Workload Type	Primary Target	Fallback 1	Fallback 2	Fallback 3
Vector Ops	GPU (Metal/CUDA)	Neural Engine	CPU SIMD	CPU Scalar
Matrix Ops	Neural Engine	GPU	CPU SIMD	CPU Scalar
Aggregations	CPU SIMD	GPU	Specialized CPU	CPU Scalar
Time Series	GPU	CPU SIMD	Specialized Storage	CPU Scalar
Graph ML	Neural Engine	GPU	CPU Parallel	CPU Scalar
Text Search	Neural Engine	GPU	CPU SIMD	CPU Scalar

Performance Thresholds

Workloads are automatically routed based on data size:

Small (< 1K rows): CPU SIMD preferred
Medium (1K-100K rows): GPU considered
Large (100K-1M rows): GPU preferred
Extra Large (> 1M rows): GPU required, with CPU fallback

Client Configuration Options

1. OrbitClient Configuration

use orbit_client::{OrbitClient, OrbitClientConfig};
use orbit_compute::EngineConfig;

// Enable all acceleration features (default)
let client = OrbitClient::builder()
    .with_compute_acceleration(true)           // Enable GPU/Neural acceleration
    .with_simd_acceleration(true)              // Enable CPU SIMD optimization
    .with_adaptive_scheduling(true)            // Enable smart workload routing
    .with_fallback_enabled(true)               // Enable graceful degradation
    .build()
    .await?;

// CPU-only mode (disable GPU/Neural acceleration)
let cpu_client = OrbitClient::builder()
    .with_compute_acceleration(false)          // Disable GPU/Neural acceleration
    .with_simd_acceleration(true)              // Keep CPU SIMD enabled
    .with_max_compute_threads(8)               // Limit CPU threads
    .build()
    .await?;

// Performance-focused configuration
let performance_client = OrbitClient::builder()
    .with_compute_acceleration(true)
    .with_preferred_compute_unit(ComputeUnit::GPU { device_id: 0 })
    .with_compute_timeout_ms(10000)            // 10 second timeout
    .with_memory_optimization(true)            // Enable unified memory
    .build()
    .await?;

2. Connection String Configuration

You can control acceleration through connection parameters:

# Enable all acceleration (default)
postgres://user:pass@localhost:5432/db?compute_acceleration=true&simd_acceleration=true

# CPU-only mode
postgres://user:pass@localhost:5432/db?compute_acceleration=false&simd_acceleration=true

# Specific GPU device
postgres://user:pass@localhost:5432/db?preferred_gpu=0&gpu_memory_limit=4GB

# Performance tuning
postgres://user:pass@localhost:5432/db?compute_timeout=5000&adaptive_scheduling=true

3. Environment Variables

# Global acceleration settings
export ORBIT_COMPUTE_ACCELERATION=true        # Enable GPU/Neural acceleration
export ORBIT_SIMD_ACCELERATION=true           # Enable CPU SIMD
export ORBIT_ADAPTIVE_SCHEDULING=true         # Enable smart scheduling
export ORBIT_COMPUTE_TIMEOUT=10000            # Timeout in milliseconds

# Hardware preferences
export ORBIT_PREFERRED_GPU=0                  # Select specific GPU
export ORBIT_GPU_MEMORY_LIMIT=8GB            # Limit GPU memory usage
export ORBIT_CPU_THREADS=16                  # Maximum CPU threads
export ORBIT_ENABLE_NEURAL_ENGINE=true       # Enable Neural Engine (Apple/Qualcomm)

# Performance tuning
export ORBIT_UNIFIED_MEMORY=true             # Use unified memory (Apple Silicon)
export ORBIT_MEMORY_ALIGNMENT=64             # SIMD alignment (bytes)
export ORBIT_WORKLOAD_PROFILING=true         # Enable performance learning

4. Per-Query Configuration

You can control acceleration on a per-query basis using SQL comments:

-- Force GPU acceleration
/*+ GPU_COMPUTE */ 
SELECT embedding <-> '[0.1, 0.2, 0.3]' AS distance FROM documents;

-- Force CPU-only execution
/*+ CPU_ONLY */
SELECT COUNT(*) FROM large_table GROUP BY category;

-- Specify compute preferences
/*+ PREFERRED_COMPUTE=NEURAL_ENGINE */
SELECT ML_INFERENCE(model, features) FROM data;

-- Disable acceleration for debugging
/*+ NO_ACCELERATION */
SELECT * FROM debug_table WHERE complex_condition = true;

-- Set resource limits
/*+ MAX_MEMORY=2GB, TIMEOUT=5000 */
SELECT * FROM huge_table JOIN another_huge_table;

5. Programmatic Configuration

use orbit_client::query::{QueryBuilder, ComputeHint};

// Query with compute hints
let query = QueryBuilder::new("SELECT * FROM vectors")
    .with_compute_hint(ComputeHint::PreferGPU)
    .with_memory_limit_gb(4.0)
    .with_timeout_ms(10000)
    .build();

let results = client.execute_query(query).await?;

// Disable acceleration for specific query
let cpu_query = QueryBuilder::new("SELECT COUNT(*) FROM small_table")
    .with_compute_hint(ComputeHint::CPUOnly)
    .build();

Monitoring Acceleration Usage

1. Query Performance Metrics

-- Check query execution statistics
SELECT 
    query_hash,
    compute_unit_used,
    execution_time_ms,
    acceleration_speedup,
    fallback_occurred
FROM orbit_query_stats 
WHERE timestamp > NOW() - INTERVAL '1 hour';

-- Monitor compute unit utilization
SELECT 
    compute_unit,
    utilization_percent,
    active_queries,
    queue_depth
FROM orbit_compute_status;

2. Client Status API

// Get acceleration status
let status = client.get_compute_status().await?;
println!("Available GPUs: {}", status.available_gpus);
println!("Neural Engine: {}", status.neural_engine_available);
println!("SIMD Support: {}", status.simd_capabilities);

// Get performance statistics
let stats = client.get_performance_stats().await?;
println!("GPU queries: {} ({:.1}x avg speedup)", 
         stats.gpu_query_count, stats.gpu_avg_speedup);

3. Kubernetes Monitoring

# Check GPU utilization in pods
kubectl top pod -l app=orbit-compute --containers

# Monitor acceleration metrics
kubectl logs -f deployment/orbit-server | grep "ACCELERATION"

# Check compute resource allocation
kubectl describe pod -l app=orbit-server | grep -A 10 "Requests\|Limits"

Performance Tuning Guidelines

1. Data Size Thresholds

Enable GPU acceleration for datasets > 100K rows
Use SIMD acceleration for all numeric operations
Neural Engine for ML inference and graph operations
CPU fallback for small datasets (< 1K rows)

2. Memory Considerations

Unified Memory (Apple Silicon): Optimal for GPU-CPU data sharing
GPU Memory Limit: Set to 70-80% of available GPU memory
Memory Alignment: Use 64-byte alignment for optimal SIMD performance

3. Query Optimization

Vector Operations: Ensure indexes on vector columns
Batch Operations: Group similar operations together
Avoid Frequent Fallbacks: Profile queries to understand acceleration patterns

4. Hardware-Specific Tuning

Apple Silicon (M1/M2/M3/M4)

export ORBIT_UNIFIED_MEMORY=true
export ORBIT_ENABLE_NEURAL_ENGINE=true
export ORBIT_METAL_OPTIMIZATION=true

NVIDIA GPUs

export ORBIT_CUDA_OPTIMIZATION=true
export ORBIT_TENSOR_CORES=true
export ORBIT_GPU_MEMORY_POOL=true

Intel/AMD CPUs

export ORBIT_AVX512_OPTIMIZATION=true
export ORBIT_NUMA_AWARENESS=true
export ORBIT_HYPER_THREADING=true

Troubleshooting

Common Issues

GPU Not Detected

# Check GPU availability
orbit-compute --check-gpu
   
# Verify drivers
nvidia-smi  # NVIDIA
system_profiler SPDisplaysDataType  # macOS

Acceleration Not Working

-- Check if acceleration is enabled
SHOW orbit_compute_acceleration;
   
-- Verify query uses acceleration
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM vectors;

Performance Regression

# Enable profiling
export ORBIT_WORKLOAD_PROFILING=true
   
# Check query plans
SET orbit_log_query_plans = 'on';

Debug Configuration

# Enable detailed logging
export RUST_LOG=orbit_compute=debug,orbit_scheduler=debug

# Disable acceleration for debugging
export ORBIT_COMPUTE_ACCELERATION=false

# Force CPU execution
export ORBIT_FORCE_CPU_FALLBACK=true

Best Practices

1. Development Environment

Start with acceleration enabled but with conservative timeouts
Use profiling to understand query patterns
Test both accelerated and CPU-only execution paths

2. Production Deployment

Enable all acceleration features by default
Set appropriate GPU memory limits (70-80% of available)
Monitor fallback rates and performance metrics
Use Kubernetes resource limits to prevent resource contention

3. Query Development

Use query hints for fine-tuning critical queries
Batch similar operations to maximize GPU utilization
Profile vector operations to ensure index usage
Test large dataset performance with GPU acceleration

4. Monitoring and Alerting

Alert on high fallback rates (>10%)
Monitor GPU memory usage and temperature
Track query performance trends over time
Set alerts for compute unit failures

Conclusion

The orbit-compute acceleration framework provides transparent performance improvements for database workloads while maintaining compatibility and reliability through graceful degradation. By understanding the workload types that benefit from acceleration and properly configuring clients and queries, you can achieve 5-50x performance improvements for compute-intensive database operations.

For workloads involving large datasets, vector operations, matrix computations, or AI/ML inference, GPU and Neural Engine acceleration can provide substantial performance benefits. The system automatically handles hardware detection, workload scheduling, and fallback to ensure optimal performance across diverse deployment environments.

See Also:

README-K8S-DEPLOYMENT.md - Kubernetes deployment with GPU support
RFC Heterogeneous Compute - Technical architecture details
Kubernetes Deployment Sizing Guide - Hardware sizing recommendations