Persistence Complete Documentation
Comprehensive Provider-Based Persistence System for Orbit-RS
Table of Contents
- Overview
- Architecture
- Storage Backends
- Provider-Based System
- Configuration
- Performance Tuning
- Monitoring and Metrics
- Deployment Scenarios
- Migration Between Providers
- Security Considerations
- Best Practices
Overview
The Orbit server persistence system provides a comprehensive provider-based architecture that supports multiple storage backends. This allows you to choose the most appropriate storage solution for your deployment scenario, from simple in-memory storage to cloud-scale object storage systems.
Key Features
- Multiple Storage Backends: Memory, COW B+Tree, LSM-Tree, RocksDB, S3, Azure, GCP, etcd, Redis, Kubernetes, MinIO, Flash-optimized, and Composite providers
- Provider-Based Architecture: Pluggable storage abstraction with unified interface
- Production Ready: Battle-tested implementations with comprehensive error handling
- Kubernetes Integration: Native support for Kubernetes deployments
- Performance Optimized: Tuned for different workload characteristics
- Comprehensive Monitoring: Built-in metrics and health checks
- Protocol-Specific Persistence: All 7 protocols (PostgreSQL, MySQL, CQL, Redis, Cypher, AQL, GraphRAG) use RocksDB for durable storage
Architecture
Core Components
The persistence system is built around several key traits:
PersistenceProvider
Base trait for all providers with common functionality:
- Lifecycle Management:
initialize(),shutdown() - Health Monitoring:
health_check(),metrics() - Transaction Support:
begin_transaction(),commit_transaction(),rollback_transaction()
AddressableDirectoryProvider
Specialized trait for storing addressable (actor) lease information:
- Store, retrieve, update, and remove leases
- List leases by node or globally
- Cleanup expired leases
- Bulk operations for efficiency
ClusterNodeProvider
Specialized trait for storing cluster node information:
- Store, retrieve, update, and remove node information
- List active and all nodes
- Cleanup expired nodes
- Node lease renewal
Provider Registry
The PersistenceProviderRegistry manages multiple providers and provides:
- Registration of providers with names
- Default provider management
- Provider lookup by name
- Support for different providers for addressable vs. cluster data
Architecture Diagram
┌─────────────────────────────────────────────────────────────┐
│ Application Layer │
└─────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────--─────┐
│ Persistence Provider Registry │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Addressable │ │ Cluster │ │ Default │ │
│ │ Provider │ │ Provider │ │ Provider │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Memory │ │ RocksDB │ │ S3 │
│ Provider │ │ Provider │ │ Provider │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ In-Memory │ │ RocksDB │ │ AWS S3 │
│ Storage │ │ Storage │ │ Storage │
└──────────────┘ └──────────────┘ └──────────────┘
Storage Backends
Local Storage Backends
1. Memory Provider (MemoryConfig)
Best for: Development, testing, high-performance scenarios with limited data
Features:
- Ultra-fast in-memory storage using concurrent data structures (
DashMap) - Optional disk backup with compression support (Gzip, LZ4, Zstd)
- Automatic periodic backup and restore on startup
- Memory usage limits with health monitoring
- Full metrics and transaction support
Configuration:
[providers.memory]
type = "Memory"
max_entries = 100000
disk_backup = { path = "/data/orbit-backup.json", sync_interval = 300, compression = "Gzip" }
Performance:
- Write latency: <1μs
- Read latency: <1μs
- Memory usage: Low
- Durability: None (unless disk backup enabled)
2. COW B+Tree Provider (CowBTreeConfig)
Best for: Read-heavy workloads, snapshot requirements, version control needs
Features:
- Copy-on-Write B+ Tree with WAL
- Point-in-time snapshots
- Range queries
- Memory efficiency through shared pages
- Crash recovery with WAL replay
Configuration:
[providers.cow_btree]
type = "CowBTree"
data_dir = "/var/lib/orbit"
max_keys_per_node = 64
wal_buffer_size = 1048576
enable_snapshots = true
snapshot_interval = 1000
Performance:
- Write latency: 5-20μs
- Read latency: 1-3μs
- Memory usage: Low (shared pages)
- Recovery time: 2-5 seconds
3. LSM-Tree Provider (LsmTreeConfig)
Best for: Write-heavy workloads, high-throughput ingestion
Features:
- Write-optimized storage with background compaction
- Crash recovery with Write-Ahead Logging
- Configurable bloom filters and memtable sizes
- Integration with Orbit-RS’s existing LSM implementation
Configuration:
[providers.lsm_tree]
type = "LsmTree"
data_dir = "/var/lib/orbit"
memtable_size_mb = 64
max_levels = 7
level_size_multiplier = 10
compaction_threshold = 4
enable_bloom_filters = true
bloom_filter_bits_per_key = 10
enable_compression = true
Performance:
- Write latency: 10-50μs
- Read latency: 2-5μs
- Memory usage: Medium
- Recovery time: 5-15 seconds
4. RocksDB Provider (RocksDbConfig)
Best for: Production deployments, ACID guarantees needed, high reliability requirements
Features:
- Production-grade key-value store
- Full ACID guarantees
- High reliability
- Rich feature set
- Proven in production
Configuration:
[providers.rocksdb]
type = "RocksDB"
data_dir = "/var/lib/orbit"
compression = true
block_cache_mb = 256
write_buffer_mb = 64
max_background_jobs = 4
Performance:
- Write latency: 10-50μs
- Read latency: 2-10μs
- Memory usage: High
- Recovery time: 3-10 seconds
Protocol-Specific Persistence
All Orbit-RS protocols use RocksDB for durable storage, with each protocol having its own isolated storage directory:
Protocol Storage Locations
- PostgreSQL (Port 5432):
data/postgresql/rocksdb/ - MySQL (Port 3306):
data/mysql/rocksdb/ - CQL/Cassandra (Port 9042):
data/cql/rocksdb/ - Redis (Port 6379):
data/redis/rocksdb/ - Cypher/Neo4j (Port 7687):
data/cypher/rocksdb/ - AQL/ArangoDB (Port 8529):
data/aql/rocksdb/ - GraphRAG (via RESP/PostgreSQL/Cypher/AQL):
data/graphrag/rocksdb/
Features
- Isolated Storage: Each protocol maintains its own RocksDB instance
- Automatic Persistence: All writes are automatically persisted to disk
- Crash Recovery: Data is automatically loaded on server restart
- ACID Guarantees: Full transactional guarantees with RocksDB
- Column Families: Optimized storage with protocol-specific column families
Storage Structure
data/
├── postgresql/rocksdb/ # PostgreSQL tables, schemas, indexes
├── mysql/rocksdb/ # MySQL tables and data
├── cql/rocksdb/ # CQL wide-column data
├── redis/rocksdb/ # Redis key-value data with TTL
├── cypher/rocksdb/ # Graph nodes and relationships
├── aql/rocksdb/ # Documents, collections, edges, graphs
└── graphrag/rocksdb/ # GraphRAG entities, relationships, embeddings
Configuration
Protocol persistence is automatically configured when starting the server. The data directory can be specified via:
orbit-server --data-dir /var/lib/orbit
Or in the configuration file:
[server]
data_dir = "/var/lib/orbit"
For more details, see Protocol Persistence Status.
Cloud Storage Backends
5. S3-Compatible Providers (S3Config)
Best for: AWS environments, MinIO deployments, high durability requirements
Features:
- Compatible with AWS S3, MinIO, and other S3-compatible services
- Configurable endpoints and regions
- SSL/TLS support
- Retry mechanisms and timeout controls
- Automatic health checks
- Object versioning and metadata
Configuration:
[providers.s3]
type = "S3"
endpoint = "https://s3.amazonaws.com"
region = "us-west-2"
bucket = "orbit-data"
access_key_id = "YOUR_ACCESS_KEY"
secret_access_key = "YOUR_SECRET_KEY"
prefix = "orbit"
enable_ssl = true
connection_timeout = 30
retry_count = 3
6. Azure Blob Storage (AzureConfig)
Best for: Microsoft Azure environments
Features:
- Native Azure Blob Storage integration
- Container-based organization
- Configurable endpoints and retry policies
- Built-in health monitoring
Configuration:
[providers.azure]
type = "Azure"
account_name = "orbitdata"
account_key = "YOUR_ACCOUNT_KEY"
container_name = "orbit"
prefix = "orbit"
connection_timeout = 30
retry_count = 3
7. Google Cloud Storage (GoogleCloudConfig)
Best for: Google Cloud Platform environments
Features:
- Native GCS integration
- Service account authentication
- Project-based organization
- Configurable retry policies
Configuration:
[providers.gcp]
type = "GoogleCloud"
project_id = "your-project-id"
bucket_name = "orbit-data"
credentials_path = "/path/to/service-account.json"
prefix = "orbit"
connection_timeout = 30
retry_count = 3
Distributed Storage Backends
8. etcd Provider (EtcdConfig)
Best for: Kubernetes environments, distributed systems requiring consistency
Features:
- Distributed key-value storage
- Strong consistency guarantees
- Native lease support with TTL
- Watch capabilities for real-time updates
- TLS/mTLS authentication support
Configuration:
[providers.etcd]
type = "Etcd"
endpoints = ["http://localhost:2379", "http://localhost:22379", "http://localhost:32379"]
prefix = "orbit"
lease_ttl = 300
username = "orbit"
password = "secret"
ca_cert = "/path/to/ca.crt"
client_cert = "/path/to/client.crt"
client_key = "/path/to/client.key"
9. Redis Provider (RedisConfig)
Best for: High-performance caching, pub/sub scenarios
Features:
- In-memory data structure store
- Cluster mode support
- Connection pooling
- Multiple database support
- TTL-based expiration
Configuration:
[providers.redis]
type = "Redis"
url = "redis://localhost:6379"
cluster_mode = false
database = 0
password = "secret"
prefix = "orbit"
pool_size = 10
retry_count = 3
10. Kubernetes Provider (KubernetesConfig)
Best for: Cloud-native Kubernetes deployments
Features:
- ConfigMap and Secret storage
- Namespace isolation
- Native Kubernetes integration
- RBAC support
Configuration:
[providers.kubernetes]
type = "Kubernetes"
namespace = "orbit"
config_map_name = "orbit-data"
secret_name = "orbit-secrets"
in_cluster = true
Specialized Backends
11. MinIO Provider (MinIOConfig)
Best for: Self-hosted object storage, hybrid cloud scenarios
Features:
- S3-compatible self-hosted storage
- High performance and scalability
- Multi-tenant support
- Distributed deployment support
12. Flash-Optimized Provider (FlashConfig)
Best for: High-performance scenarios with NVMe storage, multipathing requirements
Features:
- Optimized for flash storage devices
- Multipathing support for redundancy
- Configurable I/O depth and block sizes
- Compression support
- Direct I/O operations
Configuration:
[providers.flash]
type = "Flash"
data_dir = "/nvme/orbit"
enable_multipathing = true
io_depth = 32
block_size = 4096
cache_size = 1073741824 # 1GB
compression = "Lz4"
paths = ["/nvme0/orbit", "/nvme1/orbit", "/nvme2/orbit"]
13. Composite Provider (CompositeConfig)
Best for: High availability scenarios requiring failover
Features:
- Primary/backup provider configuration
- Automatic health monitoring
- Configurable failover thresholds
- Data synchronization between providers
- Transparent failover
Configuration:
[providers.composite]
type = "Composite"
sync_interval = 60
health_check_interval = 30
failover_threshold = 3
[providers.composite.primary]
type = "Memory"
max_entries = 50000
[providers.composite.backup]
type = "S3"
endpoint = "http://localhost:9000"
region = "us-east-1"
bucket = "orbit-backup"
access_key_id = "minioadmin"
secret_access_key = "minioadmin"
enable_ssl = false
Provider-Based System
Storage Trait (TableStorage)
The storage abstraction layer provides a unified interface for all storage operations:
pub trait TableStorage: Send + Sync {
// Schema management
async fn create_table(&self, schema: &TableSchema) -> Result<()>;
async fn get_table_schema(&self, table_name: &str) -> Result<Option<TableSchema>>;
// Data operations
async fn insert(&self, table: &str, row: &Row) -> Result<()>;
async fn select(&self, query: &SelectQuery) -> Result<Vec<Row>>;
async fn update(&self, table: &str, updates: &[Update], filter: &Filter) -> Result<u64>;
async fn delete(&self, table: &str, filter: &Filter) -> Result<u64>;
// Transaction support
async fn begin_transaction(&self) -> Result<Transaction>;
async fn commit_transaction(&self, tx: Transaction) -> Result<()>;
async fn rollback_transaction(&self, tx: Transaction) -> Result<()>;
// Maintenance
async fn checkpoint(&self) -> Result<()>;
async fn compact(&self) -> Result<()>;
async fn metrics(&self) -> Result<StorageMetrics>;
}
Factory Pattern
StorageBackendFactory provides a unified way to create storage backend instances:
use orbit_protocols::postgres_wire::storage::{StorageBackendConfig, StorageBackendFactory};
// Create with default LSM storage
let executor = SqlExecutor::new().await?;
// Create with custom storage configuration
let config = StorageBackendConfig::Memory;
let executor = SqlExecutor::new_with_storage_config(config).await?;
// Use a specific storage backend
let storage = StorageBackendFactory::create_backend(&config).await?;
let executor = SqlExecutor::with_storage(storage);
Configuration
Configuration Methods
1. Programmatic Configuration
use orbit_server::persistence::*;
let config = PersistenceProviderConfig::builder()
.with_memory("memory", MemoryConfig::default(), false)
.with_s3("s3", s3_config, true) // Default provider
.build()?;
2. Configuration Files
TOML Format:
[defaults]
addressable = "s3"
cluster = "etcd"
[providers.memory]
type = "Memory"
max_entries = 100000
[providers.s3]
type = "S3"
endpoint = "https://s3.amazonaws.com"
region = "us-west-2"
bucket = "orbit-data"
JSON Format:
{
"defaults": {
"addressable": "s3",
"cluster": "etcd"
},
"providers": {
"memory": {
"type": "Memory",
"max_entries": 100000
},
"s3": {
"type": "S3",
"endpoint": "https://s3.amazonaws.com",
"region": "us-west-2",
"bucket": "orbit-data"
}
}
}
3. Environment Variables
# S3 configuration from environment
ORBIT_S3_ENDPOINT=https://s3.amazonaws.com
ORBIT_S3_REGION=us-west-2
ORBIT_S3_BUCKET=orbit-data
ORBIT_S3_ACCESS_KEY_ID=your_access_key
ORBIT_S3_SECRET_ACCESS_KEY=your_secret_key
Backend Selection
Set the persistence backend using one of these methods:
- Environment variable:
ORBIT_PERSISTENCE_BACKEND - In TOML config:
[server] persistence_backend = "lsm_tree"
Supported values:
memory,cow_btree,lsm_tree,rocksdb,s3,azure,gcp,etcd,redis,kubernetes,minio,flash,composite
Performance Tuning
Backend Comparison
| Feature | COW B+ Tree | LSM-Tree | RocksDB | Memory |
|---|---|---|---|---|
| Write Latency | ~41μs | ~38μs | ~53μs | <1μs |
| Read Latency | <1μs | ~0.3μs | ~19μs | <1μs |
| Memory Usage | Low | Medium | High | Low |
| Disk Usage | Low | Medium (write amp) | High (write amp) | None |
| CPU Usage | Low | Medium (compaction) | High | Low |
| Crash Recovery | WAL Replay | WAL + SSTable | Built-in | None |
| Production Ready | ⚠️ Beta | ⚠️ Beta | ✅ Stable | ✅ Stable |
Tuning Guidelines
COW B+ Tree Tuning
# For read-heavy workloads
max_keys_per_node = 256 # Larger nodes, fewer levels
wal_buffer_size = 512000 # Smaller buffer, frequent flushes
enable_snapshots = true
snapshot_interval = 500 # More frequent snapshots
# For memory-constrained environments
max_keys_per_node = 32 # Smaller nodes
wal_buffer_size = 65536 # Small buffer
enable_snapshots = false # Disable snapshots
LSM-Tree Tuning
# For write-heavy workloads
memtable_size_mb = 128 # Large memtables
max_levels = 8 # More levels
level_size_multiplier = 8 # Smaller multiplier
compaction_threshold = 10 # Less frequent compaction
# For read-heavy workloads
memtable_size_mb = 32 # Smaller memtables
enable_bloom_filters = true # Enable bloom filters
bloom_filter_bits_per_key = 15 # More accurate filters
block_cache_mb = 512 # Large block cache
RocksDB Tuning
# For production workloads
write_buffer_size = 268435456 # 256MB write buffers
block_cache_size = 1073741824 # 1GB block cache for reads
max_background_jobs = 8 # More compaction threads
compression = true # Enable compression
Monitoring and Metrics
Key Metrics
All providers expose standard metrics:
- Operation Counts: read_operations, write_operations, delete_operations
- Latency: read_latency_avg, write_latency_avg, delete_latency_avg
- Error Tracking: error_count
- Cache Statistics: cache_hits, cache_misses (where applicable)
Health Checks
Health checks provide three states:
- Healthy: Provider is operating normally
- Degraded: Provider is operational but with reduced performance
- Unhealthy: Provider is not operational
Alerting Thresholds
# Recommended alert thresholds
write_latency_p95: 100ms # 95th percentile write latency
read_latency_p95: 10ms # 95th percentile read latency
error_rate: 0.1% # Error rate threshold
memory_usage: 80% # Memory usage threshold
disk_usage: 85% # Disk usage threshold
compaction_backlog: 10 # LSM compaction queue size
Deployment Scenarios
Scenario 1: Development Environment
Configuration: Memory provider with disk backup
- Fast development cycles
- Data persistence across restarts
- No external dependencies
Scenario 2: Production on AWS
Configuration: S3 for persistence, ElastiCache Redis for caching
- High durability and availability
- Scalable storage
- Cross-region replication
Scenario 3: Kubernetes Deployment
Configuration: etcd for cluster state, ConfigMaps for configuration
- Cloud-native integration
- Automatic service discovery
- RBAC integration
Scenario 4: Hybrid Cloud
Configuration: Composite provider with on-premises primary and cloud backup
- Data sovereignty requirements
- Disaster recovery
- Cost optimization
Scenario 5: High-Performance Computing
Configuration: Flash-optimized storage with multipathing
- Ultra-low latency
- High throughput
- Hardware optimization
Migration Between Providers
The system includes migration utilities for moving data between providers:
use orbit_server::persistence::migration::*;
let migration = DataMigration::new()
.from_provider(old_provider)
.to_provider(new_provider)
.with_batch_size(1000)
.with_validation(true);
let result = migration.execute().await?;
println!("Migrated {} leases and {} nodes", result.leases_migrated, result.nodes_migrated);
Migration Process
# Export from COW B+ Tree
orbit-cli export --backend=cow --data-dir=/var/lib/orbit --output=export.json
# Import to LSM-Tree
orbit-cli import --backend=lsm --data-dir=/var/lib/orbit-new --input=export.json
# Validate migration
orbit-cli validate --backend=lsm --data-dir=/var/lib/orbit-new
Security Considerations
Authentication
- Cloud Providers: Use native authentication (IAM roles, service accounts)
- etcd: Support for username/password and mTLS
- Redis: Password authentication and SSL/TLS
- Kubernetes: RBAC and service account tokens
Encryption
- At Rest: Leverage provider-native encryption (S3 SSE, Azure encryption, etc.)
- In Transit: TLS/SSL support for all network communications
- Key Management: Integration with cloud KMS services
Access Control
- Least Privilege: Configure minimal required permissions
- Network Security: Use VPCs, security groups, network policies
- Audit Logging: All operations are logged with appropriate detail
Best Practices
Development
- Always use transactions for multi-operation consistency
- Implement proper error handling for all persistence operations
- Monitor metrics in development to catch performance regressions
- Test with realistic data sizes to validate performance assumptions
- Use builder pattern for configuration to ensure type safety
Production
- Enable monitoring for all key metrics
- Set up automated backups of WAL and snapshots
- Test disaster recovery procedures regularly
- Use appropriate backend for your workload characteristics
- Configure resource limits to prevent resource exhaustion
- Implement graceful degradation for persistence failures
DevOps
- Automate deployment with configuration validation
- Use infrastructure as code for reproducible environments
- Implement blue-green deployments for zero-downtime updates
- Set up comprehensive monitoring and alerting
- Document runbooks for common operational procedures
- Regular capacity planning reviews based on growth trends
Historical Issues and Fixes
RocksDB Persistence Creation Issues (Resolved)
Root Causes Identified
- TieredTableStorage was Completely In-Memory
- Problem: The
TieredTableStorageimplementation usedHybridStorageManagerwhich stored all data in-memory usingRowBasedStore(aVec<Row>). There was no disk persistence mechanism. - Fix: Integrated RocksDB into
TieredTableStorageto provide durable storage.
- Problem: The
- No Protocol-Specific Persistence Folders
- Problem: Only generic directories were created. Each protocol should have its own persistence folder.
- Fix: Created protocol-specific subdirectories (
data/postgresql/rocksdb/,data/mysql/rocksdb/, etc.).
- Hot/Warm/Cold Directories Were Not Used
- Problem: The
data/hot,data/warm, anddata/colddirectories were created but never written to. - Fix: RocksDB now handles persistence directly, and tiered storage uses RocksDB as the backing store.
- Problem: The
- RocksDB Was Initialized But Not Used by TieredTableStorage
- Problem: A global
RocksDbTableStoragewas initialized butTieredTableStoragedidn’t use it. - Fix: Modified
TieredTableStorageto use RocksDB for schema and data persistence.
- Problem: A global
- Redis Path Inconsistency
- Problem: Redis data files were being created at both root level and rocksdb level.
- Fix: Standardized Redis to use
data/redis/rocksdb/consistently.
Persistence Verification
All persistence implementations have been verified to:
- ✅ Create data directories safely using
create_dir_all(doesn’t delete existing files) - ✅ Use RocksDB
create_if_missing(true)andcreate_missing_column_families(true) - ✅ Load existing data from RocksDB on server restart
- ✅ Preserve data across server restarts
- ✅ Use protocol-specific data directories for isolation
Verification Tests: See orbit/server/tests/persistence_verification.rs for comprehensive tests.
Conclusion
The provider-based persistence architecture transforms Orbit server from a simple in-memory system to a production-ready platform that can adapt to any deployment scenario. Whether you’re running a single development instance or a global distributed system, there’s a provider configuration that meets your needs.
The system’s flexibility, combined with comprehensive monitoring, security features, and migration capabilities, ensures that your Orbit deployment can evolve with your requirements without architectural changes.
Status: Production Ready