MCP Server for Orbit-RS: LLM Integration Plan

Overview

The Model Context Protocol (MCP) server for Orbit-RS will provide Large Language Models (LLMs) with intelligent access to the underlying distributed actor system and data through natural language queries. This creates a powerful bridge between natural language processing and the Orbit distributed data platform.

Architecture

    LLM Client (Claude, GPT, etc.)
                ↓
           MCP Protocol
                ↓ 
┌─────────────────────────────────────┐
│           MCP Server                │
│  ┌───────────────────────────────┐  │
│  │     Natural Language          │  │
│  │     Query Processor           │  │  
│  │  ┌─────────────────────────┐  │  │
│  │  │   Intent Classification │  │  │
│  │  │   Entity Recognition    │  │  │
│  │  │   Query Understanding   │  │  │
│  │  └─────────────────────────┘  │  │  
│  └───────────────────────────────┘  │
│  ┌───────────────────────────────┐  │
│  │      SQL Generation           │  │
│  │  ┌─────────────────────────┐  │  │
│  │  │   Schema Analysis       │  │  │
│  │  │   Query Construction    │  │  │
│  │  │   Optimization          │  │  │
│  │  └─────────────────────────┘  │  │
│  └───────────────────────────────┘  │
│  ┌───────────────────────────────┐  │
│  │      Result Processing        │  │
│  │  ┌─────────────────────────┐  │  │
│  │  │   Data Formatting       │  │  │
│  │  │   Summarization         │  │  │
│  │  │   Visualization Hints   │  │  │
│  │  └─────────────────────────┘  │  │
│  └───────────────────────────────┘  │
└─────────────────────────────────────┘
               ↓
        Orbit PostgreSQL 
         Wire Protocol
               ↓
┌─────────────────────────────────────┐
│         Orbit-RS Cluster            │
│  ┌───────────────────────────────┐  │
│  │      SQL Query Engine         │  │
│  │   (Phase 8 Implementation)    │  │
│  └───────────────────────────────┘  │
│  ┌───────────────────────────────┐  │
│  │      Actor System             │  │
│  │   - Table Actors              │  │
│  │   - Collection Actors         │  │
│  │   - Transaction Coordinators  │  │
│  └───────────────────────────────┘  │
└─────────────────────────────────────┘

Core Components

1. MCP Protocol Handler

Responsibility: Handle the Model Context Protocol communication with LLMs

Key Features:

  • Protocol-compliant message handling
  • Authentication and authorization
  • Session management
  • Error handling and recovery

Implementation:

pub struct McpServer {
    pub config: McpConfig,
    pub orbit_client: OrbitClient,
    pub query_processor: NlpQueryProcessor,
    pub sql_generator: SqlGenerator,
}

pub trait McpProtocolHandler {
    async fn handle_request(&self, request: McpRequest) -> Result<McpResponse, McpError>;
    async fn handle_tool_call(&self, tool_call: ToolCall) -> Result<ToolResult, McpError>;
    async fn handle_resource_request(&self, resource: ResourceRequest) -> Result<ResourceResponse, McpError>;
}

2. Natural Language Query Processor

Responsibility: Parse and understand natural language queries

Key Features:

  • Intent classification (SELECT, INSERT, UPDATE, DELETE, ANALYZE, SUMMARIZE)
  • Entity recognition (table names, column names, values)
  • Query context understanding
  • Ambiguity resolution

Implementation:

pub struct NlpQueryProcessor {
    pub schema_analyzer: SchemaAnalyzer,
    pub intent_classifier: IntentClassifier,
    pub entity_extractor: EntityExtractor,
}

pub struct QueryIntent {
    pub operation: SqlOperation,
    pub entities: Vec<RecognizedEntity>,
    pub conditions: Vec<QueryCondition>,
    pub projections: Vec<ProjectionColumn>,
    pub confidence: f64,
}

pub enum SqlOperation {
    Select { 
        aggregation: Option<AggregationType>,
        limit: Option<u64>,
        ordering: Option<OrderingSpec>,
    },
    Insert {
        mode: InsertMode, // Single, Batch, FromSelect
    },
    Update {
        conditional: bool,
    },
    Delete {
        conditional: bool,
    },
    Analyze {
        analysis_type: AnalysisType, // Summary, Distribution, Trends
    },
}

3. SQL Generation Engine

Responsibility: Convert natural language intents to SQL queries

Key Features:

  • Schema-aware query generation
  • Type-safe parameter binding
  • Query optimization hints
  • Vector similarity integration
  • Complex query construction

Implementation:

pub struct SqlGenerator {
    pub schema_cache: SchemaCache,
    pub query_builder: QueryBuilder,
    pub optimizer: QueryOptimizer,
}

pub struct GeneratedQuery {
    pub sql: String,
    pub parameters: Vec<SqlValue>,
    pub query_type: QueryType,
    pub estimated_complexity: QueryComplexity,
    pub optimization_hints: Vec<OptimizationHint>,
}

pub trait QueryBuilder {
    fn build_select(&self, intent: &SelectIntent) -> Result<GeneratedQuery, SqlError>;
    fn build_insert(&self, intent: &InsertIntent) -> Result<GeneratedQuery, SqlError>;
    fn build_update(&self, intent: &UpdateIntent) -> Result<GeneratedQuery, SqlError>;
    fn build_delete(&self, intent: &DeleteIntent) -> Result<GeneratedQuery, SqlError>;
    fn build_analytical(&self, intent: &AnalyticalIntent) -> Result<GeneratedQuery, SqlError>;
}

4. Schema Analysis and Caching

Responsibility: Maintain up-to-date schema information for intelligent query generation

Key Features:

  • Real-time schema discovery
  • Column statistics and metadata
  • Relationship inference
  • Performance characteristics
  • Data distribution analysis

Implementation:

pub struct SchemaAnalyzer {
    pub orbit_client: OrbitClient,
    pub schema_cache: Arc<RwLock<SchemaCache>>,
    pub statistics_collector: StatisticsCollector,
}

pub struct SchemaCache {
    pub tables: HashMap<String, TableSchema>,
    pub indexes: HashMap<String, IndexInfo>,
    pub relationships: Vec<TableRelationship>,
    pub statistics: HashMap<String, ColumnStatistics>,
    pub last_updated: SystemTime,
}

pub struct TableSchema {
    pub name: String,
    pub columns: Vec<ColumnInfo>,
    pub constraints: Vec<ConstraintInfo>,
    pub indexes: Vec<IndexInfo>,
    pub row_estimate: Option<u64>,
    pub data_size_estimate: Option<u64>,
}

5. Result Processing and Formatting

Responsibility: Process query results for optimal LLM consumption

Key Features:

  • Intelligent data summarization
  • Format optimization for LLM context
  • Large result set handling
  • Visualization hint generation
  • Statistical summary generation

Implementation:

pub struct ResultProcessor {
    pub formatter: ResultFormatter,
    pub summarizer: DataSummarizer,
    pub visualizer: VisualizationHintGenerator,
}

pub struct ProcessedResult {
    pub summary: String,
    pub data_preview: Vec<Row>,
    pub statistics: ResultStatistics,
    pub visualization_hints: Vec<VisualizationHint>,
    pub full_result_available: bool,
    pub continuation_token: Option<String>,
}

pub trait DataSummarizer {
    fn summarize_numeric_data(&self, column: &str, values: &[f64]) -> NumericSummary;
    fn summarize_text_data(&self, column: &str, values: &[String]) -> TextSummary;
    fn summarize_temporal_data(&self, column: &str, values: &[DateTime]) -> TemporalSummary;
    fn generate_insights(&self, result: &QueryResult) -> Vec<DataInsight>;
}

MCP Tool Definitions

Core Data Tools

  1. query_data: Execute natural language queries against Orbit data
  2. describe_schema: Get schema information for tables and relationships
  3. analyze_data: Perform statistical analysis and data profiling
  4. search_similar: Vector similarity search using pgvector operations
  5. summarize_table: Generate comprehensive table summaries

Administrative Tools

  1. list_tables: Get available tables and basic metadata
  2. check_performance: Query performance analysis and optimization
  3. get_statistics: Database and table-level statistics
  4. validate_query: Validate SQL queries without execution

Advanced Analytics Tools

  1. trend_analysis: Time-series trend analysis
  2. correlation_analysis: Column correlation analysis
  3. distribution_analysis: Data distribution profiling
  4. outlier_detection: Statistical outlier identification

Natural Language Processing Capabilities

Intent Classification Examples

Data Retrieval:

  • “Show me all users from California”
  • “What are the top 10 products by revenue?”
  • “Find documents similar to this embedding vector”
  • “Get the average order value by month”

Data Analysis:

  • “Analyze the distribution of customer ages”
  • “What’s the correlation between price and sales volume?”
  • “Show me trends in user sign-ups over the past year”
  • “Identify outliers in the transaction amounts”

Data Modification:

  • “Add a new user with email john@example.com
  • “Update all products in category ‘electronics’ to have free shipping”
  • “Delete old log entries from before last month”

Entity Recognition

  • Table Names: “users”, “products”, “orders”, “documents”
  • Column Names: “email”, “price”, “created_at”, “embedding”
  • Values: Strings, numbers, dates, vectors
  • Operators: “greater than”, “similar to”, “between”, “in”
  • Aggregations: “average”, “sum”, “count”, “maximum”

Integration with Orbit-RS

Connection Management

pub struct OrbitMcpClient {
    pub postgres_client: Arc<PostgresClient>,
    pub actor_client: Arc<ActorClient>,
    pub transaction_manager: Arc<TransactionManager>,
}

impl OrbitMcpClient {
    pub async fn execute_query(&self, query: &GeneratedQuery) -> Result<QueryResult, OrbitError> {
        match query.query_type {
            QueryType::Read => self.postgres_client.execute_select(query).await,
            QueryType::Write => {
                let tx = self.transaction_manager.begin_transaction().await?;
                let result = self.postgres_client.execute_write(query).await?;
                tx.commit().await?;
                Ok(result)
            },
            QueryType::Analysis => self.execute_analytical_query(query).await,
        }
    }
}

Actor System Integration

pub struct TableActor {
    pub table_name: String,
    pub schema: TableSchema,
    pub data_store: Arc<dyn DataStorage>,
}

#[async_trait]
impl ActorWithStringKey for TableActor {
    async fn handle_query(&self, query: SqlQuery) -> Result<QueryResult, ActorError> {
        match query.operation {
            SqlOperation::Select(select) => self.execute_select(select).await,
            SqlOperation::Insert(insert) => self.execute_insert(insert).await,
            SqlOperation::Update(update) => self.execute_update(update).await,
            SqlOperation::Delete(delete) => self.execute_delete(delete).await,
        }
    }
}

Implementation Plan

Phase 1: MCP Foundation (Week 1)

  • Set up MCP server infrastructure
  • Implement basic protocol handlers
  • Create Orbit client integration
  • Basic tool definitions

Phase 2: Natural Language Processing (Week 2)

  • Intent classification system
  • Entity recognition for SQL elements
  • Schema analysis and caching
  • Basic SQL generation

Phase 3: Advanced Features (Week 3)

  • Vector similarity search integration
  • Advanced analytics tools
  • Result processing and summarization
  • Performance optimization

Phase 4: Integration and Testing (Week 4)

  • End-to-end integration testing
  • LLM client testing (Claude, GPT-4)
  • Performance benchmarking
  • Documentation and examples

Example Usage Scenarios

Data Exploration

LLM: "I need to understand the customer data structure"
MCP: Executes describe_schema("customers") and provides detailed schema information

LLM: "Show me a sample of customer data from the last month"
MCP: Generates and executes SELECT query with date filtering and LIMIT

LLM: "What's the distribution of customer ages?"
MCP: Executes analyze_data with distribution analysis on age column
LLM: "Find documents similar to 'machine learning algorithms'"
MCP: Converts text to embedding vector and executes similarity search
MCP: Returns top results with distance scores and content summaries

Analytical Queries

LLM: "Analyze sales trends over the past year by product category"
MCP: Generates complex analytical query with time-series grouping
MCP: Returns processed results with statistical insights and visualization hints

Success Criteria

  1. Protocol Compliance: Full MCP protocol compatibility with major LLM clients
  2. Natural Language Understanding: High accuracy (>85%) intent classification and entity recognition
  3. SQL Generation Quality: Generated queries are syntactically correct and semantically meaningful
  4. Performance: Sub-second response times for most queries
  5. Integration: Seamless integration with Orbit-RS Phase 8 SQL engine
  6. Usability: LLMs can effectively query and analyze Orbit data through natural language

This MCP server will position Orbit-RS as a leading platform for AI-powered data analysis and make distributed data accessible through natural language interfaces.