GraphRAG (Graph-enhanced Retrieval-Augmented Generation) - Complete Documentation
GraphRAG (Graph-enhanced Retrieval-Augmented Generation) - Complete Documentation
Status: ✅ Production Ready
Table of Contents
- Executive Summary
- Overview
- Architecture
- RESP Protocol Commands
- PostgreSQL Integration
- Usage Examples
- Performance Optimizations
- Implementation Phases
- Contributing
Executive Summary
GraphRAG (Graph-enhanced Retrieval-Augmented Generation) provides comprehensive knowledge graph construction and querying capabilities in Orbit-RS, leveraging the existing distributed actor system, graph database (GRAPH.), vector store (VECTOR.), and search engine (FT.*) infrastructure.
Key Features
✅ Knowledge Graph Construction
- Entity extraction from documents
- Relationship identification
- Multi-extractor support
- Entity deduplication
✅ RAG Query Processing
- Graph traversal with LLM generation
- Multi-hop reasoning
- Hybrid search (graph + vector + text)
- Citation support
✅ Multiple Interfaces
- RESP protocol commands (GRAPHRAG.*)
- PostgreSQL SQL functions
- AQL function calls (GRAPHRAG_*)
- Cypher/Bolt stored procedures (orbit.graphrag.*)
- REST API support
✅ Advanced Analytics Functions (NEW - November 2025)
- Entity similarity search (embedding and text-based)
- Semantic search with RAG integration
- Entity listing with filtering
- Trend analysis over time windows
- Community detection using connected components
Overview
GraphRAG combines graph databases, vector stores, and LLM generation to provide powerful knowledge graph construction and querying capabilities. It extends Orbit-RS’s existing infrastructure with specialized actors and commands.
Core Principles
- Leverage Existing Infrastructure: Build upon existing actors and protocols
- Maintain Redis Compatibility: Extend RESP protocol with GraphRAG.* commands
- Distributed by Design: Support scaling across multiple nodes
- Multi-Modal Integration: Combine graph, vector, and text search seamlessly
- LLM-Ready: Design for integration with various LLM providers
Architecture
GraphRAG Actor Ecosystem
┌─────────────────────────────────────────────────────────────────┐
│ GraphRAG Orchestrator │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Knowledge Graph │ │ RAG Pipeline │ │ Query Processor │ │
│ │ Constructor │ │ Manager │ │ & Router │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌───────────────┼────---───────────┐
│ │ │
┌─────────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ GraphActor │ │ VectorActor │ │ SearchActor │
│ (Enhanced) │ │ (Enhanced) │ │ (Enhanced) │
│ │ │ │ │ │
│ • Cypher Queries │ │ • Embeddings │ │ • Text Search │
│ • Entity Relations │ │ • Similarity │ │ • Indexing │
│ • Multi-hop Paths │ │ • KNN Search │ │ • Ranking │
│ • Graph Algorithms │ │ • Metadata │ │ • Filtering │
│ • Vector Properties │ │ • Clustering │ │ • Analysis │
└─────────────────────┘ └─────────────────┘ └─────────────────┘
New Components
1. GraphRAGActor
Purpose: Orchestrates knowledge graph construction, entity extraction, and RAG operations.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct GraphRAGActor {
pub kg_name: String,
pub config: GraphRAGConfig,
pub stats: GraphRAGStats,
pub entity_extractors: Vec<EntityExtractor>,
pub embedding_models: Vec<EmbeddingModel>,
pub llm_providers: Vec<LLMProvider>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct GraphRAGConfig {
pub max_entities_per_document: usize,
pub max_relationships_per_document: usize,
pub embedding_dimension: usize,
pub similarity_threshold: f32,
pub max_hops: u32,
pub rag_context_size: usize,
pub enable_entity_deduplication: bool,
pub enable_relationship_inference: bool,
}
Key Methods:
process_document()- Extract entities/relationships and build knowledge graphquery_rag()- Perform retrieval-augmented generation queriestraverse_graph()- Multi-hop reasoning across relationshipssemantic_search()- Vector similarity search across graph entitieshybrid_search()- Combine graph traversal, vector search, and text search
2. EntityExtractionActor
Purpose: NLP pipeline for identifying entities and relationships from text.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct EntityExtractionActor {
pub extractors: Vec<ExtractorConfig>,
pub confidence_threshold: f32,
pub deduplication_strategy: DeduplicationStrategy,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum ExtractorConfig {
NamedEntityRecognition { model_path: String },
RelationshipExtraction { model_path: String },
CustomPattern { regex_patterns: Vec<String> },
LLMBased { provider: LLMProvider, prompt_template: String },
}
3. Enhanced GraphNode with Vector Embeddings
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct EnhancedGraphNode {
pub id: String,
pub labels: Vec<String>,
pub properties: HashMap<String, serde_json::Value>,
// GraphRAG Extensions
pub embeddings: HashMap<String, Vec<f32>>, // Multiple embeddings per model
pub entity_type: Option<EntityType>,
pub confidence_score: f32,
pub source_documents: Vec<String>,
pub last_updated: i64,
}
RESP Protocol Commands
GraphRAG commands are available under the GRAPHRAG.* namespace via the RESP protocol.
GRAPHRAG.BUILD
Build a knowledge graph from a document by extracting entities and relationships.
Syntax:
GRAPHRAG.BUILD <kg_name> <document_id> <text> [metadata key value ...]
Parameters:
kg_name- Knowledge graph identifierdocument_id- Unique document identifiertext- Document text contentmetadata- Optional key-value pairs for document metadata
Response: Array with processing statistics
entities_extracted- Number of entities extractedrelationships_extracted- Number of relationships extractedextractors_used- Number of extractors appliedprocessing_time_ms- Processing time in milliseconds
Example:
GRAPHRAG.BUILD company_kg doc1 "Apple Inc. is a technology company founded by Steve Jobs. Microsoft Corporation is another major technology company." company "Apple Inc." industry "Technology"
GRAPHRAG.QUERY
Execute a RAG query that combines graph traversal with LLM generation.
Syntax:
GRAPHRAG.QUERY <kg_name> <query_text> [MAX_HOPS <hops>] [CONTEXT_SIZE <size>] [LLM_PROVIDER <provider>] [INCLUDE_EXPLANATION]
Parameters:
kg_name- Knowledge graph identifierquery_text- Natural language queryMAX_HOPS- Maximum hops for graph traversal (optional)CONTEXT_SIZE- Maximum context size for LLM (optional)LLM_PROVIDER- LLM provider to use (optional)INCLUDE_EXPLANATION- Include reasoning paths in response (optional flag)
Response: Array with query results
response- Generated LLM responseconfidence- Confidence score (0.0-1.0)processing_time_ms- Total processing timeentities_involved- Array of entities involved in reasoningreasoning_paths- Array of reasoning paths (if INCLUDE_EXPLANATION specified)citations- Array of source citations
Example:
GRAPHRAG.QUERY company_kg "What is the relationship between Apple and technology?" MAX_HOPS 3 INCLUDE_EXPLANATION
GRAPHRAG.EXTRACT
Extract entities and relationships from text without building the knowledge graph.
Syntax:
GRAPHRAG.EXTRACT <kg_name> <document_id> <text> [EXTRACTORS extractor1 extractor2 ...]
Example:
GRAPHRAG.EXTRACT company_kg doc2 "Tesla Inc. was founded by Elon Musk in 2003." EXTRACTORS regex_ner keyword_extraction
GRAPHRAG.REASON
Find reasoning paths between two specific entities in the knowledge graph.
Syntax:
GRAPHRAG.REASON <kg_name> <from_entity> <to_entity> [MAX_HOPS <hops>] [INCLUDE_EXPLANATION]
Example:
GRAPHRAG.REASON company_kg "Apple Inc." "Steve Jobs" MAX_HOPS 2 INCLUDE_EXPLANATION
GRAPHRAG.STATS
Get statistics about a knowledge graph.
Syntax:
GRAPHRAG.STATS <kg_name>
Response: Array with statistics
total_entities- Total number of entitiestotal_relationships- Total number of relationshipsentity_types- Breakdown by entity typerelationship_types- Breakdown by relationship typelast_updated- Timestamp of last update
GRAPHRAG.ENTITIES
List entities in a knowledge graph.
Syntax:
GRAPHRAG.ENTITIES <kg_name> [TYPE <type>] [LIMIT <count>] [OFFSET <offset>]
GRAPHRAG.SIMILAR
Find similar entities using vector embeddings.
Syntax:
GRAPHRAG.SIMILAR <kg_name> <entity_id> [LIMIT <count>] [THRESHOLD <score>]
PostgreSQL Integration
GraphRAG functionality is available through PostgreSQL-compatible SQL function calls.
Setup
use orbit_server::protocols::PostgresServer;
use orbit_protocols::postgres_wire::QueryEngine;
// Create query engine with GraphRAG support
let query_engine = QueryEngine::new_with_vector_support(orbit_client);
let server = PostgresServer::new("127.0.0.1:5433", query_engine);
SQL Functions
1. GRAPHRAG_BUILD
Processes a document and builds knowledge graph entities and relationships.
Syntax:
SELECT * FROM GRAPHRAG_BUILD(
kg_name TEXT, -- Knowledge graph name/identifier
document_id TEXT, -- Unique document identifier
document_text TEXT, -- Document content to process
metadata JSON -- Optional: document metadata as JSON
);
Example:
SELECT * FROM GRAPHRAG_BUILD(
'research_papers',
'paper_001',
'Machine learning is a subset of artificial intelligence...',
'{"author": "John Doe", "year": 2023}'::json
);
2. GRAPHRAG_QUERY
Performs Retrieval-Augmented Generation queries against the knowledge graph.
Syntax:
SELECT * FROM GRAPHRAG_QUERY(
kg_name TEXT, -- Knowledge graph name
query_text TEXT, -- Query/question to answer
max_hops INTEGER, -- Optional: max relationship hops (default: 3)
context_size INTEGER, -- Optional: context window size (default: 2048)
llm_provider TEXT, -- Optional: LLM provider ('ollama', 'openai', etc.)
include_explanation BOOLEAN -- Optional: include reasoning explanation
);
Example:
SELECT * FROM GRAPHRAG_QUERY(
'research_papers',
'What are the main applications of machine learning?',
3,
2048,
'ollama',
true
);
3. GRAPHRAG_EXTRACT
Extracts entities and relationships from text without building the full knowledge graph.
Syntax:
SELECT * FROM GRAPHRAG_EXTRACT(
kg_name TEXT, -- Knowledge graph context
document_id TEXT, -- Document identifier
text TEXT -- Text to extract from
);
4. GRAPHRAG_REASON
Finds reasoning paths between two entities in the knowledge graph.
Syntax:
SELECT * FROM GRAPHRAG_REASON(
kg_name TEXT, -- Knowledge graph name
from_entity TEXT, -- Starting entity
to_entity TEXT, -- Target entity
max_hops INTEGER, -- Optional: max hops (default: 3)
include_explanation BOOLEAN -- Optional: include explanations
);
5. GRAPHRAG_STATS
Returns statistics about a knowledge graph.
Syntax:
SELECT * FROM GRAPHRAG_STATS(kg_name TEXT);
6. GRAPHRAG_ENTITIES
Lists entities in a knowledge graph.
Syntax:
SELECT * FROM GRAPHRAG_ENTITIES(
kg_name TEXT, -- Knowledge graph name
entity_type TEXT, -- Optional: filter by type
limit INTEGER, -- Optional: limit results
offset INTEGER -- Optional: offset for pagination
);
7. GRAPHRAG_SIMILAR
Finds similar entities using vector embeddings.
Syntax:
SELECT * FROM GRAPHRAG_SIMILAR(
kg_name TEXT, -- Knowledge graph name
entity_id TEXT, -- Entity to find similar entities for
limit INTEGER, -- Optional: limit results (default: 10)
threshold FLOAT -- Optional: similarity threshold (default: 0.7)
);
Usage Examples
Building a Knowledge Graph
Using RESP:
GRAPHRAG.BUILD company_kg doc1 "Apple Inc. is a technology company founded by Steve Jobs."
Using PostgreSQL:
SELECT * FROM GRAPHRAG_BUILD('company_kg', 'doc1', 'Apple Inc. is a technology company...');
Querying with RAG
Using RESP:
GRAPHRAG.QUERY company_kg "What is the relationship between Apple and technology?" MAX_HOPS 3
Using PostgreSQL:
SELECT * FROM GRAPHRAG_QUERY('company_kg', 'What is the relationship between Apple and technology?', 3);
Entity Analysis
Using RESP:
GRAPHRAG.ENTITIES company_kg TYPE "Organization"
GRAPHRAG.SIMILAR company_kg "Apple Inc." LIMIT 5
Using PostgreSQL:
SELECT * FROM GRAPHRAG_ENTITIES('company_kg', 'Organization');
SELECT * FROM GRAPHRAG_SIMILAR('company_kg', 'Apple Inc.', 5);
Performance Optimizations
1. Caching Strategy
- Entity extraction results cached
- Embedding vectors cached
- Graph traversal paths cached
- LLM responses cached (with TTL)
2. Batch Processing
- Batch entity extraction
- Batch embedding generation
- Batch graph updates
3. Distributed Processing
- Distributed entity extraction
- Distributed graph traversal
- Load balancing across nodes
Implementation Phases
Phase 1: Foundation (Current Sprint)
- GraphRAGActor implementation
- Basic entity extraction
- Simple graph construction
Phase 2: Core GraphRAG Features
- RAG query processing
- Multi-hop reasoning
- Vector-enhanced nodes
Phase 3: Advanced Features
- LLM provider integration
- Hybrid search
- Advanced reasoning
Phase 4: Production Readiness
- Performance optimization
- Comprehensive testing
- Documentation completion
Contributing
Contributions to GraphRAG are welcome! Areas needing help:
- Entity Extraction: Improve NER and relationship extraction
- LLM Integration: Add support for more LLM providers
- Performance: Optimize graph traversal and vector search
- Testing: Expand test coverage
References
- GraphRAG Complete Documentation
- Orbit-RS Architecture
- Vector Store Documentation
- Graph Database Documentation
Maintainer: Orbit-RS Development Team