Graph RAG

Combine knowledge graphs with vector search for enhanced retrieval and reasoning.

Overview

Graph RAG combines traditional vector search with knowledge graph traversal to capture relationships and enable multi-hop reasoning.

Why Graph RAG?

Traditional RAG limitations:

  • No relationship awareness
  • Can't answer "Who knows X who also knows Y?"
  • Misses implicit connections

Graph RAG advantages:

  • Captures entity relationships
  • Enables multi-hop queries
  • Better for complex domains (legal, scientific, organizational)

Basic Implementation

from neo4j import GraphDatabase
import openai

class GraphRAG:
    def __init__(self, neo4j_uri, neo4j_user, neo4j_password):
        self.graph = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_password))
        self.embedder = openai.Embedding()
    
    def add_document(self, text, metadata):
        # Extract entities and relationships
        entities = self.extract_entities(text)
        relationships = self.extract_relationships(text)
        
        # Store in graph
        with self.graph.session() as session:
            for entity in entities:
                session.run(
                    "MERGE (e:Entity {name: $name, type: $type})",
                    name=entity['name'], type=entity['type']
                )
            
            for rel in relationships:
                session.run(
                    """
                    MATCH (a:Entity {name: $from})
                    MATCH (b:Entity {name: $to})
                    MERGE (a)-[r:RELATES_TO {type: $rel_type}]->(b)
                    """,
                    **rel
                )
    
    def query(self, question):
        # 1. Vector search for relevant entities
        query_embedding = self.embedder.embed(question)
        relevant_entities = self.vector_search(query_embedding)
        
        # 2. Graph traversal from relevant entities
        with self.graph.session() as session:
            result = session.run(
                """
                MATCH (e:Entity)
                WHERE e.name IN $entities
                MATCH path = (e)-[*1..2]-(related)
                RETURN path
                """,
                entities=relevant_entities
            )
            
            graph_context = self.format_graph_results(result)
        
        # 3. Generate answer with graph context
        answer = self.llm.generate(question, graph_context)
        return answer

Entity Extraction

def extract_entities(text):
    """Use LLM to extract entities"""
    prompt = f"""
    Extract entities from this text. Return as JSON:
    {{"entities": [{{"name": "...", "type": "person|organization|location|concept"}}]}}
    
    Text: {text}
    """
    
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    
    return json.loads(response.choices[0].message.content)['entities']

Hybrid Search

Combine vector and graph:

def hybrid_graph_search(query, k=5):
    # Vector search
    vector_results = vector_db.search(query, k=k)
    
    # Graph expansion
    expanded_results = []
    for result in vector_results:
        # Find connected entities
        connected = graph.query(
            "MATCH (e {id: $id})-[*1..2]-(related) RETURN related",
            id=result.id
        )
        expanded_results.extend(connected)
    
    return expanded_results

Use Cases

  • Research: "Find papers citing X that are also cited by Y"
  • Legal: "Find cases involving Company A that reference Statute B"
  • Corporate: "Who worked with Person X on Project Y?"

When to Avoid Graph RAG

From Production: "Graph databases are useful when you need complex traversals, but most use cases only require 2-3 left joins in SQL rather than complex graph operations. From a skills perspective, it's easier to hire people who know SQL well than to find graph database experts."

Consider sticking to SQL/Vector DB if:

  • Your relationships are simple (e.g., Author -> Document, Document -> Category)
  • You only need 1-2 hops
  • You don't have a dedicated graph expert on the team
  • You can achieve the same result with metadata filtering or SQL joins

Common Questions

"Is Graph RAG production ready?"

It depends on your definition of production.

  • Technology: Neo4j and others are mature.
  • Complexity: The pipeline to extract entities and maintain the graph is fragile and expensive.
  • Recommendation: Only use it if vector search + metadata filtering fundamentally fails to answer your core questions.

"Can I use Postgres instead?"

Yes, and you probably should.

  • Use pgvector for similarity search
  • Use standard SQL JOINs for relationships
  • This covers 95% of "graph" use cases without the operational overhead of a dedicated graph DB.

Next Steps