RAG Implementation Guide

Overview

Graph RAG combines traditional vector search with knowledge graph traversal to capture relationships and enable multi-hop reasoning.

Why Graph RAG?

Traditional RAG limitations:

No relationship awareness
Can't answer "Who knows X who also knows Y?"
Misses implicit connections

Graph RAG advantages:

Captures entity relationships
Enables multi-hop queries
Better for complex domains (legal, scientific, organizational)

Basic Implementation

from neo4j import GraphDatabase
import openai

class GraphRAG:
    def __init__(self, neo4j_uri, neo4j_user, neo4j_password):
        self.graph = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_password))
        self.embedder = openai.Embedding()
    
    def add_document(self, text, metadata):
        # Extract entities and relationships
        entities = self.extract_entities(text)
        relationships = self.extract_relationships(text)
        
        # Store in graph
        with self.graph.session() as session:
            for entity in entities:
                session.run(
                    "MERGE (e:Entity {name: $name, type: $type})",
                    name=entity['name'], type=entity['type']
                )
            
            for rel in relationships:
                session.run(
                    """
                    MATCH (a:Entity {name: $from})
                    MATCH (b:Entity {name: $to})
                    MERGE (a)-[r:RELATES_TO {type: $rel_type}]->(b)
                    """,
                    **rel
                )
    
    def query(self, question):
        # 1. Vector search for relevant entities
        query_embedding = self.embedder.embed(question)
        relevant_entities = self.vector_search(query_embedding)
        
        # 2. Graph traversal from relevant entities
        with self.graph.session() as session:
            result = session.run(
                """
                MATCH (e:Entity)
                WHERE e.name IN $entities
                MATCH path = (e)-[*1..2]-(related)
                RETURN path
                """,
                entities=relevant_entities
            )
            
            graph_context = self.format_graph_results(result)
        
        # 3. Generate answer with graph context
        answer = self.llm.generate(question, graph_context)
        return answer

Entity Extraction

def extract_entities(text):
    """Use LLM to extract entities"""
    prompt = f"""
    Extract entities from this text. Return as JSON:
    {{"entities": [{{"name": "...", "type": "person|organization|location|concept"}}]}}
    
    Text: {text}
    """
    
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    
    return json.loads(response.choices[0].message.content)['entities']

Hybrid Search

Combine vector and graph:

def hybrid_graph_search(query, k=5):
    # Vector search
    vector_results = vector_db.search(query, k=k)
    
    # Graph expansion
    expanded_results = []
    for result in vector_results:
        # Find connected entities
        connected = graph.query(
            "MATCH (e {id: $id})-[*1..2]-(related) RETURN related",
            id=result.id
        )
        expanded_results.extend(connected)
    
    return expanded_results

Use Cases

Research: "Find papers citing X that are also cited by Y"
Legal: "Find cases involving Company A that reference Statute B"
Corporate: "Who worked with Person X on Project Y?"

When to Avoid Graph RAG

From Production: "Graph databases are useful when you need complex traversals, but most use cases only require 2-3 left joins in SQL rather than complex graph operations. From a skills perspective, it's easier to hire people who know SQL well than to find graph database experts."

Consider sticking to SQL/Vector DB if:

Your relationships are simple (e.g., Author -> Document, Document -> Category)
You only need 1-2 hops
You don't have a dedicated graph expert on the team
You can achieve the same result with metadata filtering or SQL joins

Common Questions

"Is Graph RAG production ready?"

It depends on your definition of production.

Technology: Neo4j and others are mature.
Complexity: The pipeline to extract entities and maintain the graph is fragile and expensive.
Recommendation: Only use it if vector search + metadata filtering fundamentally fails to answer your core questions.

"Can I use Postgres instead?"

Yes, and you probably should.

Use pgvector for similarity search
Use standard SQL JOINs for relationships
This covers 95% of "graph" use cases without the operational overhead of a dedicated graph DB.

Next Steps

Retrieval Fundamentals - Traditional RAG
Multi-modal RAG - Images and graphs