Graph RAG
Combine knowledge graphs with vector search for enhanced retrieval and reasoning.
Overview
Graph RAG combines traditional vector search with knowledge graph traversal to capture relationships and enable multi-hop reasoning.
Why Graph RAG?
Traditional RAG limitations:
- No relationship awareness
- Can't answer "Who knows X who also knows Y?"
- Misses implicit connections
Graph RAG advantages:
- Captures entity relationships
- Enables multi-hop queries
- Better for complex domains (legal, scientific, organizational)
Basic Implementation
from neo4j import GraphDatabase
import openai
class GraphRAG:
def __init__(self, neo4j_uri, neo4j_user, neo4j_password):
self.graph = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_password))
self.embedder = openai.Embedding()
def add_document(self, text, metadata):
# Extract entities and relationships
entities = self.extract_entities(text)
relationships = self.extract_relationships(text)
# Store in graph
with self.graph.session() as session:
for entity in entities:
session.run(
"MERGE (e:Entity {name: $name, type: $type})",
name=entity['name'], type=entity['type']
)
for rel in relationships:
session.run(
"""
MATCH (a:Entity {name: $from})
MATCH (b:Entity {name: $to})
MERGE (a)-[r:RELATES_TO {type: $rel_type}]->(b)
""",
**rel
)
def query(self, question):
# 1. Vector search for relevant entities
query_embedding = self.embedder.embed(question)
relevant_entities = self.vector_search(query_embedding)
# 2. Graph traversal from relevant entities
with self.graph.session() as session:
result = session.run(
"""
MATCH (e:Entity)
WHERE e.name IN $entities
MATCH path = (e)-[*1..2]-(related)
RETURN path
""",
entities=relevant_entities
)
graph_context = self.format_graph_results(result)
# 3. Generate answer with graph context
answer = self.llm.generate(question, graph_context)
return answer
Entity Extraction
def extract_entities(text):
"""Use LLM to extract entities"""
prompt = f"""
Extract entities from this text. Return as JSON:
{{"entities": [{{"name": "...", "type": "person|organization|location|concept"}}]}}
Text: {text}
"""
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
return json.loads(response.choices[0].message.content)['entities']
Hybrid Search
Combine vector and graph:
def hybrid_graph_search(query, k=5):
# Vector search
vector_results = vector_db.search(query, k=k)
# Graph expansion
expanded_results = []
for result in vector_results:
# Find connected entities
connected = graph.query(
"MATCH (e {id: $id})-[*1..2]-(related) RETURN related",
id=result.id
)
expanded_results.extend(connected)
return expanded_results
Use Cases
- Research: "Find papers citing X that are also cited by Y"
- Legal: "Find cases involving Company A that reference Statute B"
- Corporate: "Who worked with Person X on Project Y?"
When to Avoid Graph RAG
From Production: "Graph databases are useful when you need complex traversals, but most use cases only require 2-3 left joins in SQL rather than complex graph operations. From a skills perspective, it's easier to hire people who know SQL well than to find graph database experts."
Consider sticking to SQL/Vector DB if:
- Your relationships are simple (e.g., Author -> Document, Document -> Category)
- You only need 1-2 hops
- You don't have a dedicated graph expert on the team
- You can achieve the same result with metadata filtering or SQL joins
Common Questions
"Is Graph RAG production ready?"
It depends on your definition of production.
- Technology: Neo4j and others are mature.
- Complexity: The pipeline to extract entities and maintain the graph is fragile and expensive.
- Recommendation: Only use it if vector search + metadata filtering fundamentally fails to answer your core questions.
"Can I use Postgres instead?"
Yes, and you probably should.
- Use
pgvectorfor similarity search - Use standard SQL
JOINs for relationships - This covers 95% of "graph" use cases without the operational overhead of a dedicated graph DB.
Next Steps
- Retrieval Fundamentals - Traditional RAG
- Multi-modal RAG - Images and graphs