MMR: Maximal Marginal Relevance
Diversify search results to reduce redundancy and improve coverage
Overview
Maximal Marginal Relevance (MMR) is a technique that balances relevance and diversity in search results. Instead of returning the top K most similar documents (which may be redundant), MMR selects documents that are both relevant to the query AND different from already selected documents.
The Problem: Redundant Results
Standard vector search often returns similar documents:
# Query: "Python programming"
# Top 5 results (redundant):
1. "Python is a programming language"
2. "Python programming language overview"
3. "Introduction to Python programming"
4. "Python: A programming language guide"
5. "Getting started with Python programming"
All results say essentially the same thing, wasting the user's time.
The Solution: MMR
MMR selects diverse results:
# Query: "Python programming"
# MMR results (diverse):
1. "Python is a programming language"
2. "Python data science libraries: NumPy, Pandas"
3. "Python web frameworks: Django, Flask"
4. "Python performance optimization techniques"
5. "Python vs JavaScript comparison"
Each result provides unique information.
How MMR Works
Algorithm
For each selection:
- Relevance: How similar to the query?
- Diversity: How different from already selected documents?
- Balance: Combine both with lambda parameter
Formula
MMR = argmax[λ * sim(D, Q) - (1-λ) * max(sim(D, Di))]
D ∈ R\S
Where:
- D = candidate document
- Q = query
- Di = already selected documents
- R = retrieved documents
- S = selected documents
- λ = diversity parameter (0-1)
Implementation
Basic MMR
import numpy as np
from sentence_transformers import SentenceTransformer, util
class MMRRetriever:
def __init__(self, model_name='all-mpnet-base-v2'):
self.model = SentenceTransformer(model_name)
def mmr_search(self, query, documents, k=5, lambda_param=0.5):
"""
Retrieve documents using MMR
Args:
query: Search query
documents: List of candidate documents
k: Number of documents to return
lambda_param: Balance between relevance (1.0) and diversity (0.0)
"""
# Encode query and documents
query_emb = self.model.encode(query, convert_to_tensor=True)
doc_embs = self.model.encode(documents, convert_to_tensor=True)
# Calculate query-document similarities
query_sims = util.cos_sim(query_emb, doc_embs)[0]
selected_indices = []
remaining_indices = list(range(len(documents)))
# Select first document (most similar to query)
first_idx = query_sims.argmax().item()
selected_indices.append(first_idx)
remaining_indices.remove(first_idx)
# Select remaining documents
while len(selected_indices) < k and remaining_indices:
mmr_scores = []
for idx in remaining_indices:
# Relevance to query
relevance = query_sims[idx].item()
# Max similarity to already selected documents
if selected_indices:
selected_embs = doc_embs[selected_indices]
doc_sims = util.cos_sim(doc_embs[idx], selected_embs)[0]
max_sim = doc_sims.max().item()
else:
max_sim = 0
# MMR score
mmr_score = lambda_param * relevance - (1 - lambda_param) * max_sim
mmr_scores.append((idx, mmr_score))
# Select document with highest MMR score
best_idx = max(mmr_scores, key=lambda x: x[1])[0]
selected_indices.append(best_idx)
remaining_indices.remove(best_idx)
return [documents[i] for i in selected_indices]
# Usage
retriever = MMRRetriever()
documents = [
"Python is a programming language",
"Python programming language overview",
"Python data science with NumPy",
"Python web development with Django",
"JavaScript is a programming language",
]
# Standard search (redundant)
standard_results = documents[:3]
# MMR search (diverse)
mmr_results = retriever.mmr_search(
query="Python programming",
documents=documents,
k=3,
lambda_param=0.5
)
print("MMR Results:")
for doc in mmr_results:
print(f"- {doc}")
Optimized MMR with Vector Database
import lancedb
class OptimizedMMRRetriever:
def __init__(self, db_path="./vector-db"):
self.model = SentenceTransformer('all-mpnet-base-v2')
self.db = lancedb.connect(db_path)
def mmr_search(self, query, k=5, fetch_k=20, lambda_param=0.5):
"""
MMR with initial retrieval from vector DB
Args:
query: Search query
k: Final number of results
fetch_k: Initial retrieval count (should be > k)
lambda_param: Relevance vs diversity balance
"""
# Step 1: Retrieve more candidates than needed
query_emb = self.model.encode(query)
table = self.db.open_table("documents")
candidates = table.search(query_emb)\
.limit(fetch_k)\
.to_list()
if not candidates:
return []
# Step 2: Apply MMR to candidates
candidate_texts = [c['text'] for c in candidates]
candidate_embs = np.array([c['vector'] for c in candidates])
query_emb_tensor = torch.tensor(query_emb)
candidate_embs_tensor = torch.tensor(candidate_embs)
# Calculate similarities
query_sims = util.cos_sim(query_emb_tensor, candidate_embs_tensor)[0]
selected_indices = []
remaining_indices = list(range(len(candidates)))
# Select first (most relevant)
first_idx = query_sims.argmax().item()
selected_indices.append(first_idx)
remaining_indices.remove(first_idx)
# Select remaining with MMR
while len(selected_indices) < k and remaining_indices:
mmr_scores = []
for idx in remaining_indices:
relevance = query_sims[idx].item()
# Max similarity to selected
selected_embs = candidate_embs_tensor[selected_indices]
doc_sims = util.cos_sim(
candidate_embs_tensor[idx].unsqueeze(0),
selected_embs
)[0]
max_sim = doc_sims.max().item()
mmr_score = lambda_param * relevance - (1 - lambda_param) * max_sim
mmr_scores.append((idx, mmr_score))
best_idx = max(mmr_scores, key=lambda x: x[1])[0]
selected_indices.append(best_idx)
remaining_indices.remove(best_idx)
return [candidates[i] for i in selected_indices]
Lambda Parameter Tuning
The lambda parameter controls the relevance-diversity tradeoff:
λ = 1.0 (Pure Relevance)
# Same as standard vector search
results = retriever.mmr_search(query, documents, lambda_param=1.0)
# Returns most similar documents (may be redundant)
Use when:
- Precision is critical
- User query is very specific
- Redundancy is acceptable
λ = 0.5 (Balanced)
# Balance relevance and diversity
results = retriever.mmr_search(query, documents, lambda_param=0.5)
# Returns mix of relevant and diverse documents
Use when:
- General search
- Exploring a topic
- Default setting
λ = 0.0 (Pure Diversity)
# Maximum diversity (may sacrifice relevance)
results = retriever.mmr_search(query, documents, lambda_param=0.0)
# Returns most diverse documents
Use when:
- Brainstorming
- Topic exploration
- Avoiding filter bubbles
Adaptive Lambda
Adjust lambda based on query:
def adaptive_lambda(query, results_count):
"""Adjust lambda based on context"""
# Specific query: higher relevance
if len(query.split()) > 10:
return 0.7
# Few results: prioritize relevance
if results_count < 10:
return 0.8
# Many results: increase diversity
if results_count > 100:
return 0.3
# Default: balanced
return 0.5
Use Cases
1. Question Answering
Provide diverse perspectives:
# Query: "What is climate change?"
# MMR returns:
# - Scientific definition
# - Causes and effects
# - Mitigation strategies
# - Economic impacts
# - Political perspectives
2. Document Summarization
Select representative sentences:
def extractive_summary(document, num_sentences=5):
"""Create summary using MMR"""
sentences = document.split('.')
# Use document as "query" to find representative sentences
summary_sentences = retriever.mmr_search(
query=document,
documents=sentences,
k=num_sentences,
lambda_param=0.3 # Favor diversity
)
return '. '.join(summary_sentences)
3. Recommendation Systems
Diverse product recommendations:
def recommend_products(user_query, products, k=10):
"""Recommend diverse products"""
recommendations = retriever.mmr_search(
query=user_query,
documents=[p['description'] for p in products],
k=k,
lambda_param=0.4 # Slightly favor diversity
)
return recommendations
4. Research Paper Discovery
Find papers covering different aspects:
# Query: "machine learning"
# MMR returns papers on:
# - Supervised learning
# - Unsupervised learning
# - Reinforcement learning
# - Neural networks
# - Applications in healthcare
Performance Optimization
Batch Processing
Process multiple queries efficiently:
def batch_mmr_search(queries, documents, k=5):
"""Process multiple queries at once"""
# Encode all at once
query_embs = model.encode(queries)
doc_embs = model.encode(documents)
results = []
for query_emb in query_embs:
# Run MMR for each query
mmr_result = mmr_algorithm(query_emb, doc_embs, k)
results.append(mmr_result)
return results
Caching
Cache document embeddings:
class CachedMMRRetriever:
def __init__(self):
self.model = SentenceTransformer('all-mpnet-base-v2')
self.doc_emb_cache = {}
def get_doc_embedding(self, doc):
"""Get cached embedding or compute"""
if doc not in self.doc_emb_cache:
self.doc_emb_cache[doc] = self.model.encode(doc)
return self.doc_emb_cache[doc]
Approximate MMR
Trade accuracy for speed:
def fast_mmr(query, documents, k=5, sample_size=50):
"""Faster MMR with sampling"""
# Only consider top candidates
query_emb = model.encode(query)
doc_embs = model.encode(documents)
# Get top candidates by relevance
sims = util.cos_sim(query_emb, doc_embs)[0]
top_indices = sims.argsort(descending=True)[:sample_size]
# Apply MMR only to top candidates
top_docs = [documents[i] for i in top_indices]
return mmr_search(query, top_docs, k=k)
Evaluation
Diversity Metrics
Measure result diversity:
def calculate_diversity(results):
"""Calculate average pairwise dissimilarity"""
embs = model.encode(results)
total_dissimilarity = 0
count = 0
for i in range(len(embs)):
for j in range(i + 1, len(embs)):
sim = util.cos_sim(embs[i], embs[j]).item()
dissimilarity = 1 - sim
total_dissimilarity += dissimilarity
count += 1
return total_dissimilarity / count if count > 0 else 0
# Higher is more diverse
diversity_score = calculate_diversity(mmr_results)
print(f"Diversity: {diversity_score:.3f}")
Relevance-Diversity Tradeoff
def evaluate_mmr(query, results, ground_truth):
"""Evaluate both relevance and diversity"""
# Relevance: How many relevant docs?
relevant_count = len(set(results) & set(ground_truth))
relevance = relevant_count / len(ground_truth)
# Diversity: How different are results?
diversity = calculate_diversity(results)
# Combined score
f1_score = 2 * (relevance * diversity) / (relevance + diversity)
return {
'relevance': relevance,
'diversity': diversity,
'f1': f1_score
}
Common Issues
1. Over-Diversification
Problem: Results too diverse, losing relevance
Solution: Increase lambda (e.g., 0.7-0.8)
2. Still Redundant
Problem: Results still similar despite MMR
Solution:
- Decrease lambda (e.g., 0.2-0.3)
- Increase fetch_k to get more candidates
- Improve chunking to create more distinct documents
3. Slow Performance
Problem: MMR is computationally expensive
Solution:
- Reduce fetch_k
- Use approximate MMR
- Cache embeddings
- Pre-compute document-document similarities
Best Practices
- Start with λ=0.5 and adjust based on user feedback
- Set fetch_k = 2-4× k for good candidate pool
- Monitor both relevance and diversity metrics
- A/B test MMR vs. standard retrieval
- Cache embeddings for frequently accessed documents
Next Steps
- Retrieval Fundamentals - Core vector search concepts
- Hybrid Search - Combine semantic and keyword search
- Query Expansion - Improve recall
- Parent Document Retrieval - Better context