Vector Storage Fundamentals
Understanding vector databases, indexing strategies, and storage optimization for RAG systems.
Overview
Vector databases store embeddings and enable fast similarity search. Understanding storage fundamentals is critical for scaling RAG systems.
Vector Database Basics
What Gets Stored
{
"id": "doc_123",
"vector": [0.1, 0.2, ..., 0.768], # 768-dim embedding
"metadata": {
"text": "Original chunk text",
"source": "document.pdf",
"page": 5,
"created_at": "2024-01-01"
}
}
Similarity Metrics
Cosine Similarity (most common):
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
Euclidean Distance:
def euclidean_distance(a, b):
return np.linalg.norm(a - b)
Dot Product (for normalized vectors):
def dot_product(a, b):
return np.dot(a, b)
Indexing Strategies
Flat Index (Exact Search)
- Compares query to every vector
- 100% accurate
- O(n) complexity
- Use when: <100k vectors
HNSW (Hierarchical Navigable Small World)
- Graph-based approximate search
- Fast queries (~10ms)
- High recall (>95%)
- Use when: >100k vectors, need speed
IVF (Inverted File Index)
- Clusters vectors, searches nearest clusters
- Memory efficient
- Good for large datasets
- Use when: >1M vectors, limited memory
Storage Optimization
Quantization
Reduce vector precision to save space:
# Original: 768 floats × 4 bytes = 3KB per vector
# Quantized: 768 bytes = 768 bytes per vector (4x smaller)
def quantize_vector(vector, bits=8):
# Scale to [0, 2^bits - 1]
min_val, max_val = vector.min(), vector.max()
scaled = (vector - min_val) / (max_val - min_val)
quantized = (scaled * (2**bits - 1)).astype(np.uint8)
return quantized, min_val, max_val
def dequantize_vector(quantized, min_val, max_val, bits=8):
scaled = quantized.astype(np.float32) / (2**bits - 1)
return scaled * (max_val - min_val) + min_val
Savings: 75% storage reduction with minimal accuracy loss
Dimensionality Reduction
from sklearn.decomposition import PCA
# Reduce 768 dims to 256 dims
pca = PCA(n_components=256)
reduced_vectors = pca.fit_transform(original_vectors)
# 66% storage reduction
Partitioning Strategies
By Metadata
# Store different document types in separate collections
collections = {
'legal': vector_db.create_collection('legal_docs'),
'technical': vector_db.create_collection('technical_docs'),
'marketing': vector_db.create_collection('marketing_docs')
}
# Query only relevant collection
results = collections['legal'].search(query_vector)
By Time
# Partition by month for time-series data
collection_name = f"docs_{year}_{month}"
vector_db.create_collection(collection_name)
# Query recent data first
recent_results = vector_db.search('docs_2024_12', query)
Backup & Recovery
import json
def backup_collection(collection_name, output_file):
"""Export vectors and metadata to JSON"""
vectors = vector_db.get_all(collection_name)
with open(output_file, 'w') as f:
json.dump(vectors, f)
def restore_collection(collection_name, input_file):
"""Restore from backup"""
with open(input_file, 'r') as f:
vectors = json.load(f)
vector_db.upsert(collection_name, vectors)
Performance Tuning
Batch Operations
# Bad: Insert one at a time
for vector in vectors:
db.insert(vector) # 1000 network calls
# Good: Batch insert
db.insert_batch(vectors, batch_size=100) # 10 network calls
Connection Pooling
from qdrant_client import QdrantClient
# Reuse connections
client = QdrantClient(
url="localhost:6333",
timeout=60,
prefer_grpc=True # Faster than HTTP
)
Monitoring
Track these metrics:
metrics = {
'total_vectors': db.count(),
'storage_size_gb': db.get_storage_size() / 1e9,
'avg_query_latency_ms': db.get_avg_latency(),
'p95_query_latency_ms': db.get_p95_latency(),
'index_size_gb': db.get_index_size() / 1e9
}
Cost Optimization
| Vectors | Storage Cost | Query Cost | Total/Month |
|---|---|---|---|
| 100K | $5 | $10 | $15 |
| 1M | $50 | $50 | $100 |
| 10M | $500 | $200 | $700 |
Optimization tips:
- Use quantization (75% savings)
- Partition by metadata (query less data)
- Use cheaper storage tiers for cold data
Next Steps
- Retrieval Fundamentals - Query vector databases effectively
- Production Deployment - Scale to production