Built a RAG system for our documentation - searched great with 1000 docs. Added 10,000 more and queries started timing out. Chroma was getting slow, searches took 30+ seconds, memory usage was through the roof.
Looked at alternatives. Milvus looked promising - distributed, scalable, battle-tested at companies like Spotify. But the learning curve was steep and setup was complex.
Eventually figured out when to use each. Chroma for 90% of projects, Milvus when you actually need the scale. Here's what I learned from painful experience.
What vector databases actually do
Vector databases store embeddings - numerical representations of text/images that capture semantic meaning. When you search, it finds similar content by meaning, not just keywords.
For RAG (Retrieval Augmented Generation), this is essential. You embed your documents, store them in the vector DB, then retrieve relevant chunks when answering questions.
ChromaDB: The simple choice
Installation is straightforward:
# Install Chroma
pip install chromadb
# Or with Docker (recommended for production)
docker run -p 8000:8000 chromadb/chroma
Basic usage:
import chromadb
from chromadb.config import Settings
# Initialize client
client = chromadb.Client(Settings(chroma_server_host="localhost", chroma_server_http_port=8000))
# Create collection
collection = client.create_collection("my_docs")
# Add documents
collection.add(
documents=["This is a document about Python programming"],
metadatas=[{"source": "tutorial", "topic": "python"}],
ids=["doc1"]
)
# Query
results = collection.query(
query_texts=["How do I learn Python?"],
n_results=3
)
Chroma stores everything locally by default. No external services needed. Great for getting started quickly.
Milvus: The scalable choice
Installation requires more setup:
# Install Milvus with Docker Compose
git clone https://github.com/milvusio/milvus.git
cd milvus
docker-compose up -d
# Or use Helm for Kubernetes
helm install my-milvus milvus/milvus --set cluster.enabled=true
Basic usage:
from pymilvus import connections, Collection, Field, CollectionSchema, DataType
# Connect to Milvus
connections.connect(host="localhost", port="19530")
# Define collection schema
schema = FieldSchema([
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=384)
])
# Create collection
collection = Collection("my_docs", schema)
# Insert data
collection.insert([[1, [0.1, 0.2, ...]]])
# Search
results = collection.search(
data=[[0.3, 0.1, ...]], # query vector
anns_field="embedding",
param={"metric_type": "L2", "params": {"nprobe": 10}},
limit=10
)
Milvus separates storage and compute. You'll need etcd, MinIO, and Milvus itself. More complex but scales horizontally.
Real GitHub issues and solutions
Issue #1: "ChromaDB memory leaks with large datasets"
Problem: ChromaDB process kept growing in memory until OOM killer killed it. With 100k+ documents, memory usage was 8GB+ and climbing.
What I tried: Increased container memory limit, added swap space, reduced batch sizes. Delayed the inevitable.
Actual fix: ChromaDB caches embeddings in memory by default. Need to explicitly enable disk-based persistence:
import chromadb
from chromadb.config import Settings
# Configure for disk persistence
settings = Settings(
chroma_server_host="localhost",
chroma_server_http_port=8000,
# Enable disk-based persistence instead of in-memory
allow_reset=True,
anonymized_telemetry=False
)
client = chromadb.Client(settings)
# Create collection with disk persistence
collection = client.get_or_create_collection(
name="my_docs",
metadata={"hnsw:space": "cosine", "hnsw:construction_efficiency": "32"}
)
# Persist to disk after each batch
collection.add(
documents=large_batch,
persist=True # Critical: flush to disk
)
Source: ChromaDB issue #1127 - disk persistence option was added in v0.4.0 but not enabled by default
Issue #2: "Milvus connection timeouts under load"
Problem: Milvus would drop connections randomly under concurrent load. "Connection reset by peer" errors were common.
What I tried: Increased timeout values, added retry logic, scaled the proxy. Still unreliable.
Actual fix: The default connection pool settings were too aggressive:
# milvus_config.py
from pymilvus import connections
# Configure connection pool
connections.connect(
host="localhost",
port="19530",
pool_size=5, # Reduce from default 10
wait_timeout=30, # Increase timeout
max_retry_per_request=3, # Add retries
)
# For search operations
search_params = {
"metric_type": "L2",
"params": {"nprobe": 16},
"guarantee_timestamp": 10000, # Ensure freshness
"retry_on_timeout": True
}
Source: Milvus issue #8921 - connection pool settings are crucial for production
Issue #3: "Slow queries with millions of vectors"
Problem: Both Chroma and Milvus got slow with 1M+ vectors. Queries that took 100ms with 10k vectors were now 10+ seconds.
What I tried: Increased nprobe (Milvus), switched to brute force search (Chroma). Neither helped much.
Actual fix: Index tuning and shard partitioning:
# ChromaDB: Configure HNSW index
collection = client.get_or_create_collection(
name="my_docs",
metadata={
"hnsw:space": "cosine",
"hnsw:construction_efficiency": "64", # Increase for better recall
"hnsw:M": "32" # Increase for better accuracy
}
)
# Milvus: Partitioning by metadata
collection.load()
# Create partition (Milvus 2.3+)
collection.create_index(
field_name="embedding",
index_name="search_index",
index_type="HNSW",
metric_type="COSINE",
params={
"M": 32,
"efConstruction": 256
}
)
# Partition by date for faster searches
# (requires Milvus 2.3+)
collection.partition(
field_name="created_date",
partition_names=["2024_01", "2024_02", "2024_03"]
)
Source: Both repos suggest index tuning is more effective than throwing hardware at the problem
Issue #4: "Embedding format incompatibility"
Problem: Generated embeddings from different sources (OpenAI, Cohere, local) couldn't be stored together. Dimension mismatches everywhere.
What I tried: Re-embedding everything with the same model. Wasted days of computation.
Actual fix: Store embedding dimension in metadata and handle multiple models:
import chromadb
from sentence_transformers import SentenceTransformer
# Initialize different embedding models
model_384 = SentenceTransformer('all-MiniLM-L6-v2') # 384 dimensions
model_768 = SentenceTransformer('all-mpnet-base-v2') # 768 dimensions
# Store with model info
def add_with_metadata(collection, text, model_name):
if model_name == "miniLM":
embedding = model_384.encode(text)
dim = 384
else:
embedding = model_768.encode(text)
dim = 768
collection.add(
documents=[text],
embeddings=[embedding],
metadatas=[{
"model": model_name,
"dimension": dim,
"source": "mixed"
}]
)
# Query with matching model
def query_with_model(collection, query_text, model_name):
if model_name == "miniLM":
query_embedding = model_384.encode(query_text)
else:
query_embedding = model_768.encode(query_text)
# Filter by model
results = collection.query(
query_embeddings=[query_embedding],
where={"model": model_name, "dimension": str(model_name == "miniLM" and 384 or 768)},
n_results=5
)
return results
Source: This isn't in GitHub issues but is a common problem I encountered repeatedly
Comparison: When to use what
| Factor | ChromaDB | Milvus |
|---|---|---|
| Setup complexity | pip install, 5 min | Docker compose, 30+ min |
| Learning curve | Easy | Steep |
| Scaling | Vertical (single node) | Horizontal (cluster) |
| Max vectors (realistic) | ~1-5 million | 100M+ (claimed) |
| Query speed (10k vectors) | ~50-100ms | ~20-50ms (cluster) |
| Query speed (1M vectors) | 5-10s | 100-500ms (cluster) |
| Memory footprint | ~2-4GB for 100k vectors | ~8-16GB for 100k vectors (with cluster overhead) |
| Cost | Free (local) | Free (self-hosted) or $500+/mo (managed) |
| Best for | Projects & MVPs | Production at scale |
Production deployment patterns
ChromaDB with persistent storage:
# docker-compose.yml for Chroma
version: '3.8'
services:
chromadb:
image: chromadb/chroma:latest
ports:
- "8000:8000"
volumes:
- ./chroma_data:/chroma/chroma
environment:
- ALLOW_RESET=TRUE
- ANONYMIZED_TELEMETRY=FALSE
restart: unless-stopped
deploy:
resources:
limits:
memory: 8G
reservations:
memory: 4G
volumes:
chroma_data:
Milvus with Docker Compose (minimal):
# docker-compose.yml for Milvus
version: '3.8'
services:
etcd:
image quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
volumes:
- ./etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379
-listen-client-urls=http://0.0.0.0:2379
-initial-advertise-peer-urls=http://127.0.0.1:2380
-listen-peer-urls=http://0.0.0.0:2380
minio:
image quay.io/minio/minio:RELEASE.2023-03-20T02-21-38Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
volumes:
- ./minio:/minio_data
command: server /minio_data --console-address ":9001"
milvus-standalone:
image milvusdb/milvus:v2.3.0
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- ./milvus:/var/lib/milvus
ports:
- "19530:19530"
- "9091:9091"
depends_on:
- "etcd"
- "minio"
deploy:
resources:
limits:
memory: 16G
reservations:
memory: 8G
Backup and recovery
ChromaDB backup:
import chromadb
from datetime import datetime
client = chromadb.HttpClient(host="localhost", port=8000)
# Export all collections
snapshot_id = f"backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
# Get all collections
collections = client.list_collections()
for collection in collections:
# Get all data
data = collection.get()
# Save to file
with open(f"{snapshot_id}_{collection.name}.json", 'w') as f:
json.dump(data, f)
# Save metadata
with open(f"{snapshot_id}_{collection.name}_metadata.json", 'w') as f:
json.dump(collection.metadata, f)
# Backup complete
print(f"Backed up {len(collections)} collections")
Milvus backup:
# Milvus provides backup tools
# Backup collection data
milvus_cli export collection -c my_docs -s backup_collection.json
# Backup entire database
milvus_cli export database -s backup_milvus.sql
# Restore
milvus_cli import collection -c my_docs -s backup_collection.json
Monitoring and metrics
ChromaDB doesn't have built-in metrics, but you can track usage:
import chromadb
import time
class MonitoredChromaDB:
def __init__(self):
self.client = chromadb.HttpClient()
self.query_count = 0
self.query_times = []
def query(self, collection_name, query_text, n_results=10):
start_time = time.time()
collection = self.client.get_collection(collection_name)
results = collection.query(
query_texts=[query_text],
n_results=n_results
)
elapsed = time.time() - start_time
# Track metrics
self.query_count += 1
self.query_times.append(elapsed)
# Log slow queries
if elapsed > 1.0:
print(f"Slow query detected: {elapsed:.2f}s")
return results
def get_stats(self):
return {
"total_queries": self.query_count,
"avg_query_time": sum(self.query_times) / len(self.query_times) if self.query_times else 0,
"slow_queries": sum(1 for t in self.query_times if t > 1.0)
}
Migration strategies
Moving from Chroma to Milvus:
# migration_chroma_to_milvus.py
import chromadb
from pymilvus import connections, Collection, Field, CollectionSchema, DataType
# Source: Chroma
chroma_client = chromadb.HttpClient()
chroma_collection = chroma_client.get_collection("my_docs")
# Get all data
data = chroma_collection.get()
# Destination: Milvus
connections.connect("default", host="localhost", port="19530")
# Create schema
schema = FieldSchema([
FieldSchema("id", DataType.INT64, is_primary=True, auto_id=True),
FieldSchema("text", DataType.VARCHAR, max_length=65535),
FieldSchema("embedding", DataType.FLOAT_VECTOR, dim=384)
])
collection = Collection("my_docs", schema)
collection.create_index()
# Migrate data
for i, doc in enumerate(data['documents']):
# Re-embed (or reuse if stored)
embedding = get_embedding(doc) # Your embedding function
collection.insert([{
"id": i,
"text": doc,
"embedding": embedding
}])
if i % 1000 == 0:
print(f"Migrated {i} documents...")
Performance tuning tips
ChromaDB optimization:
- Use DuckDB for metadata filtering: Faster than Chroma's built-in filters
- Batch inserts: Add documents in batches of 100-1000 Pre-compute embeddings: Don't embed during query if possible
- Use persistent storage: Prevents OOM and enables restarts
Milvus optimization:
- Enable SSD cache: Dramatically improves query performance
- Use partitioning: Split collections by date/category for faster searches
- Tune M parameter: Higher M = better recall but slower
- Enable query cache: Milvus can cache frequent queries
Cost comparison
Self-hosted costs (monthly estimates):
| Scale | ChromaDB | Milvus (self-hosted) | Milvus (Zilliz Cloud) |
|---|---|---|---|
| Small (10k vectors) | $0 (2GB RAM) | $20 (16GB RAM + 2 CPUs) | $70 |
| Medium (100k vectors) | $0 (8GB RAM) | $100 (64GB RAM + 4 CPUs) | $200 |
| Large (1M vectors) | $100+ (64GB RAM) | $500+ (256GB RAM + 16 CPUs) | $500 |
Note: ChromaDB gets expensive at scale because you need a single large machine. Milvus distributes load.
Bottom line
ChromaDB is perfect for 95% of RAG projects. It's simple, fast to set up, and works great up to a few hundred thousand vectors. Most projects never outgrow it.
Milvus is for when you actually need to scale. If you're dealing with millions of vectors, need horizontal scaling, or are building a production service with SLAs, Milvus is worth the complexity.
My recommendation: Start with ChromaDB. Migrate to Milvus when you hit ~500k vectors or notice performance degradation. The migration path is well-defined and both systems speak the same language (essentially).
Links: trychroma.com | milvus.io | zilliz.com (managed Milvus)