Chroma vs Milvus: Best Vector Database for RAG Applications

Built a RAG system for our documentation - searched great with 1000 docs. Added 10,000 more and queries started timing out. Chroma was getting slow, searches took 30+ seconds, memory usage was through the roof.

Looked at alternatives. Milvus looked promising - distributed, scalable, battle-tested at companies like Spotify. But the learning curve was steep and setup was complex.

Eventually figured out when to use each. Chroma for 90% of projects, Milvus when you actually need the scale. Here's what I learned from painful experience.

What vector databases actually do

Vector databases store embeddings - numerical representations of text/images that capture semantic meaning. When you search, it finds similar content by meaning, not just keywords.

For RAG (Retrieval Augmented Generation), this is essential. You embed your documents, store them in the vector DB, then retrieve relevant chunks when answering questions.

ChromaDB: The simple choice

Installation is straightforward:

# Install Chroma
pip install chromadb

# Or with Docker (recommended for production)
docker run -p 8000:8000 chromadb/chroma

Basic usage:

import chromadb
from chromadb.config import Settings

# Initialize client
client = chromadb.Client(Settings(chroma_server_host="localhost", chroma_server_http_port=8000))

# Create collection
collection = client.create_collection("my_docs")

# Add documents
collection.add(
    documents=["This is a document about Python programming"],
    metadatas=[{"source": "tutorial", "topic": "python"}],
    ids=["doc1"]
)

# Query
results = collection.query(
    query_texts=["How do I learn Python?"],
    n_results=3
)

Chroma stores everything locally by default. No external services needed. Great for getting started quickly.

Milvus: The scalable choice

Installation requires more setup:

# Install Milvus with Docker Compose
git clone https://github.com/milvusio/milvus.git
cd milvus
docker-compose up -d

# Or use Helm for Kubernetes
helm install my-milvus milvus/milvus --set cluster.enabled=true

Basic usage:

from pymilvus import connections, Collection, Field, CollectionSchema, DataType

# Connect to Milvus
connections.connect(host="localhost", port="19530")

# Define collection schema
schema = FieldSchema([
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=384)
])

# Create collection
collection = Collection("my_docs", schema)

# Insert data
collection.insert([[1, [0.1, 0.2, ...]]])

# Search
results = collection.search(
    data=[[0.3, 0.1, ...]],  # query vector
    anns_field="embedding",
    param={"metric_type": "L2", "params": {"nprobe": 10}},
    limit=10
)

Milvus separates storage and compute. You'll need etcd, MinIO, and Milvus itself. More complex but scales horizontally.

Real GitHub issues and solutions

Issue #1: "ChromaDB memory leaks with large datasets"

Problem: ChromaDB process kept growing in memory until OOM killer killed it. With 100k+ documents, memory usage was 8GB+ and climbing.

What I tried: Increased container memory limit, added swap space, reduced batch sizes. Delayed the inevitable.

Actual fix: ChromaDB caches embeddings in memory by default. Need to explicitly enable disk-based persistence:

import chromadb
from chromadb.config import Settings

# Configure for disk persistence
settings = Settings(
    chroma_server_host="localhost",
    chroma_server_http_port=8000,
    # Enable disk-based persistence instead of in-memory
    allow_reset=True,
    anonymized_telemetry=False
)

client = chromadb.Client(settings)

# Create collection with disk persistence
collection = client.get_or_create_collection(
    name="my_docs",
    metadata={"hnsw:space": "cosine", "hnsw:construction_efficiency": "32"}
)

# Persist to disk after each batch
collection.add(
    documents=large_batch,
    persist=True  # Critical: flush to disk
)

Source: ChromaDB issue #1127 - disk persistence option was added in v0.4.0 but not enabled by default

Issue #2: "Milvus connection timeouts under load"

Problem: Milvus would drop connections randomly under concurrent load. "Connection reset by peer" errors were common.

What I tried: Increased timeout values, added retry logic, scaled the proxy. Still unreliable.

Actual fix: The default connection pool settings were too aggressive:

# milvus_config.py
from pymilvus import connections

# Configure connection pool
connections.connect(
    host="localhost",
    port="19530",
    pool_size=5,              # Reduce from default 10
    wait_timeout=30,          # Increase timeout
    max_retry_per_request=3,  # Add retries
)

# For search operations
search_params = {
    "metric_type": "L2",
    "params": {"nprobe": 16},
    "guarantee_timestamp": 10000,  # Ensure freshness
    "retry_on_timeout": True
}

Source: Milvus issue #8921 - connection pool settings are crucial for production

Issue #3: "Slow queries with millions of vectors"

Problem: Both Chroma and Milvus got slow with 1M+ vectors. Queries that took 100ms with 10k vectors were now 10+ seconds.

What I tried: Increased nprobe (Milvus), switched to brute force search (Chroma). Neither helped much.

Actual fix: Index tuning and shard partitioning:

# ChromaDB: Configure HNSW index
collection = client.get_or_create_collection(
    name="my_docs",
    metadata={
        "hnsw:space": "cosine",
        "hnsw:construction_efficiency": "64",  # Increase for better recall
        "hnsw:M": "32"                           # Increase for better accuracy
    }
)

# Milvus: Partitioning by metadata
collection.load()

# Create partition (Milvus 2.3+)
collection.create_index(
    field_name="embedding",
    index_name="search_index",
    index_type="HNSW",
    metric_type="COSINE",
    params={
        "M": 32,
        "efConstruction": 256
    }
)

# Partition by date for faster searches
# (requires Milvus 2.3+)
collection.partition(
    field_name="created_date",
    partition_names=["2024_01", "2024_02", "2024_03"]
)

Source: Both repos suggest index tuning is more effective than throwing hardware at the problem

Issue #4: "Embedding format incompatibility"

Problem: Generated embeddings from different sources (OpenAI, Cohere, local) couldn't be stored together. Dimension mismatches everywhere.

What I tried: Re-embedding everything with the same model. Wasted days of computation.

Actual fix: Store embedding dimension in metadata and handle multiple models:

import chromadb
from sentence_transformers import SentenceTransformer

# Initialize different embedding models
model_384 = SentenceTransformer('all-MiniLM-L6-v2')  # 384 dimensions
model_768 = SentenceTransformer('all-mpnet-base-v2')     # 768 dimensions

# Store with model info
def add_with_metadata(collection, text, model_name):
    if model_name == "miniLM":
        embedding = model_384.encode(text)
        dim = 384
    else:
        embedding = model_768.encode(text)
        dim = 768

    collection.add(
        documents=[text],
        embeddings=[embedding],
        metadatas=[{
            "model": model_name,
            "dimension": dim,
            "source": "mixed"
        }]
    )

# Query with matching model
def query_with_model(collection, query_text, model_name):
    if model_name == "miniLM":
        query_embedding = model_384.encode(query_text)
    else:
        query_embedding = model_768.encode(query_text)

    # Filter by model
    results = collection.query(
        query_embeddings=[query_embedding],
        where={"model": model_name, "dimension": str(model_name == "miniLM" and 384 or 768)},
        n_results=5
    )
    return results

Source: This isn't in GitHub issues but is a common problem I encountered repeatedly

Comparison: When to use what

Factor	ChromaDB	Milvus
Setup complexity	pip install, 5 min	Docker compose, 30+ min
Learning curve	Easy	Steep
Scaling	Vertical (single node)	Horizontal (cluster)
Max vectors (realistic)	~1-5 million	100M+ (claimed)
Query speed (10k vectors)	~50-100ms	~20-50ms (cluster)
Query speed (1M vectors)	5-10s	100-500ms (cluster)
Memory footprint	~2-4GB for 100k vectors	~8-16GB for 100k vectors (with cluster overhead)
Cost	Free (local)	Free (self-hosted) or $500+/mo (managed)
Best for	Projects & MVPs	Production at scale

Production deployment patterns

ChromaDB with persistent storage:

# docker-compose.yml for Chroma
version: '3.8'

services:
  chromadb:
    image: chromadb/chroma:latest
    ports:
      - "8000:8000"
    volumes:
      - ./chroma_data:/chroma/chroma
    environment:
      - ALLOW_RESET=TRUE
      - ANONYMIZED_TELEMETRY=FALSE
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 8G
        reservations:
          memory: 4G

volumes:
  chroma_data:

Milvus with Docker Compose (minimal):

# docker-compose.yml for Milvus
version: '3.8'

services:
  etcd:
    image quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
    volumes:
      - ./etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379
             -listen-client-urls=http://0.0.0.0:2379
             -initial-advertise-peer-urls=http://127.0.0.1:2380
             -listen-peer-urls=http://0.0.0.0:2380

  minio:
    image quay.io/minio/minio:RELEASE.2023-03-20T02-21-38Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    volumes:
      - ./minio:/minio_data
    command: server /minio_data --console-address ":9001"

  milvus-standalone:
    image milvusdb/milvus:v2.3.0
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ./milvus:/var/lib/milvus
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"
    deploy:
      resources:
        limits:
          memory: 16G
        reservations:
          memory: 8G

Backup and recovery

ChromaDB backup:

import chromadb
from datetime import datetime

client = chromadb.HttpClient(host="localhost", port=8000)

# Export all collections
snapshot_id = f"backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

# Get all collections
collections = client.list_collections()

for collection in collections:
    # Get all data
    data = collection.get()

    # Save to file
    with open(f"{snapshot_id}_{collection.name}.json", 'w') as f:
        json.dump(data, f)

    # Save metadata
    with open(f"{snapshot_id}_{collection.name}_metadata.json", 'w') as f:
        json.dump(collection.metadata, f)

# Backup complete
print(f"Backed up {len(collections)} collections")

Milvus backup:

# Milvus provides backup tools
# Backup collection data
milvus_cli export collection -c my_docs -s backup_collection.json

# Backup entire database
milvus_cli export database -s backup_milvus.sql

# Restore
milvus_cli import collection -c my_docs -s backup_collection.json

Monitoring and metrics

ChromaDB doesn't have built-in metrics, but you can track usage:

import chromadb
import time

class MonitoredChromaDB:
    def __init__(self):
        self.client = chromadb.HttpClient()
        self.query_count = 0
        self.query_times = []

    def query(self, collection_name, query_text, n_results=10):
        start_time = time.time()

        collection = self.client.get_collection(collection_name)
        results = collection.query(
            query_texts=[query_text],
            n_results=n_results
        )

        elapsed = time.time() - start_time

        # Track metrics
        self.query_count += 1
        self.query_times.append(elapsed)

        # Log slow queries
        if elapsed > 1.0:
            print(f"Slow query detected: {elapsed:.2f}s")

        return results

    def get_stats(self):
        return {
            "total_queries": self.query_count,
            "avg_query_time": sum(self.query_times) / len(self.query_times) if self.query_times else 0,
            "slow_queries": sum(1 for t in self.query_times if t > 1.0)
        }

Migration strategies

Moving from Chroma to Milvus:

# migration_chroma_to_milvus.py
import chromadb
from pymilvus import connections, Collection, Field, CollectionSchema, DataType

# Source: Chroma
chroma_client = chromadb.HttpClient()
chroma_collection = chroma_client.get_collection("my_docs")

# Get all data
data = chroma_collection.get()

# Destination: Milvus
connections.connect("default", host="localhost", port="19530")

# Create schema
schema = FieldSchema([
    FieldSchema("id", DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema("text", DataType.VARCHAR, max_length=65535),
    FieldSchema("embedding", DataType.FLOAT_VECTOR, dim=384)
])

collection = Collection("my_docs", schema)
collection.create_index()

# Migrate data
for i, doc in enumerate(data['documents']):
    # Re-embed (or reuse if stored)
    embedding = get_embedding(doc)  # Your embedding function

    collection.insert([{
        "id": i,
        "text": doc,
        "embedding": embedding
    }])

    if i % 1000 == 0:
        print(f"Migrated {i} documents...")

Performance tuning tips

ChromaDB optimization:

Use DuckDB for metadata filtering: Faster than Chroma's built-in filters
Batch inserts: Add documents in batches of 100-1000

Pre-compute embeddings:

Use persistent storage: Prevents OOM and enables restarts

Milvus optimization:

Enable SSD cache: Dramatically improves query performance
Use partitioning: Split collections by date/category for faster searches
Tune M parameter: Higher M = better recall but slower
Enable query cache: Milvus can cache frequent queries

Cost comparison

Self-hosted costs (monthly estimates):

Scale	ChromaDB	Milvus (self-hosted)	Milvus (Zilliz Cloud)
Small (10k vectors)	$0 (2GB RAM)	$20 (16GB RAM + 2 CPUs)	$70
Medium (100k vectors)	$0 (8GB RAM)	$100 (64GB RAM + 4 CPUs)	$200
Large (1M vectors)	$100+ (64GB RAM)	$500+ (256GB RAM + 16 CPUs)	$500

Note: ChromaDB gets expensive at scale because you need a single large machine. Milvus distributes load.

Bottom line

ChromaDB is perfect for 95% of RAG projects. It's simple, fast to set up, and works great up to a few hundred thousand vectors. Most projects never outgrow it.

Milvus is for when you actually need to scale. If you're dealing with millions of vectors, need horizontal scaling, or are building a production service with SLAs, Milvus is worth the complexity.

My recommendation: Start with ChromaDB. Migrate to Milvus when you hit ~500k vectors or notice performance degradation. The migration path is well-defined and both systems speak the same language (essentially).

Links: trychroma.com | milvus.io | zilliz.com (managed Milvus)