The problem I was trying to solve
Had tons of documentation scattered everywhere - Google Docs, Notion, PDFs, random markdown files. Every time I needed to find something, I'd search through everything manually. Annoying.
Wanted a chatbot that knew all this stuff. Ask it a question, it searches through everything, gives an answer with sources. Seemed simple enough.
Looked into building it myself. LangChain? Too complex. Custom RAG implementation? Days of work. Then found Embedchain. Promised to handle all the messy parts.
What I actually built
Upload your stuff → Embedchain handles chunking, embeddings, vector storage → Ask questions → Get answers with citations. Took like 2 hours.
Now I just ask "what did we decide about the API rate limits?" and it tells me, with links to the exact documents.
So what is Embedchain
Embedchain is a Python framework for building RAG (Retrieval Augmented Generation) apps. Basically, it lets you create AI chatbots that know about your specific data.
What it does:
- Ingests data: PDFs, docs, websites, YouTube, text files - whatever you have
- Chunks it: Breaks documents into pieces that LLMs can handle
- Embeds it: Converts to vectors for semantic search
- Stores it: Uses vector database (can be local or cloud)
- Retrieves: Finds relevant chunks when you ask questions
- Answers: Uses GPT/Claude to generate responses from retrieved context
Why it's easier than rolling your own:
- One-line data ingestion
- Handles different file types automatically
- Built-in chunking strategies
- Works with any LLM (OpenAI, Claude, local models)
- Supports multiple vector databases
- Simple Python API
It's open source, so you can self-host everything. No sending your data to third-party services if you don't want to.
Setting it up
Install
Pretty straightforward:
pip install embedchain
Get API keys
You'll need at least one LLM API key:
# For OpenAI (GPT-4, GPT-3.5)
export OPENAI_API_KEY="sk-..."
# Or for Anthropic (Claude)
export ANTHROPIC_API_KEY="sk-ant-..."
For vector database, you can use:
- ChromaDB: Local, free, no setup needed (easiest to start)
- Pinecone: Cloud, better for scale (free tier available)
- Weaviate: Self-hosted option
I started with ChromaDB since it's local and free. Moved to Pinecone once I had more data.
Building your first bot
Simple example
Here's the bare minimum to get started:
from embedchain import App
# Create app (uses ChromaDB locally by default)
app = App()
# Add some data
app.add("https://en.wikipedia.org/wiki/Artificial_intelligence")
app.add_local("path/to/your/document.pdf")
# Ask questions
result = app.query("What is artificial intelligence?")
print(result)
With configuration
For more control over how it works:
from embedchain import App
from embedchain.config import AppConfig, ChunkerConfig
# Configure behavior
config = AppConfig(
log_level="INFO",
collect_metrics=False # disable telemetry if you want
)
# Configure chunking (how docs are split)
chunker_config = ChunkerConfig(
chunk_size=500, # size of each chunk
chunk_overlap=50, # overlap between chunks
length_function="len" # how to measure size
)
# Create app with config
app = App(config=config, chunker_config=chunker_config)
# Now add data and query
app.add("path/to/docs")
response = app.query("What's in those docs?")
Different data sources
Embedchain handles various types:
# Web pages
app.add("https://example.com/page")
# Local files
app.add_local("docs/report.pdf")
app.add_local("notes.txt")
app.add_local("data/docs") # whole directory
# YouTube videos
app.add("https://youtube.com/watch?v=xxx")
# Direct text
app.add({"content": "Some text you want to add", "meta_data": {"source": "manual"}})
# Q&A pairs
app.add({"question": "What's the refund policy?", "answer": "30 days, no questions asked"})
Querying with context
Get citations and see what it retrieved:
# Get response with sources
response = app.query(
"What's our refund policy?",
citations=True # include where the answer came from
)
print(response.answer)
print(response.sources) # shows which docs were used
# Query with specific context
response = app.query(
"Compare our pricing with competitors",
where={"metadata": {"category": "pricing"}} # filter by metadata
)
Stuff I've built with it
Company knowledge bot
Fed it all our internal docs - policies, procedures, meeting notes. Now team asks it questions instead of bugging each other.
# Added company wiki
app.add_local("company_docs/wiki")
app.add_local("company_docs/policies")
app.add_local("company_docs/meeting_notes")
# Now works like this
query = "What's our work from home policy?"
answer = app.query(query)
# Returns: "Employees can work from home up to 3 days per week..."
Research assistant
Upload research papers, ask questions about them. Way faster than reading everything.
# Load papers
for paper in os.listdir("research_papers"):
app.add_local(f"research_papers/{paper}")
# Now can ask
app.query("What are the main findings about transformer attention?")
app.query("Compare the approaches in these three papers")
Customer support bot
Trained on product docs, FAQs, past support tickets. Handles common questions automatically.
# Product documentation
app.add_local("docs/user_guide.pdf")
app.add_local("docs/api_reference.pdf")
app.add_local("support/faq.md")
app.add_local("support/common_issues.md")
# Deploy as API (FastAPI example)
from fastapi import FastAPI
api = FastAPI()
@api.post("/chat")
async def chat(question: str):
answer = app.query(question)
return {"response": answer}
Personal notes search
All my notes, bookmarks, articles - now searchable by meaning, not just keywords.
# My digital brain
app.add_local("notes/obsidian_vault")
app.add_local("bookmarks/bookmarks.html")
app.add("https://my-blog.com") # my own writing
# Now I can ask
app.query("What did I write about async programming?")
app.query("Find articles about Rust I saved")
Getting better results
Choosing chunk size
Chunk size affects retrieval quality:
# Smaller chunks = more precise but less context
chunker_config = ChunkerConfig(
chunk_size=200, # small chunks
chunk_overlap=20
)
# Larger chunks = more context but less precise
chunker_config = ChunkerConfig(
chunk_size=1000, # big chunks
chunk_overlap=100
)
# I found 500-800 works well for most docs
# Adjust based on your content
Using better LLMs
GPT-4 is smarter but costs more:
# Use GPT-4 for complex queries
from embedchain.llm.gpt4 import GPT4Llm
llm = GPT4Llm(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4",
temperature=0.3 # lower = more focused
)
app = App(llm=llm)
# Or use Claude (sometimes better for long docs)
from embedchain.llm.anthropic import AnthropicLlm
llm = AnthropicLlm(
api_key=os.getenv("ANTHROPIC_API_KEY"),
model="claude-2"
)
Custom instructions
Tell the bot how to behave:
# Add system prompt
config = AppConfig(
system_prompt="""You are a helpful assistant for ACME Corp.
Answer questions based only on the provided context.
If you don't know, say so - don't make things up.
Be concise but thorough."""
)
app = App(config=config)
Hybrid search
Combine semantic and keyword search:
from embedchain.vectordb.chroma import ChromaDB
# Enable hybrid search
vectordb_config = {
"hybrid_search": True, # semantic + keyword
"bm25_weight": 0.3, # keyword weight
"semantic_weight": 0.7 # semantic weight
}
app = App(vectordb_config=vectordb_config)
Deploying it
Simple web UI
Embedchain comes with a built-in UI:
from embedchain import App
app = App()
# Start web interface
app.run() # opens at http://localhost:5000
FastAPI backend
For production deployments:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
app_api = FastAPI()
chatbot = App()
class Query(BaseModel):
question: str
@app_api.post("/chat")
async def chat(query: Query):
try:
response = chatbot.query(query.question)
return {"answer": response}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# Run with: uvicorn main:app_api --reload
Docker deployment
# Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "app.py"]
# Run
docker build -t embedchain-bot .
docker run -p 5000:5000 -e OPENAI_API_KEY=xxx embedchain-bot
Stuff that didn't work
Poor retrieval
Answers were irrelevant or missing info.
# Fixed by
1. Adjusting chunk size (500 → 800)
2. Increasing overlap (50 → 100)
3. Enabling hybrid search
4. Adding more relevant documents
Slow ingestion
Large PDFs took forever to process.
# Fixed by
1. Using async ingestion
2. Processing in batches
3. Switching to faster vector DB (Pinecone)
4. Pre-splitting huge documents
Hallucinations
Making stuff up that wasn't in the docs.
# Fixed by
1. Adding "don't make things up" to system prompt
2. Lowering temperature (0.7 → 0.3)
3. Requiring citations (shows sources)
4. Using GPT-4 instead of 3.5
Memory issues
ChromaDB ate all my RAM with lots of docs.
# Fixed by
1. Switching to Pinecone (cloud-based)
2. Purging old embeddings
3. Limiting document size before ingestion
Embedchain vs alternatives
| Embedchain | LangChain | LlamaIndex | |
|---|---|---|---|
| Learning curve | Easy | Steep | Moderate |
| Setup time | Minutes | Hours | Hours |
| Code required | Minimal | Lots | Moderate |
| Data sources | Built-in | Manual loaders | Manual loaders |
| Chunking | Auto | Manual | Auto (advanced) |
| Customization | Good | Excellent | Excellent |
| Best for | Quick RAG apps | Complex workflows | Advanced indexing |
Embedchain is the fastest way to get a RAG app running. LangChain and LlamaIndex are more powerful if you need custom stuff.
Would I recommend it?
Yeah, if you need to build a chatbot on your own data and don't want to spend weeks on it. Embedchain handles all the annoying parts - chunking, embeddings, retrieval, storage.
I built our company knowledge bot in an afternoon. It's not perfect - sometimes retrieves irrelevant chunks, occasionally hallucinates. But for 80% of queries, it works great.
The simple API is what won me over. Add data, query data. That's it. Didn't have to learn a whole framework or understand vector databases deeply.
If you outgrow it, you can always move to LangChain. But for most use cases, Embedchain is plenty.
Links: github.com/embedchain/embedchain | Docs: docs.embedchain.ai