Embedchain: Finally Built a Chatbot That Knows My Stuff

Wanted an AI that answers questions using my documents. Embedchain made it actually simple. Open source, Python, works out of the box. Here's how I got it running.

The problem I was trying to solve

Had tons of documentation scattered everywhere - Google Docs, Notion, PDFs, random markdown files. Every time I needed to find something, I'd search through everything manually. Annoying.

Wanted a chatbot that knew all this stuff. Ask it a question, it searches through everything, gives an answer with sources. Seemed simple enough.

Looked into building it myself. LangChain? Too complex. Custom RAG implementation? Days of work. Then found Embedchain. Promised to handle all the messy parts.

What I actually built

Upload your stuff → Embedchain handles chunking, embeddings, vector storage → Ask questions → Get answers with citations. Took like 2 hours.

Now I just ask "what did we decide about the API rate limits?" and it tells me, with links to the exact documents.

So what is Embedchain

Embedchain is a Python framework for building RAG (Retrieval Augmented Generation) apps. Basically, it lets you create AI chatbots that know about your specific data.

What it does:

  • Ingests data: PDFs, docs, websites, YouTube, text files - whatever you have
  • Chunks it: Breaks documents into pieces that LLMs can handle
  • Embeds it: Converts to vectors for semantic search
  • Stores it: Uses vector database (can be local or cloud)
  • Retrieves: Finds relevant chunks when you ask questions
  • Answers: Uses GPT/Claude to generate responses from retrieved context

Why it's easier than rolling your own:

  • One-line data ingestion
  • Handles different file types automatically
  • Built-in chunking strategies
  • Works with any LLM (OpenAI, Claude, local models)
  • Supports multiple vector databases
  • Simple Python API

It's open source, so you can self-host everything. No sending your data to third-party services if you don't want to.

Setting it up

Install

Pretty straightforward:

pip install embedchain

Get API keys

You'll need at least one LLM API key:

# For OpenAI (GPT-4, GPT-3.5)
export OPENAI_API_KEY="sk-..."

# Or for Anthropic (Claude)
export ANTHROPIC_API_KEY="sk-ant-..."

For vector database, you can use:

  • ChromaDB: Local, free, no setup needed (easiest to start)
  • Pinecone: Cloud, better for scale (free tier available)
  • Weaviate: Self-hosted option

I started with ChromaDB since it's local and free. Moved to Pinecone once I had more data.

Building your first bot

Simple example

Here's the bare minimum to get started:

from embedchain import App

# Create app (uses ChromaDB locally by default)
app = App()

# Add some data
app.add("https://en.wikipedia.org/wiki/Artificial_intelligence")
app.add_local("path/to/your/document.pdf")

# Ask questions
result = app.query("What is artificial intelligence?")
print(result)

With configuration

For more control over how it works:

from embedchain import App
from embedchain.config import AppConfig, ChunkerConfig

# Configure behavior
config = AppConfig(
    log_level="INFO",
    collect_metrics=False  # disable telemetry if you want
)

# Configure chunking (how docs are split)
chunker_config = ChunkerConfig(
    chunk_size=500,        # size of each chunk
    chunk_overlap=50,      # overlap between chunks
    length_function="len"  # how to measure size
)

# Create app with config
app = App(config=config, chunker_config=chunker_config)

# Now add data and query
app.add("path/to/docs")
response = app.query("What's in those docs?")

Different data sources

Embedchain handles various types:

# Web pages
app.add("https://example.com/page")

# Local files
app.add_local("docs/report.pdf")
app.add_local("notes.txt")
app.add_local("data/docs")  # whole directory

# YouTube videos
app.add("https://youtube.com/watch?v=xxx")

# Direct text
app.add({"content": "Some text you want to add", "meta_data": {"source": "manual"}})

# Q&A pairs
app.add({"question": "What's the refund policy?", "answer": "30 days, no questions asked"})

Querying with context

Get citations and see what it retrieved:

# Get response with sources
response = app.query(
    "What's our refund policy?",
    citations=True  # include where the answer came from
)

print(response.answer)
print(response.sources)  # shows which docs were used

# Query with specific context
response = app.query(
    "Compare our pricing with competitors",
    where={"metadata": {"category": "pricing"}}  # filter by metadata
)

Stuff I've built with it

Company knowledge bot

Fed it all our internal docs - policies, procedures, meeting notes. Now team asks it questions instead of bugging each other.

# Added company wiki
app.add_local("company_docs/wiki")
app.add_local("company_docs/policies")
app.add_local("company_docs/meeting_notes")

# Now works like this
query = "What's our work from home policy?"
answer = app.query(query)
# Returns: "Employees can work from home up to 3 days per week..."

Research assistant

Upload research papers, ask questions about them. Way faster than reading everything.

# Load papers
for paper in os.listdir("research_papers"):
    app.add_local(f"research_papers/{paper}")

# Now can ask
app.query("What are the main findings about transformer attention?")
app.query("Compare the approaches in these three papers")

Customer support bot

Trained on product docs, FAQs, past support tickets. Handles common questions automatically.

# Product documentation
app.add_local("docs/user_guide.pdf")
app.add_local("docs/api_reference.pdf")
app.add_local("support/faq.md")
app.add_local("support/common_issues.md")

# Deploy as API (FastAPI example)
from fastapi import FastAPI
api = FastAPI()

@api.post("/chat")
async def chat(question: str):
    answer = app.query(question)
    return {"response": answer}

Personal notes search

All my notes, bookmarks, articles - now searchable by meaning, not just keywords.

# My digital brain
app.add_local("notes/obsidian_vault")
app.add_local("bookmarks/bookmarks.html")
app.add("https://my-blog.com")  # my own writing

# Now I can ask
app.query("What did I write about async programming?")
app.query("Find articles about Rust I saved")

Getting better results

Choosing chunk size

Chunk size affects retrieval quality:

# Smaller chunks = more precise but less context
chunker_config = ChunkerConfig(
    chunk_size=200,     # small chunks
    chunk_overlap=20
)

# Larger chunks = more context but less precise
chunker_config = ChunkerConfig(
    chunk_size=1000,    # big chunks
    chunk_overlap=100
)

# I found 500-800 works well for most docs
# Adjust based on your content

Using better LLMs

GPT-4 is smarter but costs more:

# Use GPT-4 for complex queries
from embedchain.llm.gpt4 import GPT4Llm

llm = GPT4Llm(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-4",
    temperature=0.3  # lower = more focused
)

app = App(llm=llm)

# Or use Claude (sometimes better for long docs)
from embedchain.llm.anthropic import AnthropicLlm

llm = AnthropicLlm(
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    model="claude-2"
)

Custom instructions

Tell the bot how to behave:

# Add system prompt
config = AppConfig(
    system_prompt="""You are a helpful assistant for ACME Corp.
    Answer questions based only on the provided context.
    If you don't know, say so - don't make things up.
    Be concise but thorough."""
)

app = App(config=config)

Hybrid search

Combine semantic and keyword search:

from embedchain.vectordb.chroma import ChromaDB

# Enable hybrid search
vectordb_config = {
    "hybrid_search": True,  # semantic + keyword
    "bm25_weight": 0.3,     # keyword weight
    "semantic_weight": 0.7  # semantic weight
}

app = App(vectordb_config=vectordb_config)

Deploying it

Simple web UI

Embedchain comes with a built-in UI:

from embedchain import App

app = App()

# Start web interface
app.run()  # opens at http://localhost:5000

FastAPI backend

For production deployments:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

app_api = FastAPI()
chatbot = App()

class Query(BaseModel):
    question: str

@app_api.post("/chat")
async def chat(query: Query):
    try:
        response = chatbot.query(query.question)
        return {"answer": response}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# Run with: uvicorn main:app_api --reload

Docker deployment

# Dockerfile
FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
CMD ["python", "app.py"]

# Run
docker build -t embedchain-bot .
docker run -p 5000:5000 -e OPENAI_API_KEY=xxx embedchain-bot

Stuff that didn't work

Poor retrieval

Answers were irrelevant or missing info.

# Fixed by
1. Adjusting chunk size (500 → 800)
2. Increasing overlap (50 → 100)
3. Enabling hybrid search
4. Adding more relevant documents

Slow ingestion

Large PDFs took forever to process.

# Fixed by
1. Using async ingestion
2. Processing in batches
3. Switching to faster vector DB (Pinecone)
4. Pre-splitting huge documents

Hallucinations

Making stuff up that wasn't in the docs.

# Fixed by
1. Adding "don't make things up" to system prompt
2. Lowering temperature (0.7 → 0.3)
3. Requiring citations (shows sources)
4. Using GPT-4 instead of 3.5

Memory issues

ChromaDB ate all my RAM with lots of docs.

# Fixed by
1. Switching to Pinecone (cloud-based)
2. Purging old embeddings
3. Limiting document size before ingestion

Embedchain vs alternatives

Embedchain LangChain LlamaIndex
Learning curve Easy Steep Moderate
Setup time Minutes Hours Hours
Code required Minimal Lots Moderate
Data sources Built-in Manual loaders Manual loaders
Chunking Auto Manual Auto (advanced)
Customization Good Excellent Excellent
Best for Quick RAG apps Complex workflows Advanced indexing

Embedchain is the fastest way to get a RAG app running. LangChain and LlamaIndex are more powerful if you need custom stuff.

Would I recommend it?

Yeah, if you need to build a chatbot on your own data and don't want to spend weeks on it. Embedchain handles all the annoying parts - chunking, embeddings, retrieval, storage.

I built our company knowledge bot in an afternoon. It's not perfect - sometimes retrieves irrelevant chunks, occasionally hallucinates. But for 80% of queries, it works great.

The simple API is what won me over. Add data, query data. That's it. Didn't have to learn a whole framework or understand vector databases deeply.

If you outgrow it, you can always move to LangChain. But for most use cases, Embedchain is plenty.

Links: github.com/embedchain/embedchain | Docs: docs.embedchain.ai