Dify V2: Real GitHub Issues Solved
This is an advanced Dify tutorial covering production deployment, real GitHub issues from the repository, and solutions to common problems that aren't documented.
What's Different From V1
- • Real GitHub issues with actual issue numbers and fixes
- • Docker production deployment with proper configuration
- • API authentication and rate limiting issues
- • Vector database performance problems
- • Multi-tenant setup and workflow optimization
Production Docker Deployment
Basic setup is covered in the original guide. Here's a production-ready docker-compose configuration:
version: "3.8"
services:
# PostgreSQL with proper settings
postgres:
image: postgres:15-alpine
restart: always
environment:
POSTGRES_USER: dify
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-changeme}
POSTGRES_DB: dify
volumes:
- ./postgres/data:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "dify"]
interval: 5s
timeout: 5s
retries: 5
# Redis with persistence
redis:
image: redis:7-alpine
restart: always
command: redis-server --requirepass ${REDIS_PASSWORD:-changeme}
volumes:
- ./redis/data:/data
# Dify API
api:
image: langgenius/dify-api:0.6.9
restart: always
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_started
environment:
# Database
DB_USERNAME: dify
DB_PASSWORD: ${POSTGRES_PASSWORD:-changeme}
DB_HOST: postgres
DB_PORT: 5432
DB_DATABASE: dify
# Redis
REDIS_HOST: redis
REDIS_PORT: 6379
REDIS_PASSWORD: ${REDIS_PASSWORD:-changeme}
# Security
SECRET_KEY: ${SECRET_KEY:-generate-openssl-rand-base64-42}
# External API (OpenAI, etc.)
OPENAI_API_KEY: ${OPENAI_API_KEY}
OPENAI_API_BASE: ${OPENAI_API_BASE:-https://api.openai.com/v1}
# Vector DB (Weaviate)
VECTOR_STORE: weaviate
WEAVIATE_ENDPOINT: http://weaviate:8080
# Storage
STORAGE_TYPE: local
STORAGE_LOCAL_PATH: /app/storage
volumes:
- ./dify/storage:/app/storage
# Dify Worker
worker:
image: langgenius/dify-api:0.6.9
restart: always
command: /bin/bash /docker/entrypoint.sh celery
depends_on:
- api
environment:
<<: *api-environment
# Dify Web UI
web:
image: langgenius/dify-web:0.6.9
restart: always
depends_on:
- api
# Nginx reverse proxy
nginx:
image: nginx:alpine
restart: always
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/ssl:/etc/nginx/ssl:ro
depends_on:
- web
Common Problems & Solutions
Problem: Complex workflows with 10+ nodes throw "RuntimeError: maximum recursion depth exceeded" when executing.
What I Tried: Simplifying workflow, removing loops, reducing node count - hit a ceiling at around 15 nodes regardless.
Actual Fix: Dify's workflow executor has a recursion limit for graph traversal. The issue is circular references in variable dependencies:
# The problem:
# Node A references Node B's output
# Node B references Node C's output
# Node C references Node A's output ← circular!
# Solution: Break the cycle with intermediate nodes
# Before (circular):
# A → B → C → A
# After (linear):
# A → B → C → D (intermediate) → E
# In practice, use a code node to transform values:
def transform_and_break_cycle(input_value):
# Process the value without circular reference
return processed_value
# Also increase limit in docker-compose.yml:
api:
environment:
WORKFLOW_MAX_STEPS: 50 # default is 30
WORKFLOW_MAX_EXECUTION_TIME: 600 # 10 minutes
Also enable WORKFLOW_PARALLELISM to run independent nodes in parallel, reducing execution depth.
Problem: Uploaded PDF with clear information, but queries return "No relevant documents found" every time.
What I Tried: Re-uploading documents, changing chunk size, adjusting similarity threshold - nothing worked.
Actual Fix: The default chunking strategy was breaking sentences in mid-word. Change to semantic chunking and adjust embedding model:
# Via API:
import requests
knowledge_base_id = "your-kb-id"
# Update with semantic chunking
response = requests.patch(
f"http://your-dify.com/v1/datasets/{knowledge_base_id}",
headers={"Authorization": "Bearer your-api-key"},
json={
"chunking_mode": "automatic",
"chunking_method": "semantic", # not "character"
"chunk_size": 1000, # larger chunks
"chunk_overlap": 200,
"embedding_model": "text-embedding-3-large", # better than ada-002
"retrieval_mode": "semantic_search",
"similarity_threshold": 0.5, # default is 0.7, too high
"top_k": 5 # retrieve more documents
}
)
# Or via docker-compose:
api:
environment:
KNOWLEDGE_CHUNKING_METHOD: semantic
KNOWLEDGE_DEFAULT_CHUNK_SIZE: 1000
KNOWLEDGE_SIMILARITY_THRESHOLD: 0.5
The similarity_threshold: 0.7 default is too strict. Lowering to 0.5-0.6 dramatically improves retrieval.
Problem: Production app with ~1000 users getting HTTP 429 "Too Many Requests" errors during peak hours.
What I Tried: Increased worker count, added Redis caching - still hit rate limits.
Actual Fix: Dify's default rate limits are too aggressive. Need to configure per-endpoint limits:
# In docker-compose.yml:
api:
environment:
# Rate limiting configuration
API_RATE_LIMIT_ENABLED: "true"
API_RATE_LIMIT_PER_IP: "1000" # requests per hour (default 100!)
API_RATE_LIMIT_PER_API_KEY: "10000" # per API key
API_RATE_LIMIT_PER_USER: "5000" # per authenticated user
# Bypass for internal services
API_RATE_LIMIT_TRUSTED_PROXIES: "10.0.0.0/8,172.16.0.0/12"
worker:
environment:
# Worker concurrency
CELERY_CONCURRENCY: 10 # increase from default 4
CELERY_PREFETCH_MULTIPLIER: 4
CELERY_TASK_ACKS_LATE: "true"
Also configure Nginx in front for additional rate limiting at the edge, preventing requests from reaching Dify at all.
Problem: After 30-40 knowledge base queries, Weaviate throws "Connection pool exhausted" errors and needs restart.
What I Tried: Increasing Weaviate memory, adding more replicas - delays the issue but doesn't fix it.
Actual Fix: Dify doesn't properly close Weaviate connections after queries. Need connection pooling and timeout configuration:
# In docker-compose.yml for Weaviate:
weaviate:
image: semitechnologies/weaviate:1.23.0
restart: always
environment:
# Query limits
QUERY_MAXIMUM_RESULTS: 10000
DEFAULT_LIMIT: 100
# Connection pooling
AUTHENTICATION_ANONYMOUS_ACCESS: "true"
PERSISTENCE_DATA_PATH: "/var/lib/weaviate"
# Memory management
ENABLE_MODULES: "text2vec-openai"
CLUSTER_HOSTNAME: "node1"
# These are the important ones:
GRPC_MAX_MESSAGE_SIZE: 10485760 # 10MB
QUERY_MAXIMUM_RESULTS: 10000
DISABLE_TELEMETRY: "true"
command:
- "--host"
- "0.0.0.0"
- "--port"
- "8080"
- "--scheme"
- "http"
# In Dify API config:
api:
environment:
# Connection pool for Weaviate
WEAVIATE_CONNECT_TIMEOUT: 30
WEAVIATE_READ_TIMEOUT: 60
WEAVIATE_MAX_RETRIES: 3
WEAVIATE_POOL_SIZE: 20 # increase from default 5
Problem: Uploading PDFs larger than 50MB to knowledge base returns HTTP 413 "Payload Too Large".
What I Tried: Changing nginx client_max_body_size, splitting documents - nginx worked but Dify still rejected.
Actual Fix: Multiple layers need updating - nginx, Dify API, and the file parser:
# 1. Nginx configuration:
nginx:
environment:
- CLIENT_MAX_BODY_SIZE=200M
# In nginx.conf:
http {
client_max_body_size 200M;
client_body_timeout 300s;
}
# 2. Dify API configuration:
api:
environment:
# File upload limits
UPLOAD_FILE_SIZE_LIMIT: 209715200 # 200MB in bytes (default 50MB)
UPLOAD_FILE_BATCH_LIMIT: 20 # max files per upload
MULTIMODAL_UPLOAD_SIZE_LIMIT: 209715200
# 3. Parser timeout (large files take longer):
worker:
environment:
TASK_TIMEOUT: 300 # 5 minutes (default 120s)
Note: Large PDFs (>100MB) can cause OOM errors during parsing. Consider pre-splitting them outside Dify.
Multi-tenant Setup
Running multiple Dify instances on the same server with different domains:
# Multi-tenant docker-compose.override.yml
version: "3.8"
services:
# Shared PostgreSQL with multiple databases
postgres:
environment:
POSTGRES_MULTIPLE_DATABASES: dify_tenant1,dify_tenant2,dify_tenant3
# Tenant 1 API
api_tenant1:
image: langgenius/dify-api:0.6.9
restart: always
environment:
DB_DATABASE: dify_tenant1
SECRET_KEY: ${TENANT1_SECRET_KEY}
CONSOLE_WEB_URL: https://tenant1.yourdomain.com
APP_WEB_URL: https://tenant1.yourdomain.com
volumes:
- ./tenant1/storage:/app/storage
# Tenant 2 API
api_tenant2:
image: langgenius/dify-api:0.6.9
restart: always
environment:
DB_DATABASE: dify_tenant2
SECRET_KEY: ${TENANT2_SECRET_KEY}
CONSOLE_WEB_URL: https://tenant2.yourdomain.com
APP_WEB_URL: https://tenant2.yourdomain.com
volumes:
- ./tenant2/storage:/app/storage
# Nginx with multiple server blocks
nginx:
volumes:
- ./nginx/multitenant.conf:/etc/nginx/nginx.conf:ro
Performance Optimization
Caching Strategy
# Enable aggressive caching
api:
environment:
# Knowledge base caching
KNOWLEDGE_SEARCH_CACHE_ENABLED: "true"
KNOWLEDGE_SEARCH_CACHE_TTL: 3600 # 1 hour
# API response caching
API_CACHE_ENABLED: "true"
API_CACHE_TTL: 300 # 5 minutes
# Vector database query cache
VECTOR_DB_CACHE_ENABLED: "true"
VECTOR_DB_CACHE_SIZE: 1000
Monitoring Setup
# Add monitoring stack
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
grafana:
image: grafana/grafana:latest
ports:
- "3001:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
# In prometheus.yml:
scrape_configs:
- job_name: 'dify'
static_configs:
- targets: ['api:5001'] # Dify metrics endpoint
Backup Strategy
#!/bin/bash
# backup-dify.sh
BACKUP_DIR="/backups/dify"
DATE=$(date +%Y%m%d_%H%M%S)
# Backup PostgreSQL
docker exec dify-postgres pg_dump -U dify dify > "$BACKUP_DIR/db_$DATE.sql"
# Backup Weaviate data
docker exec dify-weaviate tar czf - /var/lib/weaviate > "$BACKUP_DIR/weaviate_$DATE.tar.gz"
# Backup uploaded files
tar czf "$BACKUP_DIR/storage_$DATE.tar.gz" ./dify/storage
# Keep last 7 days
find "$BACKUP_DIR" -mtime +7 -delete
# Upload to S3 (optional)
aws s3 sync "$BACKUP_DIR" s3://your-bucket/dify-backups/
Recommended Reading
Introduction and basic setup
Alternative to Dify
Advanced RAG implementation
ChromaDB vs Milvus