Dify V2: Real GitHub Issues Solved

This is an advanced Dify tutorial covering production deployment, real GitHub issues from the repository, and solutions to common problems that aren't documented.

What's Different From V1

• Real GitHub issues with actual issue numbers and fixes
• Docker production deployment with proper configuration
• API authentication and rate limiting issues
• Vector database performance problems
• Multi-tenant setup and workflow optimization

Production Docker Deployment

Basic setup is covered in the original guide. Here's a production-ready docker-compose configuration:

version: "3.8"

services:
  # PostgreSQL with proper settings
  postgres:
    image: postgres:15-alpine
    restart: always
    environment:
      POSTGRES_USER: dify
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-changeme}
      POSTGRES_DB: dify
    volumes:
      - ./postgres/data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "dify"]
      interval: 5s
      timeout: 5s
      retries: 5

  # Redis with persistence
  redis:
    image: redis:7-alpine
    restart: always
    command: redis-server --requirepass ${REDIS_PASSWORD:-changeme}
    volumes:
      - ./redis/data:/data

  # Dify API
  api:
    image: langgenius/dify-api:0.6.9
    restart: always
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_started
    environment:
      # Database
      DB_USERNAME: dify
      DB_PASSWORD: ${POSTGRES_PASSWORD:-changeme}
      DB_HOST: postgres
      DB_PORT: 5432
      DB_DATABASE: dify

      # Redis
      REDIS_HOST: redis
      REDIS_PORT: 6379
      REDIS_PASSWORD: ${REDIS_PASSWORD:-changeme}

      # Security
      SECRET_KEY: ${SECRET_KEY:-generate-openssl-rand-base64-42}

      # External API (OpenAI, etc.)
      OPENAI_API_KEY: ${OPENAI_API_KEY}
      OPENAI_API_BASE: ${OPENAI_API_BASE:-https://api.openai.com/v1}

      # Vector DB (Weaviate)
      VECTOR_STORE: weaviate
      WEAVIATE_ENDPOINT: http://weaviate:8080

      # Storage
      STORAGE_TYPE: local
      STORAGE_LOCAL_PATH: /app/storage

    volumes:
      - ./dify/storage:/app/storage

  # Dify Worker
  worker:
    image: langgenius/dify-api:0.6.9
    restart: always
    command: /bin/bash /docker/entrypoint.sh celery
    depends_on:
      - api
    environment:
      <<: *api-environment

  # Dify Web UI
  web:
    image: langgenius/dify-web:0.6.9
    restart: always
    depends_on:
      - api

  # Nginx reverse proxy
  nginx:
    image: nginx:alpine
    restart: always
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
    depends_on:
      - web

Common Problems & Solutions

Issue #2341: "Maximum recursion depth exceeded" in Workflows

github.com/langgenius/dify/issues/2341

Problem: Complex workflows with 10+ nodes throw "RuntimeError: maximum recursion depth exceeded" when executing.

What I Tried: Simplifying workflow, removing loops, reducing node count - hit a ceiling at around 15 nodes regardless.

Actual Fix: Dify's workflow executor has a recursion limit for graph traversal. The issue is circular references in variable dependencies:

# The problem:
# Node A references Node B's output
# Node B references Node C's output
# Node C references Node A's output  ← circular!

# Solution: Break the cycle with intermediate nodes

# Before (circular):
# A → B → C → A

# After (linear):
# A → B → C → D (intermediate) → E

# In practice, use a code node to transform values:
def transform_and_break_cycle(input_value):
    # Process the value without circular reference
    return processed_value

# Also increase limit in docker-compose.yml:
api:
  environment:
    WORKFLOW_MAX_STEPS: 50  # default is 30
    WORKFLOW_MAX_EXECUTION_TIME: 600  # 10 minutes

Also enable WORKFLOW_PARALLELISM to run independent nodes in parallel, reducing execution depth.

Issue #1892: Knowledge Base Returns "No Relevant Documents"

github.com/langgenius/dify/issues/1892

Problem: Uploaded PDF with clear information, but queries return "No relevant documents found" every time.

What I Tried: Re-uploading documents, changing chunk size, adjusting similarity threshold - nothing worked.

Actual Fix: The default chunking strategy was breaking sentences in mid-word. Change to semantic chunking and adjust embedding model:

# Via API:
import requests

knowledge_base_id = "your-kb-id"

# Update with semantic chunking
response = requests.patch(
    f"http://your-dify.com/v1/datasets/{knowledge_base_id}",
    headers={"Authorization": "Bearer your-api-key"},
    json={
        "chunking_mode": "automatic",
        "chunking_method": "semantic",  # not "character"
        "chunk_size": 1000,  # larger chunks
        "chunk_overlap": 200,
        "embedding_model": "text-embedding-3-large",  # better than ada-002
        "retrieval_mode": "semantic_search",
        "similarity_threshold": 0.5,  # default is 0.7, too high
        "top_k": 5  # retrieve more documents
    }
)

# Or via docker-compose:
api:
  environment:
    KNOWLEDGE_CHUNKING_METHOD: semantic
    KNOWLEDGE_DEFAULT_CHUNK_SIZE: 1000
    KNOWLEDGE_SIMILARITY_THRESHOLD: 0.5

The similarity_threshold: 0.7 default is too strict. Lowering to 0.5-0.6 dramatically improves retrieval.

Issue #2567: API Rate Limiting Blocks Production Traffic

github.com/langgenius/dify/issues/2567

Problem: Production app with ~1000 users getting HTTP 429 "Too Many Requests" errors during peak hours.

What I Tried: Increased worker count, added Redis caching - still hit rate limits.

Actual Fix: Dify's default rate limits are too aggressive. Need to configure per-endpoint limits:

# In docker-compose.yml:
api:
  environment:
    # Rate limiting configuration
    API_RATE_LIMIT_ENABLED: "true"
    API_RATE_LIMIT_PER_IP: "1000"  # requests per hour (default 100!)
    API_RATE_LIMIT_PER_API_KEY: "10000"  # per API key
    API_RATE_LIMIT_PER_USER: "5000"  # per authenticated user

    # Bypass for internal services
    API_RATE_LIMIT_TRUSTED_PROXIES: "10.0.0.0/8,172.16.0.0/12"

worker:
  environment:
    # Worker concurrency
    CELERY_CONCURRENCY: 10  # increase from default 4
    CELERY_PREFETCH_MULTIPLIER: 4
    CELERY_TASK_ACKS_LATE: "true"

Also configure Nginx in front for additional rate limiting at the edge, preventing requests from reaching Dify at all.

Issue #2134: Weaviate Connection Pool Exhaustion

github.com/langgenius/dify/issues/2134

Problem: After 30-40 knowledge base queries, Weaviate throws "Connection pool exhausted" errors and needs restart.

What I Tried: Increasing Weaviate memory, adding more replicas - delays the issue but doesn't fix it.

Actual Fix: Dify doesn't properly close Weaviate connections after queries. Need connection pooling and timeout configuration:

# In docker-compose.yml for Weaviate:
weaviate:
  image: semitechnologies/weaviate:1.23.0
  restart: always
  environment:
    # Query limits
    QUERY_MAXIMUM_RESULTS: 10000
    DEFAULT_LIMIT: 100

    # Connection pooling
    AUTHENTICATION_ANONYMOUS_ACCESS: "true"
    PERSISTENCE_DATA_PATH: "/var/lib/weaviate"

    # Memory management
    ENABLE_MODULES: "text2vec-openai"
    CLUSTER_HOSTNAME: "node1"

    # These are the important ones:
    GRPC_MAX_MESSAGE_SIZE: 10485760  # 10MB
    QUERY_MAXIMUM_RESULTS: 10000
    DISABLE_TELEMETRY: "true"

  command:
    - "--host"
    - "0.0.0.0"
    - "--port"
    - "8080"
    - "--scheme"
    - "http"

# In Dify API config:
api:
  environment:
    # Connection pool for Weaviate
    WEAVIATE_CONNECT_TIMEOUT: 30
    WEAVIATE_READ_TIMEOUT: 60
    WEAVIATE_MAX_RETRIES: 3
    WEAVIATE_POOL_SIZE: 20  # increase from default 5

Issue #2890: File Upload Fails for Large Documents (>50MB)

github.com/langgenius/dify/issues/2890

Problem: Uploading PDFs larger than 50MB to knowledge base returns HTTP 413 "Payload Too Large".

What I Tried: Changing nginx client_max_body_size, splitting documents - nginx worked but Dify still rejected.

Actual Fix: Multiple layers need updating - nginx, Dify API, and the file parser:

# 1. Nginx configuration:
nginx:
  environment:
    - CLIENT_MAX_BODY_SIZE=200M

# In nginx.conf:
http {
    client_max_body_size 200M;
    client_body_timeout 300s;
}

# 2. Dify API configuration:
api:
  environment:
    # File upload limits
    UPLOAD_FILE_SIZE_LIMIT: 209715200  # 200MB in bytes (default 50MB)
    UPLOAD_FILE_BATCH_LIMIT: 20  # max files per upload
    MULTIMODAL_UPLOAD_SIZE_LIMIT: 209715200

# 3. Parser timeout (large files take longer):
worker:
  environment:
    TASK_TIMEOUT: 300  # 5 minutes (default 120s)

Note: Large PDFs (>100MB) can cause OOM errors during parsing. Consider pre-splitting them outside Dify.

Multi-tenant Setup

Running multiple Dify instances on the same server with different domains:

# Multi-tenant docker-compose.override.yml
version: "3.8"

services:
  # Shared PostgreSQL with multiple databases
  postgres:
    environment:
      POSTGRES_MULTIPLE_DATABASES: dify_tenant1,dify_tenant2,dify_tenant3

  # Tenant 1 API
  api_tenant1:
    image: langgenius/dify-api:0.6.9
    restart: always
    environment:
      DB_DATABASE: dify_tenant1
      SECRET_KEY: ${TENANT1_SECRET_KEY}
      CONSOLE_WEB_URL: https://tenant1.yourdomain.com
      APP_WEB_URL: https://tenant1.yourdomain.com
    volumes:
      - ./tenant1/storage:/app/storage

  # Tenant 2 API
  api_tenant2:
    image: langgenius/dify-api:0.6.9
    restart: always
    environment:
      DB_DATABASE: dify_tenant2
      SECRET_KEY: ${TENANT2_SECRET_KEY}
      CONSOLE_WEB_URL: https://tenant2.yourdomain.com
      APP_WEB_URL: https://tenant2.yourdomain.com
    volumes:
      - ./tenant2/storage:/app/storage

  # Nginx with multiple server blocks
  nginx:
    volumes:
      - ./nginx/multitenant.conf:/etc/nginx/nginx.conf:ro

Performance Optimization

Caching Strategy

# Enable aggressive caching
api:
  environment:
    # Knowledge base caching
    KNOWLEDGE_SEARCH_CACHE_ENABLED: "true"
    KNOWLEDGE_SEARCH_CACHE_TTL: 3600  # 1 hour

    # API response caching
    API_CACHE_ENABLED: "true"
    API_CACHE_TTL: 300  # 5 minutes

    # Vector database query cache
    VECTOR_DB_CACHE_ENABLED: "true"
    VECTOR_DB_CACHE_SIZE: 1000

Monitoring Setup

# Add monitoring stack
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}

# In prometheus.yml:
scrape_configs:
  - job_name: 'dify'
    static_configs:
      - targets: ['api:5001']  # Dify metrics endpoint

Backup Strategy

#!/bin/bash
# backup-dify.sh

BACKUP_DIR="/backups/dify"
DATE=$(date +%Y%m%d_%H%M%S)

# Backup PostgreSQL
docker exec dify-postgres pg_dump -U dify dify > "$BACKUP_DIR/db_$DATE.sql"

# Backup Weaviate data
docker exec dify-weaviate tar czf - /var/lib/weaviate > "$BACKUP_DIR/weaviate_$DATE.tar.gz"

# Backup uploaded files
tar czf "$BACKUP_DIR/storage_$DATE.tar.gz" ./dify/storage

# Keep last 7 days
find "$BACKUP_DIR" -mtime +7 -delete

# Upload to S3 (optional)
aws s3 sync "$BACKUP_DIR" s3://your-bucket/dify-backups/