Wan-2.1: 4K Video Generation That Actually Works

Been running video generation experiments for 6 months. Stable Diffusion video was okay, but Wan-2.1 changed everything - actual 4K cinematic output that doesn't look like AI. Here's what I learned getting it running on a 24GB GPU setup.

Problem

RTX 4090 with 24GB VRAM. Model loads fine, but generation crashes at frame 3. Tried reducing batch size to 1, still OOM. Watching nvidia-smi, memory usage spikes to 26GB then crashes.

Error: CUDA out of memory. Tried to allocate 2.5GB

What I Tried

Attempt 1: Reduced resolution from 4K to 1080p - worked but defeated the purpose.
Attempt 2: Enabled gradient checkpointing - slower, still OOM at 4K.
Attempt 3: Used CPU offloading - took 45 minutes per frame.

Actual Fix

Wan-2.1 uses a tiled approach for high-resolution generation. The key is enabling tiling AND using sequential frame generation instead of batch processing. Also, the model needs to be loaded in 8-bit quantization, not 16-bit.

# Working 4K generation config
import torch
from wan import WanModel

# Load model with 8-bit quantization
model = WanModel.from_pretrained(
    "Wan-Video/Wan2.1-4K",
    torch_dtype=torch.float16,
    variant="8bit"  # Critical for 24GB GPUs
).cuda()

# Enable tiling for 4K
generation_config = {
    "width": 3840,
    "height": 2160,
    "num_frames": 120,
    "tiling": {
        "enabled": True,
        "tile_size": 512,  # Process in 512x512 tiles
        "overlap": 64,     # Overlap to prevent seams
    },
    "batch_size": 1,  # Generate frames sequentially
    "compile": False,  # torch.compile increases memory
}

# Generate with memory cleanup
for i in range(0, 120, 4):  # Process 4 frames at a time
    frames = model.generate(
        prompt="cinematic shot",
        **generation_config,
        frame_start=i,
        frame_end=i+4
    )
    torch.cuda.empty_cache()  # Clear cache between batches

Problem

Generated videos look good frame-by-frame, but when played, there's noticeable flickering. Objects appear/disappear between frames, and lighting changes unnaturally. Makes the output unusable for professional work.

What I Tried

Attempt 1: Increased CFG scale - made flickering worse.
Attempt 2: Added negative prompts for "flickering" - no effect.
Attempt 3: Reduced number of inference steps - smoother but blurry.

Actual Fix

The issue was temporal attention not being properly configured. Wan-2.1 has a specific temporal smoothing parameter that needs tuning. Also, using the "consistency" decoder variant instead of the default one makes a huge difference.

# Temporal consistency configuration
generation_config = {
    # Basic settings
    "num_inference_steps": 50,
    "cfg_scale": 7.5,

    # Temporal smoothing - this is the key
    "temporal": {
        "smoothing": 0.85,  # 0-1, higher = smoother but less detail
        "window_size": 8,   # Frames to consider for consistency
        "attention": "full",  # Use full temporal attention
    },

    # Use consistency decoder
    "decoder_type": "consistency",  # Instead of "standard"

    # Frame blending for seamlessness
    "frame_blending": {
        "enabled": True,
        "alpha": 0.3,  # Blend factor
    }
}

# Alternative: Use the temporal consistency scheduler
from wan.schedulers import ConsistencyScheduler

scheduler = ConsistencyScheduler(
    num_steps=50,
    temporal_smoothing=0.85,
    consistency_weight=0.7
)

model.set_scheduler(scheduler)

What I Learned

Production Setup

# Install Wan-2.1 with dependencies
git clone https://github.com/Wan-Video/Wan-Video.git
cd Wan-Video

# Create conda environment
conda create -n wan python=3.10
conda activate wan

# Install PyTorch with CUDA support
pip install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121

# Install Wan dependencies
pip install -r requirements.txt

# Download 4K model (optional, can use smaller models)
python scripts/download_model.py --variant 4K

# Verify installation
python -c "from wan import WanModel; print('OK')"

Monitoring & Debugging

Red Flags to Watch For

Related Resources