Wan-2.1: 4K Video Generation That Actually Works

Been running video generation experiments for 6 months. Stable Diffusion video was okay, but Wan-2.1 changed everything - actual 4K cinematic output that doesn't look like AI. Here's what I learned getting it running on a 24GB GPU setup.

CUDA Out of Memory on 24GB GPU

Problem

RTX 4090 with 24GB VRAM. Model loads fine, but generation crashes at frame 3. Tried reducing batch size to 1, still OOM. Watching nvidia-smi, memory usage spikes to 26GB then crashes.

Error: CUDA out of memory. Tried to allocate 2.5GB

What I Tried

Attempt 1: Reduced resolution from 4K to 1080p - worked but defeated the purpose.
Attempt 2: Enabled gradient checkpointing - slower, still OOM at 4K.
Attempt 3: Used CPU offloading - took 45 minutes per frame.

Actual Fix

Wan-2.1 uses a tiled approach for high-resolution generation. The key is enabling tiling AND using sequential frame generation instead of batch processing. Also, the model needs to be loaded in 8-bit quantization, not 16-bit.

# Working 4K generation config
import torch
from wan import WanModel

# Load model with 8-bit quantization
model = WanModel.from_pretrained(
    "Wan-Video/Wan2.1-4K",
    torch_dtype=torch.float16,
    variant="8bit"  # Critical for 24GB GPUs
).cuda()

# Enable tiling for 4K
generation_config = {
    "width": 3840,
    "height": 2160,
    "num_frames": 120,
    "tiling": {
        "enabled": True,
        "tile_size": 512,  # Process in 512x512 tiles
        "overlap": 64,     # Overlap to prevent seams
    },
    "batch_size": 1,  # Generate frames sequentially
    "compile": False,  # torch.compile increases memory
}

# Generate with memory cleanup
for i in range(0, 120, 4):  # Process 4 frames at a time
    frames = model.generate(
        prompt="cinematic shot",
        **generation_config,
        frame_start=i,
        frame_end=i+4
    )
    torch.cuda.empty_cache()  # Clear cache between batches

Temporal Flickering in Generated Videos

Problem

Generated videos look good frame-by-frame, but when played, there's noticeable flickering. Objects appear/disappear between frames, and lighting changes unnaturally. Makes the output unusable for professional work.

What I Tried

Attempt 1: Increased CFG scale - made flickering worse.
Attempt 2: Added negative prompts for "flickering" - no effect.
Attempt 3: Reduced number of inference steps - smoother but blurry.

Actual Fix

The issue was temporal attention not being properly configured. Wan-2.1 has a specific temporal smoothing parameter that needs tuning. Also, using the "consistency" decoder variant instead of the default one makes a huge difference.

# Temporal consistency configuration
generation_config = {
    # Basic settings
    "num_inference_steps": 50,
    "cfg_scale": 7.5,

    # Temporal smoothing - this is the key
    "temporal": {
        "smoothing": 0.85,  # 0-1, higher = smoother but less detail
        "window_size": 8,   # Frames to consider for consistency
        "attention": "full",  # Use full temporal attention
    },

    # Use consistency decoder
    "decoder_type": "consistency",  # Instead of "standard"

    # Frame blending for seamlessness
    "frame_blending": {
        "enabled": True,
        "alpha": 0.3,  # Blend factor
    }
}

# Alternative: Use the temporal consistency scheduler
from wan.schedulers import ConsistencyScheduler

scheduler = ConsistencyScheduler(
    num_steps=50,
    temporal_smoothing=0.85,
    consistency_weight=0.7
)

model.set_scheduler(scheduler)

What I Learned

Lesson 1: 4K generation requires 8-bit quantization on 24GB GPUs - 16-bit will OOM.
Lesson 2: Temporal consistency needs explicit tuning - default settings flicker.
Lesson 3: Process in small batches (4 frames) with aggressive cache clearing.
Overall: Wan-2.1 produces cinematic quality, but needs serious hardware tuning. For production, consider multiple GPU setups or cloud rendering.

Production Setup

# Install Wan-2.1 with dependencies
git clone https://github.com/Wan-Video/Wan-Video.git
cd Wan-Video

# Create conda environment
conda create -n wan python=3.10
conda activate wan

# Install PyTorch with CUDA support
pip install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121

# Install Wan dependencies
pip install -r requirements.txt

# Download 4K model (optional, can use smaller models)
python scripts/download_model.py --variant 4K

# Verify installation
python -c "from wan import WanModel; print('OK')"

Monitoring & Debugging

Red Flags to Watch For

VRAM usage >22GB - reduce tile size or batch size
Flickering in output - increase temporal smoothing
Generation taking >30min for 10s video - check for CPU bottlenecks
Seams between tiles - increase tile overlap

Problem

What I Tried

Actual Fix

Problem

What I Tried

Actual Fix

What I Learned

Production Setup

Monitoring & Debugging

Red Flags to Watch For

Related Resources