Wan-2.1: 4K Video Generation That Actually Works
Been running video generation experiments for 6 months. Stable Diffusion video was okay, but Wan-2.1 changed everything - actual 4K cinematic output that doesn't look like AI. Here's what I learned getting it running on a 24GB GPU setup.
Problem
RTX 4090 with 24GB VRAM. Model loads fine, but generation crashes at frame 3.
Tried reducing batch size to 1, still OOM. Watching nvidia-smi, memory usage
spikes to 26GB then crashes.
Error: CUDA out of memory. Tried to allocate 2.5GB
What I Tried
Attempt 1: Reduced resolution from 4K to 1080p - worked but defeated the purpose.
Attempt 2: Enabled gradient checkpointing - slower, still OOM at 4K.
Attempt 3: Used CPU offloading - took 45 minutes per frame.
Actual Fix
Wan-2.1 uses a tiled approach for high-resolution generation. The key is enabling tiling AND using sequential frame generation instead of batch processing. Also, the model needs to be loaded in 8-bit quantization, not 16-bit.
# Working 4K generation config
import torch
from wan import WanModel
# Load model with 8-bit quantization
model = WanModel.from_pretrained(
"Wan-Video/Wan2.1-4K",
torch_dtype=torch.float16,
variant="8bit" # Critical for 24GB GPUs
).cuda()
# Enable tiling for 4K
generation_config = {
"width": 3840,
"height": 2160,
"num_frames": 120,
"tiling": {
"enabled": True,
"tile_size": 512, # Process in 512x512 tiles
"overlap": 64, # Overlap to prevent seams
},
"batch_size": 1, # Generate frames sequentially
"compile": False, # torch.compile increases memory
}
# Generate with memory cleanup
for i in range(0, 120, 4): # Process 4 frames at a time
frames = model.generate(
prompt="cinematic shot",
**generation_config,
frame_start=i,
frame_end=i+4
)
torch.cuda.empty_cache() # Clear cache between batches
Problem
Generated videos look good frame-by-frame, but when played, there's noticeable flickering. Objects appear/disappear between frames, and lighting changes unnaturally. Makes the output unusable for professional work.
What I Tried
Attempt 1: Increased CFG scale - made flickering worse.
Attempt 2: Added negative prompts for "flickering" - no effect.
Attempt 3: Reduced number of inference steps - smoother but blurry.
Actual Fix
The issue was temporal attention not being properly configured. Wan-2.1 has a specific temporal smoothing parameter that needs tuning. Also, using the "consistency" decoder variant instead of the default one makes a huge difference.
# Temporal consistency configuration
generation_config = {
# Basic settings
"num_inference_steps": 50,
"cfg_scale": 7.5,
# Temporal smoothing - this is the key
"temporal": {
"smoothing": 0.85, # 0-1, higher = smoother but less detail
"window_size": 8, # Frames to consider for consistency
"attention": "full", # Use full temporal attention
},
# Use consistency decoder
"decoder_type": "consistency", # Instead of "standard"
# Frame blending for seamlessness
"frame_blending": {
"enabled": True,
"alpha": 0.3, # Blend factor
}
}
# Alternative: Use the temporal consistency scheduler
from wan.schedulers import ConsistencyScheduler
scheduler = ConsistencyScheduler(
num_steps=50,
temporal_smoothing=0.85,
consistency_weight=0.7
)
model.set_scheduler(scheduler)
What I Learned
- Lesson 1: 4K generation requires 8-bit quantization on 24GB GPUs - 16-bit will OOM.
- Lesson 2: Temporal consistency needs explicit tuning - default settings flicker.
- Lesson 3: Process in small batches (4 frames) with aggressive cache clearing.
- Overall: Wan-2.1 produces cinematic quality, but needs serious hardware tuning. For production, consider multiple GPU setups or cloud rendering.
Production Setup
# Install Wan-2.1 with dependencies
git clone https://github.com/Wan-Video/Wan-Video.git
cd Wan-Video
# Create conda environment
conda create -n wan python=3.10
conda activate wan
# Install PyTorch with CUDA support
pip install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121
# Install Wan dependencies
pip install -r requirements.txt
# Download 4K model (optional, can use smaller models)
python scripts/download_model.py --variant 4K
# Verify installation
python -c "from wan import WanModel; print('OK')"
Monitoring & Debugging
Red Flags to Watch For
- VRAM usage >22GB - reduce tile size or batch size
- Flickering in output - increase temporal smoothing
- Generation taking >30min for 10s video - check for CPU bottlenecks
- Seams between tiles - increase tile overlap