Alibaba has launched Wan2.2, the world’s first open-source large-scale video generation models built on a Mixture-of-Experts (MoE) architecture, marking a major milestone in AI-driven content creation.
The Wan2.2 release includes a text-to-video model (Wan2.2-T2V-A14B), an image-to-video model (Wan2.2-I2V-A14B), and a hybrid model (Wan2.2-TI2V-5B) that supports both generation modes within a single framework. All three are now open-sourced via Hugging Face, GitHub, and Alibaba Cloud’s ModelScope.
Positioned as a leap forward for developers and creators, Wan2.2 enables cinematic-quality video production with a single prompt, offering granular control over lighting, tone, camera dynamics, and even complex motions like facial expressions and sports sequences.
Built for cinematic control—and computational efficiency
Trained on highly curated aesthetic datasets, Wan2.2’s flagship MoE models are capable of producing visually rich, cinema-style video content while addressing one of the biggest bottlenecks in video AI: computational load.
Each MoE model contains 27 billion parameters, but only 14 billion are activated per generation step—achieved via a dual-expert denoising architecture. One expert handles high-noise layers focused on scene structure, while the second sharpens detail and texture at lower noise levels. This novel approach reduces compute demand by up to 50%, making it significantly more accessible for users with limited hardware.
Further, a cinematic prompt system allows users to tailor generation outputs by categorising aesthetic elements such as lighting direction, colour tone, and scene composition—translating prompts into highly expressive visual storytelling.
Greater diversity, realism, and artistic range

Compared to its predecessor, Wan2.1, the new models benefit from 83.2% more video data and 65.6% more image data, resulting in stronger generalisation across complex environments and enhanced artistic expressiveness.
The models can now produce videos with more realistic motion, including accurate hand gestures, facial expressions, and adherence to physical laws, making them suitable for high-fidelity animation, digital filmmaking, and advanced simulations.
A compact model for scalable video generation
For developers prioritising speed and scalability, the Wan2.2-TI2V-5B hybrid model uses a high-compression 3D VAE architecture. It achieves an overall information compression rate of 64 (4x16x16) and is able to generate 5-second 720p videos in just minutes on consumer-grade GPUs—dramatically lowering entry barriers for real-time content creation.
Continuing momentum in open-source AI
Wan2.2 builds on Alibaba’s momentum in open-source AI, following the release of four Wan2.1 models in February and the Wan2.1-VACE (Video All-in-one Creation and Editing) model in May 2025. Together, the models have already clocked over 5.4 million downloads across Hugging Face and ModelScope, signalling robust developer interest in open video generation tools.



Share your thoughts