Alibaba Unveils Wan2.2: World’s First Open-Source MoE Video Generation Model to Deliver Cinematic-Grade AI Videos

Alibaba has launched Wan2.2, the world’s first open-source large-scale video generation models built on a Mixture-of-Experts (MoE) architecture, marking a major milestone in AI-driven content creation.

The Wan2.2 release includes a text-to-video model (Wan2.2-T2V-A14B), an image-to-video model (Wan2.2-I2V-A14B), and a hybrid model (Wan2.2-TI2V-5B) that supports both generation modes within a single framework. All three are now open-sourced via Hugging Face, GitHub, and Alibaba Cloud’s ModelScope.

Positioned as a leap forward for developers and creators, Wan2.2 enables cinematic-quality video production with a single prompt, offering granular control over lighting, tone, camera dynamics, and even complex motions like facial expressions and sports sequences.

Built for cinematic control—and computational efficiency

Trained on highly curated aesthetic datasets, Wan2.2’s flagship MoE models are capable of producing visually rich, cinema-style video content while addressing one of the biggest bottlenecks in video AI: computational load.

Each MoE model contains 27 billion parameters, but only 14 billion are activated per generation step—achieved via a dual-expert denoising architecture. One expert handles high-noise layers focused on scene structure, while the second sharpens detail and texture at lower noise levels. This novel approach reduces compute demand by up to 50%, making it significantly more accessible for users with limited hardware.

Further, a cinematic prompt system allows users to tailor generation outputs by categorising aesthetic elements such as lighting direction, colour tone, and scene composition—translating prompts into highly expressive visual storytelling.

Greater diversity, realism, and artistic range

Compared to its predecessor, Wan2.1, the new models benefit from 83.2% more video data and 65.6% more image data, resulting in stronger generalisation across complex environments and enhanced artistic expressiveness.

The models can now produce videos with more realistic motion, including accurate hand gestures, facial expressions, and adherence to physical laws, making them suitable for high-fidelity animation, digital filmmaking, and advanced simulations.

A compact model for scalable video generation

For developers prioritising speed and scalability, the Wan2.2-TI2V-5B hybrid model uses a high-compression 3D VAE architecture. It achieves an overall information compression rate of 64 (4x16x16) and is able to generate 5-second 720p videos in just minutes on consumer-grade GPUs—dramatically lowering entry barriers for real-time content creation.

Continuing momentum in open-source AI

Wan2.2 builds on Alibaba’s momentum in open-source AI, following the release of four Wan2.1 models in February and the Wan2.1-VACE (Video All-in-one Creation and Editing) model in May 2025. Together, the models have already clocked over 5.4 million downloads across Hugging Face and ModelScope, signalling robust developer interest in open video generation tools.

Author


Discover more from techcoffeehouse.com

Subscribe to get the latest posts sent to your email.

Use promo code “TCH15” to get 15% off on checkout.

Share your thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from techcoffeehouse.com

Subscribe now to keep reading and get access to the full archive.

Continue reading

Discover more from techcoffeehouse.com

Subscribe now to keep reading and get access to the full archive.

Continue reading