Alibaba Qwen3 – new 6 dense models & 2 Moe models

Qwen3 introduces a new generation of open-weight large language models (LLMs) featuring hybrid reasoning modes and multilingual capabilities. The release includes two MoE models (Qwen3-235B-A22B and Qwen3-30B-A3B) and six dense models (0.6B to 32B parameters), all under Apache 2.0 license. These models achieve competitive performance against top-tier proprietary models like GPT-4o and Gemini-2.5-Pro.

Key Features

Hybrid Thinking Modes:

Thinking Mode: Enables step-by-step reasoning for complex tasks
Non-Thinking Mode: Delivers instant responses for simpler queries
Dynamic Control: Users adjust reasoning depth via /think and /no_think tags

Multilingual Support

Supports 119 languages/dialects across 10+ language families

Enhanced Capabilities

Agentic Functions: Improved tool calling and environmental interaction
Code & STEM: Outperforms previous Qwen models in coding/math tasks
Context Handling: Up to 128K token context length

Model Specifications

Model Type	Examples	Parameters	Context length
Dense	Qwen3-32B	32B	128K
MoE	Qwen3-235B-A22B	235B (22B active)	128K

Training Innovations

Data Scaling: 36T tokens (2x Qwen2.5) including synthetic STEM/code data
3-Stage Pretraining:
- Foundation (30T tokens @4K context)
- Knowledge-Intensive (5T tokens)
- Long-Context (32K context)

Usage Options

APIs: OpenAI-compatible endpoints via SGLang/vLLM
Local Tools: Ollama, LMStudio, llama.cpp

Agentic Implementation

Qwen-Agent framework enables:

Dynamic tool calling
Code interpretation
Multi-modal integration

Performance

Benchmarks: Competes with DeepSeek-R1, Grok-3, and Claude-3.5
Efficiency: MoE models match Qwen2.5 performance with 10% active parameters
Specialized Strengths:
- STEM problem-solving
- Long-context analysis
- Low-resource language support

Conclusion

Qwen3 represents a significant leap in open-source LLM development, combining hybrid reasoning modes, unprecedented multilingual support, and efficient MoE architectures. Its flexible deployment options and strong performance across coding, math, and general tasks make it a versatile tool for global AI development. The project’s commitment to open-weight models empowers researchers and developers to build innovative solutions across languages and domains.

Key Takeaways

Hybrid Reasoning: Switch between deep analysis and instant responses
Multilingual Mastery: 119 languages with specialized low-resource support
Efficient MoE: 235B model activates only 22B parameters per query
Open Ecosystem: Apache 2.0 license with full model weights
Agent Ready: Built-in tool calling and code interpretation
Scalable Training: 36T token dataset with synthetic data augmentation

Alibaba unveils #Qwen3, the latest in its open-sourced LLM family, featuring 6 dense models & 2 MoE models to power #AI innovation across industries. From mobile to robotics, the future is here. 🚀

Blog: https://t.co/dSAKhzcNho
GitHub: https://t.co/vGBlwSvTlO
Hugging Face:… pic.twitter.com/Uh04PrHrls
— Alibaba Group (@AlibabaGroup) April 29, 2025

Introducing Qwen3!

We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general… pic.twitter.com/JWZkJeHWhC
— Qwen (@Alibaba_Qwen) April 28, 2025