qwen 3

Alibaba Qwen3 – new 6 dense models & 2 Moe models

Qwen3 introduces a new generation of open-weight large language models (LLMs) featuring hybrid reasoning modes and multilingual capabilities. The release includes two MoE models (Qwen3-235B-A22B and Qwen3-30B-A3B) and six dense models (0.6B to 32B parameters), all under Apache 2.0 license. These models achieve competitive performance against top-tier proprietary models like GPT-4o and Gemini-2.5-Pro.

Key Features

Hybrid Thinking Modes:

  • Thinking Mode: Enables step-by-step reasoning for complex tasks
  • Non-Thinking Mode: Delivers instant responses for simpler queries
  • Dynamic Control: Users adjust reasoning depth via /think and /no_think tags

Multilingual Support

Supports 119 languages/dialects across 10+ language families

Enhanced Capabilities

  • Agentic Functions: Improved tool calling and environmental interaction
  • Code & STEM: Outperforms previous Qwen models in coding/math tasks
  • Context Handling: Up to 128K token context length

Model Specifications

Model TypeExamplesParametersContext length
DenseQwen3-32B32B128K
MoEQwen3-235B-A22B235B (22B active)128K

Training Innovations

  • Data Scaling: 36T tokens (2x Qwen2.5) including synthetic STEM/code data
  • 3-Stage Pretraining:
    • Foundation (30T tokens @4K context)
    • Knowledge-Intensive (5T tokens)
    • Long-Context (32K context)

Usage Options

  • APIs: OpenAI-compatible endpoints via SGLang/vLLM
  • Local Tools: Ollama, LMStudio, llama.cpp

Agentic Implementation

Qwen-Agent framework enables:

  • Dynamic tool calling
  • Code interpretation
  • Multi-modal integration

Performance

  • Benchmarks: Competes with DeepSeek-R1, Grok-3, and Claude-3.5
  • Efficiency: MoE models match Qwen2.5 performance with 10% active parameters
  • Specialized Strengths:
    • STEM problem-solving
    • Long-context analysis
    • Low-resource language support

Conclusion

Qwen3 represents a significant leap in open-source LLM development, combining hybrid reasoning modes, unprecedented multilingual support, and efficient MoE architectures. Its flexible deployment options and strong performance across coding, math, and general tasks make it a versatile tool for global AI development. The project’s commitment to open-weight models empowers researchers and developers to build innovative solutions across languages and domains.

Key Takeaways

  • Hybrid Reasoning: Switch between deep analysis and instant responses
  • Multilingual Mastery: 119 languages with specialized low-resource support
  • Efficient MoE: 235B model activates only 22B parameters per query
  • Open Ecosystem: Apache 2.0 license with full model weights
  • Agent Ready: Built-in tool calling and code interpretation
  • Scalable Training: 36T token dataset with synthetic data augmentation

Links

Announcement: https://qwenlm.github.io/blog/qwen3/

Github: https://github.com/QwenLM/Qwen3

HuggingFace: https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f

Videos

Scroll to Top