Alibaba Unveils Qwen3 Series, Boosting AI Capabilities with 3.6 Trillion Tokens
Alibaba has released and open-sourced the Qwen3 series of models, which are designed to seamlessly integrate two thinking modes, support 119 languages, and facilitate agent calls. The Qwen3 series includes two expert hybrid (MoE) models and six additional models. The flagship model, Qwen3-235B-A22B, has demonstrated competitive performance in code, mathematics, and general capabilities, matching top models like DeepSeek-R1 and OpenAI's o1. The MoE model Qwen3-30B-A3B, with only 10% of the activation parameters of QwQ-32B, outperforms it, and even smaller models like Qwen3-4B can match the performance of Qwen2.5-72B-Instruct. This system mimics human problem-solving by dividing tasks into smaller datasets, similar to a team of experts each handling different parts to enhance overall efficiency.
Alibaba has also open-sourced the weights of two MoE models: Qwen3-235B-A22B with over 235 billion total parameters and 22 billion activation parameters, and the smaller MoE model Qwen3-30B-A3B with approximately 30 billion total parameters and 3 billion activation parameters. Additionally, six Dense models—Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B—have been open-sourced under the Apache 2.0 license.
The Qwen3 series is described as a "hybrid" model, capable of both "thinking" to solve complex problems and quickly responding to simple requests, known as "thinking mode" and "non-thinking mode," respectively. The "thinking mode" allows the model to perform self-fact-checking, similar to OpenAI's o3 model, but with higher delay in the reasoning process. The Qwen team's blog post highlights that Qwen3 is trained on nearly 3.6 trillion tokens, double the amount used for Qwen2.5. The training data includes textbooks, question-answer pairs, and code snippets, among other contents. The pre-training process for Qwen3 is divided into three stages, with the first stage involving pre-training on over 30 trillion tokens with a context length of 4K tokens to provide basic language skills and general knowledge. The second stage improves the dataset by increasing the proportion of knowledge-intensive data, such as STEM, programming, and reasoning tasks, followed by pre-training on an additional 500 billion tokens. The final stage extends the context length to 32K tokens using high-quality long-context data to ensure the model can effectively handle longer inputs.
Alibaba notes that due to improvements in model architecture, increased training data, and more efficient training methods, the overall performance of the Qwen3 Dense base model is comparable to that of the parameter-rich Qwen2.5 base model. For instance, Qwen3-1.7B/4B/8B/14B/32B-Base models perform similarly to Qwen2.5-3B/7B/14B/32B/72B-Base models, with Qwen3 Dense base models even outperforming larger Qwen2.5 models in STEM, coding, and reasoning. Qwen3 MoE base models achieve similar performance to Qwen2.5 Dense base models using only 10% of the activation parameters, significantly reducing training and inference costs. In the fine-tuning stage, alibaba used diverse long-thinking chain data to fine-tune the model, covering tasks and domains such as mathematics, code, logical reasoning, and STEM problems, equipping the model with basic reasoning capabilities. Large-scale reinforcement learning was then employed, using rule-based rewards to enhance the model's exploration and research capabilities.
Alibaba highlights that Qwen3 excels in tool-calling, executing instructions, and copying specific data formats, recommending the use of Qwen-Agent to fully leverage Qwen3's agent capabilities. Qwen-Agent internally encapsulates tool call templates and parsers, significantly reducing code complexity. In addition to providing downloadable versions, Qwen3 can also be used through cloud service providers such as Fireworks AI and Hyperbolic.
Ask Aime: "Can Alibaba's Qwen3 series improve my stock picking skills?"
Alibaba's ultimate goal remains the development of Artificial General Intelligence (AGI), aiming to create AI systems with human-like intelligence. The company plans to enhance the model in multiple dimensions, including optimizing model architecture and training methods to achieve key objectives such as expanding data scale, increasing model size, extending context length, broadening modality range, and utilizing environmental feedback to advance reinforcement learning for long-term reasoning. The release of Qwen3 marks a significant milestone in Alibaba's journey towards AGI and Super Artificial Intelligence (ASI).