icon
icon
icon
icon
$300 Off
$300 Off

News /

Articles /

Alibaba Unveils Qwen3 Series, Boosting AI Capabilities with 3.6 Trillion Tokens

Word on the StreetMonday, Apr 28, 2025 7:11 pm ET
2min read

Alibaba has released and open-sourced the Qwen3 series of models, which are designed to seamlessly integrate two thinking modes, support 119 languages, and facilitate agent calls. The Qwen3 series includes two expert hybrid (MoE) models and six additional models. The flagship model, Qwen3-235B-A22B, has demonstrated competitive performance in code, mathematics, and general capabilities, matching top models like DeepSeek-R1 and OpenAI's o1. The MoE model Qwen3-30B-A3B, with only 10% of the activation parameters of QwQ-32B, outperforms it, and even smaller models like Qwen3-4B can match the performance of Qwen2.5-72B-Instruct. This system mimics human problem-solving by dividing tasks into smaller datasets, similar to a team of experts each handling different parts to enhance overall efficiency.

Alibaba has also open-sourced the weights of two MoE models: Qwen3-235B-A22B with over 235 billion total parameters and 22 billion activation parameters, and the smaller MoE model Qwen3-30B-A3B with approximately 30 billion total parameters and 3 billion activation parameters. Additionally, six Dense models—Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B—have been open-sourced under the Apache 2.0 license.

The Qwen3 series is described as a "hybrid" model, capable of both "thinking" to solve complex problems and quickly responding to simple requests, known as "thinking mode" and "non-thinking mode," respectively. The "thinking mode" allows the model to perform self-fact-checking, similar to OpenAI's o3 model, but with higher delay in the reasoning process. The Qwen team's blog post highlights that Qwen3 is trained on nearly 3.6 trillion tokens, double the amount used for Qwen2.5. The training data includes textbooks, question-answer pairs, and code snippets, among other contents. The pre-training process for Qwen3 is divided into three stages, with the first stage involving pre-training on over 30 trillion tokens with a context length of 4K tokens to provide basic language skills and general knowledge. The second stage improves the dataset by increasing the proportion of knowledge-intensive data, such as STEM, programming, and reasoning tasks, followed by pre-training on an additional 500 billion tokens. The final stage extends the context length to 32K tokens using high-quality long-context data to ensure the model can effectively handle longer inputs.

Alibaba notes that due to improvements in model architecture, increased training data, and more efficient training methods, the overall performance of the Qwen3 Dense base model is comparable to that of the parameter-rich Qwen2.5 base model. For instance, Qwen3-1.7B/4B/8B/14B/32B-Base models perform similarly to Qwen2.5-3B/7B/14B/32B/72B-Base models, with Qwen3 Dense base models even outperforming larger Qwen2.5 models in STEM, coding, and reasoning. Qwen3 MoE base models achieve similar performance to Qwen2.5 Dense base models using only 10% of the activation parameters, significantly reducing training and inference costs. In the fine-tuning stage, alibaba used diverse long-thinking chain data to fine-tune the model, covering tasks and domains such as mathematics, code, logical reasoning, and STEM problems, equipping the model with basic reasoning capabilities. Large-scale reinforcement learning was then employed, using rule-based rewards to enhance the model's exploration and research capabilities.

Alibaba highlights that Qwen3 excels in tool-calling, executing instructions, and copying specific data formats, recommending the use of Qwen-Agent to fully leverage Qwen3's agent capabilities. Qwen-Agent internally encapsulates tool call templates and parsers, significantly reducing code complexity. In addition to providing downloadable versions, Qwen3 can also be used through cloud service providers such as Fireworks AI and Hyperbolic.

Ask Aime: "Can Alibaba's Qwen3 series improve my stock picking skills?"

Alibaba's ultimate goal remains the development of Artificial General Intelligence (AGI), aiming to create AI systems with human-like intelligence. The company plans to enhance the model in multiple dimensions, including optimizing model architecture and training methods to achieve key objectives such as expanding data scale, increasing model size, extending context length, broadening modality range, and utilizing environmental feedback to advance reinforcement learning for long-term reasoning. The release of Qwen3 marks a significant milestone in Alibaba's journey towards AGI and Super Artificial Intelligence (ASI).

Comments

Add a public comment...
Post
User avatar and name identifying the post author
SDDIYer80
04/29
$BABA my hundreds holding strong will add when market improves 💸🥃💰
0
Reply
User avatar and name identifying the post author
repairmanjack2023
04/29
$BABA Aiming for 134 in the next few weeks
0
Reply
User avatar and name identifying the post author
worldforgotme
04/29
$BABA buy the dip
0
Reply
User avatar and name identifying the post author
Thebigshort2580
04/29
$BABA if Trump and China make a deal this thing takes off
0
Reply
User avatar and name identifying the post author
Geoclasm
04/29
$BABA get set for the biggest flip ever that many won't see especially the bears lol man
0
Reply
User avatar and name identifying the post author
Leather_Method_7106
04/29
$BABA might bounce around $120 for a few days before pushing to $150. Earnings are coming, with a chance for growth between the US and China
0
Reply
User avatar and name identifying the post author
SpirituallyAwareDev
04/28
Holy!I successfully capitalized on the BABA stock's bearish movement with Pro tools, generating $141!
0
Reply
Disclaimer: the above is a summary showing certain market information. AInvest is not responsible for any data errors, omissions or other information that may be displayed incorrectly as the data is derived from a third party source. Communications displaying market prices, data and other information available in this post are meant for informational purposes only and are not intended as an offer or solicitation for the purchase or sale of any security. Please do your own research when investing. All investments involve risk and the past performance of a security, or financial product does not guarantee future results or returns. Keep in mind that while diversification may help spread risk, it does not assure a profit, or protect against loss in a down market.
You Can Understand News Better with AI.
Whats the News impact on stock market?
Its impact is
fork
logo
AInvest
Aime Coplilot
Invest Smarter With AI Power.
Open App