icon
icon
icon
icon
Upgrade
Upgrade

News /

Articles /

DeepSeek's New AI Model: A Strong Challenger in the Open-Source Arena

Eli GrantThursday, Dec 26, 2024 3:32 pm ET
3min read


DeepSeek, a Chinese AI research lab backed by High-Flyer Capital Management, has released DeepSeek-V3, the latest version of their frontier model. The Mixture-of-Experts (MoE) model features a total of 671B total parameters, with 37B activated for each token. The model has been trained on 14.8 trillion tokens, demonstrating exceptional performance on various benchmarks.

DeepSeek-V3's strong performance can be attributed to several specific features and capabilities:

1. Mixture-of-Experts (MoE) Architecture: DeepSeek-V3 employs a MoE architecture, which allows it to efficiently handle a wide range of tasks and workloads. This architecture enables the model to activate only the relevant experts for each token, reducing computational overhead and improving performance.
2. Large Model Size: With 671B total parameters and 37B activated parameters for each token, DeepSeek V3 is one of the largest open-source models available. This large size allows the model to capture complex patterns and relationships in data, leading to improved performance on various tasks.
3. High-Quality Training Data: DeepSeek-V3 was trained on a diverse and high-quality corpus of 14.8 trillion tokens. This extensive training data enables the model to learn from a wide range of examples and improve its performance across different domains.
4. Efficient Inference and Training: DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were validated in DeepSeek V2. These architectures enable efficient inference and cost-effective training, allowing the model to achieve strong performance while minimizing resource consumption.
5. Auxiliary-Loss-Free Strategy for Load Balancing: DeepSeek V3 pioneers an auxiliary-loss-free strategy for load balancing, which minimizes performance degradation that arises from encouraging load balancing. This strategy helps the model maintain high performance while distributing workloads evenly among experts.
6. Multi-Token Prediction Training Objective: DeepSeek V3 investigates a Multi-Token Prediction (MTP) objective, which has been proven beneficial to model performance. This objective can also be used for speculative decoding, accelerating inference.
7. Knowledge Distillation from DeepSeek-R1: DeepSeek V3 incorporates reasoning capabilities distilled from the DeepSeek R1 series models. This knowledge distillation process improves the model's reasoning performance by incorporating verification and reflection patterns from the R1 models into DeepSeek V3.
8. FP8 Mixed Precision Training: DeepSeek V3 employs an FP8 mixed precision training framework, which validates the feasibility and effectiveness of FP8 training on an extremely large-scale model. This approach enhances training efficiency and reduces costs, enabling the model to scale up without additional overhead.

DeepSeek-V3's performance on various benchmarks demonstrates its strong capabilities:

1. English Pile-test (BPB): DeepSeek V3 scored 0.606, which is higher than Qwen2.5 (0.638) and LLaMA3.1 (0.542), but slightly lower than DeepSeek-V2 (0.606).
2. BBH (EM) 3-shot: DeepSeek V3 scored 87.5, which is higher than Qwen2.5 (79.8), LLaMA3.1 (82.9), and DeepSeek-V2 (78.8).
3. MMLU (Acc.) 5-shot: DeepSeek V3 scored 87.1, which is higher than Qwen2.5 (85.0), LLaMA3.1 (84.4), and DeepSeek-V2 (78.4).
4. MMLU-Redux (Acc.) 5-shot: DeepSeek V3 scored 86.2, which is higher than Qwen2.5 (83.2), LLaMA3.1 (81.3), and DeepSeek-V2 (75.6).
5. MMLU-Pro (Acc.) 5-shot: DeepSeek V3 scored 64.4, which is higher than Qwen2.5 (58.3), LLaMA3.1 (52.8), and DeepSeek-V2 (51.4).
6. DROP (F1) 3-shot: DeepSeek V3 scored 89.0, which is higher than Qwen2.5 (80.6), LLaMA3.1 (86.0), and DeepSeek-V2 (80.4).
7. ARC-Easy (Acc.) 25-shot: DeepSeek V3 scored 98.9, which is higher than Qwen2.5 (98.4), LLaMA3.1 (98.4), and DeepSeek-V2 (97.6).
8. ARC-Challenge (Acc.) 25-shot: DeepSeek V3 scored 95.3, which is higher than Qwen2.5 (94.5), LLaMA3.1 (95.3), and DeepSeek-V2 (92.2).
9. HellaSwag (Acc.) 10-shot: DeepSeek V3 scored 88.9, which is higher than Qwen2.5 (84.8), LLaMA3.1 (89.2), and DeepSeek-V2 (87.1).
10. PIQA (Acc.) 0-shot: DeepSeek V3 scored 84.7, which is higher than Qwen2.5 (82.6), LLaMA3.1 (85.9), and DeepSeek-V2 (83.9).
11. WinoGrande (Acc.) 5-shot: DeepSeek V3 scored 84.9, which is higher than Qwen2.5 (82.3), LLaMA3.1 (85.2), and DeepSeek-V2 (86.3).
12. RACE-Middle (Acc.) 5-shot: DeepSeek V3 scored 67.1, which is lower than Qwen2.5 (68.1), LLaMA3.1 (74.2), and DeepSeek-V2 (73.1).
13. RACE-High (Acc.) 5-shot: DeepSeek V3 scored 51.3, which is lower than Qwen2.5 (50.3), LLaMA3.1 (56.8), and DeepSeek-V2 (52.6).
14. TriviaQA (EM) 5-shot: DeepSeek V3 scored 82.9, which is higher than Qwen2.5 (71.9), LLaMA3.1 (82.7), and DeepSeek-V2 (80.0).
15. NaturalQuestions (EM) 5-shot: DeepSeek V3 scored 40.0, which is higher than Qwen2.5 (33.2), LLaMA3.1 (41.5), and DeepSeek-V2 (38.6).
16. AGIEval (Acc.) 0-shot: DeepSeek V3 scored 79.6, which is higher than Qwen2.5 (75.8), LLaMA3.1 (60.6), and DeepSeek-V2 (57.5).
17. Code HumanEval (Pass@1) 0-shot: DeepSeek V3 scored 65.2, which is higher than Qwen2.5 (53.0), LLaMA3.1 (54.9), and DeepSeek-V2 (43.3).
18. MBPP (Pass@1) 3-shot: DeepSeek V3 scored 75.4, which is higher than Qwen2.5 (72.6), LLaMA3.1 (68.4), and DeepSeek-V2 (65.0).
19. LiveCodeBench-Base (Pass@1) 3-shot: DeepSeek V3 scored 19.4, which is higher than Qwen2.5 (12.9), LLaMA3.1 (15.5), and DeepSeek-V2 (11.6).
20. CRUXEval-I (Acc.) 2-shot: DeepSeek V3 scored 67.3, which is higher than Qwen2.5 (59.1), LLaMA3.1 (58.5), and DeepSeek-V2 (52.5).
21. CRUXEval-O (Acc.) 2-shot: DeepSeek V3 scored 67.3, which is higher than Qwen2.5 (59.1), LLaMA3.1 (58.5), and DeepSeek-V2 (52.5).

DeepSeek-V3's strong performance on various benchmarks demonstrates its potential as a strong challenger in the open-source AI landscape. Its innovative features, such as the MoE architecture, large-scale training, and efficient inference and training processes, contribute to its exceptional capabilities. As DeepSeek continues to refine and improve its models, investors should keep an eye on this promising AI research lab.

Investors looking to capitalize on the growing AI market should consider DeepSeek as a potential investment opportunity. With its strong performance and innovative approach to AI
Disclaimer: the above is a summary showing certain market information. AInvest is not responsible for any data errors, omissions or other information that may be displayed incorrectly as the data is derived from a third party source. Communications displaying market prices, data and other information available in this post are meant for informational purposes only and are not intended as an offer or solicitation for the purchase or sale of any security. Please do your own research when investing. All investments involve risk and the past performance of a security, or financial product does not guarantee future results or returns. Keep in mind that while diversification may help spread risk, it does not assure a profit, or protect against loss in a down market.