NVIDIA and OpenAI Launch Open-Weight AI Models for Enhanced Reasoning and Deployment Efficiency

Generated by AI AgentCoin World
Thursday, Aug 7, 2025 7:22 am ET1min read
Aime RobotAime Summary

- NVIDIA and OpenAI launched open-weight AI models gpt-oss-120b and gpt-oss-20b, optimized for NVIDIA GPUs and CUDA platform.

- The models achieve 1.5M tokens/second inference speed on GB200 NVL72, using NVFP4 4-bit precision for efficiency.

- Available as NIM microservices with compatibility to FlashInfer/Hugging Face, enabling real-time trillion-parameter LLM deployment.

- This collaboration accelerates open-source AI adoption through 450M CUDA downloads, supporting healthcare, manufacturing, and generative AI applications.

NVIDIA and OpenAI have jointly launched two open-weight reasoning models—gpt-oss-120b and gpt-oss-20b—marking a significant advancement in open-source AI development [1]. These models are designed to support a wide range of applications, including generative AI, reasoning, and physical AI, as well as healthcare and manufacturing use cases [2]. The models were trained on

H100 GPUs and are optimized for inference on the NVIDIA CUDA platform, which powers hundreds of millions of devices globally [3].

The gpt-oss-120b model, when deployed on the NVIDIA GB200 NVL72 system, achieves an impressive inference speed of 1.5 million tokens per second [4]. This performance is attributed to the software optimizations tailored for the NVIDIA Blackwell platform, which supports ultra-efficient inference through innovations such as NVFP4 4-bit precision [5]. These advancements reduce power and memory requirements while maintaining high accuracy, enabling real-time deployment of trillion-parameter LLMs [6].

The models are available as NVIDIA NIM microservices, offering developers a secure and flexible way to deploy them across various GPU-accelerated infrastructures [7]. OpenAI and NVIDIA have also ensured compatibility with multiple open-source frameworks, including FlashInfer, Hugging Face, and Ollama, allowing developers to use their preferred tools [8]. This collaboration underscores NVIDIA’s full-stack approach to AI, which aims to make cutting-edge AI models accessible to a global community of developers [9].

The partnership between NVIDIA and OpenAI dates back to 2016 and has since driven major AI innovations through joint efforts in large-scale training and infrastructure development [10]. By optimizing the gpt-oss models for NVIDIA’s GPU ecosystem and extensive software stack, the companies are accelerating AI adoption and enabling cost-effective advancements for millions of developers worldwide [11].

With over 450 million NVIDIA CUDA downloads to date, the latest models are now accessible to a vast developer community, further strengthening the open-source movement in AI [12]. OpenAI and NVIDIA continue to demonstrate their commitment to open innovation, ensuring that developers can build and customize these models for a variety of industries and applications [13].

Source:

[1] OpenAI and NVIDIA Propel AI Innovation With New Open ... (https://blogs.nvidia.com/blog/openai-gpt-oss/)

[2] OpenAI and NVIDIA set global AI benchmark with gpt-oss ... (https://interestingengineering.com/innovation/openai-nvidia-open-weight-ai-models)

[3] Delivering 1.5 M TPS Inference on NVIDIA GB200 NVL72 ... (https://developer.nvidia.com/blog/delivering-1-5-m-tps-inference-on-nvidia-gb200-nvl72-nvidia-accelerates-openai-gpt-oss-models-from-cloud-to-edge/)

[4] NVIDIA and OpenAI Launched Fastest Open Reasoning ... (https://coinfomania.com/nvidia-and-openai-launched-fastest-open-reasoning-models/)

[5] OpenAI releases a free GPT model that can run on your ... (https://www.theverge.com/openai/718785/openai-gpt-oss-open-model-release)

[6] The US Makes Its First Modern Foray Into Open-Source ... (https://wccftech.com/the-us-makes-its-first-modern-foray-into-open-source-models-with-gpt-oss-but-how-does-it-stack-up-against-chinese-counterparts/)

Comments



Add a public comment...
No comments

No comments yet