NVIDIA's Blackwell GPUs Hit Overheating Roadblock Delaying AI Deployment for Tech Giants
Reports indicate that NVIDIA's next-generation Blackwell chips are once again facing delays due to overheating issues when installed in high-capacity server racks. This has prompted design changes and delays, concerning customers like Google, Meta, and Microsoft over timely deployments.
The Blackwell GPUs, intended for AI and high-performance computing, reportedly overheat when used in servers housing 72 processors, with power consumption reaching up to 120 kilowatts per rack. This overheating issue has compelled NVIDIA to repeatedly revise the server rack designs to prevent potential GPU performance impacts and hardware damage.
In response, NVIDIA has instructed suppliers to implement multiple design modifications to enhance cooling systems. The company is working closely with suppliers and partners to refine engineering designs, ensuring the final product meets the desired performance and reliability standards.
While such adjustments are part of standard technical releases, the latest revisions have further postponed expected delivery dates. NVIDIA stated the collaboration with cloud service providers is part of a normal development process, despite the delay being a significant disruption.
The revised version of Blackwell GPUs entered mass production in late October, aiming for shipping by late January. This delay will affect NVIDIA's key clients, including Google, Meta, and Microsoft, who rely on these GPUs for training advanced large language models.