Nvidia's Blackwell GPUs Face Overheating Woes, Threatening Tech Giants' AI Timelines
Reports have emerged indicating that Nvidia's upcoming generation of Blackwell processors is experiencing significant overheating issues when installed in high-capacity server racks. This has reportedly caused design alterations and project delays, raising concerns among major clients like Google, Meta, and Microsoft about meeting their deployment timelines for Blackwell-powered servers.
Sources familiar with the matter suggest that while the Blackwell GPU is engineered for AI and high-performance computing (HPC), the trouble arises when these chips are configured within server racks that can house up to 72 processors, with energy consumption reaching as high as 120 kilowatts per rack. This overheating has prompted Nvidia to repeatedly redesign the rack units, potentially stifling GPU performance and threatening hardware integrity, thereby leading to apprehensions about delayed data center processor deployment.
In response to these challenges, Nvidia has engaged with suppliers to modify the rack setups while collaborating with partners to optimize the thermal management systems. Although such engineering adjustments are standard in extensive tech rollouts, these changes are anticipated to extend the product delivery schedule.
It has been noted that the revised Blackwell GPUs only began mass production at the end of October and are expected to start shipping as early as late January next year. The dependency of tech giants like Google, Meta, and Microsoft on Nvidia’s GPUs for training AI models suggests that any postponement could substantially disrupt their research schedules and product launches.
Earlier in March, Nvidia showcased the Blackwell series, initially scheduling its release for the second quarter, only to face setbacks. Despite Nvidia's CEO Jensen Huang later declaring design flaws resolved with the assistance of TSMC, the likelihood of Blackwell chips hitting the market before the first quarter of the following year now seems remote.