Apple's MLX Framework Now Supports NVIDIA GPUs with CUDA Backend

Tuesday, Jul 15, 2025 8:37 pm ET1min read
AAPL--
NVDA--

Apple's machine learning framework, MLX, is gaining support for NVIDIA GPUs through a CUDA backend. This integration will allow developers to run MLX models directly on NVIDIA GPUs, opening up new possibilities for testing, experimentation, and research use cases. The work is still in progress, but core operations such as matrix multiplication and softmax are already supported.

Apple's machine learning framework, MLX, is gaining support for NVIDIA GPUs through a CUDA backend, enabling developers to run MLX models directly on NVIDIA GPUs. This integration opens up new possibilities for testing, experimentation, and research use cases. While the work is still in progress, core operations such as matrix multiplication and softmax are already supported.

In a separate development, NVIDIA has introduced the B30 AI GPU, optimized for small to medium AI models and cloud services. The B30 delivers approximately 75% of the performance of the H20 AI GPU, making it a cost-effective solution for Chinese tech firms. Demand for the B30 is significant, with Chinese tech companies placing orders for hundreds of thousands of units, totaling over $1 billion in late-June, with deliveries expected in August [1].

The B30 is designed to address two major pain points for China. It is the preferred solution for inference in small and medium-sized models, aligning with the arrival of the inference era. Additionally, it acts as a low-cost computing power pool for cloud services. A computing power pool constructed with 100 B30 AI GPUs can support lightweight training of billion-parameter models while reducing procurement costs by 40% and unit power consumption by close to 30% compared to the H20 [1].

The B30's deep compatibility with the CUDA-X ecosystem allows enterprises to seamlessly migrate frameworks like PyTorch, saving technical reconstruction costs. This integration is crucial for maintaining the 'stickiness' of the CUDA ecosystem in mainstream model deployment efficiency [1].

While domestic-made AI chips from the likes of Huawei might slightly pass the B30 in single-card FP16 computing power, the B30 maintains an advantage in mainstream model deployment efficiency due to its CUDA compatibility [1].

References:
[1] https://www.tweaktown.com/news/106374/nvidias-new-b30-ai-gpu-for-china-expected-to-have-significant-demand-75-as-fast-the-h20/index.html

Apple's MLX Framework Now Supports NVIDIA GPUs with CUDA Backend

Stay ahead of the market.

Get curated U.S. market news, insights and key dates delivered to your inbox.

Comments



Add a public comment...
No comments

No comments yet