Nvidia releases open dataset, models for multilingual speech AI
PorAinvest
viernes, 15 de agosto de 2025, 3:09 am ET1 min de lectura
NVDA--
The Granary dataset, developed in collaboration with researchers from Carnegie Mellon University and Fondazione Bruno Kessler, contains around a million hours of audio, including nearly 650,000 hours for speech recognition and over 350,000 hours for speech translation. This dataset is designed to enhance the development of high-quality speech recognition and translation AI models [1].
The Nvidia Canary-1b-v2 model is a billion-parameter model trained on Granary for high-quality transcription of European languages, plus translation between English and two dozen supported languages. Meanwhile, the Parakeet-tdt-0.6b-v3 model is a streamlined, 600-million-parameter model designed for real-time or large-volume transcription of Granary’s supported languages [1].
These tools are intended to enable developers to easily scale AI applications to support global users with fast, accurate speech technology for production-scale use cases such as multilingual chatbots, customer service voice agents, and near-real-time translation services. The models are now available on Hugging Face and are designed to run on Nvidia GPU-accelerated systems, leveraging the company's hardware and software frameworks for faster training and inference times compared to CPU-only solutions [1].
The release of Granary and the new models underscores Nvidia's commitment to fostering inclusivity in AI by providing more accessible and high-quality speech technology solutions for a broader range of languages. This initiative could potentially open up new markets and opportunities for AI developers and businesses operating in Europe [1].
References:
[1] https://blogs.nvidia.com/blog/speech-ai-dataset-models/
[2] https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3
Nvidia releases open dataset, models for multilingual speech AI
Nvidia has made significant strides in advancing multilingual speech AI by releasing an open dataset and two new models. The Granary dataset, along with the Nvidia Canary-1b-v2 and Parakeet-tdt-0.6b-v3 models, aims to support 25 European languages, including those with limited available data, such as Croatian, Estonian, and Maltese [1].The Granary dataset, developed in collaboration with researchers from Carnegie Mellon University and Fondazione Bruno Kessler, contains around a million hours of audio, including nearly 650,000 hours for speech recognition and over 350,000 hours for speech translation. This dataset is designed to enhance the development of high-quality speech recognition and translation AI models [1].
The Nvidia Canary-1b-v2 model is a billion-parameter model trained on Granary for high-quality transcription of European languages, plus translation between English and two dozen supported languages. Meanwhile, the Parakeet-tdt-0.6b-v3 model is a streamlined, 600-million-parameter model designed for real-time or large-volume transcription of Granary’s supported languages [1].
These tools are intended to enable developers to easily scale AI applications to support global users with fast, accurate speech technology for production-scale use cases such as multilingual chatbots, customer service voice agents, and near-real-time translation services. The models are now available on Hugging Face and are designed to run on Nvidia GPU-accelerated systems, leveraging the company's hardware and software frameworks for faster training and inference times compared to CPU-only solutions [1].
The release of Granary and the new models underscores Nvidia's commitment to fostering inclusivity in AI by providing more accessible and high-quality speech technology solutions for a broader range of languages. This initiative could potentially open up new markets and opportunities for AI developers and businesses operating in Europe [1].
References:
[1] https://blogs.nvidia.com/blog/speech-ai-dataset-models/
[2] https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3

Divulgación editorial y transparencia de la IA: Ainvest News utiliza tecnología avanzada de Modelos de Lenguaje Largo (LLM) para sintetizar y analizar datos de mercado en tiempo real. Para garantizar los más altos estándares de integridad, cada artículo se somete a un riguroso proceso de verificación con participación humana.
Mientras la IA asiste en el procesamiento de datos y la redacción inicial, un miembro editorial profesional de Ainvest revisa, verifica y aprueba de forma independiente todo el contenido para garantizar su precisión y cumplimiento con los estándares editoriales de Ainvest Fintech Inc. Esta supervisión humana está diseñada para mitigar las alucinaciones de la IA y garantizar el contexto financiero.
Advertencia sobre inversiones: Este contenido se proporciona únicamente con fines informativos y no constituye asesoramiento profesional de inversión, legal o financiero. Los mercados conllevan riesgos inherentes. Se recomienda a los usuarios que realicen una investigación independiente o consulten a un asesor financiero certificado antes de tomar cualquier decisión. Ainvest Fintech Inc. se exime de toda responsabilidad por las acciones tomadas con base en esta información. ¿Encontró un error? Reportar un problema

Comentarios
Aún no hay comentarios