Nvidia Releases Open-Source Speech Dataset with 1M Hours of Multilingual Audio, Introduces Two Neural Network Models
PorAinvest
lunes, 18 de agosto de 2025, 10:40 am ET1 min de lectura
NVDA--
Nvidia has released Granary, an open-source dataset containing approximately 1 million hours of audio across 25 European languages. This initiative addresses the challenge of data scarcity in speech AI models, which typically support only a small fraction of the world's languages. Granary is designed to enhance the development of high-quality speech recognition and translation AI for languages such as Croatian, Estonian, and Maltese, which have limited available data [1].
The dataset, developed using Nvidia's NeMo Speech Data Processor toolkit, includes two neural network models: Nvidia Canary-1b-v2 and Nvidia Parakeet-tdt-0.6b-v3. Canary-1b-v2 is a billion-parameter model optimized for high-quality transcription and translation between English and 24 other supported languages. Parakeet-tdt-0.6b-v3, a streamlined 600-million-parameter model, is designed for real-time or large-volume transcription tasks [2].
The development of Granary involved collaboration between Nvidia's speech AI team, Carnegie Mellon University, and Fondazione Bruno Kessler. They utilized an innovative processing pipeline to transform unlabeled audio into structured, high-quality data without the need for resource-intensive human annotation. This methodology allows developers to build models more efficiently, with Granary requiring only about half as much training data to achieve target accuracy levels for automatic speech recognition (ASR) and automatic speech translation (AST) compared to other popular datasets [1].
The models are now available on Hugging Face, with additional information on GitHub for developers interested in fine-tuning models using Granary. This release represents a significant step towards more inclusive speech technologies that better reflect linguistic diversity, particularly for European languages underrepresented in human-annotated datasets. It enables developers to create AI applications supporting global users with fast, accurate speech technology for various use cases, including multilingual chatbots, customer service voice agents, and near-real-time translation services [2].
References
[1] https://blogs.nvidia.com/blog/speech-ai-dataset-models/
[2] https://theoutpost.ai/news-story/nvidia-unveils-granary-a-groundbreaking-multilingual-dataset-for-speech-ai-19133/
Nvidia releases open-source speech dataset Granary, containing 1M hours of audio across 25 European languages. Granary addresses data scarcity issues and includes two neural network models for speech transcription and translation tasks. The dataset was created using Nvidia's NeMo Speech Data Processor toolkit and is now available for commercial use.
Title: Nvidia Unveils Granary: A Comprehensive Multilingual Speech AI DatasetNvidia has released Granary, an open-source dataset containing approximately 1 million hours of audio across 25 European languages. This initiative addresses the challenge of data scarcity in speech AI models, which typically support only a small fraction of the world's languages. Granary is designed to enhance the development of high-quality speech recognition and translation AI for languages such as Croatian, Estonian, and Maltese, which have limited available data [1].
The dataset, developed using Nvidia's NeMo Speech Data Processor toolkit, includes two neural network models: Nvidia Canary-1b-v2 and Nvidia Parakeet-tdt-0.6b-v3. Canary-1b-v2 is a billion-parameter model optimized for high-quality transcription and translation between English and 24 other supported languages. Parakeet-tdt-0.6b-v3, a streamlined 600-million-parameter model, is designed for real-time or large-volume transcription tasks [2].
The development of Granary involved collaboration between Nvidia's speech AI team, Carnegie Mellon University, and Fondazione Bruno Kessler. They utilized an innovative processing pipeline to transform unlabeled audio into structured, high-quality data without the need for resource-intensive human annotation. This methodology allows developers to build models more efficiently, with Granary requiring only about half as much training data to achieve target accuracy levels for automatic speech recognition (ASR) and automatic speech translation (AST) compared to other popular datasets [1].
The models are now available on Hugging Face, with additional information on GitHub for developers interested in fine-tuning models using Granary. This release represents a significant step towards more inclusive speech technologies that better reflect linguistic diversity, particularly for European languages underrepresented in human-annotated datasets. It enables developers to create AI applications supporting global users with fast, accurate speech technology for various use cases, including multilingual chatbots, customer service voice agents, and near-real-time translation services [2].
References
[1] https://blogs.nvidia.com/blog/speech-ai-dataset-models/
[2] https://theoutpost.ai/news-story/nvidia-unveils-granary-a-groundbreaking-multilingual-dataset-for-speech-ai-19133/

Divulgación editorial y transparencia de la IA: Ainvest News utiliza tecnología avanzada de Modelos de Lenguaje Largo (LLM) para sintetizar y analizar datos de mercado en tiempo real. Para garantizar los más altos estándares de integridad, cada artículo se somete a un riguroso proceso de verificación con participación humana.
Mientras la IA asiste en el procesamiento de datos y la redacción inicial, un miembro editorial profesional de Ainvest revisa, verifica y aprueba de forma independiente todo el contenido para garantizar su precisión y cumplimiento con los estándares editoriales de Ainvest Fintech Inc. Esta supervisión humana está diseñada para mitigar las alucinaciones de la IA y garantizar el contexto financiero.
Advertencia sobre inversiones: Este contenido se proporciona únicamente con fines informativos y no constituye asesoramiento profesional de inversión, legal o financiero. Los mercados conllevan riesgos inherentes. Se recomienda a los usuarios que realicen una investigación independiente o consulten a un asesor financiero certificado antes de tomar cualquier decisión. Ainvest Fintech Inc. se exime de toda responsabilidad por las acciones tomadas con base en esta información. ¿Encontró un error? Reportar un problema

Comentarios
Aún no hay comentarios