AI-Powered Initiative Targets Endangered Languages in Europe, Expands Language Accessibility on Social Media Platforms
Meta's No Language Left Behind project is expanding its AI language capabilities to include lesser-spoken languages, such as Scottish Gaelic and Welsh. The project utilizes the Opus repository, which contains authentic language data, and trains AI with this data to improve translation accuracy. Despite progress, experts like Lamb advise Meta to involve native speakers in refining the technology, as the current translations are insufficient for Gaelic. Meta anticipates a substantial increase in daily translations once the technology is fully implemented.
Introduction:
Meta, formerly known as Facebook, is expanding its No Language Left Behind (NLLB) project to include lesser-spoken languages such as Scottish Gaelic and Welsh. The initiative aims to provide translation services for a wider range of languages, using the Opus repository and AI models. Although progress has been made, experts like Professor William Lamb of the University of Edinburgh emphasize the importance of consulting with native speakers to refine the technology further [1].
Project Overview:
The NLLB project utilizes the Opus repository, an extensive collection of authentic language data, to train AI models for translation [1]. This open-source platform contains text, speech, or writing for various languages that can be used to program machine learning models. The project also uses a combination of mined data from sources like Wikipedia and contributions from experts in natural language processing (NLP) to enhance the AI models [1].
Scottish Gaelic and Welsh:
Meta's NLLB project has identified Scottish Gaelic and Welsh as two of the "low-resource" languages to be included [1]. While there is progress in translating these languages using AI, Professor Lamb emphasizes the importance of involving native speakers in the refinement process [1]. This approach ensures that the translations accurately reflect the nuances and cultural contexts of these languages.
Language Data and Improvements:
According to Meta, languages with less than one million sentences in available data are considered "low-resource" languages [1]. The NLLB team has improved translation accuracy by 44% from their initial model, released in 2020, through continuous evaluation and refinement [1].
Impact and Future Expectations:
Once fully implemented, Meta anticipates more than 25 billion translations daily across Facebook News Feed, Instagram, and other platforms [1]. The expansion of the NLLB project to include lesser-spoken languages like Scottish Gaelic and Welsh has the potential to significantly improve communication and connectivity for these communities.
Conclusion:
Meta's No Language Left Behind project is making strides in expanding AI language capabilities to include lesser-spoken languages like Scottish Gaelic and Welsh. While progress has been made, consulting with native speakers is crucial to ensure accurate and culturally sensitive translations. By working together, Meta and language experts can create a more inclusive and connected world for all.
References:
[1] https://www.euronews.com/next/2024/06/19/meta-expands-ai-translation-to-200-languages-but-experts-suggest-talking-to-native-speaker