Multilingual Modelling for Resource-Poor Languages

Name of applicant

Johannes Bjerva

Institution

Aalborg University

Amount

DKK 5,000,000

Year

2021

Type of grant

Semper Ardens: Accelerate

What?

Language is the key to accessing the modern technology on which our society relies, such as online search, spelling correction, and automatic translation. However, out of the over 7,000 languages in the world, only a handful have access to such technology. This is in part due to state-of-the-art solutions requiring vast amounts of data, which is unavailable to most languages, which can be referred to as resource-poor. Hence, most languages are marginalized in the current technological development, and will continue to be so unless fundamental changes are made. My project is about addressing this issue, by making use of the fact that languages often have systematic similarities with one another, aiming to increase technological access to billions of speakers of resource-poor languages.

Why?

How human language is structured is one of the greatest unsolved questions in science. Modern developments in language technology are largely ignorant to this aspect, hence current algorithms need vast amounts of data in order to have even the slightest chance of producing linguistically sensible output. In this project, we will dig deeper into the nature of linguistic structure, and how we can make use of this in societally important technology. This requires a solid foundation of both linguistic theory, as well as artificial intelligence and machine learning. The findings from the project will not only lead to fundamental changes in how multilingual NLP (Natural Language Processing) is approached, but also hold the potential to have fundamental impact in other scientific fields.

How?

The key to achieving the goals of this ambitious project lies in the composition of the team, and its focus on interdisciplinary collaboration. First and foremost, the success of the project relies on its participants having in-depth knowledge both in linguistics, especially in linguistic typology, but also in terms of machine learning and statistics. As a PI, I also believe that the cultural diversity of the team will be a strength. We are, in the end, looking to solve problems which are more predominant in other cultural contexts than the Danish or Scandinavian one. Hence, in order to ensure the project's societal relevance, a diverse team is necessary. In sum, this diversity will allow the project team to approach the fundamental scientific problems with a new mindset.

Back to listing page