The language technology innovator and service provider Tilde has developed considerable expertise in providing language services to the Presidency of the Council of the European Union. We caught up with Mārcis Pinnis, Chief AI Officer, and Artūrs Vasiļevskis, Head of Machine Translation, to find out more about the challenges and promises of this ongoing project.
What triggered Tilde’s involvement with the EU’s public service agenda?
The idea for the EU Presidency Translator Project was born in 2015 when Latvia hosted the EU Presidency. We saw that Presidency staff, delegates, and journalists were dealing with massive amounts of multilingual information, and visitors were struggling to access Latvian source content. As this couldn’t be sustained by human translators alone, we offered to provide a feasible machine translation solution that could ensure the right degree of linguistic quality and provide instant translations from Latvian to English and vice versa. The result was a dual-purpose: anyone could use the freely accessible, secure web translator to translate text, documents, and web pages, while professional translators could integrate MT into their professional translation environment. This turned out to be a great proof of the concept of machine translation as a means with which to facilitate multilingualism in the EU.
Overcoming language barriers and facilitating multilingual information exchange were already on the EU’s agenda. Therefore, Connecting Europe Facility (CEF) quickly bought into the idea of the EU Presidency Translator - which would connect European Commission’s eTranslation engines for all 24 official EU languages and our custom-developed engines that are tailored to the specific needs of each presidency. The EU Presidency Translator project eventually evolved into a complete ecosystem of custom and general engines supporting all EU languages and available to everyone. The first official full-service EU Presidency Translator was launched for the Estonian Presidency of the Council of the European Union in 2017 and since then we have supported presidencies in Bulgaria, Austria, Romania, Finland, and Croatia.
What were your key challenges?
Working with EU presidencies means addressing a broad variety of different users. Each host country uses different technologies, and translation and data curation practices differ widely. Each presidency focuses on a different agenda and, therefore, requires content on specific topics. Our job was to develop neural MT systems fine-tuned for each Presidency: agenda, domain and language, using the best data available.
Our team worked on data collection to adapt the systems to the specific domain needs. Together with our partners, we gathered data from different European institutions and host countries, which allowed us to develop high-quality custom systems for each host country. For example, Finnish is a structurally challenging and morphologically rich language so the MT engine for the Finnish language was trained with more than 20 million translated sentences, which is comparable in size to 2,317 Harry Potter books.
Is the general public benefiting from these systems, and changing its understanding of language automation?
Absolutely. EU Translator provides the general public with a free, easy-to-use, multilingual communication tool that helps them to draft and understand foreign language text, documents, and websites. It did a great job of introducing the technology as an everyday resource for their translation needs. MT can also be used as a professional service in a business environment. In recent years the community has seen the practical added-value of Neural MT and the interest from the business sector is growing tremendously.
For each Presidency, we organize training sessions for the officials to explain the value of the service and familiarize officials and representatives of the benefits of the platforms. In time, they became ambassadors for the solution and spread the word to the general public.
Can translators use the service after the Presidency has concluded?
Importantly, when the presidency period is over, we don’t close down the platform: it continues to help translators in public administrations to use the system on a daily basis. In Estonia, translators are still using the translation system we set up for their 2017 presidency.
Our top success story so far has been the Finnish presidency which produced over 12.8 million words during the July to December 2019 period. Even today, translators in the Prime Minister’s office use the platform on a daily basis, as the service was integrated into their work environment. They use it for translation of official Ministry documents and for internal communication.
How do you prepare translators for these tasks?
During each EU presidency, we usually run a workshop to introduce the technology and engines into the content-flow environment. We also include hands-on experience: this active translator contribution makes a far better impact on translators. Remember that MT developers are outsiders; it is much more effective as a learning experience to have insiders - translators themselves – communicate the desired message.
The first thing is to manage expectations: instead of competing with a human translator, MT actually augments them and helps them to be more productive and efficient. During the Finnish presidency, the translators were already familiar with previous-generation MT and were negatively biased. But when they tested our new NMT engines, they were broadly impressed by the quality.
How about data? How do you get what you need for a new EU country language engine?
When we approach a new presidency, we ask if they have appropriate data and management practices to help us, mainly in the form of shareable translation memories. Data acquisition, management, and preparation is a rather complicated, and time-consuming process, which is why we consult the host countries about the best practices of language data management. This can help for the future but is of no use in the short term. So if we need data, we have to devote time to data collection.
There is no simple answer to getting the best data. It can be in any format, including monolingual data. Extracting parallel data is very time-consuming. And in any case, each country has a different approach to data management. We found that Finland tended to show the highest tech-readiness for applying data to a presidency translation service.
With the help of our research and development team, soon we plan to introduce adaptive technology (dynamic learning).i.e. let the system adapt to specific users This will improve the quality of translations for institutions with different content types, style, and terminology. We shall also need to address the data scarcity problem for some language pairs.
Tilde’s EU Presidency project is one of the most multilingual public service translation services. While the UN, WIPO, EPO, and PAHO, together with South Africa’s and India’s internal services, all provide some MT-enabled content to their audiences, the European Union is still the only multi-country political organization to communicate to citizens in all of its members’ official languages. Let’s hope that it allows us to continue to educate users, build data resources, and adapt to new content in future multilingual projects.