AI systems are becoming a global trend. Businesses around the world are starting to explore how these systems can benefit them and their customers, but AI is not yet at the stage where it can simply be plugged in and expected to operate. They require an immense amount of data and training to provide the desired outputs.
It has therefore become quite a buzz phrase that data is the new oil. Those who possess data are to become the next generation of power bearers. In the language industry, Language Services Providers (LSP) are often the ones that have access to great quantities of language data. Yet, only those that know how to transform it into an actionable business will hold the key to unleash their potential in the data space.
Based in St. Petersburg, EGO Translating Company is one of those. The company was founded in 1990 and 80% of its services focus on translation and interpretation with the rest focusing on related technologies and platforms. To produce their own technology solutions, they have opened a technology branch including an MT department under which they collect and clean language data to feed back into their MT systems.
We spoke with Margarita Menyaylova, Head of Machine Translation Division and Evgenia Gorodetskaya, Vice President of Technology Development. In their quest for more language data, they came across the Data Marketplace. Now they have published about half a million words of English-Russian language data in manufacturing and related domains. So how did their quest for more data turn into sharing their own data for others to purchase and improve their systems?
“Why can’t we also share the data we accumulated over time with other data consumers?”, asks Margarita. “We are aware of the importance of language data abundance for the growth of ML systems. And we also know how important and challenging data cleaning is. The fact that the Data Marketplace allows data sellers to buy back a cleaned and anonymized version of their dataset has been the greatest motivator for us”. She sees this as a win-win situation. The marketplace gets enriched with more data for the growth of the industry, while LSPs can download the clean and anonymized version of the data they’ve just uploaded.
The hardest nut to crack for many LSPs would be the issue of language data ownership. Who owns the data they process? Is it the translator who provides the translated text? Or the customer who provided the source text? These and more questions were significant for EGO Translating Company as well. “We had a discussion around this with our colleagues and business partners and what convinced us were the answers we found in the Who Owns My Language Data White Paper,” says Margarita. The key takeaway from the white paper is that the existing laws do not provide black and white answers to questions that appear to be simple and straightforward. Seeking clarity around what data you process and for whom is an important first step for any organization. Common sense, in combination with some essential rules of thumb will help getting grips on legal compliance. “It’s also good to note that different data processing and privacy rules apply in Russia and other parts of the world. We have milder laws and regulations regarding data ownership and intellectual property,” adds Evgenia.
EGO Translating Company is sure that other LSPs will be just as excited for this opportunity, yet they will share similar concerns around data privacy. The ladies of EGO also emphasize the importance of analyzing each provider-client agreement separately to make a decision on what data can be monetized safely.
Margarita and Evgenia are both very positive that in the end most LSPs will catch up with the new reality of the industry and become more willing to share their language data. “A similar approach is widely used in other industries, such as the IT industry. They freely share their codes, APIs and program solutions as open source. This ecosystem of sharing turns out to be effective for both sides,” says Margarita. “Those who will manage to align with realities of the translation industry will also win in the end. As LSPs, we have more business potential and assets to monetize than just simply translation.”
Their datasets are available for purchase on the Data Marketplace for AI and ML services providers to train their systems with Russian-English data. In the meantime, EGO Translating Company continues to improve their own MT systems with the cleaned and anonymized version of their own datasets TAUS provides.
5 minute read