Data is no longer just a good idea — now that so many businesses are using and monetizing it, driving business through data and adopting this new trend have become essential for keeping up with the competition. According to the technology adoption lifecycle model, the first group of people to use new technology is called “innovators,” followed by “early adopters”. Next come the “early majority” and “late majority”, and the last group to eventually adopt a new technology are called “laggards”. Being among the early adopters brings up the head start advantage.
A head start advantage can be simply defined as a company’s ability to be better off than its competitors as a result of being first to market in a new category. Although no advantage lasts forever, companies that succeed in building durable head start advantages tend to dominate their categories for many years, from a market’s infancy until well into its maturity.
The use of language data in AI and ML applications is a fast-growing technology that has almost passed the early adoption phase. The linguists and language service providers who see the business opportunity this new data realm has in store for them will have the advantage over the late majority and laggards.
TransLink, based in Russia and #84 in the global LSPs ranking, is one of the early adopters of the language data monetization business trend. “Having been among the earliest adopters of advanced technologies such as NMT, our team couldn’t have let the opportunity the Data Marketplace offers slip,” says Mikhail Gilin, Head of R&D at TransLink. They see their participation in the Data Marketplace as a business opportunity where they can monetize the language data that they generate as a company and they see their early adoption as a significant head start to grow in this new space. Additionally, they believe that by making their multilingual data available, they will be providing a great resource to be used in the research for the technologies benefitting the overall language industry.
On Data Marketplace, they have published five corpora in the news and sports domain from Russian to English, German, Spanish and French. These are bilingual translation data they have collected over the course of the 2018 World Cup for which TransLink was the main language service provider for written communications. Whenever data sharing is the topic of conversation, the ever-returning question is the data ownership and privacy question for many LSPs. “It’s significant to differentiate between the Translation Memories (TM) and the corpora we upload,” says Mikhail. “The datasets can be crawled or based on an existing TM and in that case the segments are altered beyond recognition.” He also highlights the fact that they ensure all numeric units are reduced in a way that they no longer represent any personally identifiable information and private information is removed or replaced before making that dataset available for purchase.
As the Data Marketplace continues to grow, so do the expectations of the existing and potential data sellers and buyers. “We have multilingual corpora, and we are considering uploading multilingual data into the Data Marketplace. That improvement could let LSPs form quality multilingual MT systems, both for production and research purposes,” says Mikhail and adds that “the latter is very important, too, because the collection of the exact same corpus in different languages allows for adequate research of the NMT algorithms. We see that as a great scientific opportunity for those doing linguistic research in the translation industry.”
By joining the Data Marketplace ahead of many other LSPs, TransLink manages to secure its share in the emerging new market for the language data for AI. “This platform is unique for the moment. And by the time there are any other similar platforms, Data Marketplace will provide a huge advantage both in terms of data processing expertise and overall volume of the uploaded corpora,” says Mikhail Gilin.