TAUS Services

Eric Yu: Cross-Language Big Data Inspires the Future

In this interview, Eric Yu, CEO of GTCOM explains how cross-language big data inspires the future.

Captains of the Translation Industry Talk About the Single Biggest Thing They've Learnt

TAUS spoke with Eric Yu, CEO of GTCOM, Global Tone Communication Technology Co., Ltd, the world’s largest services provider in the combined field of translation, big data and artificial intelligence.

At the core of the conversation lies this question: “What is the single biggest lesson that you have learnt about the translation industry?’’

Eric Yu: What I have learnt most about the translation industry as such is that it can be considered as just one part of a much more all-encompassing “industry” that we call cross-language big data. 

I come from a background deeply rooted in the translation industry. Having majored in conference interpreting, I worked for China Translation Corporation, served as the Head of Conference Interpreting, CMO, and Assistant President and then Vice President of CTC which is the largest LSP in China. I was appointed as the CEO of GTCOM when it was incorporated as a subsidiary of CTC in 2013. What I have realized in this career is that today many related language phenomena can converge in this concept of language as data in a world with artificial intelligence.

 As we all know, machine translation is one of the most complicated parts of NLP and artificial intelligence. We call it “the jewel in the crown” of NLP. We started to invest deeply in R&D for machine translation in 2014, giving us a chance to better understand NLP, speech recognition, big data and artificial intelligence.

 Then in October 2015, we put forward the “cross-language big data” concept, which basically involves managing data on an Internet scale. For example, instead of searching in English or Russian or Chinese using input terms and finding content only in those languages, what if we simply let people input a word in their language but eliminated all those language labels? That way we would be able to have access to all relevant results in any language - Chinese, English, Russian, German and so on. And what if we were then able to analyze all those data quantitatively and qualitatively? That is the essence of our cross-language big data concept. 

Naturally this involves using real-time machine translation. But that is only a kind of “language switch.” I began to wonder what we could achieve if we could analyze all the texts discovered from one search term, or from one piece of news returned to the user – analyze the persons involved, the time, location, entities, and all the other knowledge and information contained in that news item or document? What if we then extended this search for data by being able to grasp all the data from the past ten or twenty years and analyze them in the same way - qualitatively and quantitatively? That capability is lacking, so we are trying to offer it through this concept. 

TAUS: How does your background and interest in interpretation fit into this vision?

Eric: In early 2013, we launched our Global Multilingual Call Center. Previously in 2008 when I was a member of the Global Advisory Committee on language line services, I proposed that we try to see whether simultaneous interpreting could be provided over the phone. At that time it seemed like a new idea, and it still is. 

So we started to build our multilingual call center in 2012 and when we first started GTCOM as an independent company, we launched our Global Multilingual Call Center. Following recent developments, I think this is now the largest Call Center in China. It provides services in 12 languages and receives an average of 500,000-minutes of calls per month from all over the world. For example, we have more than 400 interpreters working exclusively for China UnionPay. Moreover, we also provide big data analysis. Every month, we analyze all the call big data and generate reports, so this Center is no longer just a call center in the traditional sense, but increasingly a big-data center. 

TAUS: In terms of size and volume how would you compare GTCOM to its closest competitors? 

Eric: As I said, GTCOM is now a big data company and far bigger than any other translation companies in China – and maybe in many other countries! 

We started working on machine translation, for example, in 2014 and on big data in 2015. Each year we have invested on average about USD 30 million on R&D in machine translation. We have already filed a dozen patents for MT-related technologies. And our machine translation supports a total of 33 different languages, including Chinese, English, German, French, Japanese, Arabic, Portuguese, Russian, Korean, etc. Among them, about 25 languages can be translated with our own Neural Machine Translation engine. 

In addition to our translation and Call Center activities, we also provide video localization services, which are part of our translation services in China. About 85% of the work in this area is carried out using our YeeCaption toolkit, a one-stop smart subtitle translation software. 

Let me give you an example of a video job from March 2017. Our client planned to launch a short video clip business on a platform rather like YouTube which introduced huge amounts of videos from overseas into China. So we mobilized nearly 700 translators and localized about 830 hours of multilingual videos from a wide variety of content categories in just ten days. On top of the localization, we have become an IP provider, signing up the IP copyright partnerships with 37 of the biggest IP providers overseas, making us the exclusive operator for their video content in China. So once again, we have decided to go beyond the technical art of localization, and work in distribution, and production as well. 

TAUS: Your ambitions are clearly far greater than just providing a translation service. Do you feel ready to take on technology companies such as Google yet? 

Eric: Unlike Google and other MT technology providers, we provide a domain-specific MT engine. In certain fields, our MT delivers higher quality as it can be tailored to news domains such as financial news, military news and others. In addition to our machine translation, we can collect data in about 65 different languages from 200+ countries, and keep updating this daily. Globally, we regularly update about 30 million articles and 500 million social media messages daily. According to data analytics firm Palantir, this is estimated to be worth $20 billion. 

For us, cross-language big data is unstructured open-source data which are all related to open source news and social media such as Twitter, Facebook, and WeChat and Weibo in China. We can analyze each piece of unstructured data. Our current line of big data products includes JoveBird, which is designed to take advantage of big data and AI technologies to offer financial investment solutions. Using a set of financial analysis models and powerful cross-language big data processing capability, it helps investors analyze stock-price trends and strategize their investments. 

TAUS: Are there any plans to expand into other continents or countries? Or will GTCOM remain a purely China-based operation? 

Eric: We are indeed seeking new partners and targets, for example in localization. We’re also looking at advertising, consulting, and big data companies. China is already a huge market, and we are able to respond to the huge local-market demand of our customers. On the other hand, we have more than 10 products in our big data line-up, with the big data analytical platform for social media, and a news toolkit. We also have an industrial big data platform, a mining platform and a series of other big data platforms - all leading big data technologies. So what we hope is that together with our target partners, we will be able to provide them with cutting-edge data technologies, and help them first explore the Chinese market and later the global market. 

TAUS: Is translation slowly becoming a smaller part of your business while big data and AI grow bigger? 

Eric: Yes, our language services are playing a smaller role in overall GTCOM revenue, but at the same time, this market is set to grow significantly. Language services are at a completely different market level compared to big data and AI which are growing much faster. What I’d like to highlight is that the language industry should be developed in an artificial intelligence direction, not in the traditional way. As I said in my keynote at the FIT congress in Australia, we must be open to AI and all work and grow together with it for the future of the language industry. We work, for example, with such large players as Haier, GE, Alibaba and many other industry clients. Beneath our entire big data platform, we now have a language technology infrastructure. So our big data platform can handle our key asset - cross-language big data. 

That’s the future. Take our industrial big data platform as an example, all those language technologies have been automatically embedded in the big data platform. Underneath the platform we have the language technology infrastructure. In this way, we have established a link between language and data, initiating a brand new future for the language service industry. 

New Call-to-action


Andrew Joscelyne

Long-time European language technology journalist, consultant, analyst and adviser.