TAUS recently held two webinars on how major industry players – buyers and suppliers - perceive the current challenges in the translation industry, ranging from tools, via NMT to datafication and more.
We used our “Modern Translation Pipeline” (MTP) model as a benchmark for measuring progress and ideas. And we discovered some interesting developments in terms of quality measurement, jobs, communities and data applications.
This diagram below identifies how a datafied, ML-driven vision of the process can drive future workflows. It basically offers various pathways through a pipeline depending on different quality requirements.
This blog summarizes the perspectives of each speaker on various key topics, providing a summary of two rich hours of conversation.
The Buyer participants were Paul Leahy (Oracle),Silvio Picinini (eBay), and Chris Pyne (SAP). The Suppliers were Eric Vogt (Moravia), Marcus Casal: (Lionbridge), Chris Grebisz (Welocalize), and Azad Ootam (SDL).
The path towards datafication
How far has the industry reached in the journey towards a data-driven, automatable translation process? Our guests all identify with the model while naturally adapting it to their individual status and needs. Buyers vary in their state of advancement, while suppliers focus more on the details in their processes that have yet to be solved.
Chris Pyne (SAP) is at a pivotal moment of moving to a more data-driven model, which will deliver high quality translations based on high confidence scores. He sees two other dimensions to be added – one is how to integrate humans into the loop, and the other is how to switch from a 9am to 5pm to a 24/7 working mode using fully automated management.
Silvio Picinini (eBay) is in mid-process, using a FAUT approach to content for every object on sale on their site.
At Oracle, Paul Leahy had his Quantum Leap moment over five years ago and is currently logging almost every aspect of throughput, including how translations are received by customers. Like the others, Oracle is leveraging neural MT (“a game changer”). The company watchword is translation availability rather than translation quality, which is now an accomplished fact.
Chris Grebisz (Welocalize) is finding that entrenched processes and staff issues complicate data capture. Performance and productivity appear to result from invisible sources, so they are mapping the journey as they go along.
Eric Vogt (Moravia) sees datafication as the beginning of a rollercoaster. The average weighted word has gone from 35 cents to 2 cents, there’s a struggle with proliferating technology, how to control the chaos at the development stage of new products, and how to adapt and rearrange the workflow to fit the process using microservices.
Marcus Casal (Lionbridge) finds it remarkable to be able to deliver more baseline MT output into more languages (sometimes post-edited) and thereby open up more opportunities. As MT becomes more predictable, the aim is to datafy aspects such as content profiles for predicting translation difficulty and which is the best process. MT is only the tip of the iceberg of this huge machine learning and data journey.
Azad Ootam (SDL) reckons that the future will involve reassembling smaller parts of the flow to customize to each customer. This means analyzing content using neural technology and then routing it to the right person or process. But the real underlying challenge for the industry is about the people, about how linguists are equipped with the right tools and processes. Change management is vital, as the latest cool tool is not going to eliminate personal skills.
How do we manage the tool “zoo”?
How large is the “zoo of tools” – the range of technology solutions currently in use? Can it be reduced, concentrated, improved? How do tools impact post-editing?
SP (eBay) reckons their Globalization Management System has limited support for integration with neural MT, and their tool strategy is still work in progress. For post-editing, though, eBay has very special constraints on the kind of quality we want – meaning over style – which is often hard to convey to translators.
At SAP, CP has amassed a rich collection of tools from successive acquisitions, so agility rules, and suppliers do not have to accept a single standard overnight. He foresees a reduction in this tool complexity as more efficient cloud tools are rolled out.
PL (Oracle) says that after integrating 80 to 90 companies, they add all their processes in the journey towards microservices via a wizard and an API. The core of the GMS is the translation memories that still offer gold quality for product translation. Oracle is now moving towards neural MT internally for user-assistant content, aiming for complete auto-translation. Tools are also used to parse content for translatability, so impossible content can be flagged early.
What are suppliers doing with neural?
AO (SDL) says it’s mainly a question of trial and error, developing different technology solutions for different customers. SDL is also trying out generating content for customers. Time will tell where this will go.
MC (Lionbridge) says that in the shift to ML and AI algorithms, the next step will be mapping work to the best MT model and developing a self-learning approach. Human communities also need to be segregated, as transcreation is not like post-editing, to develop smaller, more differentiated work communities.
CG (Welocalize) is experimenting with NMT, training the workforce to make the best use of it to produce quality content where needed. Content types are being aligned to resources, to see whether AI can be used to predict volumes in supply chains.
How do MTP quality levels correspond to needs?
CP: SAP uses three levels:
- user interface (where there is instant feedback) has to be of the Highest Quality.
- Lightly edited MT requires a special mindset. Perhaps communities would be better at this than the translator :mindset?
- FAUT quality is improving, due to NMT (SAP has about 100 engines across various departments) and is applied where there was no translation before.
SAP now needs to extend operations to long-tail languages where data is scarcer.
SP (eBay) is starting to use NMT for parts of their localization program, using Good Enough quality to maximize benefits. They have their own engine for the FAUT output but also partner with MT vendors for content in several languages. eBay is achieving a 16% gain in quality with NMT over human translation of output!
PL (Oracle) is aiming for a single “good enough” quality for product content, which now means no post-editing on documentation. In the case of NMT, Oracle has ramped up investments using its own content to build new engines. A more difficult case is colloquial language, where obtaining colloquial translation data in Asian languages is harder.
Any notable failures along the way?
PL (Oracle) says its journey ten years ago with MOSES statistical machine translation was a failure because the company would not take it up as a satisfactory service. But it is easier with NMT because engineers throughout the company are using machine learning more generally for developing integrated workflows, so conversations around NMT are easier.
SP (eBay) says his idea of Good Enough quality (meaning-focused light reviewing/post-editing) was a failure at first, as it was hard to define the level clearly enough. The right features need to be prioritized and communicated to reviewers.
CP (SAP) evoked a possible future failure! SAP has developed a translation ecosystem - a global network of mostly single-language vendors – that he wants to expand into a crowd-based community of crowd + professional translators + LSPs. The key change will be from an 8*5 to a 24/7 paradigm, driven by the same stakeholders worldwide. The experiment could fail.
Suppliers: How do you feel about disintermediation?
MC (Lionbridge) says that humans need to be profiled in the feedback loop around MT. There are multiple overlapping communities, both inside and outside of the company (including translators and transcreators), so they are moving towards a nuanced view of community management. Trying to disintermediate for its own sake may not be the right way forward.
AO (SDL) claims that data is the key, and it must be harnessed into an insight engine. Communities must also be built in areas where professional translators don’t exist. Disintermediation is about knowing how much work is “commodity” and can be done by anyone, and how much requires content personalization. The next step is to think of human resources and other ecosystems and where people are available. The two-hour turnaround will influence this.
CG (Welocalize) suggests that while translation has been about the transformation of content, there is now a new layer – translation with intent. How can the workforce be given the right information to introduce intent into that content (whether from MT or not)? Disintermediation means taking out the middle layer between customers and the LSP. Yet the real point is to create data that informs the supply chain and produces content as a rolling requirement. It’s not clear whether disintermediation will help or hinder this.
How will jobs change as a result of these pipeline developments?
SP (eBay) believes that from our buyer’s perspective, the MTP points to jobs for data analysts, and reviewers (who would tune the tone of the translation to the voice of the customer). This could be a useful idea for buyer companies, with linguists working with data from MT. It is not transcreation, but involves a change of role and will be very important.
EV (Moravia) says a single project manager can handle a lot of work. Most of the boring bits are now automated. It’s the same with translations, where lower quality will be handled by MT and higher by transcreation. Planning will be built into the technology, so PMs will have to handle more activity.
Overall, then, we are finding considerable resonance among both buyers and suppliers with the TAUS model of the automated translation pipeline. Naturally each company will need to adapt the details of the pathway to their current state of advancement and their market positioning. But there’s no question about the overall direction the industry will be taking.
This comes out particularly clearly around the question of our quality “levels” – a dynamic concept that can be adapted to the purpose of the content instead of obliging everyone to focus on time-consuming error accountancy.
Yet the overriding message from our participants is that we haven't seen nothing yet when it comes to automation through datafication and machine learning. Including how translator and project manager jobs and communities will adapt, leading in turn to a new round of change management across the industry.