The Past and the Future of the Global Content Industry
On June 25-26 more than fifty thought leaders gathered at the Grand America Hotel in Salt Lake City with one objective in mind: to define the steps for improving the global content industry ecosystem and making it ready for emerging technologies and the changes to come. This creative process was guided by industry veterans Jaap van der Meer and Renato Beninatto.
To set the context and give a clear understanding of the world becoming increasingly global and the technology only further accelerating, Jaap van der Meer took us on a sentimental journey down translation automation memory lane. He highlighted some of the historical and political events that shaped the language industry as we know it today, making us realize not only that some of the people in the room that day played an active role in it, but also that some of those challenges are still valid today, and that we must accelerate in finding ways to overcome them.
The 50s were a decade with a few firsts in translation automation: the first Machine Translation Conference in 1952, the first public demonstration of Georgetown-IBM MT System in Russian in 1954. What followed were other conferences, creation of first research groups and publishing of scientific papers. Mid 60s, the term “artificial intelligence” (AI), coined by John McCarthy, becomes known to the world.
The 70s bring OpenLogos in 1968, an open source system translating from English and German into French, Italian, Spanish and Portuguese, meant to enhance the human translator's work environment. In the same year, Peter Toma founds LATSEC to market Systran – the first real commercial MT company, a pioneer that withstood the test of time.
In the early 80s, Carnegie Mellon University Computer Science Department develops Harpy, the first real-time 1,000 word continuous speech recognition system. A year later, in 1971, Logos begins working on a rule-based English-Vietnamese engine for the US authorities to be able to transfer military technology to South Vietnam. The same year, a project begins at Brigham Young University to develop an automated translation system.
Skipping a decade known for the launch of various MT companies, we get to 2005. The conditions are better - there are web services, the cloud, MOSES and statistical machine translation. TAUS is one year old and coming strong with the idea that translation should be available to every citizen of the world - The Promise of Translation "Out of the Wall".
Fast forward fourteen years, and there we were, at the TAUS Industry Leaders Forum in Salt Lake City, in the era of Neural Machine Translation (NMT), and amidst the fourth industrial revolution caused by digitalization. The future is change, and we are not ready: our global content pipelines are not yet fully invisible, autonomous and data-driven, and the global content supply chain is largely fragmented. To prepare, we need to bridge the knowledge, operational and data gaps on the industry-level, together!
Not Business as Usual
Aiman Copty (Oracle), JP Barraza (Systran) and Stéphan Déry (Canada Translation Bureau) shared their thoughts on traditional roles, processes and data acquisition models, to inspire honest discussions in the following sessions.
Machine Translation (MT) is not a solution that replaces humans, it is making content that you have no time for available, explains JP. With the proliferation of NMT and the content explosion, Aiman sees an opportunity for language professionals to become language advisors. What remains a question is where we will be looking for these experts in the future.
Operationally, the future involves an invisible translation workflow and moving up the value chain in MT with language-specific design. Most of the products will have machine learning built into them, explains Aiman, and it will be the international element that a device needs to learn. At the Translation Bureau Canada, they’ve already seen a 40-55% efficiency gain from the switch from TM to NMT, but they still outsource tens of millions of dollars worth of translations. How do I pay my translators? How do I measure?, is an unsolved question for Stéphan.
Data is crucial if we want translations to become an extension of the products, Aiman stresses. JP points out that in domains with little data available, you need a human in the loop. The long tail conversation and the race for data has also started for the Translation Bureau Canada, as the government announced their plan to focus on tens of indigenous languages in the near future.Buyers Perspective
The panel with Mimi Hills (VMWare), Bert Vander Meeren (Google) and Shelby Cemer (Microsoft) sets the stage for the buyers discussion.
The operational gap with buyers seems to lie with the types of content that still require heavy manual work. Ideally, they want to move toward low or no touch. It also appears that for some, the tolerance threshold for quality went down for some pieces of content and locales, while on the service side they still have a clearly defined threshold. Getting data around it from customers becomes key here, even though user feedback can be hard to mine.
The data gap is clearly visible in the push from the market to go into long tail languages where data is currently missing. At the intersection between operations and data, where business decisions on the market language planning side are made, there is an opportunity for a more consolidated approach.
- Roles and skills: The roles are changing, there is an evident need for cross-pollination between teams and equipping translators with technical skills. The new roles that need to emerge on the buyer side:
- Global content performance data analyst - to measure and analyze the performance results
- Technical trainer - to help developer teams learn about localization
- Cultural awareness officer and/or corporate anthropologist - to help central teams make local marketing and business decisions
- Process: customer-centric process needs to be the end game.
- Innovation: when do you need it?
- When there is a change in volumes and/or the number of target markets
- When business strategy and revenue drivers change
- When market conditions change raising the need to balance budget with time to market
- When the exceptional manual processes become the norm
- Quality: there is a need for a standardized quality approach, integrated and flexible, with a clear hierarchy and dimensions that focus on the criteria level. Quality needs to be purpose-driven and the standard able to prove its business usefulness
- Correlation: the quality paradigm needs to be shared among buyers and give them the ability to make decisions and justify investments.
- Open source or for-profit data marketplace? Open sourcing attracts talents to the organizations. However, open sourcing tools might be easier than open sourcing data. The benefit of sharing data for the data owners is not clear, the incentive is with those who have no data, and derivative work is often hard to differentiate from the source.
- Which data is relevant? Language data for training models and metadata for adding another layer to the algorithms. Non-language data like translator data or business data is useful when it relates to the business success metrics, like conversion, but it’s not shareable.
- Cost, quality and data provenance: how can we know data quality or avoid duplicate data without engaging the expensive human element? As a way of protecting, the buyers could enforce the requirement to deliver metadata about the provider.
- Other relevant actions:
- TAUS could identify scarce language pairs and specific data problems and develop best practices to address them
- Research is imperative in our industry, so more data should be made available to universities
Technology Solution Providers Perspective
The panelists Jim Saunders (SDL), Jack Welde (Smartling), Sarah Weldon (Google) and Bob Willans (XTM) each share the same perspective when it comes to MT and AI - they see them as catalysts for change and the way to a no-touch automated supply chain. The technology marches forward and we need to catch-up.
Customization is the new niche, and that is where the data is greatly needed. When it comes to the knowledge gaps, the panelists stressed the importance of the cultural mindshift that has to occur among the people, not only the scarcity of resources.
- External and internal Gaps:
- In the industry, the lack of technological understanding creates logical resistance to change
- Within technology companies there is a lack of understanding regarding.
- the selling process from the clients;
- the complexities that other languages add, as the development focus is mainly on English;
- the impact of the work, in terms of sales, clicks, etc;
- Communication. Disconnect between the technology providers and end users, as well as the developers and clients.
- Suggested solution:
- TAUS to create video documentation: mythbusters for MT, and achieve industry terminology standardization
- Can we standardize and maintain differentiation? Anybody should be able to add the standards to a platform. The competitive advantage is in the platform itself and the layers that are built on top: customization, specialization, the ease of use.
- Next thing for MT:
- going more to paragraph and sentence level, handling more context
- augmented and adaptive MT
- Value, provenance & lifecycle: the value of data depends on the utility of the data. In order to establish utility, data needs to be validated through an automated system and/or manual checks. At the same time, its provenance needs to be clear - which hands did it go through? Lastly, there needs to be an agreement about when data is considered old or obsolete
- Data governance: who has access to data needs to be clearly defined
- Business opportunity for technology providers: algorithm to clean and prepare the data
Language Service Providers Perspective
Let’s stop talking about the factory, let’s talk about the outcomes!, said Jaime Punishill (Lionbridge), comparing the development of the global content industry with fintech. Panelists Erik Vogt (RWS Moravia), Olga Beregovaya (Welocalize), Garry Levitt (thebigword), Allison McDougall (AMPLEXOR) continued on the same line of thought.
Not all content is created equal, and language service providers need to look at who is going to use the content, where it will be served and consumed and what its shelf life is. The content transformation introduces new value added services based around data, marketing and digital transformation.
They also need to be able to follow the customer demand and innovate on behalf of the client at the same time, without losing the reputation for problem solving or losing their margins. An opportunity to help close the knowledge gap is by going beyond the usual pool of linguists to help customers identify the talents.
- Hiring strategic thinkers: service providers are in a position where they need to get more strategic in order to support the multidimensional mass customization. That starts by recruiting strategic thinkers as project managers.
- Learning curve issue: in a continuous improvement cycle, the regular learning curve is interrupted and hastened. That needs to be taken into consideration when developing training materials.
- Opportunity for rebuilding trust: using new tools and channels to recruit talents for clients.
- Alignment: aligning service provider and clients KPIs is key
- Measuring: service providers often lack access to the field data and content performance data, so they need to focus on client feedback to drive their strategy. There should be a set of standard success measurements for providers.
- Quality: Human LQA for MT is cumbersome and costly. A possible alternative could be to run MT through A/B testing instead.
- Legacy: there is still a strong cultural resistance to re-thinking the whole process among the providers
- Opportunity: the service provider is the leader of the effort and taking the customer where they want to go, so they are best positioned to design new metrics for translation and source content profiling.
- Where to start with data? Start by logging your quality and TM data, you can analyze it later.
- CSAT!: establishing the measurements for and measuring customer satisfaction is imperative
- Internal data: having a feedback loop between linguists and production team is a must
It is clear that the industry is in the phase of transformation and that MT is not the boogeyman anymore. While the buyers are focused on the high level topics like making the process invisible, data privacy and GDPR, the providers are rushing to develop new value-added services and are worrying about the access to data and data ownership. There is a clear need for more collaboration, consolidation and shift in focus to the outcomes, not the factory!