Today, AI breakthroughs in surpassing human ability in certain activities make headlines. AI capacity has been improved not only by the advances in AI and ML algorithms and available compute power but more specifically by the massive increase of digital data. We are now searching for photographs, videos, audio, and language data in speech and writing in addition to the classification of all forms of collected data. All in an effort to teach human perception of real-world experiences to AI systems, ranging from verbal, non-verbal, visual, and written communication to smart actions often attributed to humans such as driving, making ethical judgments, and so on.
Today, only an estimated 2% of all newly-created data in Western Europe are processed using AI algorithms once they have left the device on which data are captured or created. This proportion is expected to grow to well above 50% by the end of 2025. AI is here right now, available to and benefiting us all in the form of autonomous vehicles, weapons, domestic devices, including smart speakers, games consoles, and more. We are increasingly surrounded by - even embedded in - pervasive automated perception, analysis, and action.
Yet there is little understanding of the consequences of this for our social order and existing industries; until recently they have barely been studied. One field that AI has already deeply penetrated is the language industry, but its potential and limits have not yet been fully defined.
In an effort to explore the impact of AI on the language industry, TAUS has now published the Language Data for AI Report. This describes Language Data for AI, a new industry sub-sector resulting from converging trends in the AI and language industries, LD4AI in short. We begin by dialing back a few decades to analyze how language evolved into data and how that megatrend starts to reinvent parts of the professional language industry. Our goal with this report is to offer context and perspective to assess the opportunities and challenges for both buyers and new providers entering this industry.
We’ve built this report on the earlier findings shared in two previous TAUS reports TAUS Data Market White Paper (June 2017) and Translation Data Landscape Report (Dec 2015). To establish our findings we have carried out one-on-one interviews and questionnaires with data services providers including Lionbridge, Clickworker, Welocalize, Unbabel, Utopia Analytics, BasicAI, and PacteraEdge. And to understand the tendencies within the traditional language industry, we have surveyed 205 LSPs, asking them for their insights on whether the language industry is at least partially transforming into a data industry. And lastly, we interviewed topic experts including Paul Leahy (Oracle), Shahram Khadivi (eBay), Marcello Federico, and Suzanne Fogarty (Microsoft).
LD4AI is well en route to becoming a fully-fledged business sector. In this report, we share the insights into the how, why and where to of this emerging sub-sector in the following headings:
- Introduction: Previous TAUS reports and the rationale behind this new publication
- Birth of a New Industry Sector: Language Data for AI (LD4AI): Converging trends in the AI and language industries
- Data and Language: Noam Chomsky versus Randolph Quirk - the back story to the growing awareness of the unreasonable effectiveness of data.
- Translation Economics: Meanwhile in the real world: automatic translation has become a thousand times more productive than human translation and laid a solid foundation for expansion into new services and business in the language sector.
- The Data-First Paradigm Shift: There is enough evidence, both from a technology and from an economic perspective, that the translation industry is in the midst of a paradigm shift. Computer-aided translation is making space for a data-first approach.
- The Language Data Landscape: How language service providers branch off into new services and change the translation landscape.
- Profiling the New Cultural Professional: The deep need for native, organic data changes the profile of the workers needed in the global language business.
- Trends and Takeaways: Five takeaways from our explorations in this new sub-sector.
- TAUS Data Marketplace: How the Data Marketplace serves as an enabler and accelerator in the Language Data for the AI sector.
How big is the LD4AI subsector? Will it get any bigger? How will the role of the linguists shift as the crowdworking platforms get more common? What LD4AI services are currently being offered by the language service providers? What will be the future of LD4AI services? How an open marketplace for language data monetization and acquisition fits into this new picture and can you become a part of it? The full report provides answers to these and more questions. To learn more about the survey results, and the notes on interviews with the players and experts get your free copy!