Hundreds of researchers, students, recruiters, and business professionals came to Brussels this November to learn about recent advances, and share their own findings, in computational linguistics and Natural Language Processing (NLP). The events that brought all of them together were: EMNLP 2018, one of the biggest conferences on Natural Language Processing in the world, and WMT 2018, which for many years has been one of the most reputable conferences in the field of machine translation (MT).
As expected, MT was one of the most popular topics at both events. My general feeling is that the research community is truly coming to an understanding of the real potential of recent breakthroughs, as could be seen in the use of the latest architectural frameworks such as Transformer and RNMT+ as baselines for the works presented on stage.
So, what's new in the world of machine translation and what can we expect in 2019?
The back translation technique enables the use of synthetic parallel data, obtained by automatically translating cheap and in many cases available information in the target language into the source language (and vice versa). The synthetic parallel data generated in this way is combined with authentic parallel texts and used to improve the quality of NMT systems.
Back translation was introduced in the technology’s early days, however, it has only recently gotten serious attention from leading researchers, particularly in fundamental issues such as:
- Does back-translation direction matter?
- How much monolingual back-translated data is necessary to see a significant impact in MT quality?
- Which sentences are worth back-translating and which can be skipped?
Answers to these questions were found in papers published by both Facebook/Google and the University of Amsterdam. Overall, we are becoming smarter in selecting incremental synthetic data in NMT that helps improve both: performance of the systems and translation accuracy.
2-Byte-Pair Encoding or Character-Based?
Similar to back-translation, byte-pair encoding (BPE), which helps NMT systems translate rare words by using a fixed vocabulary of sub-words, is nothing new. Neither is the idea of training an NMT system at character level, instead of at the word level (at the cost of increased training and decoding time).
Nonetheless, it was interesting to see Google AI present a systematic comparison of BPE and character-based ways of handling out-of-vocabulary words from deep learning perspective. The main conclusion reached by the Google team is that depending on the amount and nature of data an NMT system architects have at their disposal, both techniques can be applied.
3-Is Automatic Post-Editing (APE) a Thing?
We have been talking about Automatic Post-Editing for a while now, but will it really be the next big thing in the field of Neural Machine Translation? According to what was discussed at WMT 2018 that might not be the case — at least not anytime soon.
The results of the WMT 2018 Shared Task on Automatic Post-Editing were, in fact, slightly discouraging. The researchers ended up concluding that “the output of NMT systems is harder to improve by current neural APE technology” because the “NMT translations are of higher quality.”
In spite of all this, I still believe that some forms of APE for NMT have significant potential. For example, in a form of contrastive examples generated from annotated post-edited data which can teach NMT to avoid certain types of errors.
4-Is Marian Winning the NMT Toolkits Game?
It's fascinating to see the transformation of the technical requirements of the competing NMT toolkits. While a few months earlier, the main requirements were stability and performance, nowadays, the availability of features and customization flexibility are becoming the main decisive factors. Of course, a full support of the three state-of-the-art architectures (RNN, Transformer, and CNN) is a prerequisite.
While OpenNMT was the most popular (or at least most prominent) NMT toolkit early 2018, it now seems that Marian is getting more traction due to its speed, features and supporting community. According to Marian’s Twitter account, its NMT toolkit was cited almost twice as often as the second best tool. As a matter of fact, we at Unbabel are using Marian as a core component of our NMT engines and are very happy with its performance and progress.
5-Parallel Filtering is a Thing
One of the shared tasks for WMT was one of parallel data filtering for MT. Participants were given a very noisy corpus of sentence pairs crawled from the web. Their task was to select a small percentage of data that maximizes the quality of a machine translation system that was trained on it.
It was interesting to see that most participants used similar configurations of the systems that basically establishes a baseline for further applications of this technique in real-world scenarios.
If you’re curious, here’s an overview of the main findings of the evaluation campaign.
6-Deep Words Representation
Not linked to any particular paper disclosed at the conference, but the BERT configuration was widely discussed and often mentioned during presentations and coffee breaks. BERT is a new milestone in NLP. It pushes the state-of-the-art of 11 different NLP tasks, including question answering, named entity recognition and of course machine translation.
Thanks to Google, the pre-trained models are available in multiple languages. Furthermore, Thomas Wolf from Hugging Face reimplemented BERT on PyTorch that made it readily available for research purposes.
7-Have we Finally Solved Machine Translation?
Over the course of 2017 and 2018, it's been claimed, multiple times, that machine translation is close to being solved or almost reaching "human parity" — at least in some language pairs.
Researchers from two acclaimed universities, the University of Zurich and the University of Edinburgh, took the decision of verifying those claims. Their findings concluded that while at the sentence level it might be true, at the document-level it is not — the quality gap between machine and human translation is far from being closed. And they go even further as they demonstrate that there is "the need to shift towards document-level evaluation as machine translation improves to the degree that errors which are hard or impossible to spot at the sentence-level become decisive in discriminating quality of different translation outputs."
In the end, we’re making huge progress in machine translation, but every time we get better people’s expectations increase. It’s our job to try and keep up with them.
If you would like to know more about any of the topics mentioned above or get feedback on the conference, click below to join the TAUS MT User Group call on the 6th of December where we will discuss these and other findings of EMNLP 2018.
7 minute read