Noising Scheme for Data Augmentation in Automatic Post-Editing

WonKee Lee; Jaehun Shin; Baikjin Jung; Jihyung Lee; Jong-Hyeok Lee

Noising Scheme for Data Augmentation in Automatic Post-Editing

WonKee Lee, Jaehun Shin, Baikjin Jung, Jihyung Lee, Jong-Hyeok Lee

Abstract

This paper describes POSTECH’s submission to WMT20 for the shared task on Automatic Post-Editing (APE). Our focus is on increasing the quantity of available APE data to overcome the shortage of human-crafted training data. In our experiment, we implemented a noising module that simulates four types of post-editing errors, and we introduced this module into a Transformer-based multi-source APE model. Our noising module implants errors into texts on the target side of parallel corpora during the training phase to make synthetic MT outputs, increasing the entire number of training samples. We also generated additional training data using the parallel corpora and NMT model that were released for the Quality Estimation task, and we used these data to train our APE model. Experimental results on the WMT20 English-German APE data set show improvements over the baseline in terms of both the TER and BLEU scores: our primary submission achieved an improvement of -3.15 TER and +4.01 BLEU, and our contrastive submission achieved an improvement of -3.34 TER and +4.30 BLEU.

Anthology ID:: 2020.wmt-1.83
Volume:: Proceedings of the Fifth Conference on Machine Translation
Month:: November
Year:: 2020
Address:: Online
Editors:: Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 783–788
Language:
URL:: https://aclanthology.org/2020.wmt-1.83
DOI:
Bibkey:
Cite (ACL):: WonKee Lee, Jaehun Shin, Baikjin Jung, Jihyung Lee, and Jong-Hyeok Lee. 2020. Noising Scheme for Data Augmentation in Automatic Post-Editing. In Proceedings of the Fifth Conference on Machine Translation, pages 783–788, Online. Association for Computational Linguistics.
Cite (Informal):: Noising Scheme for Data Augmentation in Automatic Post-Editing (Lee et al., WMT 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.wmt-1.83.pdf
Video:: https://slideslive.com/38939646
Data: eSCAPE

PDF Cite Search Video