Infosys Machine Translation System for WMT20 Similar Language Translation Task
Kamalkumar Rathinasamy, Amanpreet Singh, Balaguru Sivasambagupta, Prajna Prasad Neerchal, Vani Sivasankaran
Abstract
This paper describes Infosys’s submission to the WMT20 Similar Language Translation shared task. We participated in Indo-Aryan language pair in the language direction Hindi to Marathi. Our baseline system is byte-pair encoding based transformer model trained with the Fairseq sequence modeling toolkit. Our final system is an ensemble of two transformer models, which ranked first in WMT20 evaluation. One model is designed to learn the nuances of translation of this low resource language pair by taking advantage of the fact that the source and target languages are same alphabet languages. The other model is the result of experimentation with the proportion of back-translated data to the parallel data to improve translation fluency.- Anthology ID:
- 2020.wmt-1.52
- Volume:
- Proceedings of the Fifth Conference on Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 437–441
- Language:
- URL:
- https://aclanthology.org/2020.wmt-1.52
- DOI:
- Bibkey:
- Cite (ACL):
- Kamalkumar Rathinasamy, Amanpreet Singh, Balaguru Sivasambagupta, Prajna Prasad Neerchal, and Vani Sivasankaran. 2020. Infosys Machine Translation System for WMT20 Similar Language Translation Task. In Proceedings of the Fifth Conference on Machine Translation, pages 437–441, Online. Association for Computational Linguistics.
- Cite (Informal):
- Infosys Machine Translation System for WMT20 Similar Language Translation Task (Rathinasamy et al., WMT 2020)
- Copy Citation:
- PDF:
- https://aclanthology.org/2020.wmt-1.52.pdf
- Video:
- https://slideslive.com/38939615
Export citation
@inproceedings{rathinasamy-etal-2020-infosys, title = "Infosys Machine Translation System for {WMT}20 Similar Language Translation Task", author = "Rathinasamy, Kamalkumar and Singh, Amanpreet and Sivasambagupta, Balaguru and Prasad Neerchal, Prajna and Sivasankaran, Vani", editor = {Barrault, Lo{\"\i}c and Bojar, Ond{\v{r}}ej and Bougares, Fethi and Chatterjee, Rajen and Costa-juss{\`a}, Marta R. and Federmann, Christian and Fishel, Mark and Fraser, Alexander and Graham, Yvette and Guzman, Paco and Haddow, Barry and Huck, Matthias and Yepes, Antonio Jimeno and Koehn, Philipp and Martins, Andr{\'e} and Morishita, Makoto and Monz, Christof and Nagata, Masaaki and Nakazawa, Toshiaki and Negri, Matteo}, booktitle = "Proceedings of the Fifth Conference on Machine Translation", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2020.wmt-1.52", pages = "437--441", abstract = "This paper describes Infosys{'}s submission to the WMT20 Similar Language Translation shared task. We participated in Indo-Aryan language pair in the language direction Hindi to Marathi. Our baseline system is byte-pair encoding based transformer model trained with the Fairseq sequence modeling toolkit. Our final system is an ensemble of two transformer models, which ranked first in WMT20 evaluation. One model is designed to learn the nuances of translation of this low resource language pair by taking advantage of the fact that the source and target languages are same alphabet languages. The other model is the result of experimentation with the proportion of back-translated data to the parallel data to improve translation fluency.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="rathinasamy-etal-2020-infosys"> <titleInfo> <title>Infosys Machine Translation System for WMT20 Similar Language Translation Task</title> </titleInfo> <name type="personal"> <namePart type="given">Kamalkumar</namePart> <namePart type="family">Rathinasamy</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Amanpreet</namePart> <namePart type="family">Singh</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Balaguru</namePart> <namePart type="family">Sivasambagupta</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Prajna</namePart> <namePart type="family">Prasad Neerchal</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Vani</namePart> <namePart type="family">Sivasankaran</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2020-11</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the Fifth Conference on Machine Translation</title> </titleInfo> <name type="personal"> <namePart type="given">Loïc</namePart> <namePart type="family">Barrault</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ondřej</namePart> <namePart type="family">Bojar</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Fethi</namePart> <namePart type="family">Bougares</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Rajen</namePart> <namePart type="family">Chatterjee</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Marta</namePart> <namePart type="given">R</namePart> <namePart type="family">Costa-jussà</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Christian</namePart> <namePart type="family">Federmann</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mark</namePart> <namePart type="family">Fishel</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Alexander</namePart> <namePart type="family">Fraser</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yvette</namePart> <namePart type="family">Graham</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Paco</namePart> <namePart type="family">Guzman</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Barry</namePart> <namePart type="family">Haddow</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Matthias</namePart> <namePart type="family">Huck</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Antonio</namePart> <namePart type="given">Jimeno</namePart> <namePart type="family">Yepes</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Philipp</namePart> <namePart type="family">Koehn</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">André</namePart> <namePart type="family">Martins</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Makoto</namePart> <namePart type="family">Morishita</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Christof</namePart> <namePart type="family">Monz</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Masaaki</namePart> <namePart type="family">Nagata</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Toshiaki</namePart> <namePart type="family">Nakazawa</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Matteo</namePart> <namePart type="family">Negri</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Online</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>This paper describes Infosys’s submission to the WMT20 Similar Language Translation shared task. We participated in Indo-Aryan language pair in the language direction Hindi to Marathi. Our baseline system is byte-pair encoding based transformer model trained with the Fairseq sequence modeling toolkit. Our final system is an ensemble of two transformer models, which ranked first in WMT20 evaluation. One model is designed to learn the nuances of translation of this low resource language pair by taking advantage of the fact that the source and target languages are same alphabet languages. The other model is the result of experimentation with the proportion of back-translated data to the parallel data to improve translation fluency.</abstract> <identifier type="citekey">rathinasamy-etal-2020-infosys</identifier> <location> <url>https://aclanthology.org/2020.wmt-1.52</url> </location> <part> <date>2020-11</date> <extent unit="page"> <start>437</start> <end>441</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T Infosys Machine Translation System for WMT20 Similar Language Translation Task %A Rathinasamy, Kamalkumar %A Singh, Amanpreet %A Sivasambagupta, Balaguru %A Prasad Neerchal, Prajna %A Sivasankaran, Vani %Y Barrault, Loïc %Y Bojar, Ondřej %Y Bougares, Fethi %Y Chatterjee, Rajen %Y Costa-jussà, Marta R. %Y Federmann, Christian %Y Fishel, Mark %Y Fraser, Alexander %Y Graham, Yvette %Y Guzman, Paco %Y Haddow, Barry %Y Huck, Matthias %Y Yepes, Antonio Jimeno %Y Koehn, Philipp %Y Martins, André %Y Morishita, Makoto %Y Monz, Christof %Y Nagata, Masaaki %Y Nakazawa, Toshiaki %Y Negri, Matteo %S Proceedings of the Fifth Conference on Machine Translation %D 2020 %8 November %I Association for Computational Linguistics %C Online %F rathinasamy-etal-2020-infosys %X This paper describes Infosys’s submission to the WMT20 Similar Language Translation shared task. We participated in Indo-Aryan language pair in the language direction Hindi to Marathi. Our baseline system is byte-pair encoding based transformer model trained with the Fairseq sequence modeling toolkit. Our final system is an ensemble of two transformer models, which ranked first in WMT20 evaluation. One model is designed to learn the nuances of translation of this low resource language pair by taking advantage of the fact that the source and target languages are same alphabet languages. The other model is the result of experimentation with the proportion of back-translated data to the parallel data to improve translation fluency. %U https://aclanthology.org/2020.wmt-1.52 %P 437-441
Markdown (Informal)
[Infosys Machine Translation System for WMT20 Similar Language Translation Task](https://aclanthology.org/2020.wmt-1.52) (Rathinasamy et al., WMT 2020)
- Infosys Machine Translation System for WMT20 Similar Language Translation Task (Rathinasamy et al., WMT 2020)
ACL
- Kamalkumar Rathinasamy, Amanpreet Singh, Balaguru Sivasambagupta, Prajna Prasad Neerchal, and Vani Sivasankaran. 2020. Infosys Machine Translation System for WMT20 Similar Language Translation Task. In Proceedings of the Fifth Conference on Machine Translation, pages 437–441, Online. Association for Computational Linguistics.