Bridging the Resource Gap: Exploring the Efficacy of English and Multilingual LLMs for Swedish

Oskar Holmström; Jenny Kunz; Marco Kuhlmann

Bridging the Resource Gap: Exploring the Efficacy of English and Multilingual LLMs for Swedish

Oskar Holmström, Jenny Kunz, Marco Kuhlmann

Abstract

Large language models (LLMs) have substantially improved natural language processing (NLP) performance, but training these models from scratch is resource-intensive and challenging for smaller languages. With this paper, we want to initiate a discussion on the necessity of language-specific pre-training of LLMs.We propose how the “one model-many models” conceptual framework for task transfer can be applied to language transfer and explore this approach by evaluating the performance of non-Swedish monolingual and multilingual models’ performance on tasks in Swedish.Our findings demonstrate that LLMs exposed to limited Swedish during training can be highly capable and transfer competencies from English off-the-shelf, including emergent abilities such as mathematical reasoning, while at the same time showing distinct culturally adapted behaviour. Our results suggest that there are resourceful alternatives to language-specific pre-training when creating useful LLMs for small languages.

Anthology ID:: 2023.resourceful-1.13
Volume:: Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)
Month:: May
Year:: 2023
Address:: Tórshavn, the Faroe Islands
Editors:: Nikolai Ilinykh, Felix Morger, Dana Dannélls, Simon Dobnik, Beáta Megyesi, Joakim Nivre
Venue:: RESOURCEFUL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 92–110
Language:
URL:: https://aclanthology.org/2023.resourceful-1.13
DOI:
Bibkey:
Cite (ACL):: Oskar Holmström, Jenny Kunz, and Marco Kuhlmann. 2023. Bridging the Resource Gap: Exploring the Efficacy of English and Multilingual LLMs for Swedish. In Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023), pages 92–110, Tórshavn, the Faroe Islands. Association for Computational Linguistics.
Cite (Informal):: Bridging the Resource Gap: Exploring the Efficacy of English and Multilingual LLMs for Swedish (Holmström et al., RESOURCEFUL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.resourceful-1.13.pdf

PDF Cite Search