Received 03.08.2024, Revised 22.10.2024, Accepted 26.12.2024

Chat-based translation of Slavic languages with large language models

Olena Sokol

Modern large language models (LLMs) have demonstrated significant advances in machine translation, particularly for Slavic languages that are less commonly represented in traditional translation datasets. This study aimed to evaluate the effectiveness of LLMs (ChatGPT, Claude, and Llama) in translating conversational texts in Slavic languages compared to commercial translators and transformer models. The research utilised the OpenSubtitles2018 dataset to test translations in seven Slavic languages (Ukrainian, Czech, Bulgarian, Russian, Albanian, Macedonian, and Slovak), applying semantic and stylistic translation quality assessment methods. Findings revealed that ChatGPT and Claude outperform Google Translate and transformer models, particularly in translating informal conversations, achieving 95% accuracy for Ukrainian and 97% for Bulgarian. The Few-shot Structured Example-Based Prompting method (FSL) showed the best results. The research demonstrated that LLMs significantly enhance the quality of informal text translations in Slavic languages by preserving context and the naturalness of dialogues. Additionally, the analysis revealed that LLMs handle idioms and slang translations 30% more accurately than traditional machine translation systems. Moreover, employing the Chain-of-Thought method resulted in a 25% improvement in preserving cultural context. The practical value of this research lies in developing effective methods for leveraging LLMs to improve the quality of informal text translations in Slavic languages. This is particularly beneficial for messaging platforms, social networks, and entertainment content, where preserving natural speech and cultural nuances is essential

LLM; prompt engineering; NLP; TER; COMET; text correlation analysis; CHRF
43-52
Sokol, O. (2024). Chat-based translation of Slavic languages with large language models. Information Technologies and Computer Engineering, 21(3), 43-52. https://doi.org/10.63341/vitce/3.2024.43

References

[1] Bhatt, S., & Diaz, F. (2024). Extrinsic evaluation of cultural competence in large language models. ArXiv. doi: 10.48550/ arXiv.2406.11565.

[2] Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., & Amodei, D. (2020). Language models are fewshot learners. Advances in Neural Information Processing Systems, 33, 1877-1901. doi: 10.48550/arXiv.2005.14165.

[3] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. ArXivdoi: 10.48550/arXiv.1810.04805.

[4] Escolano, C., Costa-jussà, M.R., & Fonollosa, J.A.R. (2020). The TALP-UPC system description for WMT20 news translation task: Multilingual adaptation for low resource MT. In Proceedings of the fifth conference on machine translation (pp. 134-138). Kerrville: ACL.

[5] Freitag, M., Foster, G., Grangier, D., Ratnakar, V., Tan, Q., & Macherey, W. (2021). Experts, errors, and context: A largescale study of human evaluation for machine translation. Transactions of the Association for Computational Linguistics, 9(1), 1460-1474. doi: 10.1162/tacl_a_00437.

[6] Jiao, W., Wu, H., Wang, W., Wan, Y. & Lyu, M. (2023). ChatGPT or Grammarly? Evaluating ChatGPT on grammatical error correction benchmark. ArXiv. doi: 10.48550/arXiv.2303.13648.

[7] Kepler, F., Trénous, J., Treviso, M., Vera, M., & Góis, A. (2021). Comparative analysis of current approaches to quality estimation for neural machine translation. Applied Sciences, 11(14), article number 6584. doi: 10.3390/ app11146584.

[8] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. ArXiv. doi: 10.48550/arXiv.2205.11916.

[9] Koubaa, A., Boulila, W., Ghouti, L., & Alzahem, A. (2023). Exploring ChatGPT capabilities and limitations: A survey. IEEE Access, 11, 95574-95593. doi: 10.1109/ACCESS.2023.3326474.

[10] Liu, J., Shen, D., Zhang, Y., & Dolan, B. (2022). Few-shot learning through structured example-based prompting. In Proceedings of the 60th annual meeting of the association for computational linguistics (ACL 2022) (pp. 7688-7699). doi: 10.18653/v1/2022.acl-long.529.

[11] Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), article number 195. doi: 10.1145/3560815.

[12] Naveen, P., & Trojovský, P. (2024). Overview and challenges of machine translation for contextually appropriate translations. iScience, 27(1), article number 110878. doi: 10.1016/j.isci.2024.110878.

[13] Nicholas, G., & Bhatia, A. (2023). Lost in translation: Large language models in non-english content analysis. Journal of Artificial Intelligence and Society, 15(4), 423-450. doi: 10.48550/arXiv.2306.07377.

[14] NLLB Team et al. (2022). No language left behind: Scaling human-centered machine translation. ArXivdoi: 10.48550/ arXiv.2207.04672.

[15] Popovic, M., & Poncelas, A. (2020). Neural machine translation between similar South-Slavic languages. In Proceedings of the 5th conference on machine translation (WMT) (pp. 430-436). Kerrville: ACL.

[16] Qiu, X. (2023). Cultural differences and translation strategies. Journal of Education and Educational Research, 2(3), 100-105. doi: 10.54097/jeer.v2i3.7741.

[17] Ranathunga, S., Lee, E.A., Skenduli, M.P., Shekhar, R., Alam, M., & Kaur, R. (2021). Neural machine translation for low-resource languages: A survey. ArXiv. doi: 10.48550/arXiv.2106.15115.

[18] Rei, R., Stewart, C., Farinha, A.C., & Lavie, A. (2020). COMET: A neural framework for MT evaluation. In Proceedings of the 2020 conference on empirical methods in natural language (pp. 2685-2702). Kerrville: ACL. doi: 10.18653/v1/2020. emnlp-main.213.

[19] Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 3982-3992). Hong Kong: ACL. doi: 10.18653/v1/ d19-1410.

[20] Reynolds, L., & McDonell, K. (2021). Prompt programming for large language models: Beyond the few-shot paradigm. In CHI EA ‘21: Extended abstracts of the 2021 CHI conference on human factors in computing systems (article number 314). Yokohama: ACM. doi: 10.1145/3411763.3451760.

[21] Sokol, O.O. (2024). Chat-based translation system with LLMs. Retrieved from https://github.com/sokolheavy/slavicllm-translator.

[22] Tang, X., & Zheng, Y. (2023). Unpacking complex language ideologies toward heritage language maintenance: A case of Chinese migrant families in the US. International Multilingual Research Journal, 17(4), 333-350. doi: 10.1080/19313152.2023.2209358.

[23] Tang, Y., Tran, C., Li, X., Chen, P. J., Goyal, N., Chaudhary, V., Gu, J., & Fan, A. (2021). Multilingual translation from denoising pre-training. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 3450-3466). Kerrville: ACL. doi: 10.18653/v1/2021.findings-acl.304.

[24] Tiedemann, J., & Thottingal, S. (2020). OPUS-MT – building open translation services for the World. In Proceedings of the 22nd annual conference of the european association for machine translation (pp. 479-480). Lisboa: European Association for Machine Translation.

[25] Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., & Lample, G. (2023). LLaMA: Open and efficient foundation language models. ArXivdoi: 10.48550/arXiv.2302.13971.

[26] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. ArXivdoi: 10.48550/arXiv.2201.11903.

[27] Wieting, J., Berg-Kirkpatrick, T., Gimpel, K., & Neubig, G. (2019). Beyond BLEU: Training neural machine translation with semantic similarity. ArXiv. doi: 10.48550/arXiv.1909.06694.

[28] Zhu, W., Liu, H., Dong, Q., Xu, J., Huang, S., Kong, L., Chen, J., & Li, L. (2023). Multilingual machine translation with large language models: Empirical results and analysis. ArXiv. doi: 10.48550/arXiv.2304.04675.