
Enhancing multilingual information retrieval: The efficacy of hybrid approaches
- 1 Shandong University, Shandong, China
* Author to whom correspondence should be addressed.
Abstract
Multilingual Information Retrieval (MLIR) plays a crucial role in accessing information across different languages. This paper explores various techniques and tools used in Cross-Language Information Retrieval (CLIR), focusing on query translation, document translation, and hybrid approaches. Query translation employs bilingual dictionaries and machine translation systems to convert user queries from one language to another, whereas document translation involves translating documents into the query language for indexing and retrieval. Hybrid approaches combine these methods to optimize retrieval performance, leveraging the strengths of both to address their individual limitations. Our comparative analysis shows that hybrid systems consistently outperform standalone query or document translation systems, achieving higher precision, recall, and user satisfaction. For instance, hybrid systems in multilingual legal document retrieval tasks achieved precision rates of 88%, recall rates of 82%, and an F1 score of 0.85. These results underscore the effectiveness of hybrid approaches in handling the complexities of MLIR, providing more accurate and comprehensive retrieval outcomes. This study highlights the practical benefits of hybrid CLIR systems and suggests directions for future research in enhancing multilingual access to information.
Keywords
Multilingual Information Retrieval, Cross-Language Information Retrieval, Query Translation, Document Translation
[1]. Lawrie, Dawn, et al. "Neural approaches to multilingual information retrieval." European Conference on Information Retrieval. Cham: Springer Nature Switzerland, 2023.
[2]. Kim, Jungyeon, Sehwan Chung, and Seokho Chi. "Cross-Lingual Information Retrieval from Multilingual Construction Documents Using Pretrained Language Models." Journal of Construction Engineering and Management 150.6 (2024): 04024041.
[3]. Jeronymo, Vitor Amancio. Advancements in multilingual and cross-lingual information retrieval: a study of lexical and reranking pipelines and their impact on effectiveness. Diss. [sn], 2023.
[4]. Mayfield, James, et al. "Synthetic Cross-language Information Retrieval Training Data." arXiv preprint arXiv:2305.00331 (2023).
[5]. Manwar, Vivek A., Rita L. Gupta, and A. B. Manwar. "Word Sense Disambiguation for Marathi Language in Cross Language Information Retrieval." Recent Advancements in Science and Technology (2024): 155.
[6]. Zhebel, V. V., et al. "Approaches to Cross-Language Retrieval of Similar Legal Documents Based on Machine Learning." Scientific and Technical Information Processing 50.5 (2023): 494-499.
[7]. Adeyemi, Mofetoluwa. Facilitating Cross-Lingual Information Retrieval Evaluations for African Languages. MS thesis. University of Waterloo, 2024.
[8]. Basit, Abdul, et al. "Cross-Lingual Information Retrieval in a Hybrid Query Model for Optimality." Journal of Computing & Biomedical Informatics 5.01 (2023): 130-141.
Cite this article
Hu,J. (2024). Enhancing multilingual information retrieval: The efficacy of hybrid approaches. Applied and Computational Engineering,69,97-102.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 6th International Conference on Computing and Data Science
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).