Enhancing Multilingual Information Extraction Towards Global Linguistic Inclusivity

Loading...
Thumbnail Image

Date

2024-08-07

Authors

Nguyen, Van Minh

Journal Title

Journal ISSN

Volume Title

Publisher

University of Oregon

Abstract

In our interconnected world, the diversity of around 7,000 languages presents challenges and opportunities for bridging language barriers. Multilingual information extraction (Multilingual IE) is crucial in natural language processing (NLP) for extracting information from texts across languages, facilitating global understanding and information equity. Despite advancements, the focus on high-resource languages has marginalized speakers of less-represented languages. Multilingual IE seeks to correct this by embracing linguistic diversity and inclusivity. This dissertation enhances Multilingual IE to address challenges of linguistic diversity, data scarcity, and model generalization, aiming to make IE technologies more accessible. It focuses on developing sophisticated algorithms for tasks like event trigger detection, event argument extraction, entity mention recognition, and relation extraction. The goal is to create a system capable of accurate information extraction across diverse languages, supporting global communication and cultural preservation. Furthermore, the importance of IE in the era of large language models (LLMs) remains significant. While LLMs have broadened NLP's capabilities, the precise, context-specific information provided by IE is essential, especially in retrieval-augmented generation (RAG) settings. This underscores IE's ongoing relevance, ensuring LLMs retrieve accurate, relevant information and highlighting IE's critical role in advancing NLP.

Description

Keywords

information extraction, information retrieval, large language models, multilingual, natural language processing, question answering

Citation