Enhancing Multilingual Information Extraction Towards Global Linguistic Inclusivity
Loading...
Date
2024-08-07
Authors
Nguyen, Van Minh
Journal Title
Journal ISSN
Volume Title
Publisher
University of Oregon
Abstract
In our interconnected world, the diversity of around 7,000 languages presents challenges and opportunities for bridging language barriers. Multilingual information extraction (Multilingual IE) is crucial in natural language processing (NLP) for extracting information from texts across languages, facilitating global understanding and information equity. Despite advancements, the focus on high-resource languages has marginalized speakers of less-represented languages. Multilingual IE seeks to correct this by embracing linguistic diversity and inclusivity. This dissertation enhances Multilingual IE to address challenges of linguistic diversity, data scarcity, and model generalization, aiming to make IE technologies more accessible. It focuses on developing sophisticated algorithms for tasks like event trigger detection, event argument extraction, entity mention recognition, and relation extraction. The goal is to create a system capable of accurate information extraction across diverse languages, supporting global communication and cultural preservation. Furthermore, the importance of IE in the era of large language models (LLMs) remains significant. While LLMs have broadened NLP's capabilities, the precise, context-specific information provided by IE is essential, especially in retrieval-augmented generation (RAG) settings. This underscores IE's ongoing relevance, ensuring LLMs retrieve accurate, relevant information and highlighting IE's critical role in advancing NLP.
Description
Keywords
information extraction, information retrieval, large language models, multilingual, natural language processing, question answering