A Hybrid Approach for Ontology-based Information Extraction

Gutierrez, Fernando

A Hybrid Approach for Ontology-based Information Extraction

Files

Gutierrez_oregon_0171A_11473.pdf (932.07 KB)

Date

2016-02-23

Authors

Gutierrez, Fernando

Publisher

University of Oregon

Abstract

Information extraction (IE) is the process of automatically transforming written natural language (i.e., text) into structured information, such as a knowledge base. However, because natural language is inherently ambiguous, this transformation process is highly complex. On the other hand, as Information Extraction moves from the analysis of scientific documents to the analysis of Internet textual content, we cannot rely completely on the assumption that the content of the text is correct. Indeed, in contrast to scientific documents, which are peer reviewed, Internet content is not verified for the quality and correctness. Thus, two main issues that affect the IE process are the complexity of the extraction process and the quality of the data. In this dissertation, we propose an improved ontology-based IE (OBIE) by providing solutions to these issues of accuracy and content quality. Based on a hybrid strategy that combines aspects of IE that are usually considered as opposite to each other, or that are not even considered, we intend to improve IE by developing a more accurate extraction and new functionality (semantic error detection). Our approach is based on OBIE, a sub-area of IE, which reduces extraction complexity by including domain knowledge, in the form of concepts and relationships of the domain, to guide the extraction process. We address the complexity of extraction by combining information extractors that have different implementations. By integrating different types of implementation into one extraction system, we can produce a more accurate extraction. For each concept or relationship in the ontology, we can select the best implementation for extraction, or we can combine both implementations under an ensemble learning schema. In tandem, we address the quality of information by determining its semantic correctness with regard to domain knowledge. We define two methods for semantic error detection: by predefining the types of errors expected in the text or by applying logic reasoning to the text. This dissertation includes both published and unpublished coauthored material.

Keywords

Information extraction, Logic Reasoning, Ontology

URI

https://hdl.handle.net/1794/19729

Collections

Theses and Dissertations
Computer Science Theses and Dissertations

Full item page

Scholars' Bank

A Hybrid Approach for Ontology-based Information Extraction

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections