A Hybrid Approach for Ontology-based Information Extraction

Gutierrez, Fernando

A Hybrid Approach for Ontology-based Information Extraction

dc.contributor.advisor	Dou, Dejing
dc.contributor.author	Gutierrez, Fernando
dc.date.accessioned	2016-02-24T00:34:06Z
dc.date.available	2016-02-24T00:34:06Z
dc.date.issued	2016-02-23
dc.description.abstract	Information extraction (IE) is the process of automatically transforming written natural language (i.e., text) into structured information, such as a knowledge base. However, because natural language is inherently ambiguous, this transformation process is highly complex. On the other hand, as Information Extraction moves from the analysis of scientific documents to the analysis of Internet textual content, we cannot rely completely on the assumption that the content of the text is correct. Indeed, in contrast to scientific documents, which are peer reviewed, Internet content is not verified for the quality and correctness. Thus, two main issues that affect the IE process are the complexity of the extraction process and the quality of the data. In this dissertation, we propose an improved ontology-based IE (OBIE) by providing solutions to these issues of accuracy and content quality. Based on a hybrid strategy that combines aspects of IE that are usually considered as opposite to each other, or that are not even considered, we intend to improve IE by developing a more accurate extraction and new functionality (semantic error detection). Our approach is based on OBIE, a sub-area of IE, which reduces extraction complexity by including domain knowledge, in the form of concepts and relationships of the domain, to guide the extraction process. We address the complexity of extraction by combining information extractors that have different implementations. By integrating different types of implementation into one extraction system, we can produce a more accurate extraction. For each concept or relationship in the ontology, we can select the best implementation for extraction, or we can combine both implementations under an ensemble learning schema. In tandem, we address the quality of information by determining its semantic correctness with regard to domain knowledge. We define two methods for semantic error detection: by predefining the types of errors expected in the text or by applying logic reasoning to the text. This dissertation includes both published and unpublished coauthored material.	en_US
dc.identifier.uri	https://hdl.handle.net/1794/19729
dc.language.iso	en_US
dc.publisher	University of Oregon
dc.rights	All Rights Reserved.
dc.subject	Information extraction	en_US
dc.subject	Logic Reasoning	en_US
dc.subject	Ontology	en_US
dc.title	A Hybrid Approach for Ontology-based Information Extraction
dc.type	Electronic Thesis or Dissertation
thesis.degree.discipline	Department of Computer and Information Science
thesis.degree.grantor	University of Oregon
thesis.degree.level	doctoral
thesis.degree.name	Ph.D.

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Gutierrez_oregon_0171A_11473.pdf
Size:: 932.07 KB
Format:: Adobe Portable Document Format

Download

Collections

Theses and Dissertations
Computer Science Theses and Dissertations