Computer Science Theses and Dissertations

Permanent URI for this collection

https://hdl.handle.net/1794/2040

This collection contains some of the theses and dissertations produced by students in the University of Oregon Computer Science Graduate Program. Paper copies of these and other dissertations and theses are available through the UO Libraries.

Browse

Now showing 1 - 20 of 137

Open Access
A Hybrid Approach for Ontology-based Information Extraction
(University of Oregon, 2016-02-23) Gutierrez, Fernando; Dou, Dejing
Information extraction (IE) is the process of automatically transforming written natural language (i.e., text) into structured information, such as a knowledge base. However, because natural language is inherently ambiguous, this transformation process is highly complex. On the other hand, as Information Extraction moves from the analysis of scientific documents to the analysis of Internet textual content, we cannot rely completely on the assumption that the content of the text is correct. Indeed, in contrast to scientific documents, which are peer reviewed, Internet content is not verified for the quality and correctness. Thus, two main issues that affect the IE process are the complexity of the extraction process and the quality of the data. In this dissertation, we propose an improved ontology-based IE (OBIE) by providing solutions to these issues of accuracy and content quality. Based on a hybrid strategy that combines aspects of IE that are usually considered as opposite to each other, or that are not even considered, we intend to improve IE by developing a more accurate extraction and new functionality (semantic error detection). Our approach is based on OBIE, a sub-area of IE, which reduces extraction complexity by including domain knowledge, in the form of concepts and relationships of the domain, to guide the extraction process. We address the complexity of extraction by combining information extractors that have different implementations. By integrating different types of implementation into one extraction system, we can produce a more accurate extraction. For each concept or relationship in the ontology, we can select the best implementation for extraction, or we can combine both implementations under an ensemble learning schema. In tandem, we address the quality of information by determining its semantic correctness with regard to domain knowledge. We define two methods for semantic error detection: by predefining the types of errors expected in the text or by applying logic reasoning to the text. This dissertation includes both published and unpublished coauthored material.
Open Access
A Longitudinal Assessment of Website Complexity
(University of Oregon, 2018-09-06) Mostafavi, Seyed Hooman; Rejaie, Reza
Nowadays, most people use several websites on a daily basis for various purposes like social networking, shopping, reading news, etc. which shows the significance of these websites in our lives. Due to this phenomenon, businesses can make a lot of profit by designing high quality websites to attract more people. An important aspect of a good website is its page load time. There has been a lot of studies which analyzed this aspect of the websites from different perspectives. In this thesis, we characterize and examine the complexity of a wide range of popular websites in order to discover the trends in their complexity metrics, like their number, size and type of the objects and number and type of the contacted servers for delivering the objects, over the past six year. Moreover, we analyze the correlation between these metrics and the page load times.
Open Access
A Tile-Based Approach for Photo-Realistic Volume Rendering
(University of Oregon, 2018-09-06) Mathai, Manish; Childs, Hank
Previous studies on photo-realistic volume rendering have failed to optimize for performance with respect to the cache-hierarchy. With this thesis, we consider a tile-based approach for photo-realistic volume rendering, in an effort to improve cache performance and decrease overall execution time. We evaluated the algorithm compared to the traditional approach, with workloads of varying data sizes, resolutions, samples per pixel. Overall we ran 48 serial experiments, and found that the tile-based approach is consistently faster than the traditional approach, including speedups of up to 20%. Additionally we determine that the improvement does not carry forward directly to parallel platforms like Intel TBB.
Open Access
Accelerating Machine Learning via Multi-Objective Optimization
(University of Oregon, 2021-11-23) Lim, Robert; Malony, Allen
This dissertation work presents various approaches toward accelerating training of deep neural networks with the use of high-performance computing resources, while balancing learning and systems utilization objectives. Acceleration of machine learning is formulated as a multi-objective optimization problem that seeks to satisfy multiple objectives, based on its respective constraints. In machine learning, the objective is to strive for a model that has high accuracy, while eliminating false positives and generalizing beyond the training set. For systems execution performance, maximizing utilization of the underlying hardware resources within compute and power budgets are constraints that bound the problem. In both scenarios, the search space is combinatorial and contains multiple local minima that in many cases satisfies the global optimum. This dissertation work addresses the search of solutions in both performance tuning and neural network training. Specifically, subgraph matching is proposed to bound the search problem and provide heuristics that guide the solver toward the optimal solution. Mixed precision operations is also proposed for solving systems of linear equations and for training neural networks for image classification for evaluating the stability and robustness of the operations. Use cases are presented with CUDA performance tuning and neural network training, demonstrating the effectiveness of the proposed technique. The experiments were carried out on single and multi-node GPU clusters, and reveals opportunities for further exploration in this critical hardware/software co-design space of accelerated machine learning.
Open Access
Accelerating Science with Directive-Based Programming on Heterogeneous Machines and Future Technologies
(University of Oregon, 2021-11-23) Lambert, Jacob; Malony, Allen
Accelerator-based heterogeneous computing has become the de facto standard in contemporary high-performance machines, including upcoming exascale machines. These heterogeneous platforms have been instrumental to the development of computation-based science over the past several years. However, this specialization of hardware has also led to a specialization of associated heterogeneous programming models that are often intimidating to scientific programmers and that may not be portable or transferable between different platforms. Directive-based programming offers one high-level alternative to specialized code, but also introduces its own set of challenges. Many accelerators, like FPGAs, may not support a directive-based approach, and others like GPUs and CPUs may selectively support standards. In this dissertation we perform the necessary research required to further enable directive-based computing to consistently accelerate science on heterogeneous platforms. This research takes the form of three major projects: (1) an OpenACC-to-FPGA framework developed to bring FPGAs under the umbrella of directive-based computing, (2) an OpenACC and OpenMP interoperable framework designed to improve the portability and performance of directive-based standards across different platforms, and (3) an exploration of exascale-intended platforms with directive-based applications. This dissertation includes previously published and co-authored material, as well as unpublished co-authored material.
Open Access
Accurate Distance Calculation Using GPS While Performing Low Speed Activity
(University of Oregon, 2018-09-06) Bennett, Benjamin; Fickas, Stephen
In the last 10 years GPS technology has become widely available due to the proliferation of smart devices with GPS capability. GPS was introduced as a method to assist in location and navigation and is still the most common use for this technology today. Many may not consider that GPS can be used for a variety of different purposes. GPS technology can be used to calculate the distance of activity (running, walking, biking etc.) to build applications to promote exercise and a healthy lifestyle. Calculating this distance accurately is challenging due to the amount of error in GPS location measurements and the low speed of many activities. In this thesis I present methods of calculating distance traveled that reduces this error to produce an accurate distance calculation. I also present an application that uses this distance calculation to help promote children to become more active.
Open Access
An "active vision" computational model of visual search for human-computer interaction
(University of Oregon, 2008-12) Halverson, Timothy E., 1971-
Visual search is an important part of human-computer interaction (HCI). The visual search processes that people use have a substantial effect on the time expended and likelihood of finding the information they seek. This dissertation investigates visual search through experiments and computational cognitive modeling. Computational cognitive modeling is a powerful methodology that uses computer simulation to capture, assert, record, and replay plausible sets of interactions among the many human processes at work during visual search. This dissertation aims to provide a cognitive model of visual search that can be utilized by predictive interface analysis tools and to do so in a manner consistent with a comprehensive theory of human visual processing, namely active vision. The model accounts for the four questions of active vision, the answers to which are important to both practitioners and researchers in HCI: What can be perceived in a fixation? When do the eyes move? Where do the eyes move? What information is integrated between eye movements? This dissertation presents a principled progression of the development of a computational model of active vision. Three experiments were conducted that investigate the effects of visual layout properties: density, color, and word meaning. The experimental results provide a better understanding of how these factors affect human- computer visual interaction. Three sets of data, two from the experiments reported here, were accurately modeled in the EPIC (Executive Process-Interactive Control) cognitive architecture. This work extends the practice of computational cognitive modeling by (a) informing the process of developing computational models through the use of eye movement data and (b) providing the first detailed instantiation of the theory of active vision in a computational framework. This instantiation allows us to better understand (a) the effects and interactions of visual search processes and (b) how these visual search processes can be used computationally to predict people's visual search behavior. This research ultimately benefits HCI by giving researchers and practitioners a better understanding of how users visually interact with computers and provides a foundation for tools to predict that interaction. This dissertation includes-both previously published and co-authored material.
Open Access
Advancing Clinical Natural Language Processing through Knowledge-Infused Language Models
(University of Oregon, 2024-01-09) Lu, Qiuhao; Nguyen, Thien
Pre-trained Language Models (PLMs) have shown remarkable success in general-domain text tasks, but their application in the clinical domain is constrained by specialized language, terminology, and a lack of in-depth understanding of scientific and medical knowledge. As the adoption of Electronic Health Records (EHRs) and intricate clinical documents continues to grow, the need for domain-adapted PLMs in healthcare research and applications becomes increasingly vital. This research proposes innovative strategies to address these challenges, integrating domain-specific knowledge into PLMs to enhance their efficacy in healthcare. Our approach includes (i) fine-tuning models with knowledge graphs and domain-specific textual data, using graph representation learning and data augmentation techniques, and (ii) directly injecting domain knowledge into PLMs through the use of adapters. By employing these methods, the study aims to improve the performance of clinical language models in tasks such as interpreting EHRs, extracting information from clinical documents, and predicting patient outcomes. The advancements achieved in this work hold the potential to significantly influence the field of clinical Natural Language Processing (NLP) and contribute to improved patient care and healthcare innovation.
Open Access
Algorithm capability and applications in artificial intelligence
(University of Oregon, 2008-12) Ray, Katrina
Many algorithms are known to work well in practice on a variety of different problem instances. Reusing existing algorithms for problems besides the one that they were designed to solve is often quite valuable. This is accomplished by transforming an instance of the new problem into an input for the algorithm and transforming the output of the algorithm into the correct answer for the new problem. To capitalize on the efficiency of the algorithm, it is essential that these transformations are efficient. Clearly not all problems will have efficient transformations to a particular algorithm so there are limitations on the scope of an algorithm. There is no previous study of which I am aware on determining the capability of an algorithm in terms of the complexity of problems that it can be used to solve. Two examples of this concept will be presented in proving the exact capability of the most well known algorithms for solving Satisfiability (SAT) and for solving Quantified Boolean Formula (QBF). The most well known algorithm for solving SAT is called DPLL. It has been well studied and is continuously being optimized in an effort to develop faster SAT solvers. The amount of work being done on optimizing DPLL makes it a good candidate for solving other problems. The notion of algorithm capability proved useful in applying DPLL to two areas of AI: Planning and Nonmonotonic Reasoning. Planning is PSPACE Complete in general, but NP Complete when restricted to problems that have polynomial length plans. Trying to optimize the plan length or introducing preferences increases the complexity of the problem. Despite the fact that these problems are harder than SAT, they are with in the scope of what DPLL can handle. Most problems in nonmonotonic reasoning are also harder than SAT. Despite this fact, DPLL is a candidate solution for nonmonotonic logics. The complexity of nonmonotonic reasoning in general is beyond the scope of what DPLL can handle. By knowing the capability of DPLL, one can analyze subsets of nonmonotonic reasoning that it can be used to solve. For example, DPLL is capable of solving the problem of model checking in normal default logic. Again, this problem is harder than SAT, but can still be solved with a single call to a SAT solver. The idea of algorithm capability led to the fascinating discovery that SAT solvers can solve problems that are harder than SAT.
Open Access
ALGORITHM FOR ENUMERATING HYPERGRAPH TRANSVERSALS
(University of Oregon, 2018-04-10) Casita, Roscoe; Norris, Boyana
This paper introduces the hypergraph transversal problem along with thefollowing iterative solutions: naive, branch and bound, and dynamic exponentialtime (NC-D). Odometers are introduced along with the functions that manipulatethem. The traditional definitions of hyperedge, hypergraph, etc., are redefined interms of odometers and lists. All algorithms and functions necessary to implementthe solution are presented along with techniques to validate and test the results.Lastly, parallelization advanced applications, and future research directions areexamined.
Open Access
Algorithms for Permutation Groups and Cayley Networks
(University of Oregon, 1989-09-06) Blaha, Kenneth D.
Bases, subgroup towers and strong generating sets (SGSs) have played a key role in the development of algorithms for permutation groups. We analyze the computational complexity of several problems involving bases and SGSs, and we use subgroup towers and SGSs to construct dense networks with practical routing schemes. Given generators for G ≤ Sym(n), we prove that the problem of computing a minimum base for G is NP-hard. In fact, the problem is NP-hard for cyclic groups and elementary abelian groups. However for abelian groups with orbits of size less than 8, a polynomial time algorithm is presented for computing minimum bases. For arbitrary permutation groups a greedy algorithm for approximating minimum bases is investigated. We prove that if G ≤ Sym(n) with a minimum base of size k, then the greedy algorithm produces a base of size Ω (k log log n).
Open Access
An Algorithm for Clipping Polygons of Large Geographical Data
(University of Oregon, 2017-09-27) Alghamdi, Areej; Childs, Hank
We present an algorithm for overlaying polygonal data with regular grids and calculating the percentage overlap for each cell in the regular grid. Our algorithm is able to support self-intersecting polygons, meaning that some spatial regions may be covered by two or more polygons. Our algorithm is able to identify these cases and eliminate redundant polygons, preventing erroneous results. We also present an optimized version of our algorithm that uses spatial sorting through interval trees, and provide a performance comparison between the optimized and unoptimized versions. Finally, we apply our algorithm to geography data, specifically of bark beetle infestation
Open Access
An Interface for Embedding the BLAS in Haskell
(University of Oregon, 2020-12-08) Ndemeye, Bosco; Norris, Boyana
Scientific algorithms have been built on top of linear algebra subprograms, historically implemented in languages such as C/C++ or Fortran, to optimize their performance, sometimes at the cost of their conciseness. Recent work in these languages has sought to address the problem of optimizing the implementations of the subprograms through the use of domain-specific languages (DSLs). However, it has been shown that using various techniques such as fusion, concise yet optimal implementations of array computation DSLs in functional languages ---such as Haskell--- are possible. Consequently, we investigate an interface to a library that supports subprogram computations in Haskell. We apply the "delayed fusion" technique to separate data stored in memory and their delayed versions, providing the user with the option to force computation as they deem fit. We present implementations of several subprograms, abstracting over the choice of data sparsity layouts in memory.
Open Access
Analysis of a Mobile Computing System for Indoor Environmental Monitoring
(University of Oregon, 2021-09-13) McLaughlin, Joseph; Young, Michal
Sensor networks that collect indoor environmental (IEQ) data are frequently used to drive building systems and to inform research and building standards.The research that connects the trends in IEQ data with human factors is well established and remains an active area of research. Conventional sensor systems often require specialized building infrastructure and are cost prohibitive, making them available primarily to large-scale indoor settings. In this thesis, we describe an alternative sensor network which utilizes a mobile sensor to collect IEQ data in traditionally under-served buildings. The sensor network is composed of inexpensive components and minimally relies on specialized infrastructure. We demonstrate that the sensor network maintains a uniquely fine spatial resolution, illustrating a unique utility beyond static sensor networks in conventional settings.
Open Access
Automated Attacks on Compression-Based Classifiers
(University of Oregon, 2014-09-29) Burago, Igor; Lowd, Daniel
Methods of compression-based text classification have proven their usefulness for various applications. However, in some classification problems, such as spam filtering, a classifier confronts one or many adversaries willing to induce errors in the classifier's judgment on certain kinds of input. In this thesis, we consider the problem of finding thrifty strategies for character-based text modification that allow an adversary to revert classifier's verdict on a given family of input texts. We propose three statistical statements of the problem that can be used by an attacker to obtain transformation models which are optimal in some sense. Evaluating these three techniques on a realistic spam corpus, we find that an adversary can transform a spam message (detectable as such by an entropy-based text classifier) into a legitimate one by generating and appending, in some cases, as few additional characters as 20% of the original length of the message.
Open Access
Automated methods to infer ancient homology and synteny
(University of Oregon, 2009-06) Catchen, Julian M., 1978-
Establishing homologous (evolutionary) relationships among a set of genes allows us to hypothesize about their histories: how are they related, how have they changed over time, and are those changes the source of novel features? Likewise, aggregating related genes into larger, structurally conserved regions of the genome allows us to infer the evolutionary history of the genome itself: how have the chromosomes changed in number, gene content, and gene order over time? Establishing homology between genes is important for the construction of human disease models in other organisms, such as the zebrafish, by identifying and manipulating the zebrafish copies of genes involved in the human disease. To make such inferences, researchers compare the genomes of extant species. However, the dynamic nature of genomes, in gene content and chromosomal architecture, presents a major technical challenge to correctly identify homologous genes. This thesis presents a system to infer ancient homology between genes that takes into account a major but previously overlooked source of architectural change in genomes: whole-genome duplication. Additionally, the system integrates genomic conservation of synteny (gene order on chromosomes), providing a new source of evidence in homology assignment that complements existing methods. The work applied these algorithms to several genomes to infer the evolutionary history of genes, gene families, and chromosomes in several case studies and to study several unique architectural features of post-duplication genomes, such as Ohnologs gone missing.
Open Access
Automatic Code Rewriting for Performance Portability
(University of Oregon, 2024-08-07) Johnson, Alister; Malony, Allen
Rewriting code for cleanliness, API changes, and new programming models is a common, yet time-consuming task. This is important for HPC applications that desire performance portability in particular, since these applications are usually very long lived and wish to run on many architectures, so they need to be written such that they can make good use of all the available hardware with minimal code changes. Furthermore, it is unknown what future supercomputer hardware and programming models will be, so they need to be written in such a way that they are ``future proof'' and will only need minimal rewrites in the future. Localized or syntax-based changes are often mechanical and can be automated with text-based rewriting tools, like sed.However, non-localized or semantic-based changes require specialized tools that usually come with complex, hard-coded rules that require expertise in compilers. This means techniques for source rewriting are either too simple for complex tasks or too complex to be customized by non-expert users; in either case, developers are often forced to manually update their code instead. This work describes a new approach to code rewriting which exposes complex and semantic-driven rewrite capabilities to users in a simple and natural way.Rewrite rules are expressed as a pair of parameterized ``before-and-after'' source code snippets, one to describe what to match and one to describe what the replacement looks like. Through this novel and user-friendly interface, programmers can automate and customize complex code changes which require a deep understanding of the language without any knowledge of compiler internals. This dissertation includes previously published and unpublished co-authored material.
Open Access
Automating Camera Placement for In Situ Visualization
(University of Oregon, 2022-05-10) Marsaglia, Nicole; Childs, Hank
Trends in high-performance computing increasingly require visualization to be carried out using in situ processing. This processing most often occurs without a human in the loop, meaning that the in situ software must be able to carry out its tasks without human guidance. This dissertation explores this topic, focusing on automating camera placement for in situ visualization when there is no a priori knowledge of where to place the camera. We introduce a new approach for this automation process, which depends on Viewpoint Quality (VQ) metrics that quantify how much insight a camera position provides. This research involves three major sub-projects: (1) performing a user survey to determine the viewpoint preferences of scientific users as well as developing new VQ metrics that can predict preference 68% of the time; (2) parallelizing VQ metrics and designing search algorithms so they can be executed efficiently in situ; and (3) evaluating the behavior of camera placement of time-varying data to determine how often a new camera placement should be considered. In all, this dissertation shows automating in situ camera placement for scientific simulations is possible on exascale computers and provides insight on best practices.
Open Access
Behavior-based Worm Detection
(University of Oregon, 2012) Stafford, John; Stafford, John; Li, Jun
The Internet has become a core component of our lives and businesses. Its reliability and availability are of paramount importance. There are many types of malware that impact the availability of the Internet, including network worms, bot-nets, viruses, etc. Detecting such attacks is a critical component of defending against them. This dissertation focuses on detecting and understanding self-propagating network worms, a type of malware with a proven record of disrupting the Internet. According to
Open Access
Bicycle Crash Detection: Using a Voice-Assistant for More Accurate Reporting
(University of Oregon, 2018-09-06) Williams, Brian; Fickas, Stephen
It is estimated that over half of bicycle crashes are not reported. There are various reasons for this, such as no property damage or physical injuries sustained. In order to improve the likelihood that bicycle riders will report a crash, I have developed Urban Bike Buddy, a smartphone application which uses the internal sensors of the device to detect a crash. The application interacts with Alexa to help guide the user through the crash reporting process. The innovative features of my work are the ability to initiate communication with Alexa without user interaction. In addition, there is an intersection controller that has been connected to extra hardware that allows bicycle riders to request a crossing signal during their approach based on the speed that they are riding. These features add value to bicycle riders, and will help contribute to a safer environment for bicycle riders, automobiles, and pedestrians as well.