Computer Science Theses and Dissertations
Permanent URI for this collection
This collection contains some of the theses and dissertations produced by students in the University of Oregon Computer Science Graduate Program. Paper copies of these and other dissertations and theses are available through the UO Libraries.
Browse
Browsing Computer Science Theses and Dissertations by Subject "Apache Spark"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Open Access Distributed Memory Processing of Very Large Graphs(University of Oregon, 2020-02-27) Riazi, Sara; Norris, BoyanaBig graphs such as social networks or the internet network, biological networks, knowledge graphs appear in many domains. However, processing these graphs rely on the accessibility of high-performance frameworks which are able to handle these large graphs. One aspect of this accessibility is the usability of the frameworks for a broad community of researches who do not have sufficient expertise to work with these frameworks. To address this issue, we introduce GraphFlow framework, a workflow-based framework that provides several graph mining components. GraphFlow benefits from data-parallel Apache Spark and its GraphX library, as the back-end, so it processes very large graphs. GraphFlow also supports the construction of experiment pipelines that involve running several components. Integrated into our GraphFlow framework, we also introduce a novel vertex-centric network embedding algorithm, which can learn low-dimensional vectors for vertices of very large graphs. Our network embedding algorithm can scale to graphs with billions of edges, while previous algorithms do not scale to the graphs of this scale. GraphFlow also supports dynamic graphs using graph snapshots and batch updates. We provide SSSPIncJoint, a novel algorithm for computing single-source shortest paths (SSSP) for dynamic graphs. SSSPIncJoint is significantly more efficient than running SSSP for each snapshot of a dynamic graph.Item Open Access Insightful Performance Analysis of Many-Task Runtimes through Tool-Runtime Integration(University of Oregon, 2017-09-06) Chaimov, Nicholas; Malony, AllenFuture supercomputers will require application developers to expose much more parallelism than current applications expose. In order to assist application developers in structuring their applications such that this is possible, new programming models and libraries are emerging, the many-task runtimes, to allow for the expression of orders of magnitude more parallelism than currently existing models. This dissertation describes the challenges that these emerging many-task runtimes will place on performance analysis, and proposes deep integration between runtimes and performance tools as a means of producing correct, insightful, and actionable performance results. I show how tool-runtime integration can be used to aid programmer understanding of performance characteristics and to provide online performance feedback to the runtime for Unified Parallel C (UPC), High Performance ParalleX (HPX), Apache Spark, the Open Community Runtime, and the OpenMP runtime.