Computer Science Theses and Dissertations
Permanent URI for this collection
This collection contains some of the theses and dissertations produced by students in the University of Oregon Computer Science Graduate Program. Paper copies of these and other dissertations and theses are available through the UO Libraries.
Browse
Browsing Computer Science Theses and Dissertations by Author "Brophy, Jonathan"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Open Access Collective Classification of Social Network Spam(University of Oregon, 2017-09-06) Brophy, Jonathan; Lowd, DanielUnsolicited messages affects virtually every popular social media website, and spammers have become increasingly proficient at bypassing conventional filters, prompting a stronger effort to develop new methods. First, we build an independent model using features that capture the cases where spam is obvious. Second, a relational model is built, taking advantage of the interconnected nature of users and their comments. By feeding our initial predictions from the independent model into the relational model, we can propagate and jointly infer the labels of all comments at the same time. This allows us to capture the obfuscated spam comments missed by the independent model that are only found by looking at the relational structure of the social network. The results from our experiments shows that models utilizing the underlying structure of the social network are more effective at detecting spam than ones that do not. This thesis includes previously published coauthored material.Item Open Access Understanding and Adapting Tree Ensembles: A Training Data Perspective(University of Oregon, 2023-03-24) Brophy, Jonathan; Lowd, DanielDespite the impressive success of deep-learning models on unstructured data (e.g., images, audio, text), tree-based ensembles such as random forests and gradient-boosted trees are hugely popular and remain the preferred choice for tabular or structured data, and are regularly used to win challenges on data-competition websites such as Kaggle and DrivenData. Despite their impressive predictive performance, tree-based ensembles lack certain characteristics which may limit their further adoption, especially for safety-critical or privacy-sensitive domains such as weather forecasting or predictive medical modeling. This dissertation investigates the shortcomings currently facing tree-based ensembles---lack of explainable predictions, limited uncertainty estimation, and inefficient adaptability to changes in the training data---and posits that numerous improvements to tree-based ensembles can be made by analyzing the relationships between the training data and the resulting learned model. By studying the effects of one or many training examples on tree-based ensembles, we develop solutions for these models which (1) increase their predictive explainability, (2) provide accurate uncertainty estimates for individual predictions, and (3) efficiently adapt learned models to accurately reflect updated training data. This dissertation includes previously published coauthored material.