Essays in Applied Machine Learning and Causal Inference

Waddell, GlenLennon, Connor2022-10-262022-10-262022-10-26https://hdl.handle.net/1794/27737This dissertation represents a study of how machine learning can be incorporated into existing econometric causal techniques, with explorationsboth in the costs and benefits of making that choice. The first chapter explores a simulated instrumental variables setting to evaluate the ease of incorporating unmodified machine learning techniques into the ”first stage“ problem. The first stage of two-stage least squares (2SLS) is a prediction problem—suggesting gains from utilizing ML in 2SLS’s first stage. However, little guidance exists on when ML helps 2SLS—or when it hurts. We investigate the implications of inserting ML into 2SLS, decomposing the bias into three informative components. Mechanically, ML-in-2SLS procedures face issues common to prediction and causal-inference settings—and their interaction. Through simulation, we show linear ML methods (e.g.post-Lasso) work “well,” while nonlinear methods (e.g.random forests, neural nets) generate substantial bias in second-stage estimates—some exceeding the bias of endogenous OLS. This work was performed in conjunction with professors Edward Rubin and Glen Waddell. The chapter author wrote simulation code, excepting the substantial portions used for table creation and to iterate over differing methods, to evaluate and run the methods tested in this chapter, and we designed the DGP function based on those found in Belloni, Chen, Chernozhukov, and Hansen (2012). The second chapter is an applied use of Machine Learning to evaluate an existing causal estimate of property value on suppression costs in the Wildfire Economics space. Models in use currently rely on excluding class A-D wildfires that burn fewer than 300 acres, use property values as an input and feature differential estimates for per-acre suppression costs in the Eastern and Western United States. However, restricting suppression cost estimates to large fires ignores wildfires that have high per-acre costs due to aggressive initial-attack strategies, and fires occurring in well-managed forests with fewer suppression requirements, which may lead SCI-derived estimates of cost to be biased and potentially be overly responsive to changes in local wealth. Using double/debiased vision transformers, SCI parameters overestimate the impact of property value as a contributor to suppression costs. This dissertation includes unpublished and co-authored material.en-USAll Rights Reserved.CausalLearningMachineSCIWildfireEssays in Applied Machine Learning and Causal InferenceElectronic Thesis or Dissertation