Exploratory Data Analysis with Clustered Data: Simulation and Application with Oregon’s Statewide Longitudinal Data System using Generalized Linear Mixed-Effects Model Trees
Loading...
Date
2024-08-07
Authors
Loan, Christopher
Journal Title
Journal ISSN
Volume Title
Publisher
University of Oregon
Abstract
Simulations were conducted to establish best practice in hyperparameter optimization and accounting for clustering in Generalized Linear Mixed-Effects Model Trees (GLMM trees). Using data-driven best practices, the relationship between a 9th Grade On-Track to Graduate (9G-OTG) indicator and observed high school graduation within four years was explored. Data originated from two cohorts of the Oregon State Longitudinal Data System (SLDS) and were joined with external datasets. Restricted to complete cases, the data were comprised of more than 58,000 observations, each with more than 1500 variables measured at student, school, district, and zip code levels. GLMM trees explored heterogeneity in a cross-classified multilevel logistic regression which regressed observed graduation on 9G-OTG, accounting for variance in school- and zip-code-level random intercepts. Subgroups were identified for whom the probability of graduating among on- and-off track students were systematically heterogeneous, relative to the supraordinate group. Results suggest that for most students, 9G-OTG is a potent early warning indicator of graduation, but systematic variation in the indicator’s effectiveness was found along all levels except district. Subgroups were defined by combinations of alternative schools, absences, transferring schools, being enrolled in more than one instructional program, neighborhood unemployment, and sex. Implications and recommendations to measurement, practice, and evaluation are discussed.
Description
Keywords
Generalized Linear Mixed Effects Model Trees, High School Graduation, Hyperparameter Optimization, Model-based Recursive Partitioning, Multilevel Modeling, State Longitudinal Data Systems