A JOINT MODELING APPROACH TO STUDYING ENGLISH LANGUAGE PROFICIENCY DEVELOPMENT AND TIME-TO-RECLASSIFICATION by TYLER H. MATTA A DISSERTATION Presented to the Department of Educational Methodology, Policy and Leadership and the Graduate School of the University of Oregon in partial fulfillment of the requirements for the degree of Doctor of Philosophy December 2016 DISSERTATION APPROVAL PAGE Student: Tyler H. Matta Title: A Joint Modeling Approach to Studying English Language Proficiency Development and Time-to-Reclassification This dissertation has been accepted and approved in partial fulfillment of the requirements for the Doctor of Philosophy degree in the Department of Educational Methodology, Policy and Leadership by: Joseph Stevens Chair Ilana Umansky Core Member Yeow Meng Thum Core Member Sanjay Srivastava Institutional Representative and Scott L. Pratt Dean of the Graduate School Original approval signatures are on file with the University of Oregon Graduate School. Degree awarded December 2016 ii c© 2016 Tyler H. Matta iii DISSERTATION ABSTRACT Tyler H. Matta Doctor of Philosophy Department of Educational Methodology, Policy and Leadership December 2016 Title: A Joint Modeling Approach to Studying English Language Proficiency Development and Time-to-Reclassification The development of academic English proficiency and the time it takes to reclassify to fluent English proficient status are key issues in monitoring achievement of English learners. Yet, little is known about academic English language development at the domain-level (listening, speaking, reading, and writing), or how English language development is associated with time-to- reclassification as an English proficient student. Although the substantive findings surrounding English proficiency and reclassification are of great import, the main focus of this dissertation was methodological: the exploration and testing of joint modeling methods for studying both issues. The first joint model studied was a multilevel, multivariate random effects model that estimated the student- specific and school-specific association between different domains of English language proficiency. The second model was a multilevel shared random effects model that estimated English proficiency development and time-to-reclassification simultaneously and treated the student-specific random effects as latent covariates in the time-to-reclassification model. These joint modeling approaches were illustrated using annual English language proficiency test scores and time-to- reclassification data from a large Arizona school district. iv Results from the multivariate random effects model revealed correlations greater than .5 among the reading, writing and oral English proficiency random intercepts. The analysis of English proficiency development illustrated that some students had attained proficiency in particular domains at different times, and that some students had not attained proficiency in a particular domain even when their total English proficiency score met the state benchmark for proficiency. These more specific domain score analyses highlight important differences in language development that may have implications for instruction and policy. The shared random effects model resulted in predictions of time-to-reclassification that were 97% accurate compared to 80% accuracy from a conventional discrete-time hazard model. The time-to-reclassification analysis suggested that use of information about English language development is critical for making accurate predictions of the time a student will reclassify in this Arizona school district. v CURRICULUM VITAE NAME OF AUTHOR: Tyler H. Matta GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: University of Oregon, Eugene, OR Pratt Institute, Brooklyn, NY DEGREES AWARDED: Doctor of Philosophy, Educational Leadership, 2016, University of Oregon Master of Science, Art and Design Education, 2007, Pratt Institute Bachelor of Fine Arts, Fine Arts, 2007, Pratt Institute AREAS OF SPECIAL INTEREST: Latent Variable Modeling Longitudinal Data Analysis Missing Data GRANTS, AWARDS AND HONORS: Graduate Fellow Teaching Award, College of Education, University of Oregon, 2015 Travel Award, Society for Multivariate Experimental Psychology (SMEP), 2014 Scholarship for Education Research, Inter-university Consortium for Political and Social Research (ICPSR), 2014 vi ACKNOWLEDGEMENTS My advisor, Joe Stevens, for his unconditional support throughout my doctoral studies. My dear friend, collaborator, and mentor, Yeow Meng Thum. John Wilson, for providing the data used for this study. All those who made this journey memorable, particularly, Meg Guerreiro, Michael Their, Jim Soland, Quinn Lathrop, and Dan McNeish. My family: Kaity, Deb, Pete, Jan, Paula, Debbie, and Peter. Finally, my best friend and wife, Bianca, to whom this work is dedicated. vii For Bianca viii TABLE OF CONTENTS Chapter Page I. INTRODUCTION . . . . . . . . . . . . . . . . . 1 English Proficiency Development . . . . . . . . . . . 5 Reclassification . . . . . . . . . . . . . . . . . 8 Joint Models for Longitudinal Data . . . . . . . . . . . 10 Theoretical Framework . . . . . . . . . . . . . . 14 II. METHODOLOGY . . . . . . . . . . . . . . . . 20 Sample . . . . . . . . . . . . . . . . . . . 20 Variables . . . . . . . . . . . . . . . . . . . 21 Outcomes . . . . . . . . . . . . . . . . . 21 Covariates . . . . . . . . . . . . . . . . . 24 Missing data . . . . . . . . . . . . . . . . . . 25 Analysis . . . . . . . . . . . . . . . . . . . 28 Research question 1 . . . . . . . . . . . . . . 29 Research question 2 . . . . . . . . . . . . . . 30 Research question 3 . . . . . . . . . . . . . . 31 III. RESULTS . . . . . . . . . . . . . . . . . . . 34 English Language Proficiency Development . . . . . . . . 34 Total English proficiency . . . . . . . . . . . . . 34 Linear growth model . . . . . . . . . . . . . 35 Quadratic growth model . . . . . . . . . . . . 35 School-level model . . . . . . . . . . . . . . 36 School-level model with covariates . . . . . . . . . 37 ix Chapter Page Weakly informative prior model . . . . . . . . . . 39 Domain-specific proficiency models . . . . . . . . . . 39 Reading proficiency . . . . . . . . . . . . . 39 Writing proficiency . . . . . . . . . . . . . . 43 Oral proficiency . . . . . . . . . . . . . . 44 Correlation Among Domain-Specific English Proficiency Development 45 Predicting Time-to-Reclassification . . . . . . . . . . . 50 Hazard models . . . . . . . . . . . . . . . . 50 Hazard model with manifest covariates . . . . . . . . 50 Shared random effects models . . . . . . . . . . . 54 Hazard submodel with latent covariates . . . . . . . 56 Hazard submodel with latent and manifest covariates . . . 59 IV. DISCUSSION . . . . . . . . . . . . . . . . . . 63 Substantive Findings . . . . . . . . . . . . . . . 64 Describing English proficiency development . . . . . . . 64 Predicting time-to-reclassification . . . . . . . . . . 66 Limitations . . . . . . . . . . . . . . . . . . 68 Conclusions and Future Directions . . . . . . . . . . . 70 APPENDICES A. MODEL SPECIFICATION . . . . . . . . . . . . . 73 Univariate AZELLA Models . . . . . . . . . . . . . 73 AZELLA total score models . . . . . . . . . . . . 75 AZELLA total score model 1 . . . . . . . . . . . 75 AZELLA total score model 2 . . . . . . . . . . . 76 AZELLA total score model 3 . . . . . . . . . . . 76 x Chapter Page AZELLA total score model 4 . . . . . . . . . . . 77 AZELLA total score model 5 . . . . . . . . . . . 77 AZELLA reading score models . . . . . . . . . . . 77 AZELLA reading score model 1 . . . . . . . . . . 78 AZELLA reading score model 2 . . . . . . . . . . 78 AZELLA reading score model 3 . . . . . . . . . . 79 AZELLA reading score model 4 . . . . . . . . . . 79 AZELLA reading score model 5 . . . . . . . . . . 80 AZELLA writing score models . . . . . . . . . . . 80 AZELLA writing score model 1 . . . . . . . . . . 80 AZELLA writing score model 2 . . . . . . . . . . 81 AZELLA writing score model 3 . . . . . . . . . . 81 AZELLA writing score model 4 . . . . . . . . . . 82 AZELLA writing score model 5 . . . . . . . . . . 82 AZELLA oral score models . . . . . . . . . . . . 82 AZELLA oral score model 1 . . . . . . . . . . . 83 AZELLA oral score model 2 . . . . . . . . . . . 83 AZELLA oral score model 3 . . . . . . . . . . . 84 AZELLA oral score model 4 . . . . . . . . . . . 84 AZELLA oral score model 5 . . . . . . . . . . . 85 Multivariate AZELLA Growth Models . . . . . . . . . . 85 Multivariate AZELLA model 1 . . . . . . . . . . . 87 Multivariate AZELLA model 2 . . . . . . . . . . . 88 Multivariate AZELLA model 3 . . . . . . . . . . . 89 Time-to-Reclassification Models . . . . . . . . . . . . 90 xi Chapter Page Discrete-time hazard models . . . . . . . . . . . . 90 Hazard model 1 . . . . . . . . . . . . . . . 91 Hazard model 2 . . . . . . . . . . . . . . . 91 Hazard model 3 . . . . . . . . . . . . . . . 91 Shared random effects models . . . . . . . . . . . 91 Shared random effects model 1 . . . . . . . . . . 92 Shared random effects model 2 . . . . . . . . . . 93 Shared random effects model 3 . . . . . . . . . . 94 B. CONVERGENCE . . . . . . . . . . . . . . . . 95 C. FULL RESULTS . . . . . . . . . . . . . . . . . 112 REFERENCES CITED . . . . . . . . . . . . . . . . 123 xii LIST OF FIGURES Figure Page 1. Directed graphs for time-to-reclassification . . . . . . . . . 17 2. Distributions for AZELLA total and sub-test scores by grade . . . 23 3. Variation in FRL and SWD . . . . . . . . . . . . . 26 4. Fitted prototypical AZELLA trajectories by subgroup and test . . . 42 5. Fitted AZELLA trajectories by test . . . . . . . . . . . 49 6. Hazard model with manifest covariates reclassification plot . . . . 53 7. ROC curves for the time-to-reclassification models . . . . . . . 55 8. Fitted AZELLA total score trajectories and probability of reclassification 58 9. Hazard model with latent and manifest covariates reclassification plot . 61 B.1. Total score linear model trace plots 1 . . . . . . . . . . . 97 B.2. Total score linear model trace plots 2 . . . . . . . . . . . 98 B.3. Total score quadratic model trace plots 1 . . . . . . . . . 99 B.4. Total score quadratic model trace plots 2 . . . . . . . . . 100 B.5. Total score school-level model trace plots 1 . . . . . . . . . 101 B.6. Total score school-level model trace plots 2 . . . . . . . . . 102 B.7. Total score school-level model trace plots 3 . . . . . . . . . 103 B.8. Total score covariates model trace plots 1 . . . . . . . . . 104 B.9. Total score covariates model trace plots 2 . . . . . . . . . 105 B.10. Total score covariates model trace plots 3 . . . . . . . . . 106 B.11. Total score covariates model trace plots 4 . . . . . . . . . 107 B.12. Total score weakly informative prior model trace plots 1 . . . . . 108 B.13. Total score weakly informative prior model trace plots 2 . . . . . 109 xiii Figure Page B.14. Total score weakly informative prior model trace plots 3 . . . . . 110 B.15. Total score weakly informative prior model trace plots 4 . . . . . 111 xiv LIST OF TABLES Table Page 1. Grade of reclassification for the sample . . . . . . . . . . 22 2. Demographic characteristics of the sample . . . . . . . . . 27 3. Classification table . . . . . . . . . . . . . . . . 32 4. Parameter estimates for the AZELLA total score growth models . . . 40 5. Parameter estimates for the final domain-specific growth models . . . 41 6. Correlations for domain-specific English proficiency growth . . . . 46 7. Parameter estimates for time-to-reclassification models . . . . . . 51 B.1. Quantitative convergence evidence for the total ELP growth models . . 96 C.1. Parameter estimates for the AZELLA reading score growth models . . 113 C.2. Parameter estimates for the AZELLA writing score growth models . . 114 C.3. Parameter estimates for the AZELLA oral score growth models . . . 115 C.4. Parameter estimates for the multivariate AZELLA growth models . . 116 C.5. Parameter estimates for the reclassification hazard models . . . . . 120 C.6. Parameter estimates for the reclassification shared random effects models 121 xv CHAPTER I INTRODUCTION English learners (EL) are those students who come from homes where the active language is one other than English, and whose proficiency in English is not yet developed to the level where they can profit fully from instruction delivered solely in English (August & Hakuta, 1997). According to the National Center for Education Statistics (2015), the proportion of English learners enrolled in United States’ (US) schools was higher in 2012-13 (9.2% or 4.4 million) than in 2002-03 (8.7% or 4.1 million), and with approximately 4.5 million English learners enrolled in PK-12 public schools nationwide in 2014-15, they are the fastest growing subgroup of students (Office of English Language Acquisition, 2015). This growing population of students faces the doubly difficult task of obtaining English proficiency while also meeting the same content standards as required for their fluent English speaking peers. Research indicates, however, that there are persistent gaps between English learners and their fluent English speaking peers on a number of key academic outcomes. In 2015, there were achievement gaps on the National Assessment of Educational Progress (NAEP) between non-English learners and English learners in both mathematics and reading in Grades 4 and 8 of approximately one standard deviation (U.S. Department of Education, Institute of Education Sciences, & National Center for Education Statistics, 2015). This finding was not new, however. Research has shown persistent gaps between English learners and their fluent English peers generally (Callahan, Wilkinson, & Muller, 2010; Kim & Herman, 2009), and specifically for reading (Kieffer, 2008, 2011), and mathematics achievement (Roberts & Bryant, 2011). Additionally, research has shown significant 1 differences between English learners and their fluent English peers in their opportunity to learn (Abedi & Herman, 2010; Callahan, 2005). Furthermore, English learners have been found to be less likely to graduate from high school or persist into post-secondary education (Heilig, 2011; Kanno & Cromley, 2013; Kao & Thompson, 2003). While there is general agreement that closing these gaps between English learners and their fluent English speaking peers is of critical importance, there is ongoing debate regarding the role that language and language policy plays in this effort (Hakuta, 2011). Current federal policy provides no guidance and states are left with the autonomy to serve English learners in any way they find to be effective as long as those programs are (a) based on sound educational theory, (b) implemented adequately, and (c) are periodically evaluated (Hakuta, 2011). Although states such as Arizona, Massachusetts, and California have passed laws that require English-only instructional programs, evidence suggests these policies have had no impact on academic outcomes for English learners (Koretz & Guo, 2012; Parrish, Perez, Merickel, & Linquanti, 2006). Furthermore, there is a sizable research literature that compares types of English-only instruction to types of dual language instruction (e.g., Cheung & Slavin, 2012; Valentino & Reardon, 2015). While language of instruction remains a key issue in English learner research, there remains little understanding of how English develops over time within these programs. Academic English is characterized as the English language practices required of the four language domains (speaking, listening, reading, and writing) to successfully engage in academic content in school and display knowledge (Council of Chief State School Officers, 2012). When English learners reach academic 2 English proficiency, they are eligible for reclassification from EL status to what is generally referred to as fluent English proficient. Intertwined with the language of instruction debate is the issue of how long it should take for English learners to attain proficiency in academic English and reclassify to fluent English proficient. For example, in addition to four-hour intensive Structured English Immersion (SEI) instruction, Arizona’s Title 15 (§15.752) requires that students be reclassified in one year. Early research has shown, however, that students take between three to five years to attain oral English proficiency and four to seven years to attain academic English fluency (Hakuta, Butler, & Witt, 2000). While the act of reclassification is largely an administrative change, it results in students losing access to language support services and entering the mainstream classroom as well as altering accountability reporting of the EL subcategory of students. Thus, it is critical to ensure reclassification policies are aligned with the most current understanding of how academic English develops for ELs and accurately reflects the attainment of fluent English proficient status. There is scant longitudinal research, however, focused on describing the development of school- aged (K-12) English learners’ academic English across the four language domains as measured by annual state English language proficiency assessments. Understanding the factors that impact English learners’ progress to reclassification is complex (Bailey & Carroll, 2015). There has been a small but growing literature focused on understanding this process, modeling the time it takes English learners to reclassify using discrete-time survival models (Slama, 2014; Thompson, 2015; Umansky & Reardon, 2014). However, this body of research does not account for the association between English language development and time to reclassification. Reclassification decisions are based largely on one’s English 3 proficiency, which is a developmental outcome (Ramsey & O’Day, 2010). Thus, any factor that influences time to reclassification is likely to do so, in full or in part, through the development of English proficiency. The use of longitudinal English language proficiency measures as time-dependent covariates however, would likely be subject to issues of endogeneity. In research using cross-sectional data, covariate endogeneity for a fitted model exists when the covariance between the covariate and the residual does not equal zero (Angrist & Pischke, 2008). In the longitudinal context, covariate endogeneity becomes more complex and has been well studied in both the analysis of repeated measures data (Diggle, Zeger, Liang, & Heagerty, 2002) as well as the analysis of survival data (Kalbfleisch & Prentice, 2011). For survival analysis, a time-dependent variable is exogenous (referred to as external in the survival literature) if its process influences the rate of event occurrence over time, but its future path is not affected by the occurrence of the event (Kalbfleisch & Prentice, 2011). Such variables include defined covariates where the values of the covariate are established in the study design, and ancillary covariates where the stochastic process is outside of the individual under study. A time-dependent variable is endogenous (referred to as internal in the survival literature) if its future path is affected by the event occurrence (Kalbfleisch & Prentice, 2011). These variables are typically the output of a stochastic process associated with the participant, and therefore require their own statistical model (Kalbfleisch & Prentice, 2011). The joint modeling paradigm provides a useful approach for incorporating endogenous time-varying covariates into longitudinal analysis (Kalbfleisch & Prentice, 2011; Rizopoulos & Lesaffre, 2014). 4 This study proposed two statistical models for the analysis of both academic English language proficiency development and the time-to-reclassification. The intention of the first model was to describe how developments in each language domain correlated with the others. The second model was designed to describe the system of reclassification more accurately than conventional approaches. In what follows, I provide an overview of the recent research on English language proficiency development and reclassification followed by a review of the joint modeling literature. From there, I provide a theoretical rational that connects this joint modeling paradigm to questions pertaining to English learner reclassification. I then propose a series of research questions that motivated the use of these models. English Proficiency Development Hakuta et al. (2000) have been influential in the conversation regarding the time it takes for English learners to become proficient in English. Their findings, however, were based on cross-sectional data and did not describe the within-student developmental process for attaining academic English proficiency. The research within August and Shanahan’s (2006) exceptionally thorough literature review of second-language literacy development was almost exclusively cross-sectional. In the context of setting progress and attainment benchmarks for English learners, Cook, Boals, Wilmes, and Santos (2008) described the academic English language development of 12,836 English learners using repeated measures over a three year period. This descriptive analysis also treated the English proficiency outcomes in a cross-sectional manner by reporting grade-level mean scores and did not attempt to model within-student developmental change or the variation in development between students. As has been well documented, the analysis of outcomes in a cross-sectional manner when interest is in describing intra-individual development 5 can confound cohort and age effects and/or conflate inter-individual and intra- individual variation (Diggle et al., 2002; Raudenbush, 2001; Thum, 1994). That is, cross-sectional analysis is valid only when all individuals grow in the same way, or when individual differences, even when present, are considered to be nuisance variation. Longitudinal English proficiency outcomes allows for the fitting of statistical models that provide a smooth empirical approximation of the student-specific developmental process. There is an abundance of longitudinal research focused on the English development of early childhood populations (Conboy & Thal, 2006; Hammer, Lawrence, & Miccio, 2008; Kohnert & Conboy, 2010; Vagh, Pan, & Mancilla-martinez, 2009); whereas, fewer studies exist for school-aged children (Rojas & Iglesias, 2013; Slama, 2012; Uchikoshi, 2012). Of the longitudinal studies of English development for school-aged children, the average developmental process has been described using simple linear functions (Uchikoshi, 2012), piecewise functions (Rojas & Iglesias, 2013), and polynomial functions (Slama, 2012). Much of the longitudinal research on English language development focuses on oral proficiency (Rojas & Iglesias, 2013; Uchikoshi, 2012). Uchikoshi (2012) studied the oral vocabulary development of Spanish and Cantonese English learners between Kindergarten and Grade 2. Limited by only three time points per student, oral vocabulary development was characterized as a simple linear trajectory with variation in initial status and rate of development (Uchikoshi, 2012). Rojas and Iglesias (2013) also analyzed the oral English vocabulary development of a 1,732 English learners during their first three years of schooling. With two measures per year, Rojas and Iglesias fit a piecewise model that described within-grade development as a positive linear process with stagnation over the summer. While 6 these studies contribute to the research on English vocabulary development of English learners, neither study considers the development of the literate domains (writing and reading). To my knowledge, there has been only one published study that examined the English language proficiency growth trajectories of English learners using annual state English language proficiency assessments. Slama (2012) focused on the difference between US-born and foreign-born adolescent ELs in their development of academic English proficiency. Using the composite English proficiency score (i.e., weighted combination of reading, writing, listing and speaking sub-tests), this study described the average within-student trend as a curvilinear function where the rate of growth was relatively large early in the study and decelerated in late grades (Slama, 2012). However the study also found significant variation in initial status, linear growth, and the curvature, resulting in some students’ English development accelerating over time (Slama, 2012). Slama did not describe the functional form of domain-specific academic English proficiency development. Overall, there is scant longitudinal research focused on describing the development of school-aged English learners’ academic English across the four language domains as measured by annual state English language proficiency assessments. Understanding the average developmental process for a group of students, and the individual differences in those developmental trajectories, can prove useful for English learner policy and practice. From a policy perspective, well estimated developmental trajectories can aid in the setting of realistic standards for annual English language development and understand the differential impact of English language programs. Additionally, such information can enable schools 7 to improve practices by accurately identifying individual students whose English language development is not keeping pace and may be in need of intervention. Reclassification Within the broader English learner literature is a body of research focused specifically on the time it takes for students to be reclassified as English proficient. Students are tracked from some original classification, often at the start of Kindergarten, until they are reclassified as fluent English proficient or are censored (i.e., they do not reclassify within the study window or are lost to follow-up). The time spent classified as an English learner —referred to as duration in the survival analysis literature —can then be analyzed using discrete time survival models. Covariates can also be added to the model in an attempt to understand the factors that predict reclassification and to understand how specific decision criteria may impact time to reclassification. The earliest example of this approach was used to investigate the impact of California’s Proposition 227, also called the English Language in Public Schools Statute, on time to reclassification (Parrish et al., 2006). More recently, a new wave of reclassification studies has emerged. Data used in these studies have been collected at the state or district level and have focused on different populations. Slama (2014) analyzed a single cohort of 5,345 English learners across the state of Massachusetts. Umansky and Reardon (2014) focused on Latino English learners from a large urban school district, using nine cohorts over 12 years (fall 2000 through spring 2012) for a total sample of 5,423 students. Finally, Thompson (2015) analyzed nine years of data that spanned 2001- 02 through 2009-10 using eight cohorts for a total of 202,931 English learners from the Los Angeles Unified School District. 8 This body of research indicates that it can take anywhere from three to eight years for the average English learner to reclassify. Slama (2014) found the average English learner reclassified after three years (or in second grade) but 17% of the cohort had not been reclassified by the end of Grade 7. Umansky and Reardon (2014) found that 38% of students reclassified by the end of Grade 5, 75% reclassified by the end of Grade 11, and the median time to reclassification was Grade 7. Similarly, Thompson (2015) found the likelihood of reclassification increased during elementary school peaking after six years (5th grade), and then began to drop again. After nine years, students had a 74% chance of being reclassified (Thompson, 2015). Student and program related factors were also found to influence the likelihood of reclassification. A non-Spanish speaking English learner was found more likely to reclassify than Spanish speaking English learners (Slama, 2014; Thompson, 2015). Slama (2014) also found free/reduced price lunch (FRL) status to be predictive of time to reclassification. In addition to FRL status, Thompson (2015) found that females were more likely to reclassify than males, and those with parents who had higher levels of education, or those who were never identified as special education students were also more likely to reclassify. Umansky and Reardon (2014) found that students in English immersion programs had higher rates of reclassification in early years but the students in two-language programs caught up over time. Thompson also showed that students who ever received bilingual education had a significant interaction with time, as did initial first language proficiency and initial second language proficiency. Within this literature, secondary analyses have investigated which of the reclassification criteria proved to be barriers to reclassification. Rather than using 9 the actual event of reclassification, individual exit criteria are used to estimate the time a student would have reclassified if that sole criterion was used for reclassification. Of the seven criteria used to determine reclassification, Umansky and Reardon (2014) indicated that in early grades (K-5), English proficiency was a larger barrier than the English Language Arts (ELA) criterion, but this shifted after Grade 5. Thompson (2015) found that the barrier to reclassification was not any one specific criterion, but rather, a matter of meeting all criteria in the same year. Joint Models for Longitudinal Data The longitudinal analysis of multiple outcome measures has been, and remains, an active topic of research (Galecki, 1994; Reinsel, 1982; Rizopoulos & Lesaffre, 2014; Wu & Carroll, 1988). Early examples in the social sciences can be traced to the multivariate random effects models (e.g., Bock & Bargmann, 1966; Joreskog, 1970; Joreskog & Goldberger, 1975; Muthen, 1989). Multivariate hierarchal linear models (Raudenbush & Bryk, 2002; Thum, 1994, 1997) can be seen as extensions of a subset of models focused on multiple outcomes. Similarly, multiple outcomes analyzed with latent curve models (Blozis, 2004, 2007; Bollen & Curran, 2006) can perform longitudinal analyses where the functional form of time is unknown and is estimated from the data. A modern synthesis of the above developments can be found in generalized latent variable modeling (Muthe´n, 2002; Skrondal & Rabe-Hesketh, 2004). There are two common frameworks under the joint modeling umbrella: multivariate random effects models (MVREM) and shared random effects models (SREM). MVREMs provide a robust framework for describing the dependence between multiple longitudinal outcomes (see Verbeke & Davidian, 2008; Verbeke, 10 Fieuws, Molenberghs, & Davidian, 2014). When interest is in the association between repeated measures outcomes and event times, SREMs has proven to be a useful modeling paradigm (see Proust-Lima, Se´ne, Taylor, & Jacqmin-Gadda, 2014; Tsiatis & Davidian, 2004). Both frameworks consist of separate submodels for each outcome and are linked together through a latent variable structure. Multivariate random effects models (MVREM) account for the association between two or more repeated measures outcomes through subject-specific latent variables (e.g., random effects; MacCallum, Kim, Malarkey, & Kiecolt-Glaser, 1997; Verbeke et al., 2014). That is, for each outcome, Y1, . . . , Ym, a mixed model is specified. Consider a random intercepts model for two outcomes, Y1 and Y2, for individual i at time j, Y1ij =β1 + ζ1i + β2tij + 1ij Y2ij =β3 + ζ2i + β4tij + 2ij But rather than assuming ζi ≡ [ζ1i, ζ2i] follow independent zero-mean normal distributions —as would be assumed if the outcomes were analyzed separately —their dependence is estimated. Under the MVREM framework, the dependence between Y1 and Y2 is accounted for by the correlation of the latent variables, ρ (ζ1, ζ2). Because each outcome is assumed interdependent conditional on the random effects, ζi, the submodels for Y1 and Y2 carry the same interpretation as their univariate expression. With MVREM, one is required to specify the form of the multivariate random effects. While it is conventional to assume a multivariate normal distribution, Thum (1994, 1997) specified a multivariate-t distribution while Stoolmiller (1994) and Ferrer and McArdle (2003) both specified non-parametric random effects. Verbeke et al. (2014) note that the underlying idea of the MVREM 11 holds irrespective of the number of outcomes. However, as the number of outcomes grow, the dimensionality of the latent structure also grows, which can present computational issues (Verbeke et al., 2014). Fieuws and Verbeke (2006) show that high dimensional models with normal random effects can be estimated by averaging over all pairwise combinations of the outcomes. The MVREM literature contains numerous examples of linear, generalized linear, non-linear, and combinations of such models for multivariate repeated outcomes. For the continuous case, Beckett, Tancredi, and Wilson (2004) estimated a multivariate growth model for four outcomes related to cognitive functioning in an aging population (ages 65 and older). They modeled growth in episodic memory, semantic memory, working memory, and perceptual speed, with a focus on the correlations between their respective rates of decline (linear slopes). Doss, Thum, Sevier, Atkins, and Christensen (2005) employed a latent variable regression of the change in a satisfaction outcome on the change in partner behavior outcome within a doubly-multivariate growth model in a comparison of two types of behavioral couple therapy. That is, two outcomes were tracked for the two members of each couple who are assigned to one of two types of behavioral couple therapy. In the context of generalized linear models, Ribaudo and Thompson (2002) extended the multivariate repeated model to deal with binary outcomes. They estimated the prevalence of reporting six quality of life outcomes between two treatment groups for subjects with previously untreated non-small-cell lung cancer. Finally, Fieuws, Verbeke, Maes, and Vanrenterghem (2008) provide an example of combined linear, generalized linear and non-linear mixed models to predict renal graft failure. For a review this literature, see Verbeke et al. (2014). 12 Like the multivariate random effects models, SREMs require the specification of submodels for each outcome, for example, a SREM for two longitudinal processes, Y1 and and Y2 may be specified as: Y1ij =β1 + ζ1i + β2tij + 1ij Y2ij =β3 + ζ1iλ1 + β5tij + 2ij. Unlike MVREMs, however, the dependence is specified through ‘shared’ random effects that link the two models, ζ1i, where λ1 is a scalar. While SREMs may be fit for any combination of outcomes, they are used primarily for the joint modeling of longitudinal data and event outcomes where the survival model is fit without a subject-specific random effect, or frailty as they’re referred to in the survival literature1. Such a specification enables the researcher to test different hypotheses regarding the relationship between the longitudinal process and the time-to-event process. However, one drawback is that the shared random effects influence the correlation between repeated measures and the dependency between the repeated measure and the time to event (Verbeke et al., 2014). Tsiatis and Davidian (2004) provide a philosophical rational for the use of SREMs for repeated measures and event outcomes. The SREM paradigm was developed in the statistics and biostatistics literatures. The first SREM was published by Wu and Carroll (1988) to deal with what Little (1995) termed latent variable dependent missingness. The use of SREMs to understand the association between a set of repeated measures and the time to some event was popularized by early HIV clinical trials (De Gruttola & Tu, 1994; Tsiatis, Degruttola, & Wulfsohn, 1995; Wulfsohn & Tsiatis, 1997). These 1The term frailty was introduced by Vaupel, Manton, and Stallard (1979) and is an unobserved random proportionality factor that modifies the hazard function of an individual. 13 studies aimed to understand how C4D T-lymphocytes were associated with onset of AIDS or death for those subjects with HIV. Henderson, Diggle, and Dobson (2000) utilized a shared random effects model to reanalyze the effect of drug therapy for schizophrenia patients while simultaneously accounting for attrition. The SREM framework has been less commonly used in educational research. Muthen and Masyn (2005) fit a latent class growth model to students’ aggressive behavior in Grades 1 and 2 that was used to predict the time to removal in Grades 3 through 7 using a discrete time survival process with a latent class frailty. Feldman and Rabe-Hesketh (2012) employed an SREM to understand if achievement trajectories were impacted by possibly non-random dropout in a large national dataset. Their discrete-time hazard submodel included separate parameters for the random intercept and random slope. Estimates from the SREM and a competing model fit to the data assuming the missing data mechanism was ignorable were consistent, suggesting the data were not sensitive to missing data assumptions (Feldman & Rabe-Hesketh, 2012). Finally, Thum and Matta (2015a, 2015b) employed a shared random effect model for longitudinal interim assessments between Grades 4 and 9, SAT and ACT scores between Grades 10 and 12, and a logistic regression model for the probability of taking a college test. The parameter estimates were then used to establish college readiness benchmarks for the interim assessment. While the SREM has had limited application in educational settings, it provides a framework for understanding relationships between multiple processes that cannot be revealed when separate models are used for each process. Theoretical Framework The development of English proficiency and reclassification from English learner to fluent English proficient are two related but separate phenomena. 14 Academic English proficiency is a construct characterized by the English language practices required to successfully engage in academic content in school and display knowledge (Council of Chief State School Officers, 2012), what Schleppegrell (2001) described as “doing school” (p. 432). Teachers of English to Speakers of Other Languages (TESOL; 2006) operationalize academic language as “the language used to acquire a new or deeper understanding of content related to the core curriculum areas and communicate that understanding to others” (p. 18). While academic English is an evolving construct, and one not without criticism (Scarcella, 2003), the practical realization is that one cannot do school with ordinary language alone (Hakuta, 2011). Proficiency is assessed annually across the four language domains: speaking, listening, reading, and writing (Bailey & Huang, 2011). For federal reporting and decision purposes, the four domain measures are combined or assessed in total to create a single measure of academic English proficiency (Abedi, 2008; Cook et al., 2008). Reclassification, on the other hand, is an administrative change in status from English learner to fluent English proficient. Federal regulations require English language proficiency assessments to be part of the criteria used in reclassification decisions (National Research Council, 2011). States are granted the autonomy, however, to include other criteria including (but not limited to) academic content standards, teacher recommendations, or information from parent conferences (Ramsey & O’Day, 2010). Furthermore, states are free to define whether the reclassification decision rules are conjunctive, compensatory, complementary, or mixed (Bailey & Carroll, 2015). For example, Arizona makes reclassification decisions based on the total English language proficiency score only, whereas Californias decision rules use a students combined English language 15 proficiency score, the four domain-specific scores, and their annual ELA score. This suggests that within some states, reclassification is a highly mechanistic process, based largely on a students initial status and rate of development for each decision criterion. In other states, however, the reclassification process may be less direct and also depend on clinical judgments and evaluations. In what follows, I propose three theoretical models that illustrate the possible relationships between the development of English proficiency and reclassification for a state the uses a total English language proficiency score as the only basis for reclassification (e.g., Arizona). Prior research on reclassification has used student-specific covariates as predictors with no explicit connection to English language proficiency, as illustrated in Figure 1a. Variables inside the box labeled “student j” have an j subscript and vary between students while variables inside the box labeled “risk-set t” vary by both students and risk-sets and have both j and t subscripts. Here, r is a dummy variable taking the value 1 if student j is reclassified in risk-set t and 0 otherwise. Student-level covariates, generically referred to as w, are time invariant. The arrow from time t to r represents the baseline survival model, or the regression of whether or not reclassification occurred for student j at time t. The arrows from the student-level covariates to r represent the regression of those covariates on reclassification under the proportional hazards assumption. Figure 1b illustrates a theoretical model where reclassification is predicted by the attributes of a student’s English proficiency development, y. The figure adds a box labeled “occasion i” to represent those variables that vary by measurement occasion and student, and have both i and j subscripts. The arrow from t to y represents a regression of the observed English proficiency score for student 16 student j risk-set t r t w1, . . . , wu (a) Survival model with covariates studentj occasioni y t ζ0 ζ1 risk − sett r t (b) Shared random effects model student j occasion i y t ζ0 ζ1 risk-set t r t w1, . . . , wu x1, . . . , xp (c) Shared random effects model with covariates Figure 1. Directed graphs for time-to-reclassification 17 j on time i. The latent variable ζ0 represents the random intercept while ζ1, which interacts with time t, represents the random slope (Skrondal & Rabe- Hesketh, 2004). This aspect of the directed graph represents a mixed effect growth model where the residual variance is constrained to be equal across occasions. The student-specific deviation in initial status and growth from the population mean, ζ0 and ζ1, are then used as latent covariates to predict reclassification at time t. Figure 1c combines Figure 1a and 1b by using student covariates (w and x) to condition English proficiency development and time to reclassification simultaneously. This theoretical model could be expanded to include a multivariate growth processes with multiple true scores if a state based reclassification on more than one criterion. This dissertation draws on a dataset from a district in Arizona to demonstrate the utility of joint models for English learner research. In particular, the first purpose of this research was to describe the relationship between the development of overall academic English proficiency and each language domain for students in the school district. The second goal was to develop a model that accurately predicts time to reclassification for these students. Specifically, the research questions were: 1. What is the functional form of overall academic English proficiency and for each language domain between Grades 3 and 7? 2. How do developments in each language domain correlate with each other? 3. Does a shared random effects model yield more accurate predictions of time to reclassification than a conventional discrete-time survival model? 18 For Research Question 1, scores from the overall English language proficiency assessment and from each language domain assessment will be modeled across Grades 3 to 7 using a mixed effect modeling framework. For Research Question 2, each of the univariate models from Research Question 1 will be combined to form a multivariate random effects model. Research Question 3 will evaluate the predictive accuracy of three models, a discrete-time survival model for time to reclassification and two shared random effects models for the overall English language proficiency assessment scores and time to reclassification. 19 CHAPTER II METHODOLOGY Sample This study used extant data from one large urban school district in Arizona. The data were from a single cohort of English learners tracked longitudinally from third grade in academic year 2007-08 through seventh grade in 2011-121. During this time, Arizona implemented an English-only instruction law resulting in homogeneity of language programs across schools (Ga´ndara & Orfield, 2010). For each student, annual English language proficiency assessment scores were collected as well as indicators of classification status. Due to the nature of the English learner classification, the data were highly unbalanced where some students may contribute data for only one time point while others contributed observations for the entire time span. The Grade 3 English learner cohort consisted of 277 students, or 20.77% of all third grade students in the district. All 277 English learners were enrolled in self-contained ELD classes. By Grade 7, there were 20 English learners remaining. Of the 20 English learners, 19 were in self-contained English language development classes and 1 was in a classroom with Non-EL students. There were 19 schools serving elementary Grades 3 - 5 and six schools serving middle school Grades 6 and 7. Elementary schools served between 37 and 121 students in Grade 3 and middle schools served between 47 and 297 students in Grade 7. English learners were nested within 18 of the 19 elementary schools and made up between 1.39% and 42.15% of students in their school. In seventh grade, 1Grade 8 data was excluded from the analysis because Arizona implemented a new English proficiency assessment in the 2012-13 academic year and there was limited information regarding construct and scale alignment between the two assessments. 20 English learners attended five of the six middle schools and made up between 11.02% and 26.46% of students in their school. Variables Outcomes. The outcome variables for this study included students’ scores on the Arizona English Language Learner Assessment (AZELLA) as well as the binary indicator of student reclassification from English learner to fluent English proficient. The AZELLA is Arizona’s annual English language proficiency assessment and consists of four sub-tests: reading, writing, listening, and speaking. Each sub-test is a vertically scaled, IRT based assessment that consists of multiple choice, constructed response, short response, and extended response items (Harcourt, 2007). The listening and speaking tests are combined to produce an oral language proficiency score. In addition, items from all four domains are combined to produce an overall English proficiency score, referred to as the total score. The total score is not a composite scores, but an independently constructed and vertically scaled IRT-based score that has its own conditional standard errors with no direct reference to the domain scores (Harcourt, 2007). The vertical scale of the sub-tests and the total score provide a foundation for analysis of the measures across grades. The binary reclassification indicator was coded 0 for each grade a student was classified as an English learner and was coded 1 for the first grade in which a student was reclassified as English language proficient. After reclassification, the student was no longer tracked. For these data, a student was reclassified when they earned an AZELLA total score in the proficient category. The AZELLA technical manual indicates that the total score of the 3 to 5 grade band test is composed of 31% speaking items, 19% listening items, 23% reading items and 27% writing items 21 Table 1. Grade of reclassification for the sample ( N = 277) Reclassified Censored Grade n (%) n (%) 3 6 (2.17) 2 (0.72) 4 130 (46.93) 31 (11.19) 5 41 (14.80) 11 (3.97) 6 40 (14.44) 6 (2.17) 7 4 (1.44) 6 (2.17) N ote. Grade of reclassification refers to the last year a student was classified as an English learner. compared to 30% speaking, 18% listening, 25% reading and 28% writing items for the 6 to 8 grade band test (Harcourt, 2007). This study used AZELLA reading, writing, oral language, and total scores. The AZELLA scale was rescaled by dividing observed scores by 10 for computational purposes. Figure 2 plots the median, first and third quartile, and minimum and maximum rescaled score at each grade for each of the AZELLA domain scores and the total score. However, it is important to realize that English learners reclassify due to their performance on the AZELLA, so each subsequent grade contained fewer students, and those students remaining do not score as high as those students who reclassified. Table 1 provides the frequencies and percentages of students who reclassify at the end of each grade. Censored students are either those students who do not reclassify by the end of seventh grade or who were lost to follow-up due to begin retained a grade or leaving the district. The largest number of students reclassified in Grade 4 (130), and 56 students were censored by the end of Grade 7. 22 3 4 5 6 7 50 55 60 65 70 75 80 Grade AZ EL LA s co re d ist rib u tio ns Total Reading Writing Oral Figure 2. Distributions for AZELLA total and sub-test scores by grade 23 Covariates. In addition to the outcome measures, information pertaining to home language, race/ethnicity, free/reduced priced lunch status, special education status, and sex was collected (see Table 2). Among the sample of English learners, most students reported Spanish as their home language (93.5%). The next most common language was generically described as Other, Non- Indian (1.81%) and the other languages spoken by students in the cohort included Malay (0.36%), Navajo (0.36%), Somali (0.72%), Thai (0.36%), Urdu (0.36%), Vietnamese (0.72%), and Yaqui (0.36%). Within the cohort, most English learners identified as Hispanic (88.45%). While all Hispanic English learners reported their home language to be Spanish, 88.45% of students who reported speaking a home language of Spanish also identified as Hispanic. Because Spanish is the dominant language in the district, Table 2 pools home language into Spanish and non-Spanish. However, preliminary analyses were unable to detect any differences by home language or by race/ethnicity for this sample and therefore will not be discussed any further. Free/reduced price lunch status (FRL) was used as a proxy measure for socioeconomic status. However, eligibility for free/reduced price lunch can vary from year to year based on family income levels. For example, student 54 in Figure 3 did not receive FRL in Grade 3, did in Grades 4 and 5, did not in Grade 6, but did so again in Grade 7. Across the sample, 90.25% of students were eligible for FRL in Grade 3 and 84.48% were eligible every year between Grades 3 and 7. Likewise, 9.75% of students were not eligible to receive FRL in Grade 3 but only 3.97% of students remained ineligible for FRL until Grade 7. A sensitivity analysis indicated little difference in mean overall AZELLA scores depending on FRL status. Thus, FRL was coded as ever being eligible for FRL as 1 and never 24 being eligible for FRL as 0. Furthermore, preliminary analysis was unable to detect any differences between these groups and will not be discussed further. Like free/reduced price lunch status, classification as a student with a disability (SWD) is a time-varying process. While the SWD status for most English learners was invariant over time, some students in the sample were identified with a disability one year, and not the next. Across the sample, 83.75% of students were never diagnosed with a disability and 8.30% were classified as a SWD every year between Grades 3 and 7. The remaining 7.95% transitioned over time. For example, student 150 in Figure 3 was classified as a SWD in Grades 3, 4, and 7, but not in Grades 5 and 6. Because the group of students with varying SWD status was small, sensitivity analysis suggested little difference in mean overall AZELLA scores depending on the definition of FRL. Thus, SWD was coded those English learners who were ever classified as a SWD as 1 and those English learners who were never classified as a SWD as 0. Finally, the sample was 48.38% female and 51.62% male. For the analysis, a code of 0 indicated female English learners while a code of 1 indicated male English learners. Missing data As noted above, the data were highly unbalanced as students who met the reclassification criteria were no longer tested. Much of this imbalance should not be considered missing data as English proficiency scores were never intended to be collected from those students who reclassify. Covariates such as SWD or FRL were missing only when a student’s entire record for that year was missing. Of the 277 English learners, 50 (18.05%) were lost to follow-up. Table 1 indicates two English learners were lost after Grade 3, 31 were lost after Grade 4, 11 were lost after Grade 5, and six were lost after Grade 6. Within each grade, Little’s missing 25 Grade l l l l l l l l l lN Y N Y 3 4 5 6 7 FRL SWD Grade Student 5 Grade l l l l l l l l l lN Y N Y 3 4 5 6 7 FRL SWD Grade Student 15 Grade l l l l l l l l l lN Y N Y 3 4 5 6 7 FRL SWD Grade Student 54 Grade l l l l l l l l l lN Y N Y 3 4 5 6 7 FRL SWD Grade Student 127 Grade l l l l l l l l l lN Y N Y 3 4 5 6 7 FRL SWD Grade Student 134 Grade l l l l l l l l l lN Y N Y 3 4 5 6 7 FRL SWD Grade Student 142 Grade l l l l l l l l l l N Y N Y 3 4 5 6 7 FRL SWD Grade Student 150 Grade l l l l l l l l l lN Y N Y 3 4 5 6 7 FRL SWD Grade Student 177 Grade l l l l l l l l l lN Y N Y 3 4 5 6 7 FRL SWD Grade Student 224 l l l l l l l l l l N Y N Y 3 4 5 6 7 FRL SWD Grade Student 233 l l l l l l l l l lN Y N Y 3 4 5 6 7 FRL SWD Grade Student 235 l l l l l l l l l lN Y N Y 3 4 5 6 7 FRL SWD Grade Student 277 Figure 3. A random sample of students illustrating the variation in free/reduced price lunch status and special education status between Grades 3 and 7 26 Table 2. Demographic characteristics of the sample ( N = 277) Characteristic n % Free or reduced price lunch No 11 3.97 Yes 266 96.03 Sex Female 134 48.38 Male 143 51.62 Student with disabilities No 232 83.75 Yes 45 16.25 Race / Ethnicity American Indian or Alaska Native 15 5.42 Asian 9 3.25 Black or African American 5 1.81 Hispanic 245 88.45 White 3 1.08 Home language Spanish 259 93.50 Other 18 6.50 completely at random (MCAR) test (Little, 1988) was not statistically significant (p > .05) indicating that within each grade, missing data were a random sample of the complete data. However, this does not guarantee that missing data across time were MCAR. Missing data were handled using likelihood-based estimation assuming an ignorable missing data mechanism (Little, 1995; Rubin, 1976). For all growth models, each student contributed all available data. For all survival models, students who were lost to follow-up and thus, never experienced the event of 27 reclassification, were censored assuming they were non-informative. It is important to note, however, that there is no empirical procedure to determine if a missing mechanism (or censoring mechanism) is ignorable. Molenberghs, Beunckens, Sotto, and Kenward (2008) proved that any NMAR model fit to observed data can be reproduced exactly by an MAR counterpart. Analysis All models were fit within the probabilistic modeling language Stan (Gelman, Lee, & Guo, 2015) using Hamiltonian Monte Carlo estimation (Hoffman & Gelman, 2014). For these data, there were k = 1, . . . , 18 schools, j = 1, . . . , nk students in school k, and i = 1, . . . , njk observations for subject j in school k. For each student j in school k, consider the vectors of repeated measures outcomes, yOjk, yRjk, yWjk, for the oral, reading, and writing AZELLA sub-tests, respectively, and yTjk being the AZELLA total score. Furthermore, consider the vector of reclassification indicators, rjk, where rtjk = 1 for the time t a student reclassified to fluent English proficient student, and 0 until reclassification occurred. For all analyses, model fit was evaluated using approximate leave-one-out cross validation (LOO) using Pareto smoothed importance sampling (Gelman, Hwang, & Vehtari, 2014; Vehtari, Gelman, & Gabry, 2016). LOO computes the expected log pointwise predictive density for a new dataset, elpd, and may be multiplied by −2 to be placed on the deviance scale. The deviance scaled elpd is referred to as the LOOIC and is a fully Bayesian information criterion that is viewed as an improvement over the deviance information criterion (DIC) and is more robust than the widely applicable or Watanabe-Akaike information criterion (WAIC) in the finite case with weak priors or influential observations. Model comparison may be conducted by taking the difference of the LOOCI, the 28 ∆LOOIC, and evaluate the difference in ratio to its standard error. Vehtari et al. (2016) suspect that this method of model comparison provides a better sense of uncertainty than conventional approaches that evaluate the difference of deviances in comparison to a χ2 distribution. However, Vehtari et al. warn that such a comparison should not be used to select a single model from a set of potential models. To that end, posterior probability distributions were also evaluated based on their 95% highest posterior density (95% HPD) interval. Research question 1. The first research question was addressed by fitting a series of univariate growth models for each of the four AZELLA outcomes using a systematic search for the best functional form in each case. The goal of this question was to understand the functional form of each outcome which guided model specification in the subsequent analyses. With measures nested in students and students nested in schools, I adopted the notation of (Skrondal & Rabe- Hesketh, 2004) to express the unit-level models generally as yijk = x ′ ijkβ + z (2)′ ijk ζ (2) jk + z (3)′ ijk ζ (3) k + ijk (2.1) Where β was the p × 1 vector of fixed effects corresponding to the known p × 1 vector of explanatory variables, xijk; z (2) ijk was the known q (2) × 1 vector of explanatory variables with student-specific random effects ζ (2) jk ; and z (3) ijk was the known q(3) × 1 vector of explanatory variables with school-specific random effects ζ (3) k . I assumed ζ (2) jk ∼ N (0,T(2)) and ζ(3)k ∼ N (0,T(3)) where T(2) and T(3) are q(2)× q(2) and q(3)× q(3) unstructured variance-covariance matrices, respectively, and it was assumed that ijk ∼ N (0, σ). For those models that excluded the school-level random effects, Equation 2.1 reduced to yijk = x ′ ijkβ + z (2)′ ijk ζ (2) jk + ijk (2.2) 29 Because so many students in the sample were reclassified between third and fifth grade, the number of potential models was limited. Both linear and quadratic functions for the fixed effects were assessed while variance components were specified for the intercept and liner growth component. Unless otherwise noted, a random variable, x, was U(a, b) where a ≤ x ≤ b. If x was a standard deviation, 0 < x and x < ∞. For computational efficiency, Equation 2.1 was specified such that the Cholesky factorized random effects were given independent N (0, 1) priors while T(2) and T(3), re-specified as correlation matrices, were given LKJ (1.5) priors. While an LKJ (1) prior results in a uniform density over all correlation matrices of a given order (Lewandowski, Kurowicka, & Joe, 2009), an LKJ (1.5) was found to drastically reduce the autocorrelation in the chains. The complete set of models specified for Research Question 1 and their formulation is provided in Appendix A. Research question 2. With univariate models specified, Research Question 2 required the joint analysis of the sub-test outcomes for the three measures in a multivariate random effects modeling framework, yRijk =x ′ RijkβR + z (2)′ Rijkζ (2) Rjk + z (3)′ Rijkζ (3) Rk + Rijk (2.3) yWijk =x ′ WijkβW + z (2)′ Wijkζ (2) Wjk + z (3)′ Wijkζ (3) Wk + Wijk (2.4) yOijk =x ′ OijkβO + z (2)′ Oijkζ (2) Ojk + z (3)′ Oijkζ (3) Ok + Oijk. (2.5) Focus was on the estimated student-level covariance matrix for the random effects which estimated the student-specific dependency among the three outcomes. I defined ζ (2) jk ≡ [ ζ (2)′ Rjk, ζ (2)′ Wjk, ζ (2)′ Ojk ]′ , as a q × 1 vector of student-specific random effects where q was the total number of student-specific random effects for all three submodels. Then I assumed ζ (2) jk to have a zero mean and variance-covariance 30 matrix T(2) =  τ (2) 11 τ (2) 12 · · · τ (2)1q τ (2) 21 τ (2) 22 · · · τ (2)2q ... ... . . . ... τ (2) q1 τ (2) q2 · · · τ (2)qq  where √ τ (2) pp was the standard deviation of ζ (2) jkp and ρpg = τ (2) pg / √ τ (2) pp τ (2) gg was the correlation between ζ (2) jkp and ζ (2) jkg for p 6= g. Priors for Equation 2.3, including T(2), were the same as those specified for Equation 2.1. The complete set of model specified for Research Question 2 and their formulation is provided in Appendix A. Research question 3. Research Question 3 focused on the predictive accuracy of time to reclassification for a conventional survival model and a shared random effects model. English learners are eligible for reclassification at discrete times, t = 1, 2, . . . , T . Consider the Tjk × 1 vector of reclassification indicators rjk, where rtjk = 1 when t = Tjk indicating student j in school k was reclassified to fluent English proficient, and was 0 until reclassification. The first model fit was a discrete-time survival model of the form: logit{P(rtjk = 1|wtjk, νk)} = logit(htjk) = w′tjkα+ νk (2.6) where htij was the hazard estimates at for student j in school k at time t, wtjk was a u× 1 vector of fixed effects, including indicators for each discrete time 1, . . . , T , α was a u×1 vector of corresponding fixed effects, and νk was the school-level random intercept. The second model fit was a shared random effects model which jointly estimated a growth submodel for the AZELLA total score as expressed in Equation 2.1 and a discrete-time survival submodel. The hazard submodel used the student-specific random effects estimated by the growth model as latent 31 Table 3. Classification table of cell counts or proportions True Outcome Predicted Outcome rtjk = 1 rtjk = 0 rˆtjk = 1 True Positive (TP) False Positive (FP) rˆtjk = 0 False Negative (FN) True Negative (TN) covariates to predict time to reclassification. The shared random effects model can be expressed as a system of equations: yT ijk = x ′ ijkβ + z (2)′ ijk ζ (2) jk + z (3)′ ijk ζ (3) k + ijk (2.7a) logit(htjk) = w ′ tjkα+ ζ (2)′ jk λ+ νk (2.7b) where ζ (2) jk in Equation 2.7b was a q (2) × 1 matrix of student-specific random effects estimated by the overall English proficiency growth model specified in Equation 2.7a, and λ was a q(2) × 1 vector of fixed effects corresponding to the latent covariates. Priors for Model 2.7a were the same as those specified for Model 2.1. Due to the complexity of the shared-parameter model however, priors for Model 2.7b, in particular α and λ, were specified to constrain the support of the parameters (Gelman, Jakulin, Pittau, & Su, 2008). Specifically, α and λ were specified with N (µ, σ) where −3 ≤ µ ≤ 3, and 6 ≤ σ ≤ 10, which, on the logit scale, provided ample space for the data to dominate the posterior estimates. The complete set of models specified for Research Question 3 and their formulation is provided in Appendix A. The predictive accuracy of the models can be understood through the cross-classification of (rtjk, rˆtjk). Here, rˆtjk = 1 when htjk ≥ pi where pi is some probability threshold between 0 and 1. This cross-tabulation results in four 32 conditions: (a) true positives, (b) false positives, (c) true negatives, and (d) false negatives as seen in Table 3. False positives, or type I errors, are the counts or proportions of those subjects who have not reclassified at time t but were predicted to reclassify based on the model prediction, (rtjk = 0, rˆtjk = 1). False negatives, or type II errors, are the counts or proportions of those students who do reclassify at time t but were not predicted to do so by the model, (rtjk = 1, rˆtjk = 0). The true positive rate (TPR) for the model, defined as P(rˆtjk = 1|rtjk = 1), gives the probability that the model correctly predicted reclassification at time t. The true negative rate (TNR) for the model, defined as P(rˆtjk = 0|rtjk = 0), gives the probability that the model correctly predicted those students who were not ready to reclassify at time t. The overall classification accuracy of the model is then given by the probability that the model made an accurate prediction, P(rˆtjk = 1|rtjk = 1)P(rtjk = 1) + P(rˆtjk = 0|rtjk = 0)P(rtjk = 0) . The receiver operating characteristic (ROC) curve was generated by computing the TPR and TNR for 0 ≤ pi ≤ 1 and provides complete information on the set of all possible TPR/TNR rates. The area under the ROC curve was used to compare the predictive power of the hazard models. 33 CHAPTER III RESULTS In what follows, the notation for all models fit to answer the three research questions were suppressed and are presented in Appendix A. All models were estimated using four chains run for 2500 iterations using the first 1250 iterations for warm-up in each chain, resulting in 5000 samples from the posterior distribution. Evidence of convergence included 0.99 < Rˆ < 1.01 as well as confirming evidence from visual inspection of trace plots. Appendix B provides the convergence evidence for the five total score models and readers interested in reviewing additional convergence evidence are encouraged to contact the author. English Language Proficiency Development A series of mixed effect models were fit to each of the AZELLA tests to understand how AZELLA English proficiency scores developed across Grades 3 to 7 in an Arizona school district. The same model building protocol was followed for the AZELLA total score and each domain score. The first model specified linear development across grades and allowed intercepts and slopes to vary between students. The second model added a quadratic term to the fixed part of the model to determine if development was non-linear across grades. After the curvature of development was determined, the third model allowed the intercepts and slopes to vary between students and schools. The fourth model added student covariates to the model. Finally, the fifth model altered the specification of the priors used in the fourth model to assess the sensitivity of posteriors to the choice of model priors. Total English proficiency. As noted above, five models were fit to identify the best model that described the functional form for overall English 34 proficiency development as measured by the AZELLA total score. Results for AZELLA total score growth models are shown in Table 4. Linear growth model. The first model included fixed and student- specific random effects for the intercept, Grade 3 AZELLA total score, and linear growth in AZELLA total score across grades. Based on the estimates from the linear growth model, the average third grade English learner had a score of 63.72 on the AZELLA total English proficiency test, β1 = 63.72, 95% HPD = [63.48, 63.97], and had an increase of 3 points per year, β2 = 3.00, 95% HPD = [2.78, 3.22]. Third grade test scores were found to vary between students by a standard deviation of 1.5 points, √ τ (2) 11 = 1.54, 95% HPD = [1.31, 1.77], while annual growth was found to vary between students by a standard deviation of 0.62 points, √ τ (2) 22 = 0.62, 95% HPD = [0.44, 0.83]. The correlation between initial status and linear growth on the AZELLA total English proficiency test was τ (2) 21 / √ τ (2) 11 τ (2) 22 = .75, 95% HPD = [.43, .97]. Finally, the estimated within- student residual √ σ was 1.48, resulting in an estimated proportion of total variance between-students of .41. The linear growth model provides a benchmark for comparing the model fit of subsequent models. The elpdpsis-loo was -1261.22 with a standard error of 7.40, which on the deviance scale translated to a LOOIC of 2522.44 with standard error 14.80. The estimated effective number of parameters, ploo, was 42.06 with a standard error of 2.02. Quadratic growth model. The second model added a quadratic term to describe potential non-linearity in average AZELLA total score growth. In the quadratic growth model, the average student was estimated with a score of 63.81 in Grade 3 and had an initial linear growth of 2.44 points and an initial 35 quadratic acceleration of 0.32 points , β3 = 0.32, 95% HPD = [0.17, 0.46]. This model included a quadratic term in the fixed part of the model only as there was insufficient within-student information to estimate variability in the curvature. The quadratic model was a better fitting model than the linear model based on two criteria. First, given the data, the 95% HPD did not contain zero, rather, there was a 95% chance that the parameter for the quadratic term was between 0.17 and 0.46. Second, the LOOIC for the quadratic growth model was less than the LOOIC for linear growth model, ∆LOOIC = −30.79, SE = 4.2, suggesting that the quadratic growth model would produce better out-of-sample predictions than the linear growth model. School-level model. The third model extended the quadratic growth model by further partitioning the initial status and linear growth variance into between-student and between-school components. With the school-level model, the estimates for the fixed effects were unchanged but their interpretation evolved. The average student in the average school was estimated to have an AZELLA total score of 63.88 in Grade 3 and an initial linear change of 2.37 points and an initial quadratic acceleration of 0.36 points. In addition to intercept and linear change varying between students, third grade test scores varied between schools by a standard deviation of 0.39 points,√ τ (3) 11 = 0.39, 95% HPD = [0.03, 0.88], while linear growth varied between schools by a standard deviation of 0.54 points, √ τ (3) 22 = 0.54, 95% HPD = [0.15, 1.01]. The correlation between school-specific intercepts and slopes was estimated to be 0, τ (3) 21 / √ τ (3) 11 τ (3) 22 = −.02, 95% HPD = [−0.77, 0.79]. The estimated proportion of total variance between-schools was .10. 36 Determining if the school-level model fit the data better than the quadratic growth model was less straight forward. First, the 95% HPD for the between-school intercept and slope variance components did not contain zero, which favored their inclusion in the model. However, the 95% HPD for the intercepts-slopes correlation did contain 0. Furthermore, the ∆LOOIC of 41.0, SE = 4.0, suggested that the quadratic growth model would make better out-of-sample predictions than the school-level model. For these data with a limited number of schools, one could argue for the more parsimonious quadratic growth model (Bates, Kliegl, Vasishth, & Baayen, 2015). However, by including the school-level variance components, the student-specific random effects, which are the primary research interest, were arguably better estimated as they did not contain school-level variance. Additionally, when using a Bayesian estimation approach, there is no harm done by fitting a model with a full variance-covariance matrix for both students and schools. School-level model with covariates. The fourth model extended the fixed part of the school-level model by adding two time-invariant dichotomous covariates, ever identified with a disability (SWDe) and sex, to the model. SWDe assigned the value of 0 to those English learners who were never identified with a disability up to Grade 7, and 1 to those English learners who had ever been identified with a disability up to Grade 7. Sex assigned the value of 0 to female English learners and 1 to male English learners. Controlling for all other covariates, English learners who were ever identified with a disability were expected to score 2.37 points lower in third grade than those English learners who were never identified with a disability, β4 = −2.37, 95% HPD = [−2.93, −1.80]. An interaction between SWDe and grade was also estimated, indicating that English learners who were ever identified 37 with a disability were expected to grow linearly by .45 points less than those English learners who were never identified with a disability, controlling for all other covariates, β5 = −0.45, 95% HPD = [−0.85, −0.05]. Finally, male English learners were estimated to score 1.23 points lower in third grade than their female peers, controlling for other covariates, β6 = −1.23, 95% HPD = [−1.68, −0.78]. Evaluation of the 95% HPD and the change in the LOOIC of −48.54, SE = 4.8, supported the school-level model with covariates over school-level model without. Although the inclusion of the grade by SWDe improved model fit, it is important to note that without probing the interaction between Grade and SWDe, this analysis cannot confirm that students identified with a disability grew at a different rate for the entirety of the study (Bauer & Curran, 2010; Stevens & Schulte, 2016). Figure 4a is a plot the estimates from the school-level model with covariates illustrating the average English proficiency development for female and male students who were ever or never identified with a disability against the grade- specific total English proficiency benchmarks. Additionally, the total scores from the 277 English learners were plotted. Females never identified with a disability (green solid line), on average, met the total English proficiency benchmark by fourth grade. Males never identified with a disability (orange solid line), on average, met the benchmark by fifth grade. Females ever identified with a disability (green dashed line) also, on average, met the benchmark by fifth grade. Lastly, males ever identified with a disability (orange dashed line), on average, met the benchmark by sixth grade. All female and male English learners never identified with a disability met the total English proficiency benchmark by sixth grade while some females and males ever identified with a disability had yet to meet the benchmark by the end of seventh grade. 38 Weakly informative prior model. The last model as shown in Table 4 was identical to the school-level model with covariates but specified weakly informative priors for the fixed effects. That is, under the school-level model with covariates, β4, β5, β6 were each specified with the non-informative prior U(−∞, ∞). The weakly informative prior model, however, employed the priors β4 ∼ N (0, 5), β5 ∼ N (0, 2.5), β6 ∼ N (0, 5). Application of this model, however, resulted in parameter estimates that were unchanged. Therefore, I concluded that the school- level model with covariates was not sensitive to the choice of reasonable priors. Domain-specific proficiency models. The same model fitting process described for the AZELLA total score was also used to analyze the AZELLA domain scores. For brevity, only the final models for each outcome were presented in Table 5. Results for the preceding steps in model fitting produced similar results as those found and just described for the AZELLA total scores. Complete results for AZELLA reading, writing, and oral proficiency growth models are shown in Tables C.1, C.2, and C.3 of Appendix C. Reading proficiency. The models fit to the AZELLA reading domain score found that, unlike the total score models, growth in reading was linear. That is, neither the 95% HPD for the quadratic effect of time nor the LOOIC of the quadratic growth model favored the additional fixed effect. For the school- level model, the 95% HPD for the between-school intercept and slope variances did not contain zero, nor did the values of the LOOIC suggest that either model with or without school-level variances predicted new data better, ∆LOOIC = −1.57, SE = 4.6. Like the total score models, the between-school intercept and slope correlation was imprecise and included zero in the 95% HPD , τ (3) 21 / √ τ (3) 11 τ (3) 22 = −.24, 95% HPD = [−0.93, 0.80]. 39 T ab le 4. P ar am et er es ti m at es fo r th e to ta l E n gl is h la n gu ag e pr ofi ci en cy gr ow th m od el s M o d el 1 M o d el 2 M o d el 3 M o d el 4 M o d el 5 M 9 5 % H P D M 9 5 % H P D M 9 5 % H P D M 9 5 % H P D M 9 5 % H P D F ix ed E ff ec ts β 1 [I n it ia l S ta tu s] 6 3 .7 2 [6 3 .4 8 , 6 3 .9 7 ] 6 3 .8 1 [6 3 .5 7 , 6 4 .0 5 ] 6 3 .8 8 [6 3 .5 6 , 6 4 .2 6 ] 6 4 .4 6 [6 4 .1 4 , 6 4 .8 0 ] 6 4 .4 6 [6 4 .1 2 , 6 4 .8 0 ] β 2 [G ra d e] 3 .0 0 [2 .7 8 , 3 .2 2 ] 2 .4 4 [2 .0 9 , 2 .7 9 ] 2 .3 7 [1 .9 1 , 2 .8 2 ] 2 .5 7 [2 .1 3 , 3 .0 0 ] 2 .5 7 [2 .1 4 , 2 .9 8 ] β 3 [G ra d e2 ] 0 .3 2 [0 .1 7 , 0 .4 6 ] 0 .3 6 [0 .2 1 , 0 .5 0 ] 0 .3 4 [0 .1 9 , 0 .4 8 ] 0 .3 4 [0 .1 9 , 0 .4 8 ] β 4 [S W D e] −2 .3 7 [− 2 .9 3 , −1 .8 0 ] −2 .3 7 [− 2 .9 2 , −1 .8 1 ] β 5 [G ra d e] × [S W D e] −0 .4 5 [− 0 .8 5 , −0 .0 5 ] −0 .4 5 [− 0 .8 6 , −0 .0 5 ] β 6 [M a le ] −1 .2 3 [− 1 .6 8 , −0 .7 8 ] −1 .2 4 [− 1 .6 9 , −0 .8 1 ] V a ri a n ce C o m p o n en ts √ τ( 2 ) 1 1 1 .5 4 [1 .3 1 , 1 .7 7 ] 1 .4 6 [1 .2 4 , 1 .6 9 ] 1 .4 5 [1 .2 3 , 1 .6 7 ] 1 .1 4 [0 .9 3 , 1 .3 6 ] 1 .1 4 [0 .9 2 , 1 .3 7 ] √ τ( 2 ) 2 2 0 .6 2 [0 .4 4 , 0 .8 3 ] 0 .8 9 [0 .6 5 , 1 .1 3 ] 0 .8 7 [0 .6 4 , 1 .1 2 ] 0 .6 6 [0 .4 3 , 0 .9 0 ] 0 .6 6 [0 .4 4 , 0 .8 9 ] τ (2 ) 2 1 / √ τ( 2 ) 1 1 τ (2 ) 2 2 0 .7 5 [0 .4 3 , 0 .9 7 ] 0 .7 7 [0 .4 9 , 0 .9 7 ] 0 .7 8 [0 .5 1 , 0 .9 7 ] 0 .6 7 [0 .2 8 , 0 .9 6 ] 0 .6 7 [0 .2 7 , 0 .9 6 ] √ τ( 3 ) 1 1 0 .3 9 [0 .0 3 , 0 .8 8 ] 0 .2 4 [0 .0 1 , 0 .6 0 ] 0 .2 3 [0 .0 1 , 0 .6 0 ] √ τ( 3 ) 2 2 0 .5 4 [0 .1 5 , 1 .0 1 ] 0 .4 5 [0 .1 1 , 0 .8 7 ] 0 .4 5 [0 .1 3 , 0 .8 7 ] τ (3 ) 2 1 / √ τ( 3 ) 1 1 τ (3 ) 2 2 0 .4 9 [− 0 .7 7 , 0 .7 9 ] 0 .5 2 [− 0 .8 3 , 0 .8 6 ] 0 .5 1 [− 0 .8 1 , 0 .8 5 ] √ σ 1 .4 8 [1 .3 7 , 1 .6 0 ] 1 .4 0 [1 .2 9 , 1 .5 2 ] 1 .3 7 [1 .2 6 , 1 .4 9 ] 1 .3 8 [1 .2 6 , 1 .5 0 ] 1 .3 8 [1 .2 7 , 1 .5 0 ] M o d el F it E st im a te (S E ) E st im a te (S E ) E st im a te (S E ) E st im a te (S E ) E st im a te (S E ) el p d p si s- lo o −1 2 6 1 .2 2 (7 .4 0 ) −1 2 4 5 .8 2 (7 .0 5 ) −1 2 6 6 .3 1 (6 .5 5 ) −1 2 4 2 .0 4 (7 .4 7 ) −1 2 4 1 .5 7 (7 .4 8 ) p lo o 4 2 .0 6 (2 .0 2 ) 3 7 .5 2 (1 .9 2 ) 2 8 .2 4 (1 .4 3 ) 3 4 .5 6 (1 .8 3 ) 3 4 .5 7 (1 .8 4 ) L O O IC 2 5 2 2 .4 4 (1 4 .8 0 ) 2 4 9 1 .6 5 (1 4 .1 0 ) 2 5 3 2 .6 2 (1 3 .1 0 ) 2 4 8 4 .0 8 (1 4 .9 4 ) 2 4 8 3 .1 3 (1 4 .9 6 ) 40 Table 5. Parameter estimates for final reading, writing, and oral English language proficiency growth models Reading 4 Writing 4 Oral 4 M 95% HPD M 95% HPD M 95% HPD Fixed Effects β1 [Initial Status] 63.98 [63.50, 64.48] 65.35 [64.85, 65.90] 64.53 [63.89, 65.15] β2 [Grade] 3.36 [2.89, 3.82] 3.06 [2.42, 3.68] 2.16 [1.51, 2.79] β3 [Grade 2] 0.26 [0.05, 0.46] 0.57 [0.31, 0.83] β4 [SWDe] −3.47 [−4.33, −2.63] −3.73 [−4.58, −2.89] −1.41 [−2.18, −0.68] β5 [Grade] × [SWDe] −0.18 [−0.75, 0.40] −1.27 [−1.86, −0.67] −0.08 [−0.63, 0.46] β6 [Male] −0.64 [−1.17, −0.10] −1.18 [−1.82, −0.53] −1.65 [−2.49, −0.85] Variance Components√ τ (2) 11 1.69 [1.35, 2.06] 1.82 [1.42, 2.20] 0.88 [0.51, 1.25]√ τ (2) 22 0.64 [0.23, 1.04] 1.05 [0.69, 1.40] 1.37 [0.90, 1.86] τ (2) 21 / √ τ (2) 11 τ (2) 22 0.28 [−0.25, 0.90] 0.04 [−0.33, 0.53] 0.80 [0.42, 0.98]√ τ (3) 11 0.29 [0.01, 0.81] 0.41 [0.04, 0.93] 0.91 [0.45, 1.62]√ τ (3) 22 0.55 [0.14, 1.08] 0.65 [0.27, 1.16] 0.50 [0.03, 1.25] τ (3) 21 / √ τ (3) 11 τ (3) 22 0.39 [−0.93, 0.77] 0.28 [−0.96, 0.51] 0.44 [−0.81, 0.75] √ σ 2.24 [2.05, 2.42] 2.15 [1.96, 2.36] 2.33 [2.15, 2.54] Predictive Accuracy Estimate (SE) Estimate (SE) Estimate (SE) elpdpsis-loo −1513.84 (11.00) −1500.41 (9.32) −1528.71 (11.13) ploo 64.91 (3.82) 62.21 (3.50) 47.42 (3.61) LOOIC 3027.67 (22.00) 3000.82 (18.63) 3057.42 (22.26) 41 55 60 65 70 75 80 85 AZ EL LA 3rd 4th 5th 6th 7th l l Female, Never SWD Male, Never SWD Female, Ever SWD Male, Ever SWD Benchmark (a) Total 55 60 65 70 75 80 85 3rd 4th 5th 6th 7th (b) Reading 55 60 65 70 75 80 85 Grade AZ EL LA 3rd 4th 5th 6th 7th (c) Writing 55 60 65 70 75 80 85 Grade 3rd 4th 5th 6th 7th (d) Oral (Listening + Speaking) Figure 4. Fitted prototypical trajectories by subgroup for the total AZELLA score and each sub-test. The solid black lines represent the test-specific proficiency benchmarks. The points represent the data used to fit the models. 42 The school-level model with covariates showed students who were ever identified with a disability were expected to score 3.5 points lower on the Grade 3 AZELLA reading test than students never identified with a disability, β4 = −3.47, 95% HPD = [−4.33, −2.63]. There, was little evidence, given the data, that students who were ever identified with a disability grew at a different rate than those never identified with a disability, β5 = −0.18, 95% HPD = [−0.75, 0.40]. Finally, males were estimated to score 0.6 points lower on the Grade 3 reading sub-test than females, controlling for other covariates, β6 = −0.64, 95% HPD = [−1.17, −0.10]. Figure 4b is a plot of the estimates from the school-level model with covariates illustrating the average reading proficiency development for female and male students ever or never identified with a disability against the grade- specific reading proficiency benchmarks. Additionally, the reading domain scores from the 277 English learners were plotted. Females and males never identified with a disability (green solid line, orange solid line), on average, met the reading proficiency benchmark by fifth grade. Females and males ever identified with a disability (green dashed line and orange dashed line), on average, met the benchmark by sixth grade. Writing proficiency. Like the total score models, the models fit to the AZELLA writing domain scores supported a quadratic term to describe the non- linearity in annual growth. Support for the inclusion of between-school variance components was similar to that of the total score model as well. The school-level model with covariates estimated that students who were ever identified with a disability were expected to score 3.5 points lower on the Grade 3 AZELLA writing sub-test, β4 = −3.73, 95% HPD = [−4.58, −2.89], and had an initial linear growth 43 of 1.3 points less than students never identified with a disability, controlling for other covariates, β5 = −1.27, 95% HPD = [−1.86, −0.67]. Finally, males were estimated to score 1.2 points lower than females on the Grade 3 AZELLA writing sub-test, controlling for other covariates, β6 = −1.18, 95% HPD = [−1.82, −0.53]. Figure 4c is a plot of the estimates from the school-level model with covariates illustrating the average writing proficiency development for female and male student ever or never identified with a disability against the grade-specific writing proficiency benchmarks. Additionally, the writing domain scores from the 277 English learners were plotted. Females and males never identified with a disability (green solid line and orange solid line), on average, met the total English proficiency benchmark by fourth grade. Females and males ever identified with a disability (green dashed line add orange dashed line), on average, met the benchmark by sixth grade. Oral proficiency. The last outcome analyzed was the AZELLA oral English proficiency sub-test, which combined speaking and listening domains. A quadratic growth model provided better fit than a linear growth model as demonstrated by both the 95% HPDs for β3, [0.22, 0.74], and LOOIC comparison, ∆LOOIC = −68.65, SE = 7.6. Likewise, 95% HPD and the LOOIC comparison between the quadratic growth model and the school-level model supported the inclusion of between-school intercept and slope variance components, ∆LOOIC = −14.9SE = 7.4. In the school-level model with covariates, students who were ever identified with a disability scored 1.4 points lower on the Grade 3 AZELLA writing test than students never diagnosed with a disability, controlling for other covariates, β4 = −1.41, 95% HPD = [−2.18, −0.68]. Further, males scored 44 1.7 points lower on the Grade 3 oral sub-test than females, controlling for other variables, β6 = −1.65, 95% HPD = [−2.49, −0.85]. Figure 4d is a plot of the estimates from the school-level model with covariates illustrating the average oral proficiency development for female and male student ever or never identified with a disability against the grade-specific oral proficiency benchmarks. Additionally, the oral sub-test scores from the 277 English learners were plotted. Females ever and never identified with a disability and males never identified with a disability (green solid line, green dashed line, and orange solid line), on average, met the total English proficiency benchmark by fourth grade. Males ever identified with a disability (orange dashed line), on average, met the benchmark by fifth grade. Correlation Among Domain-Specific English Proficiency Development Three multivariate mixed effect models were fit to the AZELLA reading, writing and oral domain scores. The first model combined the three school-level models described above, the second model combined the three school-level models with covariates described above, and the third model combined the three weakly informative prior models described above. The fixed effect estimates for the multivariate models mirrored the estimates from the univariate analyses so they will not be reinterpreted here. Results may be found in Table C.4 in Appendix C. To address Research Question 2, I focused on the variation and covariation among student-specific random effects from the multivariate school-level model with covariates. To facilitate interpretation, the six parameters that estimated between- student variation in each intercept and slope were presented as standard deviations, and the 15 parameters that estimated the covariation between those intercepts and slopes were presented as correlations (see Table 6). 45 Table 6. Estimates of student-specific variances (as standard deviations) and correlations of the random intercepts and slopes for the final multivariate mixed effect model for reading, writing, and oral English proficiency growth 1. ζ (2) R1i 2. ζ (2) R2i 3. ζ (2) W1i 4. ζ (2) W2i 5. ζ (2) O1i 6. ζ (2) O2i 1. ζ (2) R1i [Reading Intercept] 1.77 (0.17) 2. ζ (2) R2i [Reading Slope] 0.01 0.79 (0.20) (0.18) 3. ζ (2) W1i [Writing Intercept] 0.82 0.04 1.87 (0.07) (0.19) (0.17) 4. ζ (2) W2i [Writing Slope] 0.06 0.30 −0.04 1.13 (0.17) (0.20) (0.18) (0.18) 5. ζ (2) O1i [Oral Intercept] 0.54 0.34 0.66 0.00 0.99 (0.15) (0.22) (0.13) (0.21) (0.17) 6. ζ (2) O2i [Oral Slope] 0.34 0.48 0.41 −0.12 0.63 1.32 (0.14) (0.18) (0.13) (0.18) (0.16) (0.24) N ote. The numbers are the posterior means and standard deviations (in parentheses). The standard deviations for the six intercepts and slopes were similar to those of their univariate counterparts. None of the 95% HPDs for the six variances contained 0. Like the oral English proficiency univariate model, variation in oral proficiency growth was estimated to be greater than the variation in third grade oral proficiency, √ τ (2) 55 = 0.99, 95% HPD = [0.66, 1.33] and √ τ (2) 66 = 1.32, 95% HPD = [0.87, 1.80], respectively. The means of all 15 student-specific correlation posteriors were greater than 0. Only seven of the 15, however, were estimated with a 95% HPD that did not include zero. The correlations between intercepts and slopes for the same outcome also mirrored the findings from the univariate models. The correlation between oral proficiency intercepts and oral proficiency slopes was the only one of the three to have a 95% HPD that did not include zero, τ (2) 65 / √ τ (2) 55 τ (2) 66 = .70, 95% HPD = [.40, .91]. 46 All three initial status correlation parameters were positively correlated above .5. The strongest correlation was between third grade reading proficiency and third grade writing proficiency, τ (2) 31 / √ τ (2) 11 τ (2) 33 = .82, 95% HPD = [.66, .94], suggesting students with a high third grade reading score would also have a high third grade writing score. The next strongest correlation was between third grade writing proficiency and third grade oral proficiency, τ (2) 53 / √ τ (2) 33 τ (2) 55 = .66, 95% HPD = [.39, .87]. While still greater than .5, the correlation between the third grade reading proficiency and third grade oral proficiency was the weakest among the three, τ (2) 51 / √ τ (2) 11 τ (2) 55 = .54, 95% HPD = [.24, .80]. The only correlation among the three linear growth random effects that was estimated with a 95% HPD that did not cross zero was between reading and oral proficiency, τ (2) 62 / √ τ (2) 22 τ (2) 66 = .48, 95% HPD = [.08, .80], indicating that students who made above average growth in reading were also likely to make above average growth in listening and speaking development. There was no correlation between writing growth and oral proficiency growth, τ (2) 64 / √ τ (2) 44 τ (2) 66 = −.12, 95% HPD = [−.46, .26], or between reading growth and writing growth, τ (2) 42 / √ τ (2) 22 τ (2) 44 = .30, 95% HPD = [−.11, .66]. Figure 9 is a plot of the fitted subject-specific developmental trajectories (mean and 95% HPD), observed scores, and proficiency benchmarks for total English language proficiency and domain-specific proficiencies for a random sample of students. The first row of figures (Figures 5a to 5d) use the parameter estimates from Model 4 of Table 4 to illustrate total English language proficiency development for each student while Figures 5e, to 5p plot the subject-specific developmental trajectories for the three domains as estimated by Model 3 of Table C.4 for the same students. Focusing on Student 88, Figure 5a shows that 47 55606570758085 3r d 4t h 5t h 6t h 7t h (a ) T ot al S co re , ID = 88 55606570758085 3r d 4t h 5t h 6t h 7t h (b ) T o ta l S co re , ID = 9 6 55606570758085 3r d 4t h 5t h 6t h 7t h (c ) T o ta l S co re , ID = 2 4 4 55606570758085 3r d 4t h 5t h 6t h 7t h (d ) T o ta l S co re , ID = 3 4 55606570758085 3r d 4t h 5t h 6t h 7t h (e ) R ea d in g S co re , ID = 88 55606570758085 3r d 4t h 5t h 6t h 7t h (f ) R ea d in g S co re , ID = 9 6 55606570758085 3r d 4t h 5t h 6t h 7t h (g ) R ea d in g S co re , ID = 2 4 4 55606570758085 3r d 4t h 5t h 6t h 7t h (h ) R ea d in g S co re , ID = 3 4 48 55606570758085 3r d 4t h 5t h 6t h 7t h (i ) W ri ti n g S co re , ID = 88 55606570758085 3r d 4t h 5t h 6t h 7t h (j ) W ri ti n g S co re , ID = 9 6 55606570758085 3r d 4t h 5t h 6t h 7t h (k ) W ri ti n g S co re , ID = 2 4 4 55606570758085 3r d 4t h 5t h 6t h 7t h (l ) W ri ti n g S co re , ID = 3 4 55606570758085 3r d 4t h 5t h 6t h 7t h (m ) O ra l S co re , ID = 88 55606570758085 3r d 4t h 5t h 6t h 7t h (n ) O ra l S co re , ID = 9 6 55606570758085 3r d 4t h 5t h 6t h 7t h (o ) O ra l S co re , ID = 2 4 4 55606570758085 3r d 4t h 5t h 6t h 7t h (p ) O ra l S co re , ID = 3 4 F ig u re 5. F it te d A Z E L L A tr a je ct or ie s, 95 % H P D s, an d ob se rv ed sc or es fo r a ra n d om sa m p le of st u d en ts . T h e A Z E L L A to ta l sc or e fi tt ed tr a je ct or ie s ar e b as ed on M o d el 4 of T ab le 4 an d A Z E L L A su b -t es t fi tt ed tr a je ct or ie s ar e b as ed on M o d el 3 of T ab le C .4 . 49 the student met the benchmark in grade 6 which corresponded to when the model predicted Student 88 would meet the benchmark. Figure 5e shows the student met (and was predicted to meet) the reading benchmark in Grade 6. Figure 5i shows the student met (and was predicted to meet) the benchmark for writing in Grade 5. Finally, Figure 5m shows the student did not meet the benchmark for oral proficiency while classified as an English learner but the model had predicted they had. Predicting Time-to-Reclassification A series of discrete-time hazard models and shared random effects models were fit to understand whether initial status and growth in total English language proficiency improved predictions of time to reclassification. The fitted models were then compared based on their ability to accurately predict the time for a student to reclassify. For brevity, only the final hazard models and two hazard submodels from the shared random effects models were presented in Table 7. Interested readers may refer to Appendix C, Tables C.5 and C.6 for the complete results of the hazard models and shared random effects models, respectively. Hazard models. Hazard model with manifest covariates. The first model is a conventional hazard model with manifest covariates, which described the logit of the hazard of reclassification for Grades 3 through 7 conditioning on the two time-invariant dichotomous manifest covariates, SWDe and sex, and the school- level random intercept. A female English learner who was never identified with a disability had a .002 probability of reclassifying in Grade 3, α1 = −3.70, 95% HPD = [−4.70, −2.83]. The probability increased to 0.6 in Grades 4 and 5, α2 = 0.46, 95% HPD = [0.02, 0.92]; , α3 = 0.44, 95% HPD = [−0.14, 1.03]. By 50 T ab le 7. P ar am et er es ti m at es fo r th e fi n al ti m e- to -r ec la ss ifi ca ti on ha za rd m od el an d th e sh ar ed ra n do m eff ec ts ha za rd su bm od el s M o d el 1 M o d el 2 M o d el 3 M 95 % H P D M 95 % H P D M 95 % H P D α 1 [G ra d e 3] −3 .7 0 [− 4. 70 , −2 .8 3] −1 0. 59 [− 13 .1 4, −8 .3 5 ] −8 .6 6 [− 1 0. 95 , −6 .6 5 ] α 2 [G ra d e 4] 0 .4 6 [0 .0 2, 0. 92 ] −0 .1 8 [− 1 .0 9, 0 .7 6 ] 1 .3 1 [0 .2 3, 2 .4 7] α 3 [G ra d e 5 ] 0 .4 4 [− 0. 14 , 1. 03 ] 2 .9 7 [1 .8 5, 4 .2 7 ] 4 .6 7 [3 .2 1, 6 .3 2] α 4 [G ra d e 6 ] 3 .1 1 [2 .1 7, 4. 16 ] 7 .8 9 [5 .9 3, 10 .0 8] 1 0 .5 2 [8 .1 5, 1 3. 19 ] α 5 [G ra d e 7] 2 .6 6 [1 .0 3, 4. 28 ] 8 .8 0 [5 .8 5, 11 .8 9 ] 12 .1 9 [8 .8 7, 15 .7 2] α 6 [S W D e] −2 .8 2 [− 3. 65 , −2 .1 1] −8 .3 5 [− 10 .9 6, −6 .1 5 ] α 7 [M al e] −0 .2 7 [− 0. 71 , 0. 15 ] −0 .8 3 [− 2 .0 3, 0. 3 2] λ 1 [I n it ia l S ta tu s, ζ (2 ) 1 i ] 2 .1 3 [1 .0 0, 3 .3 7] 2 .1 4 [1 .0 5, 3 .3 2 ] λ 2 [L in ea r G ro w th , ζ (2 ) 2 i ] 3 .0 8 [1 .5 6, 4 .7 3] 3 .0 1 [1 .4 9, 4 .8 6 ] √ ν 0 .5 7 [0 .2 5, 1. 00 ] 0 .5 4 [0 .0 4, 1 .4 7] 0 .7 3 [0 .0 8, 1 .7 7 ] M o d el F it E st im at e (S E ) E st im at e (S E ) E st im at e (S E ) el p d p si s- lo o −2 90 .0 7 (1 4. 44 ) −1 67 2 .8 9 (3 2. 48 ) −1 63 7. 0 2 (3 1 .4 8 ) p lo o 1 6. 9 2 (1 .2 6) 13 6 .4 8 (7 .5 7) 11 4 .8 7 (5 .6 8) L O O IC 5 80 .1 5 (2 8. 87 ) 33 45 .7 8 (6 4. 96 ) 3 27 4 .0 3 (6 2. 9 6) N o te . M o d el s 2 a n d 3 a re th e h a za rd su b m o d el s fr o m M o d el s 2 a n d 3 o f T a b el C .6 . 51 middle school (Grades 6 and 7), the probability of reclassification was greater than .9, α4 = 3.11, 95% HPD = [2.17, 4.16]; α5 = 2.66, 95% HPD = [1.03, 4.28]. The mean of the posterior for SWDe was -2.82 logits, α6 = −2.82, 95% HPD = [−3.65 − 2.11]. Expressed as an odds ratio, the odds of reclassification at any grade for an English learner never identified with a disability was 16.72 times that of an English learner ever identified with a disability. The mean of the posterior for sex was -0.27 logits, α7 = −0.27, 95% HPD = [−0.71, 0.15]. The odds of reclassification at any grade for a female English learner was 1.31 times greater than for a male English learner. While the 95% HPD for sex included zero, the inclusion of both predictors improved the estimated out-of-sample predictive ability of the model, ∆LOOIC = −80.3, SE = 17.2. Using a latent response interpretation, the correlation among the latent responses of any two students from the same school was only ρ = ν ν + pi2/3 = 0.572 0.572 + pi2/3 = .09 . Figure 6 is a plot of the estimates from the hazard model with manifest covariates, illustrating the school-specific probabilities of reclassification at each grade for female and male English learners who were ever or never identified with a disability. The primary feature of the plot is the difference in the probability of reclassification between Grades 3 and 7 for English learners who were ever and never identified with a disability, regardless of sex. The probability of reclassification for those never identified with a disability between third grade and seventh grade, regardless of sex, appeared as a step function. At the end of third grade, English learners never identified with a disability had a near 0% chance of reclassification. The probability of reclassification increased to approximately 60% at the end of fourth grade, remained steady through fifth grade, increased again 52 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 Grade Pr ob ab ilit y of re cla ss ific at io n 3rd 4th 5th 6th 7th Female, Never SWD Male, Never SWD Female, Ever SWD Male, Ever SWD Figure 6. Probability of reclassification by subgroup for the hazard model with manifest covariates 53 to approximately 90% at the end of sixth grade, and remained steady through seventh grade. The probability of reclassification for English learners ever identified with a disability, regardless of sex, was nearly flat between third grade and fifth grade, starting at nearly 0% and never increasing above 10%. The probability of reclassification increased to approximately 50% at the end of sixth grade, and dropped to approximately 40% at the end of seventh grade. The fitted model parameters were used to predict whether or not reclassification would occur for observation i at time t. Using pi = .5 as a probability threshold, the hazard model with manifest covariates was 80.94% accurate when predicting if reclassification does or does not occur for the 720 observations across 277 English learners. The TPR was .76 indicating that 76% of the English learners predicted to reclassify at the end of a given grade by the hazard model with manifest covariates actually reclassified at the end of that grade. The TNR was .83 indicating that 83% of the students predicted by the model to not reclassify at the end of a given grade actually did not reclassify at the end of that grade. Figure 7 is a plot of the ROC curve for the hazard model with covariates (green solid line). The ROC curve illustrates the trade-off between true positive rate and true negative rate (1−TNR) as the probability threshold, pi, increased from 0 to 1. In the context of model comparison, the ROC curve for the hazard model with covariates provided a baseline to compare the predictive power of the hazard submodels from the shared random effects models. Shared random effects models. Both shared random effects models presented below employed the school-level model with covariates to describe AZELLA total score development. The student-specific random effects for initial status, ζ (2) 1i , and linear growth, ζ (2) 2i , were then shared with the discrete-time hazard 54 0.0 0.2 0.4 0.6 0.8 1.0 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 1 − True Negative Rate Tr u e P o si tiv e R at e Hazard model with manifest covariates Hazard submodel with latent covariates Hazard submodel with latent and manifest covariates Figure 7. ROC curves for the time-to-reclassification models 55 model for reclassification. That is, the student-specific random effects are used as latent covariates to predict time-to-reclassification. Because the primary interest was in the prediction of time to reclassification, the following results will be limited to the hazard submodels. Hazard submodel with latent covariates. The hazard submodel with latent covariates shown in Table 7 estimated the probability of reclassification conditional on a students deviation from the average third grade AZELLA total score, and their deviation from the average linear growth. A student with an average third grade AZELLA total score who made average growth had a less than .001 probability of reclassifying in Grade 3, α1 = −10.18, 95% HPD = [−13.14, −8.35]. The probability increased to .45 in Grade 4, α2 = −0.18, 95% HPD = [−1.09, −0.76]. Between Grades 5 and 7, the probability of reclassification for an English learner with an average third grade AZELLA total score who grew at an average rate was greater than .95, α3 = 2.97, 95% HPD = [1.85, 4.27]; α4 = 7.89 95% HPD = [5.93, 10.08]; α5 = 8.80, 95% HPD = [5.85, 11.89]. Every point a student earned above the average in initial status resulted in an increased likelihood of reclassification of 2.13 logits, λ1 = 2.13, 95% HPD = [1.00, 3.37]. That is, an English learner whose initial status random effect, ζ (2) 1ki, was estimated to be one —resulting in a predicted third grade AZELLA total score one point higher than average —was 8.13 times more likely to reclassify than a student with average third grade AZELLA total score. Every point above average in linear growth resulted in an increased likelihood of reclassification of 3 logits, λ2 = 3.08, 95% HPD = [1.56, 4.73]. That is, an English learner whose linear growth random 56 effect ζ (2) 2ki was estimated to be one was 21.73 times more likely to reclassify than a student who was estimated to grow at an average rate. The shared random effects model is illustrated using the submodel estimates to plot both the subject-specific total score growth trajectory and the subject- specific probability of reclassification for a single individual selected at random from the dataset. Figures 8a to 8d illustrate the average developmental trajectory (dashed line) and the subject-specific developmental trajectory (solid line) for total English proficiency for a sample of students in the dataset. Figures 8e to 8h illustrate the corresponding plot of the probability of reclassification at each grade for those students. When the growth model predicted the student to meet or exceed the benchmark at a give grade, the probability of reclassification exceeded .5. Figures 8b and 8f illustrate that as Student 96’s predicted score approached the benchmark, their probability of reclassification increased. Figures 8c and 8g illustrate the situation where the growth model incorrectly predicted Student 244 to reach the benchmark in grade 5, and thus estimated their probability of reclassification at the end of Grade 5 to be greater than .6. Using pi = .5, the hazard submodel with latent covariates predicted the time-to-reclassification with 97.22% accuracy. Both the TPR and TNR were also .97 indicating that the model accurately predicted when 97% of the student would reclassify and accurately predicted when 97% of the students would not reclassify. The predictive power of the hazard submodel with latent covariates can be compared to the hazard model with manifest covariates across the range of pi by assessing the ROC curves in Figure 7. The overall predictive power of the hazard submodel with latent covariates was greater than the hazard model with manifest 57 55606570758085 3r d 4t h 5t h 6t h 7t h (a ) T ot al S co re G ro w th , ID = 88 55606570758085 3r d 4t h 5t h 6t h 7t h (b ) T ot al S co re G ro w th , ID = 9 6 55606570758085 3r d 4t h 5t h 6t h 7t h (c ) T o ta l S co re G ro w th , ID = 2 4 4 55606570758085 3r d 4t h 5t h 6t h 7t h (d ) T o ta l S co re G ro w th , ID = 3 4 0.00.20.40.60.81.0 3r d 4t h 5t h 6t h 7t h (e ) P( R ec la ss ifi ca ti on ), ID = 88 0.00.20.40.60.81.0 3r d 4t h 5t h 6t h 7t h (f ) P( R ec la ss ifi ca ti o n ), ID = 9 6 0.00.20.40.60.81.0 3r d 4t h 5t h 6t h 7t h (g ) P( R ec la ss ifi ca ti o n ), ID = 2 4 4 0.00.20.40.60.81.0 3r d 4t h 5t h 6t h 7t h (h ) P( R ec la ss ifi ca ti o n ), ID = 3 4 F ig u re 8. T op ro w : M ar gi n al (d as h ed ) an d su b je ct -s p ec ifi c (s ol id ) A Z E L L A to ta l E n gl is h p ro fi ci en cy tr a je ct or ie s fo r a ra n d om sa m p le of st u d en ts . B ot to m ro w : P ro b ab il it y of re cl as si fi ca ti on fo r th e sa m e ra n d om sa m p le of st u d en ts . 58 covariates as indicated by the area under its ROC curve (orange dashed line). The more area under the ROC curve, the better the model’s predictive power. Hazard submodel with latent and manifest covariates. The final hazard model shown in Table 7 added the two time-invariant dichotomous manifest covariates, SWDe and sex, to the hazard submodel with latent covariates. A female English learner who was never identified with a disability who had an average third grade AZELLA total score and who made average growth had a less than a .001 probability of reclassifying in Grade 3, α1 = −8.66, 95% HPD = [10.95, −6.65]. The probability increased to .79 in Grade 4, α2 = 1.31, 95% HPD = [0.23, 2.47]. Between Grades 5 and 7, the probability of reclassification for a female English learner with an average third grade AZELLA total score who made average growth was greater than .99, α3 = 4.67, 95% HPD = [3.21, 6.32]; α4 = 10.52, 95% HPD = [8.15, 13.19]; α5 = 12.19, 95% HPD = [8.87, 15.72]. The mean of the posterior for SWDe was -8.35 logits, α6 = −8.35, 95% HPD = [21.04, −13.50]. Controlling for other covariates, the odds of reclassification at any grade for an English learner ever identified with a disability were 0.0002 times that of an English learner never identified with a disability. The mean of the posterior for sex was not different from zero, α7 = −0.83, 95% HPD = [11.80, −7.38]. Controlling for other covariates, every point above the average in initial status resulted in an increased likelihood of reclassification was 2.14 logits, λ1 = 2.14, 95% HPD = [1.05, 3.32]. That is, an English learner whose estimated initial status random effect, ζ (2) 1ki, was one —resulting in a predicted third grade AZELLA total score one point higher than average —was 8.5 times more likely to reclassify. Controlling for other covariates, every point above average in linear 59 growth resulted in an increased likelihood of reclassification of 3 logits, λ2 = 3.01, 95% HPD = [1.49, 4.86]. That is, an English learner who was estimated to have a linear growth random effect ζ (2) 2ki of one, was 20.33 times more likely to reclassify than a student who was estimated to grow at an average rate, controlling for other covariates. The correlation among the latent responses of any two students from the same school increased from the hazard model with manifest covariates to ρ = ν ν + pi2/3 = 0.732 0.732 + pi2/3 = .14. Figure 9 is a plot of the school-specific probabilities of reclassification at each grade for English learners ever or never identified with a disability who scored one standard deviation above and below average on the third grade AZELLA and who made one standard deviation above and below average linear growth. English learners who were never identified with a disability and who scored one standard deviation above average on the third grade AZELLA and whose linear growth component was one standard deviation above average growth (green solid line), on average, had a 99% change of being reclassified starting in Grade 4. English learners ever identified with a disability who scored one standard deviation above average on the third grade AZELLA and whose linear growth component was one standard deviation above average growth (green dashed line), on average, had a 30% chance of being reclassified in Grade 4, a 90% chance of being reclassified in Grade 5, and a 99% chance of being reclassified in Grades 6 and 7. English learners never identified with a disability who scored one standard deviation below average on the third grade AZELLA and whose linear growth component was one standard deviation below average growth (orange solid line), on average, had less than a 20% chance of being reclassified between Grades 3 and 5 and a 99% chance of being reclassified in Grades 6 and 7. Finally, English learners ever identified 60 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 Grade Pr ob ab ilit y of re cla ss ific at io n 3rd 4th 5th 6th 7th Never SWD, +1 SD Int & Slp Ever SWD, +1 SD Int & Slp Never SWD, −1 SD Int & Slp Ever SWD, −1 SD Int & Slp Figure 9. Probability of reclassification by subgroup for the hazard submodel with latent and manifest covariates 61 with a disability who scored one standard deviation below average on the third grade AZELLA and whose linear growth component was one standard deviation below average growth (orange dashed line), on average, never had more than a 15% chance of being reclassified by Grade 7. Using the parameters to predict time-to-reclassification, the hazard submodel with latent and manifest covariates predicted the time to reclassification for the students in the sample with 98.47% accuracy. The TPR was .99 and the TNR was .98. While the ROC curve for the hazard submodel with latent and manifest covariates shown in Figure 7 strongly mirrors the ROC curve of the hazard model with latent covariates, it captures slightly more area indicating it is a better predictive model. 62 CHAPTER IV DISCUSSION The primary purpose of this dissertation was to propose a series of statistical models for studying two measures of English learner proficiency development. First, a multilevel, multivariate mixed model was developed to illustrate how each domain of academic English proficiency developed over time, and how development between language domains were correlated. Second, a multilevel shared random effects model was developed to illustrate how heterogeneity in academic English proficiency development contributed to prediction of reclassification from English learner to reclassified fluent English proficient. Secondary purposes of the dissertation included describing how academic English proficiency developed between Grade 3 through Grade 7 for a single cohort of English learners in an Arizona school district, how English proficiency development differed based on ever or never being identified with a disability and sex, as well as characterizing the degree to which academic English proficiency development, disability status, and sex predicted time to reclassification for those students. This research contributed to the English learner literature in several ways. First, only a few studies have focused on the functional form of academic English development (e.g., Cook et al., 2008; Slama, 2012). Of these existing studies, none, to my knowledge, examined the association between language domains over time. By modeling academic English proficiency development at the domain level as a multivariate process, a more nuanced understanding of the developmental process than using total scores alone was provided. 63 Second, prior research on time-to-reclassification has studied the policy implementation without controlling for English proficiency development (Slama, 2014; Thompson, 2015; Umansky & Reardon, 2014), even though reclassification decisions are known to be based largely on measures of one’s English proficiency (Ramsey & O’Day, 2010). Explicitly incorporating English proficiency into a statistical model, however, requires a model specification that incorporates its endogenous properties (Kalbfleisch & Prentice, 2011). This study did so by jointly modeling the longitudinal English proficiency development process and reclassification process simultaneously. The joint modeling solution acknowledged that the initial status and development of English proficiency varied across students, and the variation in that process was associated with time to reclassification. Substantive Findings Describing English proficiency development. The results of the univariate analyses suggested that, on average, this sample of English learners saw their overall academic English proficiency accelerate between third grade and seventh grade. Univariate and multivariate analyses of the subtest scores indicated that this quadratic functional form was not the same across language domains, however. Although average growth for the writing and oral subdomains accelerated over time, a linear model was found to fit best for reading domain development. Furthermore, the quadratic component for oral English development was more than twice as large as the quadratic component for writing development, suggesting that, on average, English learners would meet the oral English proficiency benchmark before meeting the writing benchmark. 64 Both the univariate analysis and multivariate analysis of sub-test scores provided evidence that ever being diagnosed with a disability was negatively associated with third grade reading, writing, and oral proficiency scores. There was also evidence that English learners who had ever been diagnosed with a disability grew more slowly than English learners who were never diagnosed with a disability for the writing domain, but there was no such evidence that this was the case for reading and oral English proficiency. One way to interpret this finding is that writing, more than reading or oral proficiency, is the domain where those English learners identified with disabilities fall farthest behind. However, not knowing specific student exceptionality classifications limits the interpretability of this finding. That is, this particular sample may happen to have more English learners with exceptionalities that impact writing over other language domains. Furthermore, male English learners were expected to score lower than female English learners in third grade scores for all three sub-tests. The multivariate analysis provided estimates of how the development of a given domain correlated with development in another domain. There was a large positive association between third grade English proficiency for each domain. That is, if a student scored higher than average on the reading sub-test, they were also likely to score higher than average on both the writing and oral domain sub-tests in Grade 3. The only association found among the linear growth components of the three language domains was among reading and oral proficiency development. This is not to say other associations did not exist for these students. The relative sparseness of within-student information due to high proportions of reclassification in the early grades could have been one reason for the imprecision in the random effects variance-covariance matrix. 65 A subject-specific domain-level analysis based on a multivariate random effects model has the potential to provide states and districts with additional information to evaluate reclassification criteria. For example, Arizona used only the total score for reclassification. As Figure 5 indicated, however, some students had been reclassified to fluent English proficient without reaching proficiency in each domain. This finding suggests that some students may be reclassified prior to be fully proficient. Such an analysis can also be used to evaluate more stringent reclassification criteria. For example, California requires, among other criteria, that English learners reach proficiency in all domains to be reclassified. Subject-specific trajectories may be used to evaluate the probability that a specific student will meet the proficiency benchmark between the time the test was taken and the end of the year. Such an analysis can indicate if too many students are being retained as English learners when they would likely be fluent English proficient and could be reclassified sooner. Predicting time-to-reclassification. The final discrete-time hazard model predicted time to reclassification based on one’s ever being diagnosed with a disability and sex, and was able to predict the time of a students reclassification with 80% accuracy. The shared random effects model that used one’s deviation from the average third grade English proficiency scores and their deviation from average English proficiency growth was able to predict the time of a students reclassification with 97% accuracy. The predictive accuracy of the final shared random effects model with disability status, sex and random effects as predictors improved classification accuracy to 98%. While the predictive accuracy of the discrete-time hazard model met most conventional criteria for diagnostic models 66 (Steyerberg et al., 2010), using the information from one’s English proficiency was exceptionally accurate in this sample. This finding suggests that for this particular cohort and district, the district adhered closely to Arizona’s reclassification policies that required a student to be reclassified when his or her total English proficiency score reached the benchmark. This particular model, using total English proficiency scores, may need to be altered for those states that use a wider range of reclassification criteria. Furthermore, the predictive accuracy would likely be lower for districts that do not adhere as closely to the reclassification policies, or for states that make reclassification decisions based on additional criteria (e.g., teacher recommendations). Although this study used the shared random effect models for prediction, the statistical framework is also useful for descriptive and inferential analyses of reclassification. To model time to reclassification using survival analysis without including students’ English proficiency development ignores the direct connection between English proficiency development and reclassification. Not controlling for English proficiency in a time-to-reclassification model suggests that any two students who are equal on all covariates being controlled for in the model are equally likely to be reclassified at time t, regardless of their unique English proficiency development. Furthermore, not controlling for English proficiency can bias the estimates of other covariates that may be of interest. For example, in this study, the SWDe parameter from the hazard model with manifest covariates differed in size and interpretation from the SWDe parameter from the hazard submodel that also included latent covariates. 67 Limitations The first set of limitations for this study pertained to the research design. The primary limitation is that the data used in the analysis were from a single cohort from a single district in one state. Because Arizona’s English learner policies are unique, these findings may not be generalizable to other locales, testing systems, or policy contexts. Furthermore, the data came from a single Arizona district and is unlikely to be representative of other districts or the entire state. With only a single cohort used in the analysis, the results presented here may be subject to differences from one cohort to another. Finally, because limited testing occurs after a student reclassifies, alternative functional forms for language development based on the time-of-reclassification could not be adequately assessed. A related limitation is that this research spanned Grades 3 through 7, but all prior research on reclassification had used Kindergarten as the first measurement occasion. Had this research extended back to Kindergarten, there would have likely been greater within-student information that could have warranted the application of a broader range of growth models. Furthermore, this study used a strict cohort design where those students who entered the school district after Grade 3 were excluded. This decision was made because late-arriving students lacked information about prior schooling which could have biased hazard model parameter estimates (Guo, 1993). These decisions limit interpretation of parameter estimates in the current study —rather than applying to all English learners in the district, study findings do not apply to those English learners who reclassify prior to, or enter the district after Grade 3. Yet another study limitation pertains to the measures of English language proficiency analyzed. It is important to note that all analyses used observed 68 AZELLA scores, which contained error. Modeling the observed score results in a residual that contains both measurement error and the deviation in the true score from the fitted trend line (Skrondal & Rabe-Hesketh, 2004). Furthermore, this study used oral English proficiency rather than separate measures for speaking and listening. Using the speaking and listening measures in the multivariate growth model would have provided a more complete description of academic English development. The relatively small number of schools, and a small number of students within some schools, also limited the precision of the school-level variance components. That said, a noisy estimate, particularly within a Bayesian framework, results in uncertainty, not bias. Regarding variance components, the residual variances for each submodel of the multivariate mixed effect models were assumed independent. Future research could test this independence assumption by specifying more complex variance-covariance structures for the residuals. Another limitation, common in most educational research, surrounds the reliability and validity of the free/reduced lunch (FRL) and the student with disabilities status (SWD) variables. The FRL covariate is a dichotomous proxy for the more complex construct of economic status. Furthermore, the actual indicator is time-varying in that a student may qualify one year and not the next. The time-invariant version used in this study, where any student who ever received free/reduced lunch is defined as FRL assumed that those students who received the benefit in any year are representative of the FRL group overall. A similar approach was used for students who ever received services due to their disability status. Like FRL, this assumes students who ever received services are more like those who had received services throughout their schooling. This 69 indicator also did not indicate that those who are categorized as SWD in the district testing system actually receive services for that disability. Finally, limitations of this research include the possibility that the assumption pertaining to the missing data mechanism were incorrect, and the potential omission of other variables that may be related to proficiency development and reclassification. Regarding missing data, subsequent analysis should assess the sensitivity of the parameter estimates to competing missing data assumptions (Xu & Blozis, 2011). As for potential omitted variables, generational status and information about the instructional program and its implementation are two in particular. Slama (2012) found that the average English development of foreign-born English learners differed from US-born English learners. As for program information, Hakuta (2011) pointed out, “[t]here are well implemented and poorly implemented programs...” (p.166). While all students in this sample were enrolled in an SEI program, the implementation of the program likely varied from one school to another which may have impacted rates of English proficiency development. Exploration of program instructional design and fidelity of program implementation are important issues for future study. Conclusions and Future Directions Understanding how academic English proficiency develops and its role in time-to-reclassification is critical for the development of effective policies and programs for the improvement of outcomes for English learners. This study fit a multivariate growth model to estimate the association between reading, writing and oral English proficiency development for a single cohort of students in one Arizona school district. It also fit a shared random effects model to estimate the association between academic English development and time to reclassification. Modeling 70 academic English language development and reclassification simultaneously enabled a more nuanced understanding of the time required to attain English proficiency. While the data used in this study cannot support causal inferences, nor can the results be broadly generalized, the models presented in this dissertation provide an analytical foundation for future research aimed at prediction, description, and testing the development of academic English proficiency and reclassification. An avenue for future research is to extend and assess the utility of the shared random effect model for states that use multiple reclassification criteria. Future research can also focus on the role of school-level variation in academic English development and reclassification. For example, incorporating fidelity measures at the classroom/school level may better estimate how program quality associates with academic English development and time-to-reclassification. Additionally, future research could extend the multilevel shared parameter model to test how school-level random effects for academic English development impact time-to-reclassification. A new requirement in the recent ESSA legislation requires schools to report disaggregated data for English learners who are also students with disabilities. Findings from this dissertation indicate that this specific subgroup may be particularly challenged when it comes to attaining academic English proficiency. Future research should continue to analyze the English proficiency development for English learners with disabilities, and when possible, by specific exceptionality category. This study demonstrated that the inclusion of academic English proficiency in the survival process resulted in improved predictions of time to reclassification. There has been much work related to SREM for dynamic participant-specific 71 predictions in the medical literature (Fieuws et al., 2008; Rizopoulos, 2011; Rizopoulos, Hatfield, Carlin, & Takkenberg, 2014; Yu, Taylor, & Sandler, 2008). These extensions can be used to develop an early warning system to identify those students at risk of becoming long-term English learners and which students may be in need of additional language support. Such a system would provide practitioners with information to aid decisions regarding resource allocation for English learners. Finally, within the English learner research there has been much interest in understanding the quality of reclassification criteria (e.g., Robinson, 2011). A joint modeling approach could be adopted to acknowledge that not all English learners reach the reclassification threshold in the same way, and how they get there may provide valuable information for evaluating reclassification criteria. To that end, a joint modeling approach could also be used to better understand which students succeed and which struggle academically after reclassification. The dynamic nature of English learner populations and the complex policy surrounding their academic achievement requires researchers to utilize more complex analytical frameworks. Joint models are still in their relative infancy and this study sought to illustrate their use for studying English language development and reclassification. With continued research into their strengths and limitations, such models may provide key insights into those policies and programs that help English learners succeed. 72 APPENDIX A MODEL SPECIFICATION This section provides the details of the models estimated to answer Research Questions 1, 2 and 3. I first review those models used to answer Research Question 1, followed by the models used for Research Question 2, and conclude with the models used for Research Question 3. Univariate AZELLA Models Research Question 1 fit a series of mixed models, some that estimate only student-specific random effects and others that estimate school- and student- specific random effects. For the student-specific random effects models, let there be i = 1, 2, . . . , n students and j = 1, 2, . . . , nj longitudinal measures for student j. Define yj as a nj × 1 vector containing the longitudinal measures for student j. I can then specify the mixed model generally using the notation of Skrondal and Rabe-Hesketh (2004), yj = Xjβ + Zjζj + j (A.1) where ij ∼ N (0, σ), ζj ∼ N (0, T) and T =  τ11 τ12 · · · τ1q τ21 τ22 · · · τ2q ... ... . . . ... τq1 τq2 · · · τqq  . (A.2) For Equation A.1, Xj is a known nj × p design matrix corresponding to the p × 1 vector of fixed effects β, Zj is a known nj × q design matrix corresponding to the q × 1 vector of random effects, ζj, and j is an nj × 1 vector of residuals. Extending Equation A.1 to include school-specific random effects requires extended notation. Let there be k = 1, 2, . . . , K schools, j = 1, 2, . . . , nk students 73 in school k, and i = 1, 2, . . . , njk longitudinal measures for student j in school k. Further, let Nk = ∑nk j=1 njk be the total number of measures in school k. Define yjk as a Nk × 1 vector containing the longitudinal measures for student j in school k and yk ≡ [ y′1k, . . . ,y ′ nkk ]′ as an Nk × 1 vector containing all observations in school k. I can then specify the three-level mixed model for school k generally as: yk = Xkβ + Zkζk + k (A.3) where Xk is a known Nk × p design matrix corresponding to the p × 1 vector of fixed effects β. Because this is a so-called three-level model, Zk ≡[ Z (3) k , diag [ Z (2) 1k , . . . ,Z (2) nkk ]] and ζk ≡ [ ζ (3)′ k , ζ (2)′ 1k , . . . , ζ (2)′ nkk ]′ , where Z (3) k is a known Nk×q(3) design matrix corresponding to the q(3)×1 vector of school-specific random effects, ζ (3) k , and Z (2) jk is a known njk×q(2) design matrix corresponding to the q(2)×1 vector of student-specific random effects, ζ (2) jk . For clarity, diag [ Z (2) k1 , . . . ,Z (2) nkk ] is a matrix of size Nk × (nk q(2)). This notation is essentially equivalent to the 3-level mixed model notation presented by Hedeker and Gibbons (2006). Furthermore, we assume jk ∼ N (0, σI), ζ(2)jk ∼ N (0, T(2)), and ζ(3)k ∼ N (0, T(3)), where T(2) =  τ (2) 11 τ (2) 12 · · · τ (2)1q(2) τ (2) 21 τ (2) 22 · · · τ (2)2q(2) ... ... . . . ... τ (2) q(2)1 τ (2) q(2)2 · · · τ (2) q(2)q(2)  and T(3) =  τ (3) 11 τ (3) 12 · · · τ (3)1q(3) τ (3) 21 τ (3) 22 · · · τ (3)2q(3) ... ... . . . ... τ (3) q(3)1 τ (3) q(3)2 · · · τ (3) q(3)q(3)  . (A.4) For subsequent notation, the two level model is notated using Z (2) j and ζ (2) j such that Equation A.1 becomes yj = Xjβ + Z (2) j ζ (2) j + j (A.5) For the prior specification, let T be either T(2) or T(3) and ζj be the corresponding vector of random effects. Define diag(T) ≡ D where D is a 74 vector of standard deviations for the student-specific random effects. Next, let R = D−1T (D−1)′ be the correlation matrix associated with the variance-covariance matrix T. Lastly, let [diag(D)S′] zi = ζi where S′ is the Cholesky decomposition of the correlation matrix R. By doing this, I was able to specify priors for the specified priors for the uncorrelated standard normal random effects, zi, standard deviations, D, and correlation matrix R rather than the variance covariance matrix T directly. AZELLA total score models. AZELLA total score model 1. yT ij = βT1 + ζ (2) Tj1 + ( βT2 + ζ (2) Tj2 ) GRADEij + T ij p(D) ∼ U(0, ∞), p(R) ∼ LKJ (1.5), p(βT1) ∼ U(−∞, ∞), p(βT2) ∼ U(−∞, ∞), p(z (2) j ) ∼ N (0, I), p(σT ijk) ∼ U(0, ∞). where U(a, b) is a uniform distribution over the support x ∈ [a, b], LKJ (η) is an LKJ correlation distribution where η ∈ [0,∞], N (µ, σ) is a normal distribution where µ ∈ R and σ2 ∈ (0,∞], and I is the identity matrix. 75 AZELLA total score model 2. yT ij = βT1 + ζ (2) Tj1 + ( βT2 + ζ (2) Tj2 ) GRADEij + βT3 GRADE 2 ij + T ij p(D) ∼ U(0, ∞), p(R) ∼ LKJ (1.5), p(βT1) ∼ U(−∞, ∞), p(βT2) ∼ U(−∞, ∞), p(βT3) ∼ U(−∞, ∞), p(z (2) j ) ∼ N (0, I), p(σT ijk) ∼ U(0, ∞). AZELLA total score model 3. yT ijk = βT1 + ζ (2) Tjk1 + ζ (3) Tk1 +( βT2 + ζ (2) Tjk2 + ζ (3) Tk2 ) GRADEijk + βT3 GRADE 2 ijk + T ijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βT1) ∼ U(−∞, ∞), p(βT2) ∼ U(−∞, ∞), p(βT3) ∼ U(−∞, ∞), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σT ijk) ∼ U(0, ∞). 76 AZELLA total score model 4. yT ijk = βT1 + ζ (2) Tjk1 + ζ (3) Tk1 + βT4 SWDejk + βT6 MALEjk +( βT2 + ζ (2) Tjk2 + ζ (3) Tk2 + βT5 SWDejk ) GRADEijk + βT3 GRADE 2 ijk + T ijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βT1) ∼ U(−∞, ∞), p(βT2) ∼ U(−∞, ∞), p(βT3) ∼ U(−∞, ∞), p(βT4) ∼ U(−∞, ∞), p(βT5) ∼ U(−∞, ∞), p(βT6) ∼ U(−∞, ∞), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σT ijk) ∼ U(0, ∞). AZELLA total score model 5. yT ijk = βT1 + ζ (2) Tjk1 + ζ (3) Tk1 + βT4 SWDejk + βT6 MALEjk +( βT2 + ζ (2) Tjk2 + ζ (3) Tk2 + βT5 SWDejk ) GRADEijk + βT3 GRADE 2 ijk + T ijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βT1) ∼ N (60, 10), p(βT2) ∼ N (3, 6), p(βT3) ∼ N (0, 5), p(βT4) ∼ N (0, 2.5), p(βT5) ∼ N (0, 5), p(βT6) ∼ N (0, 2.5), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σT ijk) ∼ U(0, ∞). AZELLA reading score models. 77 AZELLA reading score model 1. yRij = βR1 + ζ (2) Rj1 + ( βT2 + ζ (2) Rj2 ) GRADEij + Rij p(D) ∼ U(0, ∞), p(R) ∼ LKJ (1.5), p(βR1) ∼ U(−∞, ∞), p(βR2) ∼ U(−∞, ∞), p(z (2) j ) ∼ N (0, I), p(σRij) ∼ U(0, ∞). AZELLA reading score model 2. yRij = βR1 + ζ (2) Rj1 + ( βR2 + ζ (2) Rj2 ) GRADEij + βR3 GRADE 2 ij + Rij p(D) ∼ U(0, ∞), p(R) ∼ LKJ (1.5), p(βR1) ∼ U(−∞, ∞), p(βR2) ∼ U(−∞, ∞), p(βR3) ∼ U(−∞, ∞), p(z (2) j ) ∼ N (0, I), p(σRij) ∼ U(0, ∞). 78 AZELLA reading score model 3. yRijk = βR1 + ζ (2) Rjk1 + ζ (3) R1k +( βR2 + ζ (2) Rjk2 + ζ (3) Rk2 ) GRADEijk + Rijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βR1) ∼ U(−∞, ∞), p(βR2) ∼ U(−∞, ∞), p(βR3) ∼ U(−∞, ∞), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σRijk) ∼ U(0, ∞). AZELLA reading score model 4. yRij = βR1 + ζ (2) Rjk1 + ζ (3) Rk1 + βR4 SWDejk + βR6 MALEjk +( βR2 + ζ (2) Rjk2 + ζ (3) Rk2 + βR5 SWDejk ) GRADEijk + Rijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βR1) ∼ U(−∞, ∞), p(βR2) ∼ U(−∞, ∞), p(βR3) ∼ U(−∞, ∞), p(βR4) ∼ U(−∞, ∞), p(βR5) ∼ U(−∞, ∞), p(βR6) ∼ U(−∞, ∞), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σRijk) ∼ U(0, ∞). 79 AZELLA reading score model 5. yRij = βR1 + ζ (2) Rjk1 + ζ (3) Rk1 + βR4 SWDejk + βR6 MALEjk +( βR2 + ζ (2) Rjk2 + ζ (3) Rk2 + βR5 SWDejk ) GRADEijk + Rijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βR1) ∼ N (60, 10), p(βR2) ∼ N (3, 6), p(βR3) ∼ N (0, 5), p(βR4) ∼ N (0, 2.5), p(βR5) ∼ N (0, 5), p(βR6) ∼ N (0, 2.5), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σRijk) ∼ U(0, ∞). AZELLA writing score models. AZELLA writing score model 1. yWij = βW1 + ζ (2) Wj1 + ( βW2 + ζ (2) Wj2 ) GRADEij + Wij p(D) ∼ U(0, ∞), p(R) ∼ LKJ (1.5), p(βW1) ∼ U(−∞, ∞), p(βW2) ∼ U(−∞, ∞), p(z (2) j ) ∼ N (0, I), p(σWij) ∼ U(0, ∞). 80 AZELLA writing score model 2. yWij = βW1 + ζ (2) Wj1 + ( βW2 + ζ (2) Wj2 ) GRADEij + βW3 GRADE 2 ij + Wij p(D) ∼ U(0, ∞), p(R) ∼ LKJ (1.5), p(βW1) ∼ U(−∞, ∞), p(βW2) ∼ U(−∞, ∞), p(βW3) ∼ U(−∞, ∞), p(z (2) j ) ∼ N (0, I), p(σWij) ∼ U(0, ∞). AZELLA writing score model 3. yWijk = βW1 + ζ (2) Wjk1 + ζ (3) Wk1 +( βW2 + ζ (2) Wjk2 + ζ (3) Wk2 ) GRADEijk + βW3 GRADE 2 ijk + Wijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5) p(R(3)) ∼ LKJ (1.5), p(βW1) ∼ U(−∞, ∞), p(βW2) ∼ U(−∞, ∞), p(βW3) ∼ U(−∞, ∞), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σWijk) ∼ U(0, ∞). 81 AZELLA writing score model 4. yWijk = βW1 + ζ (2) Wjk1 + ζ (3) Wk1 + βW4 SWDejk + βW6 MALEjk +( βW2 + ζ (2) Wjk2 + ζ (3) Wk2 + βW5 SWDejk ) GRADEijk + βW3 GRADE 2 ijk + Wijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βW1) ∼ U(−∞, ∞), p(βW2) ∼ U(−∞, ∞), p(βW3) ∼ U(−∞, ∞), p(βW4) ∼ U(−∞, ∞), p(βW5) ∼ U(−∞, ∞), p(βW6) ∼ U(−∞, ∞), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σWijk) ∼ U(0, ∞). AZELLA writing score model 5. yWijk = βW1 + ζ (2) Wjk1 + ζ (3) Wk1 + βW4 SWDejk + βW6 MALEjk +( βW2 + ζ (2) Wjk2 + ζ (3) Wk2 + βW5 SWDejk ) GRADEijk + βW3 GRADE 2 ijk + Wijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βW1) ∼ N (60, 10), p(βW2) ∼ N (3, 6), p(βW3) ∼ N (0, 5), p(βW4) ∼ N (0, 2.5), p(βW5) ∼ N (0, 5), p(βW6) ∼ N (0, 2.5), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σWijk) ∼ U(0, ∞). AZELLA oral score models. 82 AZELLA oral score model 1. yOij = βO1 + ζ (2) Oj1 + ( βO2 + ζ (2) Oj2 ) GRADEij + Oij p(D) ∼ U(0, ∞), p(R) ∼ LKJ (1.5), p(βO1) ∼ U(−∞, ∞), p(βO2) ∼ U(−∞, ∞), p(z (2) j ) ∼ N (0, I), p(σOij) ∼ U(0, ∞). AZELLA oral score model 2. yOij = βO1 + ζ (2) Oj1 + ( βO2 + ζ (2) Oj2 ) GRADEij + βO3 GRADE 2 ij + Oij p(D) ∼ U(0, ∞), p(R) ∼ LKJ (1.5), p(βO1) ∼ U(−∞, ∞), p(βO2) ∼ U(−∞, ∞), p(βO3) ∼ U(−∞, ∞), p(z (2) j ) ∼ N (0, I), p(σOij) ∼ U(0, ∞). 83 AZELLA oral score model 3. yOijk = βO1 + ζ (2) Ojk1 + ζ (3) Ok1 +( βO2 + ζ (2) Ojk2 + ζ (3) Ok2 ) GRADEijk + βO3 GRADE 2 ijk + Oijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βO1) ∼ U(−∞, ∞), p(βO2) ∼ U(−∞, ∞), p(βO3) ∼ U(−∞, ∞), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σOijk) ∼ U(0, ∞). AZELLA oral score model 4. yOijk = βO1 + ζ (2) Ojk1 + ζ (3) Ok1 + βO4 SWDejk + βO6 MALEjk +( βO2 + ζ (2) Ojk2 + ζ (3) Ok2 + βO5 SWDejk ) GRADEijk + βO3 GRADE 2 ijk + Oijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βO1) ∼ U(−∞, ∞), p(βO2) ∼ U(−∞, ∞), p(βO3) ∼ U(−∞, ∞), p(βO4) ∼ U(−∞, ∞), p(βO5) ∼ U(−∞, ∞), p(βO6) ∼ U(−∞, ∞), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σOijk) ∼ U(0, ∞) 84 AZELLA oral score model 5. yOijk = βO1 + ζ (2) Ojk1 + ζ (3) Ok1 + βO4 SWDejk + βO6 MALEjk +( βO2 + ζ (2) Ojk2 + ζ (3) Ok2 + βO5 SWDejk ) GRADEijk + βO3 GRADE 2 ijk + Oijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βO1) ∼ N (60, 10), p(βO2) ∼ N (3, 6), p(βO3) ∼ N (0, 5), p(βO4) ∼ N (0, 2.5), p(βO5) ∼ N (0, 5), p(βO6) ∼ N (0, 2.5), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σOijk) ∼ U(0, ∞). Multivariate AZELLA Growth Models Research Question 2 extended Equations A.3 to model the multiple English language proficiency sub-tests simultaneously. Let there be g = 1, 2, . . . ,m outcomes, each measured longitudinally, such that ygjk is a vector of length ngjk that contains the longitudinal measures of outcome g for student j in school k and ygk is a length Ngk = ∑ngk j=1 ngjk vector that contain all observations for outcome g in school k. Next, let yMjk ≡ [ y′1jk, . . . ,y ′ mjk ]′ be the length nMjk = ∑m g=1 ngjk vector that contains all outcomes measured longitudinally for subject j. Finally, let yMk ≡ [y′1k, . . . ,y′mk]′ be the length NMk = ∑m g=1 ∑ngk j=1 ngjk vector of all observations for all outcomes for school k. I can express the three-level multivariate mixed model generally by expanding Equation A.3, yMk = XMkβM + ZMkζMk + Mk (A.6) 85 where XMk ≡ diag [X1k, . . . ,Xmk] is a known NMk × p design matrix corresponding to the p × 1 vector of fixed effects βM ≡ [β′1, . . . ,β′m]′. Define ZMk ≡ diag [Z1k, . . . ,Zmk] and ζM ≡ [ζ ′1, . . . , ζ ′m]′ where Zgk ≡[ Z (3) gk , diag [ Z (2) g1k, . . . ,Z (2) gnkk ]] and ζgk ≡ [ ζ (3)′ gk , ζ (2)′ gk1 , . . . , ζ (2)′ gnkk ]′ . Following Equation A.3, Z (3) gk is an Ngk × q(3)g design matrix corresponding to the school- specific random effects for outcome g, ζ (3) gk and Z (2) gjk is an ngjk × q(2)g design matrix corresponding to the q (2) g × 1 vector of random effects for student j in school k for outcome g, ζ (2) gjk. Given this notation, ζ (2) Mjk ≡ [ ζ (2)′ 1jk , . . . , ζ (2)′ mjk ]′ is the vector of student-specific random effects for all outcomes for student j in school k and ζ (3) Mk ≡ [ ζ (3)′ 1k , . . . , ζ (3)′ mk ]′ is the vector of school-specific random effects for all outcomes for school k. Finally, Mk is the NMk × 1 vector of residuals. I assume gijk are independent N (0, σg), ζ(2)Mki ∼ N (0, T(2)), ζ(3)Mk ∼ N (0, T(3)), where T(2) and T(3) are defined by Equation A.4. Using this generalization, D R, zjk, and zk have the same definitions as in Research Question 1. 86 Multivariate AZELLA model 1. yRijk = βR1 + ζ (2) Rjk1 + ζ (3) Rk1 + ( βR2 + ζ (2) Rjk1 + ζ (3) Rk2 ) GRADEijk + Rijk yWijk = βW1 + ζ (2) Wjk1 + ζ (3) Wk1 + ( βW2 + ζ (2) Wjk1 + ζ (3) Wk2 ) GRADEijk + βW3 GRADE 2 ijk + Wijk yOijk = βO1 + ζ (2) Ojk1 + ζ (3) Ok1 + ( βO2 + ζ (2) Ojk1 + ζ (3) Ok2 ) GRADEijk + βO3 GRADE 2 ijk + Oijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βR1) ∼ U(−∞, ∞), p(βR2) ∼ U(−∞, ∞), p(βW1) ∼ U(−∞, ∞), p(βW2) ∼ U(−∞, ∞), p(βW3) ∼ U(−∞, ∞), p(βO1) ∼ U(−∞, ∞), p(βO2) ∼ U(−∞, ∞), p(βO3) ∼ U(−∞, ∞), p(z (2) Rjk) ∼ N (0, I), p(z(3)Rk) ∼ N (0, I), p(z (2) Wjk) ∼ N (0, I), p(z(3)Wk) ∼ N (0, I), p(z (2) Ojk) ∼ N (0, I), p(z(3)Ok) ∼ N (0, I), p(σRijk) ∼ U(0, ∞), p(σWijk) ∼ U(0, ∞), p(σOijk) ∼ U(0, ∞). It is important to note that [ζ (2) Rjk1, ζ (2) Rjk2, ζ (2) Wjk2, ζ (2) Wjk2, ζ (2) Ojk1, ζ (2) Ojk2] and [ζ (3) Rjk1, ζ (3) Rjk2, ζ (3) Wjk2, ζ (3) Wjk2, ζ (3) Ojk1, ζ (3) Ojk2] are both assumed to be distributed multivariate normal with mean zero and 6× 6 unstructured covariance matrices T(2) and T(3). 87 Multivariate AZELLA model 2. yRijk = βR1 + ζ (2) Rjk1 + ζ (3) Rk1 + βR4 SWDejk + βR6 MALEjk +( βR2 + ζ (2) Rjk2 + ζ (3) Rk2 + βR5 SWDejk ) GRADEijk + Rijk yWijk = βW1 + ζ (2) Wjk1 + ζ (3) Wk1 + βW4 SWDejk + βW6 MALEjk +( βW2 + ζ (2) Wjk2 + ζ (3) Wk2 + βW5 SWDejk ) GRADEijk + βW3 GRADE 2 ijk + Wijk yOijk = βO1 + ζ (2) Ojk1 + ζ (3) Ok1 + βO4 SWDejk + βO6 MALEjk +( βO2 + ζ (2) Ojk2 + ζ (3) Ok2 + βO5 SWDejk ) GRADEijk + βO3 GRADE 2 ijk + Oijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βR1) ∼ U(−∞, ∞), p(βR2) ∼ U(−∞, ∞), p(βR4) ∼ U(−∞, ∞), p(βR5) ∼ U(−∞, ∞), p(βR6) ∼ U(−∞, ∞), p(βW1) ∼ U(−∞, ∞), p(βW2) ∼ U(−∞, ∞), p(βW3) ∼ U(−∞, ∞), p(βW4) ∼ U(−∞, ∞), p(βW5) ∼ U(−∞, ∞), p(βW6) ∼ U(−∞, ∞), p(βO1) ∼ U(−∞, ∞), p(βO2) ∼ U(−∞, ∞), p(βO3) ∼ U(−∞, ∞), p(βO4) ∼ U(−∞, ∞), p(βO5) ∼ U(−∞, ∞), p(βO6) ∼ U(−∞, ∞), p(z (2) Rjk) ∼ N (0, I), p(z(3)Rk) ∼ N (0, I), p(z (2) Wjk) ∼ N (0, I), p(z(3)Wk) ∼ N (0, I), p(z (2) Ojk) ∼ N (0, I), p(z(3)Ok) ∼ N (0, I), p(σRijk) ∼ U(0, ∞), p(σWijk) ∼ U(0, ∞), p(σOijk) ∼ U(0, ∞). 88 Multivariate AZELLA model 3. yRijk = βR1 + ζ (2) Rjk1 + ζ (3) Rk1 + βR4 SWDejk + βR6 MALEjk +( βR2 + ζ (2) Rjk2 + ζ (3) Rk2 + βR5 SWDejk ) GRADEijk + Rijk yWijk = βW1 + ζ (2) Wjk1 + ζ (3) Wk1 + βW4 SWDejk + βW6 MALEjk +( βW2 + ζ (2) Wjk2 + ζ (3) Wk2 + βW5 SWDejk ) GRADEijk + βW3 GRADE 2 ijk + Wijk yOijk = βO1 + ζ (2) Ojk1 + ζ (3) Ok1 + βO4 SWDejk + βO6 MALEjk +( βO2 + ζ (2) Ojk2 + ζ (3) Ok2 + βO5 SWDejk ) GRADEijk + βO3 GRADE 2 ijk + Oijk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βR1) ∼ N (60, 10), p(βR2) ∼ N (3, 6), p(βR4) ∼ N (0, 5), p(βR5) ∼ N (0, 5), p(βR6) ∼ N (0, 2.5), p(βW1) ∼ N (60, 10), p(βW2) ∼ N (3, 6), p(βW3) ∼ N (0, 2.5), p(βW4) ∼ N (0, 5), p(βW5) ∼ N (0, 5), p(βW6) ∼ N (0, 2.5), p(βO1) ∼ N (60, 10), p(βO2) ∼ N (3, 6), p(βO3) ∼ N (0, 2.5), p(βO4) ∼ N (0, 5), p(βO5) ∼ N (0, 5), p(βO6) ∼ N (0, 2.5), p(z (2) Rjk) ∼ N (0, I), p(z(3)Rk) ∼ N (0, I), p(z (2) Wjk) ∼ N (0, I), p(z(3)Wk) ∼ N (0, I), p(z (2) Ojk) ∼ N (0, I), p(z(3)Ok) ∼ N (0, I), p(σRijk) ∼ U(0, ∞), p(σWijk) ∼ U(0, ∞), p(σOijk) ∼ U(0, ∞). 89 Time-to-Reclassification Models Research Question 3 estimated a series of discrete-time hazard models and a series of shared random effects models. Discrete-time hazard models. Let there be k = 1, 2, . . . , K schools, and j = 1, 2, . . . , nk students in school k. Event occurrence can only happen at discrete time points, t = 1, 2, . . . , T with a student’s survival time is indicated by Tjk = t. Define htjk = P[tjk = t|tjk ≥ t] as the probability of reclassification for student j in school k at time t given reclassification has not occurred prior to time t. Those students who do not experience the event while under observation are considered censored. Using grouped-time survival parametrization (Allison, 1982; D’Agostino et al., 1990; Hedeker, Siddiqui, & Hu, 2000; Singer & Willett, 1993), each student contributes a tjk × 1 vector of of dichotomous indicators of event status for each discrete time point. Students who experience reclassification at time tjk will have tjk − 1 zeros followed by a one indicating event occurrence. Students who are censored have a tjk × 1 vector of zeros. Using the logit link function, I can express log odds of reclassification for student j at school k at time t as log [htjk/1− htjk] = logit(htjk) = w′tjkα+ νk (A.7) and the probability or reclassification as 1/ (1 + exp [−logit(htjk)]) = 1/ ( 1 + exp [−(w′tjkα+ νk)]) (A.8) where α is a p × 1 vector of fixed effects that contains T intercept terms for the T discrete time points and additional covariates, and wtjk, a p × 1 known vector of fixed effects. Because students are nested in schools, νk is a school-level random intercept assumed to be normally distributed with mean zero and variance ψ. 90 Hazard model 1. logit(htjk) =α1GRADE3tjk + α2GRADE4tjk + α3GRADE5tjk + α4GRADE6tjk + α5GRADE7tjk p(α1) ∼ U(−∞, ∞), p(α2) ∼ U(−∞, ∞), p(α3) ∼ U(−∞, ∞), p(α4) ∼ U(−∞, ∞), p(α5) ∼ U(−∞, ∞). Hazard model 2. logit(htjk) =α1GRADE3tjk + α2GRADE4tjk + α3GRADE5tjk + α4GRADE6tjk + α5GRADE7tjk + νk p(α1) ∼ U(−∞, ∞), p(α2) ∼ U(−∞, ∞), p(α3) ∼ U(−∞, ∞), p(α4) ∼ U(−∞, ∞), p(α5) ∼ U(−∞, ∞), p(ψ) ∼ U(0, ∞). Hazard model 3. logit(htjk) =α1GRADE3tjk + α2GRADE4tjk + α3GRADE5tjk + α4GRADE6tjk + α5GRADE7tjk + α6SWDetjk + α6MALEtjk + νk p(α1) ∼ U(−∞, ∞), p(α2) ∼ U(−∞, ∞), p(α3) ∼ U(−∞, ∞), p(α4) ∼ U(−∞, ∞), p(α5) ∼ U(−∞, ∞), p(α6) ∼ U(−∞, ∞), p(α7) ∼ U(−∞, ∞), p(ψ) ∼ U(0, ∞). Shared random effects models. The final set of models combines Equations A.3 and A.8 and links them by specifying the student-specific random 91 effects estimated by Equations A.3 as latent covariates in Equation A.8. yjk = Xjkβ + Zjkζjk + jk htjk = w ′ tjkα+ ζ (2)′ jk λ+ νk Shared random effects model 1. Note that the hazard submodel estimates are also presented in Table 7 as Model 2. yT ijk = βT1 + ζ (2) Tjk1 + ζ (3) Tk1 + βT4 SWDejk + βT6 MALEjk +( βT2 + ζ (2) Tjk2 + ζ (3) Tk2 + βT5 SWDejk ) GRADEijk + βT3 GRADE 2 ijk + Tijk logit(htjk) =α1GRADE3tjk + α2GRADE4tjk + α3GRADE5tjk + α4GRADE6tjk+ α5GRADE7tjk + ζ (2) jk1 λ1 + ζ (2) jk2 λ2 + νk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βT1) ∼ N (60, 10), p(βT2) ∼ N (3, 6), p(βT3) ∼ N (0, 2.5), p(βT4) ∼ N (0, 5), p(βT5) ∼ N (0, 2.5), p(βT6) ∼ N (0, 5), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σT ) ∼ U(0, ∞), p(α1) ∼ N (0, 10), p(α2) ∼ N (0, 10), p(α3) ∼ N (0, 10), p(α4) ∼ N (0, 10), p(α5) ∼ N (0, 10), p(λ1) ∼ N (5, 2), p(λ2) ∼ N (5, 2), p(ψ) ∼ U(0, ∞). 92 Shared random effects model 2. Note that the hazard submodel estimates are also presented in Table 7 as Model 2. yT ijk = βT1 + ζ (2) Tjk1 + ζ (3) Tk1 + βT4 SWDejk + βT6 MALEjk +( βT2 + ζ (2) Tjk2 + ζ (3) Tk2 + βT5 SWDejk ) GRADEijk + βT3 GRADE 2 ijk + T ijk logit(htjk) =α1GRADE3tjk + α2GRADE4tjk + α3GRADE5tjk + α4GRADE6tjk+ α5GRADE7tjk + α6SWDejk + α6MALEjk + νk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βT1) ∼ N (60, 10), p(βT2) ∼ N (3, 6), p(βT3) ∼ N (0, 2.5), p(βT4) ∼ N (0, 5), p(βT5) ∼ N (0, 2.5), p(βT6) ∼ N (0, 5), p(z (2) jk ) ∼ N (0, I), p(z(3)k ) ∼ N (0, I), p(σT ) ∼ U(0, ∞), p(α1) ∼ N (0, 10), p(α2) ∼ N (0, 10), p(α3) ∼ N (0, 10), p(α4) ∼ N (0, 10), p(α5) ∼ N (0, 10), p(α6) ∼ N (0, 10), p(α7) ∼ N (0, 10), p(ψ) ∼ U(0, ∞). 93 Shared random effects model 3. Note that the hazard submodel estimates are also presented in Table 7 as Model 3. yT ijk = βT1 + ζ (2) Tjk1 + ζ (3) Tk1 + βT4 SWDejk + βT6 MALEjk +( βT2 + ζ (2) Tjk2 + ζ (3) Tk2 + βT5 SWDejk ) GRADEijk + βT3 GRADE 2 ijk + T ijk logit(htjk) =α1GRADE3tjk + α2GRADE4tjk + α3GRADE5tjk + α4GRADE6tjk+ α5GRADE7tjk + α6SWDejk + α6MALEjk+ ζ (2) jk1 λ1 + ζ (2) jk2 λ2 + νk p(D(2)) ∼ U(0, ∞), p(D(3)) ∼ U(0, ∞), p(R(2)) ∼ LKJ (1.5), p(R(3)) ∼ LKJ (1.5), p(βT1) ∼ N (60, 10), p(βT2) ∼ N (3, 6), p(βT3) ∼ N (0, 2.5), p(βT4) ∼ N (0, 5), p(βT5) ∼ N (0, 2.5), p(βT6) ∼ N (0, 5), p(z (2) i ) ∼ N (0, I), p(z(3)i ) ∼ N (0, I), p(σT ) ∼ U(0, ∞), p(α1) ∼ N (0, 10), p(α2) ∼ N (0, 10), p(α3) ∼ N (0, 10), p(α4) ∼ N (0, 10), p(α5) ∼ N (0, 10), p(α6) ∼ N (0, 10), p(α7) ∼ N (0, 10), p(λ1) ∼ N (5, 2), p(λ2) ∼ N (5, 2), p(ψ) ∼ U(0, ∞). 94 APPENDIX B CONVERGENCE This section describes the methods used to assess convergence of the Markov chains specified for each model. It is important to note that approximate convergence cannot be tested empirically. Thus the assessment of mixing and stationarity of the chains was assessed following the guidelines of Gelman, Carlin, et al. (2014). Each model was estimated by specifying four Markov chains, each for 2500 iterations. The conservative choice of discarding the first half of each Markov chain was adopted, leaving 1250 simulations from the target distribution per chain, or a total of 5000 simulated values. Visual inspection of the chains using trace plots provided the initial evidence of both mixing and stationarity. Convergence for each key parameter was further evaluated using the Rˆ statistic. Rˆ is defined as an estimate of the factor by which the scale of the current posterior for a given parameter might be reduced if each chain continued to ∞ (Gelman, Carlin, et al., 2014). Finally, because the simulations within a chain are subject to autocorrelation, the effective number of simulations, neff , was computed for each parameter using the post-warm-up simulations. Gelman, Carlin, et al. (2014) recommended Rˆ < 1.1 and neff = 10 per parameter. Table B.1 provides the Rˆ and neff estimates for key parameter from the AZELLA total score growth models. In addition, Figures B.1 to B.15 provide trace and density plots of those parameter. Chains were run for all models used for this dissertation until Rˆ < 1.1 and neff ≥ 400. Readers interested in convergence evidence for the other models are encouraged to contact the author. 95 T ab le B .1 . Q u an ti ta ti ve co n ve rg en ce ev id en ce fo r th e to ta l E n gl is h la n gu ag e pr ofi ci en cy gr ow th m od el s M o d el 1 M o d el 2 M o d el 3 M o d el 4 M o d el 5 P a ra m et er Rˆ n eff Rˆ n eff Rˆ n eff Rˆ n eff Rˆ n eff β 1 [I n it ia l S ta tu s] 1. 0 0 2 24 0 .6 0 1 .0 0 31 84 .2 6 1 .0 0 21 42 .9 7 1 .0 0 3 51 0 .3 7 1 .0 0 3 38 0 .8 8 β 2 [G ra d e] 1. 00 29 7 5 .5 4 1 .0 0 50 00 .0 0 1 .0 0 28 05 .4 5 1 .0 0 50 00 .0 0 1 .0 0 50 00 .0 0 β 3 [G ra d e2 ] 1 .0 0 50 00 .0 0 1 .0 0 30 50 .3 3 1 .0 0 5 00 0 .0 0 1 .0 0 5 00 0 .0 0 β 4 [S W D e] 1 .0 0 3 86 5 .0 1 1 .0 0 3 50 9 .1 7 β 5 [G ra d e] × [S W D e] 1 .0 0 3 27 6 .2 9 1 .0 0 3 30 4 .3 5 β 6 [M a le ] 1 .0 0 5 00 0 .0 0 1 .0 0 5 00 0 .0 0 √ τ(2 ) 1 1 1. 00 2 1 54 .4 7 1 .0 0 16 36 .2 0 1 .0 0 14 48 .9 6 1 .0 0 1 76 0 .9 8 1 .0 0 2 02 1 .6 0 √ τ(2 ) 2 2 1. 00 1 0 91 .4 3 1 .0 1 66 8 .7 5 1 .0 0 97 1 .5 9 1 .0 0 6 29 .7 7 1 .0 0 95 8 .4 5 ρ (2 ) 2 1 1. 0 0 5 6 7 .8 0 1 .0 1 48 9 .1 1 1 .0 1 49 0 .0 1 1 .0 1 49 3 .5 7 1 .0 1 5 73 .1 7 √ τ(3 ) 1 1 1. 01 74 5 .2 5 1 .0 1 12 70 .4 3 1 .0 0 12 51 .8 1 √ τ(3 ) 2 2 1. 00 10 96 .2 9 1 .0 0 13 19 .3 0 1 .0 0 16 25 .5 7 ρ (3 ) 2 1 1. 00 12 58 .0 3 1 .0 0 11 94 .8 9 1 .0 0 13 96 .8 2 √ σ 1. 0 0 1 96 7 .6 8 1 .0 0 99 1 .3 8 1 .0 0 11 44 .2 3 1 .0 0 1 25 4 .0 2 1 .0 0 1 18 3 .3 3 96 0 20 0 40 0 60 0 80 0 10 00 12 00 63.463.864.2 Ite ra tio n 63 .2 63 .4 63 .6 63 .8 64 .0 64 .2 0.01.02.03.0 β 1 0 20 0 40 0 60 0 80 0 10 00 12 00 2.62.83.03.23.4 Ite ra tio n 2. 6 2. 8 3. 0 3. 2 3. 4 0.01.02.03.0 β 2 0 20 0 40 0 60 0 80 0 10 00 12 00 1.31.41.51.61.7 Ite ra tio n 1. 3 1. 4 1. 5 1. 6 1. 7 01234567 σ F ig u re B .1 . T ra ce an d d en si ty p lo ts fo r th e li n ea r gr ow th m o d el fi x ed eff ec ts 97 0 20 0 40 0 60 0 80 0 10 00 12 00 1.21.41.61.82.0 Ite ra tio n 1. 2 1. 4 1. 6 1. 8 2. 0 2. 2 0.01.02.03.0 τ 1 1(2) 0 20 0 40 0 60 0 80 0 10 00 12 00 0.40.60.81.0 Ite ra tio n 0. 2 0. 4 0. 6 0. 8 1. 0 01234 τ 1 1(2) 0 20 0 40 0 60 0 80 0 10 00 12 00 0.20.40.60.81.0 Ite ra tio n 0. 2 0. 4 0. 6 0. 8 1. 0 0.01.02.0 ρ 2 1(2) F ig u re B .2 . T ra ce an d d en si ty p lo ts fo r th e li n ea r gr ow th m o d el st u d en t- sp ec ifi c va ri an ce co m p on en ts 98 0 20 0 40 0 60 0 80 0 10 00 12 00 63.463.864.2 Ite ra tio n 63 .4 63 .6 63 .8 64 .0 64 .2 64 .4 0.01.02.03.0 β 1 0 20 0 40 0 60 0 80 0 10 00 12 00 1.82.22.63.0 Ite ra tio n 1. 8 2. 0 2. 2 2. 4 2. 6 2. 8 3. 0 0.00.51.01.52.0 β 2 0 20 0 40 0 60 0 80 0 10 00 12 00 0.10.30.5 Ite ra tio n 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 012345 β 3 F ig u re B .3 . T ra ce an d d en si ty p lo ts fo r th e q u ad ra ti c gr ow th fi x ed eff ec ts 99 0 20 0 40 0 60 0 80 0 10 00 12 00 1.01.21.41.61.8 Ite ra tio n 1. 0 1. 2 1. 4 1. 6 1. 8 2. 0 0.01.02.03.0 τ 1 1(2) 0 20 0 40 0 60 0 80 0 10 00 12 00 0.60.81.01.21.4 Ite ra tio n 0. 4 0. 6 0. 8 1. 0 1. 2 1. 4 0.01.02.03.0 τ 1 1(2) 0 20 0 40 0 60 0 80 0 10 00 12 00 0.40.60.81.0 Ite ra tio n 0. 2 0. 4 0. 6 0. 8 1. 0 0.01.02.0 ρ 2 1(2) F ig u re B .4 . T ra ce an d d en si ty p lo ts fo r th e q u ad ra ti c gr ow th m o d el st u d en t- sp ec ifi c va ri an ce co m p on en ts 100 0 20 0 40 0 60 0 80 0 10 00 12 00 63.463.864.264.6 Ite ra tio n 63 .5 64 .0 64 .5 0.00.51.01.52.02.5 β 1 0 20 0 40 0 60 0 80 0 10 00 12 00 1.52.02.53.0 Ite ra tio n 1. 5 2. 0 2. 5 3. 0 0.00.51.01.5 β 2 0 20 0 40 0 60 0 80 0 10 00 12 00 0.10.30.5 Ite ra tio n 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 012345 β 3 F ig u re B .5 . T ra ce an d d en si ty p lo ts fo r th e sc h o ol -l ev el m o d el fi x ed eff ec ts 101 0 20 0 40 0 60 0 80 0 10 00 12 00 1.21.41.61.8 Ite ra tio n 1. 0 1. 2 1. 4 1. 6 1. 8 0.01.02.03.0 τ 1 1(2) 0 20 0 40 0 60 0 80 0 10 00 12 00 0.40.60.81.01.2 Ite ra tio n 0. 4 0. 6 0. 8 1. 0 1. 2 1. 4 0.01.02.03.0 τ 1 1(2) 0 20 0 40 0 60 0 80 0 10 00 12 00 0.40.60.81.0 Ite ra tio n 0. 2 0. 4 0. 6 0. 8 1. 0 0.01.02.03.0 ρ 2 1(2) F ig u re B .6 . T ra ce an d d en si ty p lo ts fo r th e sc h o ol -l ev el m o d el st u d en t- sp ec ifi c va ri an ce co m p on en ts 102 0 20 0 40 0 60 0 80 0 10 00 12 00 0.00.51.01.5 Ite ra tio n 0. 0 0. 5 1. 0 1. 5 0.00.51.01.5 τ 1 1(3) 0 20 0 40 0 60 0 80 0 10 00 12 00 0.00.51.01.5 Ite ra tio n 0. 0 0. 5 1. 0 1. 5 0.00.51.01.52.0 τ 1 1(3) 0 20 0 40 0 60 0 80 0 10 00 12 00 −1.00.00.51.0 Ite ra tio n − 1. 0 − 0. 5 0. 0 0. 5 1. 0 0.00.20.40.60.8 ρ 2 1(3) F ig u re B .7 . T ra ce an d d en si ty p lo ts fo r th e sc h o ol -l ev el m o d el sc h o ol -s p ec ifi c va ri an ce co m p on en ts 103 0 20 0 40 0 60 0 80 0 10 00 12 00 63.864.264.665.0 Ite ra tio n 64 .0 64 .5 65 .0 0.00.51.01.52.0 β 1 0 20 0 40 0 60 0 80 0 10 00 12 00 2.02.53.03.5 Ite ra tio n 2. 0 2. 5 3. 0 3. 5 0.00.51.01.5 β 2 0 20 0 40 0 60 0 80 0 10 00 12 00 0.10.30.5 Ite ra tio n 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 012345 β 3 F ig u re B .8 . T ra ce an d d en si ty p lo ts fo r th e sc h o ol -l ev el m o d el w it h co va ri at es fi x ed eff ec ts , 1 of 2 104 0 20 0 40 0 60 0 80 0 10 00 12 00 −3.0−2.0 Ite ra tio n − 3. 5 − 3. 0 − 2. 5 − 2. 0 − 1. 5 0.00.40.81.2 β 4 0 20 0 40 0 60 0 80 0 10 00 12 00 −1.0−0.50.0 Ite ra tio n − 1. 0 − 0. 5 0. 0 0.00.51.01.52.0 β 5 0 20 0 40 0 60 0 80 0 10 00 12 00 −2.0−1.5−1.0−0.5 Ite ra tio n − 2. 0 − 1. 5 − 1. 0 − 0. 5 0.00.51.01.5 β 6 F ig u re B .9 . T ra ce an d d en si ty p lo ts fo r th e sc h o ol -l ev el m o d el w it h co va ri at es fi x ed eff ec ts , 2 of 2 105 0 20 0 40 0 60 0 80 0 10 00 12 00 0.81.01.21.41.6 Ite ra tio n 0. 8 1. 0 1. 2 1. 4 1. 6 0.01.02.03.0 τ 1 1(2) 0 20 0 40 0 60 0 80 0 10 00 12 00 0.40.60.81.01.2 Ite ra tio n 0. 2 0. 4 0. 6 0. 8 1. 0 1. 2 0.01.02.03.0 τ 1 1(2) 0 20 0 40 0 60 0 80 0 10 00 12 00 0.00.40.8 Ite ra tio n 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 0.00.51.01.52.0 ρ 2 1(2) F ig u re B .1 0. T ra ce an d d en si ty p lo ts fo r th e sc h o ol -l ev el m o d el w it h co va ri at es st u d en t- sp ec ifi c va ri an ce co m p on en ts 106 0 20 0 40 0 60 0 80 0 10 00 12 00 0.00.40.8 Ite ra tio n 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 1. 2 0.01.02.0 τ 1 1(3) 0 20 0 40 0 60 0 80 0 10 00 12 00 0.00.51.01.5 Ite ra tio n 0. 0 0. 5 1. 0 1. 5 0.00.51.01.52.02.5 τ 1 1(3) 0 20 0 40 0 60 0 80 0 10 00 12 00 −1.00.00.51.0 Ite ra tio n − 1. 0 − 0. 5 0. 0 0. 5 1. 0 0.00.20.40.6 ρ 2 1(3) F ig u re B .1 1. T ra ce an d d en si ty p lo ts fo r th e sc h o ol -l ev el m o d el w it h co va ri at es sc h o ol -s p ec ifi c va ri an ce co m p on en ts 107 0 20 0 40 0 60 0 80 0 10 00 12 00 64.064.464.865.2 Ite ra tio n 64 .0 64 .5 65 .0 0.00.51.01.52.0 β 1 0 20 0 40 0 60 0 80 0 10 00 12 00 1.52.02.53.03.5 Ite ra tio n 1. 5 2. 0 2. 5 3. 0 3. 5 0.00.51.01.5 β 2 0 20 0 40 0 60 0 80 0 10 00 12 00 0.10.30.5 Ite ra tio n 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 012345 β 3 F ig u re B .1 2. T ra ce an d d en si ty p lo ts fo r th e w ea k ly in fo rm at iv e p ri or m o d el fi x ed eff ec ts , 1 of 2 108 0 20 0 40 0 60 0 80 0 10 00 12 00 −3.0−2.0 Ite ra tio n − 3. 5 − 3. 0 − 2. 5 − 2. 0 − 1. 5 0.00.40.81.2 β 4 0 20 0 40 0 60 0 80 0 10 00 12 00 −1.0−0.50.0 Ite ra tio n − 1. 0 − 0. 5 0. 0 0. 5 0.00.51.01.52.0 β 5 0 20 0 40 0 60 0 80 0 10 00 12 00 −2.0−1.5−1.0−0.5 Ite ra tio n − 2. 0 − 1. 5 − 1. 0 − 0. 5 0.00.51.01.5 β 6 F ig u re B .1 3. T ra ce an d d en si ty p lo ts fo r th e w ea k ly in fo rm at iv e p ri or m o d el fi x ed eff ec ts , 2 of 2 109 0 20 0 40 0 60 0 80 0 10 00 12 00 0.81.01.21.4 Ite ra tio n 0. 6 0. 8 1. 0 1. 2 1. 4 1. 6 0123 τ 1 1(2) 0 20 0 40 0 60 0 80 0 10 00 12 00 0.20.40.60.81.0 Ite ra tio n 0. 2 0. 4 0. 6 0. 8 1. 0 0123 τ 1 1(2) 0 20 0 40 0 60 0 80 0 10 00 12 00 0.00.40.8 Ite ra tio n 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 0.00.51.01.52.0 ρ 2 1(2) F ig u re B .1 4. T ra ce an d d en si ty p lo ts fo r th e w ea k ly in fo rm at iv e p ri or m o d el st u d en t- sp ec ifi c va ri an ce co m p on en ts 110 0 20 0 40 0 60 0 80 0 10 00 12 00 0.00.40.81.2 Ite ra tio n 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 1. 2 0.01.02.0 τ 1 1(3) 0 20 0 40 0 60 0 80 0 10 00 12 00 0.00.51.01.5 Ite ra tio n 0. 0 0. 5 1. 0 1. 5 0.00.51.01.52.02.5 τ 1 1(3) 0 20 0 40 0 60 0 80 0 10 00 12 00 −1.00.00.51.0 Ite ra tio n − 1. 0 − 0. 5 0. 0 0. 5 1. 0 0.00.20.40.6 ρ 2 1(3) F ig u re B .1 5. T ra ce an d d en si ty p lo ts fo r th e w ea k ly in fo rm at iv e p ri or m o d el sc h o ol -s p ec ifi c va ri an ce co m p on en ts 111 APPENDIX C FULL RESULTS 112 T ab le C .1 . P ar am et er es ti m at es fo r th e re ad in g E n gl is h la n gu ag e pr ofi ci en cy gr ow th m od el s M o d el 1 M o d el 2 M o d el 3 M o d el 4 M o d el 5 M 9 5 % H P D M 9 5 % H P D M 9 5 % H P D M 9 5 % H P D M 9 5 % H P D F ix ed E ff ec ts β 1 [I n it ia l S ta tu s] 6 3 .3 1 [6 2 .9 7 , 6 3 .6 6 ] 6 3 .2 3 [6 2 .8 6 , 6 3 .5 9 ] 6 3 .3 5 [6 2 .9 2 , 6 3 .8 0 ] 6 3 .9 8 [6 3 .5 0 , 6 4 .4 8 ] 6 3 .9 8 [6 3 .5 1 , 6 4 .5 0 ] β 2 [G ra d e] 3 .2 8 [2 .9 7 , 3 .5 9 ] 3 .7 1 [3 .1 9 , 4 .2 3 ] 3 .2 5 [2 .7 8 , 3 .7 2 ] 3 .3 6 [2 .8 9 , 3 .8 2 ] 3 .3 6 [2 .8 9 , 3 .8 1 ] β 3 [G ra d e2 ] −0 .2 2 [− 0 .4 3 , 0 .0 0 ] β 4 [S W D e] −3 .4 7 [− 4 .3 3 , −2 .6 3 ] −3 .4 7 [− 4 .2 9 , −2 .6 3 ] β 5 [G ra d e] × [S W D e] −0 .1 8 [− 0 .7 5 , 0 .4 0 ] −0 .1 8 [− 0 .7 5 , 0 .4 0 ] β 6 [M a le ] −0 .6 4 [− 1 .1 7 , −0 .1 0 ] −0 .6 3 [− 1 .1 8 , −0 .0 9 ] V a ri a n ce C o m p o n en ts √ τ( 2 ) 1 1 2 .1 3 [1 .7 6 , 2 .5 0 ] 2 .1 8 [1 .8 2 , 2 .5 4 ] 2 .1 3 [1 .7 8 , 2 .4 9 ] 1 .6 9 [1 .3 5 , 2 .0 6 ] 1 .6 9 [1 .3 3 , 2 .0 4 ] √ τ( 2 ) 2 2 0 .9 1 [0 .5 5 , 1 .2 6 ] 0 .7 7 [0 .3 7 , 1 .1 7 ] 0 .7 8 [0 .4 3 , 1 .1 5 ] 0 .6 4 [0 .2 3 , 1 .0 4 ] 0 .6 6 [0 .2 5 , 1 .0 6 ] τ (2 ) 2 1 / √ τ( 2 ) 1 1 τ (2 ) 2 2 0 .3 1 [− 0 .1 0 , 0 .7 8 ] 0 .2 5 [− 0 .2 2 , 0 .8 0 ] 0 .3 9 [− 0 .0 6 , 0 .8 7 ] 0 .2 8 [− 0 .2 5 , 0 .9 0 ] 0 .2 6 [− 0 .2 7 , 0 .8 5 ] √ τ( 3 ) 1 1 0 .3 6 [0 .0 1 , 1 .0 4 ] 0 .2 9 [0 .0 1 , 0 .8 1 ] 0 .2 9 [0 .0 1 , 0 .8 6 ] √ τ( 3 ) 2 2 0 .6 0 [0 .1 4 , 1 .1 6 ] 0 .5 5 [0 .1 4 , 1 .0 8 ] 0 .5 5 [0 .1 3 , 1 .0 8 ] τ (3 ) 2 1 / √ τ( 3 ) 1 1 τ (3 ) 2 2 0 .3 8 [− 0 .9 3 , 0 .8 0 ] 0 .3 9 [− 0 .9 3 , 0 .7 7 ] 0 .3 8 [− 0 .9 3 , 0 .7 8 ] √ σ 2 .2 3 [2 .0 4 , 2 .4 3 ] 2 .2 6 [2 .0 7 , 2 .4 6 ] 2 .2 2 [2 .0 3 , 2 .4 1 ] 2 .2 4 [2 .0 5 , 2 .4 2 ] 2 .2 4 [2 .0 5 , 2 .4 3 ] M o d el F it E st im a te (S E ) E st im a te (S E ) E st im a te (S E ) E st im a te (S E ) E st im a te (S E ) el p d p si s- lo o −1 5 1 6 .9 2 (1 0 .0 5 ) −1 5 2 1 .5 9 (1 0 .3 7 ) −1 5 2 0 .8 0 (9 .2 6 ) −1 5 1 3 .8 4 (1 1 .0 0 ) −1 5 1 4 .8 7 (1 1 .1 0 ) p lo o 7 1 .0 4 (3 .8 7 ) 7 3 .2 0 (3 .9 9 ) 6 2 .0 8 (3 .3 3 ) 6 4 .9 1 (3 .8 2 ) 6 5 .9 7 (3 .9 5 ) L O O IC 3 0 3 3 .8 4 (2 0 .1 1 ) 3 0 4 3 .1 7 (2 0 .7 4 ) 3 0 4 1 .6 0 (1 8 .5 2 ) 3 0 2 7 .6 7 (2 2 .0 0 ) 3 0 2 9 .7 5 (2 2 .2 1 ) 113 T ab le C .2 . P ar am et er es ti m at es fo r th e w ri ti n g E n gl is h la n gu ag e pr ofi ci en cy gr ow th m od el s M o d el 1 M o d el 2 M o d el 3 M o d el 4 M o d el 5 M 9 5 % H P D M 9 5 % H P D M 9 5 % H P D M 9 5 % H P D M 9 5 % H P D F ix ed E ff ec ts β 1 [I n it ia l S ta tu s] 6 3 .9 5 [6 3 .5 7 , 6 4 .3 3 ] 6 4 .0 4 [6 3 .6 5 , 6 4 .4 2 ] 6 4 .1 2 [6 3 .6 6 , 6 4 .6 5 ] 6 5 .3 5 [6 4 .8 5 , 6 5 .9 0 ] 6 5 .3 3 [6 4 .8 2 , 6 5 .8 3 ] β 2 [G ra d e] 3 .4 1 [3 .0 9 , 3 .7 3 ] 2 .9 8 [2 .4 4 , 3 .5 3 ] 2 .8 4 [2 .2 0 , 3 .4 7 ] 3 .0 6 [2 .4 2 , 3 .6 8 ] 3 .0 5 [2 .4 2 , 3 .6 8 ] β 3 [G ra d e2 ] 0 .2 1 [0 .0 0 , 0 .4 2 ] 0 .2 6 [0 .0 5 , 0 .4 7 ] 0 .2 6 [0 .0 5 , 0 .4 6 ] 0 .2 6 [0 .0 5 , 0 .4 7 ] β 4 [S W D e] −3 .7 3 [− 4 .5 8 , −2 .8 9 ] −3 .7 0 [− 4 .5 6 , −2 .8 6 ] β 5 [G ra d e] × [S W D e] −1 .2 7 [− 1 .8 6 , −0 .6 7 ] −1 .2 4 [− 1 .8 3 , −0 .6 5 ] β 6 [M a le ] −1 .1 8 [− 1 .8 2 , −0 .5 3 ] −1 .1 8 [− 1 .8 1 , −0 .5 2 ] V a ri a n ce C o m p o n en ts √ τ( 2 ) 1 1 2 .4 6 [2 .0 8 , 2 .8 5 ] 2 .4 3 [2 .0 5 , 2 .8 1 ] 2 .3 8 [2 .0 2 , 2 .7 4 ] 1 .8 2 [1 .4 2 , 2 .2 0 ] 1 .8 0 [1 .3 8 , 2 .1 8 ] √ τ( 2 ) 2 2 1 .2 6 [0 .9 3 , 1 .6 2 ] 1 .3 3 [0 .9 8 , 1 .6 7 ] 1 .1 9 [0 .8 4 , 1 .5 4 ] 1 .0 5 [0 .6 9 , 1 .4 0 ] 1 .0 2 [0 .6 3 , 1 .4 0 ] τ (2 ) 2 1 / √ τ( 2 ) 1 1 τ (2 ) 2 2 0 .1 1 [− 0 .2 1 , 0 .4 9 ] 0 .1 5 [− 0 .1 6 , 0 .5 5 ] 0 .2 9 [− 0 .0 5 , 0 .7 2 ] 0 .0 4 [− 0 .3 3 , 0 .5 3 ] 0 .0 7 [− 0 .3 3 , 0 .6 7 ] √ τ( 3 ) 1 1 0 .4 7 [0 .0 3 , 1 .1 2 ] 0 .4 1 [0 .0 4 , 0 .9 3 ] 0 .4 1 [0 .0 3 , 0 .9 0 ] √ τ( 3 ) 2 2 0 .6 5 [0 .2 7 , 1 .1 8 ] 0 .6 5 [0 .2 7 , 1 .1 6 ] 0 .6 6 [0 .2 9 , 1 .1 6 ] τ (3 ) 2 1 / √ τ( 3 ) 1 1 τ (3 ) 2 2 0 .3 4 [− 0 .9 3 , 0 .6 2 ] 0 .2 8 [− 0 .9 6 , 0 .5 1 ] 0 .2 8 [− 0 .9 5 , 0 .5 3 ] √ σ 2 .1 9 [1 .9 9 , 2 .4 0 ] 2 .1 6 [1 .9 6 , 2 .3 8 ] 2 .1 5 [1 .9 7 , 2 .3 6 ] 2 .1 5 [1 .9 6 , 2 .3 6 ] 2 .1 7 [1 .9 7 , 2 .3 8 ] M o d el F it E st im a te (S E ) E st im a te (S E ) E st im a te (S E ) E st im a te (S E ) E st im a te (S E ) el p d p si s- lo o −1 5 1 4 .5 2 (8 .6 3 ) −1 5 0 9 .8 9 (8 .5 2 ) −1 5 1 6 .8 4 (7 .9 7 ) −1 5 0 0 .4 1 (9 .3 2 ) −1 5 0 3 .1 0 (9 .4 5 ) p lo o 6 6 .6 0 (3 .5 3 ) 6 5 .9 4 (3 .5 4 ) 5 5 .9 3 (3 .0 4 ) 6 2 .2 1 (3 .5 0 ) 6 2 .3 4 (3 .5 5 ) L O O IC 3 0 2 9 .0 4 (1 7 .2 7 ) 3 0 1 9 .7 8 (1 7 .0 4 ) 3 0 3 3 .6 8 (1 5 .9 4 ) 3 0 0 0 .8 2 (1 8 .6 3 ) 3 0 0 6 .2 1 (1 8 .9 0 ) 114 T ab le C .3 . P ar am et er es ti m at es fo r th e or al E n gl is h la n gu ag e pr ofi ci en cy gr ow th m od el s M o d el 1 M o d el 2 M o d el 3 M o d el 4 M o d el 5 M 9 5 % H P D M 9 5 % H P D M 9 5 % H P D M 9 5 % H P D M 9 5 % H P D F ix ed E ff ec ts β 1 [I n it ia l S ta tu s] 6 4 .0 9 [6 3 .7 7 , 6 4 .4 0 ] 6 4 .2 2 [6 3 .9 1 , 6 4 .5 2 ] 6 4 .3 1 [6 3 .7 4 , 6 4 .9 1 ] 6 4 .5 3 [6 3 .8 9 , 6 5 .1 5 ] 6 4 .5 3 [6 3 .8 8 , 6 5 .1 8 ] β 2 [G ra d e] 2 .7 5 [2 .3 8 , 3 .1 1 ] 1 .9 6 [1 .4 0 , 2 .5 3 ] 1 .9 0 [1 .1 9 , 2 .6 2 ] 2 .1 6 [1 .5 1 , 2 .7 9 ] 2 .1 7 [1 .5 2 , 2 .7 8 ] β 3 [G ra d e2 ] 0 .4 9 [0 .2 2 , 0 .7 4 ] 0 .5 7 [0 .3 0 , 0 .8 2 ] 0 .5 7 [0 .3 1 , 0 .8 3 ] 0 .5 7 [0 .3 2 , 0 .8 2 ] β 4 [S W D e] −1 .4 1 [− 2 .1 8 , −0 .6 8 ] −1 .4 2 [− 2 .1 8 , −0 .6 7 ] β 5 [G ra d e] × [S W D e] −0 .0 8 [− 0 .6 3 , 0 .4 6 ] −0 .0 7 [− 0 .6 0 , 0 .4 5 ] β 6 [M a le ] −1 .6 5 [− 2 .4 9 , −0 .8 5 ] −1 .6 7 [− 2 .5 5 , −0 .8 5 ] V a ri a n ce C o m p o n en ts √ τ( 2 ) 1 1 1 .3 0 [0 .9 3 , 1 .6 8 ] 1 .1 7 [0 .8 0 , 1 .5 2 ] 0 .9 8 [0 .6 3 , 1 .3 4 ] 0 .8 8 [0 .5 1 , 1 .2 5 ] 0 .8 9 [0 .5 2 , 1 .2 6 ] √ τ( 2 ) 2 2 0 .9 0 [0 .5 2 , 1 .3 0 ] 1 .4 3 [0 .9 6 , 1 .9 0 ] 1 .5 9 [1 .1 1 , 2 .0 9 ] 1 .3 7 [0 .9 0 , 1 .8 6 ] 1 .3 8 [0 .9 1 , 1 .8 7 ] τ (2 ) 2 1 / √ τ( 2 ) 1 1 τ (2 ) 2 2 0 .8 0 [0 .4 5 , 0 .9 8 ] 0 .8 4 [0 .5 6 , 0 .9 9 ] 0 .8 3 [0 .5 2 , 0 .9 8 ] 0 .8 0 [0 .4 2 , 0 .9 8 ] 0 .7 8 [0 .4 2 , 0 .9 8 ] √ τ( 3 ) 1 1 0 .9 7 [0 .4 9 , 1 .6 5 ] 0 .9 1 [0 .4 5 , 1 .6 2 ] 0 .9 1 [0 .4 6 , 1 .5 3 ] √ τ( 3 ) 2 2 0 .6 5 [0 .0 3 , 1 .5 2 ] 0 .5 0 [0 .0 3 , 1 .2 5 ] 0 .5 0 [0 .0 3 , 1 .2 9 ] τ (3 ) 2 1 / √ τ( 3 ) 1 1 τ (3 ) 2 2 0 .4 3 [− 0 .8 1 , 0 .6 9 ] 0 .4 4 [− 0 .8 1 , 0 .7 5 ] 0 .4 5 [− 0 .8 0 , 0 .7 5 ] √ σ 2 .5 7 [2 .3 8 , 2 .7 7 ] 2 .4 0 [2 .2 3 , 2 .6 0 ] 2 .3 2 [2 .1 3 , 2 .5 3 ] 2 .3 3 [2 .1 5 , 2 .5 4 ] 2 .3 3 [2 .1 5 , 2 .5 4 ] M o d el F it E st im a te (S E ) E st im a te (S E ) E st im a te (S E ) E st im a te (S E ) E st im a te (S E ) el p d p si s- lo o −1 5 7 3 .7 2 (1 4 .6 6 ) −1 5 3 9 .4 0 (1 2 .5 4 ) −1 5 3 1 .9 4 (1 0 .2 3 ) −1 5 2 8 .7 1 (1 1 .1 3 ) −1 5 2 8 .3 4 (1 1 .1 0 ) p lo o 7 1 .5 1 (5 .3 1 ) 6 3 .5 3 (4 .7 3 ) 4 5 .3 4 (3 .2 3 ) 4 7 .4 2 (3 .6 1 ) 4 7 .2 2 (3 .6 2 ) L O O IC 3 1 4 7 .4 4 (2 9 .3 2 ) 3 0 7 8 .7 9 (2 5 .0 8 ) 3 0 6 3 .8 9 (2 0 .4 5 ) 3 0 5 7 .4 2 (2 2 .2 6 ) 3 0 5 6 .6 9 (2 2 .2 1 ) 115 Table C.4. Parameter estimates for the multivariate reading, writing, and oral English language proficiency growth models Model 1 Model 2 Model 3 M 95%HPD M 95%HPD M 95%HPD Fixed Effects Reading βR1, [Initial Status] 63.26 [62.87, 63.67] 63.91 [63.42, 64.39] 63.89 [63.42, 64.36] βR2, [Grade] 3.46 [3.03, 3.89] 3.57 [3.13, 4.01] 3.58 [3.14, 4.01] βR4, [SWDe] −3.50 [−4.33,−2.66] −3.46 [−4.30,−2.62] βR5, [Grade]×[SWDe] −0.16 [−0.74, 0.44] −0.16 [−0.73, 0.41] βR6, [Male] −0.73 [−1.28,−0.17] −0.74 [−1.29,−0.19] Writing βW1, [Initial Status] 64.07 [63.64, 64.53] 65.34 [64.83, 65.86] 65.33 [64.81, 65.85] βW2, [Grade] 2.92 [2.30, 3.53] 3.10 [2.49, 3.70] 3.12 [2.53, 3.72] βW3, [Grade 2] 0.28 [0.07, 0.49] 0.28 [0.07, 0.49] 0.28 [0.07, 0.48] βW4, [SWDe] −3.71 [−4.57,−2.86] −3.68 [−4.54,−2.81] βW5, [Grade]×[SWDe] −1.31 [−1.89,−0.71] −1.31 [−1.91,−0.72] βW6, [Male] −1.26 [−1.91,−0.59] −1.26 [−1.93,−0.58] Oral βO1, [Initial Status] 64.22 [63.63, 64.82] 64.51 [63.85, 65.15] 64.51 [63.86, 65.15] βO2, [Grade] 1.86 [1.18, 2.53] 2.14 [1.50, 2.75] 2.14 [1.49, 2.79] βO3, [Grade 2] 0.63 [0.38, 0.88] 0.62 [0.37, 0.87] 0.62 [0.36, 0.87] βO4, [SWDe] −1.37 [−2.14,−0.61] −1.36 [−2.14,−0.59] βO5, [Grade]×[SWDe] −0.14 [−0.68, 0.41] −0.15 [−0.68, 0.41] βO6, [Male] −1.75 [−2.57,−0.98] −1.74 [−2.55,−0.97] Continued on next page 116 Table C.4 continued Model 1 Model 2 Model 3 M 95%HPD M 95%HPD M 95%HPD Variance Components Student-level√ τ (2) 11 2.21 [1.90, 2.54] 1.77 [1.46, 2.11] 1.77 [1.44, 2.11]√ τ (2) 22 0.90 [0.57, 1.23] 0.79 [0.42, 1.14] 0.80 [0.44, 1.15]√ τ (2) 33 2.43 [2.10, 2.77] 1.87 [1.53, 2.21] 1.87 [1.53, 2.22]√ τ (2) 44 1.31 [0.99, 1.65] 1.13 [0.78, 1.47] 1.13 [0.79, 1.49]√ τ (2) 55 1.10 [0.78, 1.43] 0.99 [0.66, 1.33] 0.99 [0.66, 1.33]√ τ (2) 55 1.52 [1.06, 1.99] 1.32 [0.87, 1.80] 1.33 [0.87, 1.81] τ (2) 21 / √ τ (2) 11 τ (2) 22 0.14 [−0.18, 0.50] 0.01 [−0.34, 0.45] 0.01 [−0.34, 0.43] τ (2) 31 / √ τ (2) 11 τ (2) 33 0.87 [0.76, 0.96] 0.82 [0.66, 0.94] 0.82 [0.66, 0.94] τ (2) 41 / √ τ (2) 11 τ (2) 44 0.23 [−0.05, 0.52] 0.06 [−0.26, 0.39] 0.06 [−0.26, 0.39] τ (2) 51 / √ τ (2) 11 τ (2) 55 0.64 [0.39, 0.86] 0.54 [0.24, 0.80] 0.54 [0.22, 0.80] τ (2) 61 / √ τ (2) 11 τ (2) 66 0.49 [0.25, 0.70] 0.34 [0.04, 0.61] 0.34 [0.05, 0.61] τ (2) 32 / √ τ (2) 22 τ (2) 33 0.14 [−0.17, 0.46] 0.04 [−0.31, 0.42] 0.04 [−0.31, 0.41] τ (2) 42 / √ τ (2) 22 τ (2) 44 0.41 [0.06, 0.71] 0.30 [−0.11, 0.66] 0.31 [−0.11, 0.67] τ (2) 52 / √ τ (2) 22 τ (2) 55 0.38 [−0.01, 0.73] 0.34 [−0.12, 0.73] 0.34 [−0.12, 0.73] τ (2) 62 / √ τ (2) 22 τ (2) 66 0.54 [0.19, 0.81] 0.48 [0.08, 0.80] 0.48 [0.08, 0.80] τ (2) 43 / √ τ (2) 33 τ (2) 44 0.17 [−0.12, 0.49] −0.04 [−0.34, 0.35] −0.05 [−0.35, 0.33] τ (2) 53 / √ τ (2) 33 τ (2) 55 0.73 [0.50, 0.90] 0.66 [0.39, 0.87] 0.66 [0.39, 0.87] τ (2) 63 / √ τ (2) 33 τ (2) 66 0.53 [0.31, 0.73] 0.41 [0.14, 0.66] 0.42 [0.14, 0.67] τ (2) 54 / √ τ (2) 44 τ (2) 55 0.15 [−0.22, 0.52] 0.00 [−0.41, 0.42] −0.01 [−0.42, 0.40] τ (2) 64 / √ τ (2) 44 τ (2) 66 0.06 [−0.27, 0.39] −0.12 [−0.46, 0.26] −0.13 [−0.47, 0.24] τ (2) 65 / √ τ (2) 55 τ (2) 66 0.70 [0.40, 0.91] 0.63 [0.28, 0.89] 0.63 [0.28, 0.89] Continued on next page 117 Table C.4 continued Model 1 Model 2 Model 3 M 95%HPD M 95%HPD M 95%HPD Variance Components School-level√ τ (3) 11 0.28 [0.01, 0.81] 0.26 [0.01, 0.74] 0.25 [0.01, 0.73]√ τ (3) 22 0.51 [0.14, 0.94] 0.50 [0.17, 0.91] 0.50 [0.17, 0.92]√ τ (3) 33 0.38 [0.02, 0.92] 0.36 [0.02, 0.85] 0.37 [0.02, 0.85]√ τ (3) 44 0.57 [0.17, 1.05] 0.61 [0.24, 1.08] 0.60 [0.26, 1.06]√ τ (3) 55 0.97 [0.49, 1.65] 0.93 [0.46, 1.57] 0.92 [0.47, 1.56]√ τ (3) 55 0.60 [0.04, 1.33] 0.50 [0.03, 1.18] 0.51 [0.04, 1.22] τ (3) 21 / √ τ (3) 11 τ (3) 22 −0.16 [−0.76, 0.57] −0.15 [−0.75, 0.57] −0.14 [−0.75, 0.56] τ (3) 31 / √ τ (3) 11 τ (3) 33 0.06 [−0.63, 0.72] 0.03 [−0.64, 0.69] 0.03 [−0.65, 0.69] τ (3) 41 / √ τ (3) 11 τ (3) 44 0.03 [−0.64, 0.66] 0.05 [−0.62, 0.68] 0.04 [−0.64, 0.68] τ (3) 51 / √ τ (3) 11 τ (3) 55 0.02 [−0.64, 0.66] −0.03 [−0.67, 0.62] −0.04 [−0.67, 0.64] τ (3) 61 / √ τ (3) 11 τ (3) 66 0.00 [−0.68, 0.66] 0.02 [−0.65, 0.67] 0.02 [−0.66, 0.68] τ (3) 32 / √ τ (3) 22 τ (3) 33 −0.01 [−0.65, 0.61] −0.07 [−0.69, 0.57] −0.07 [−0.67, 0.56] τ (3) 42 / √ τ (3) 22 τ (3) 44 0.40 [−0.22, 0.85] 0.41 [−0.19, 0.84] 0.41 [−0.19, 0.84] τ (3) 52 / √ τ (3) 22 τ (3) 55 0.27 [−0.34, 0.76] 0.28 [−0.30, 0.76] 0.29 [−0.29, 0.77] τ (3) 62 / √ τ (3) 22 τ (3) 66 0.28 [−0.42, 0.80] 0.24 [−0.45, 0.77] 0.24 [−0.45, 0.77] τ (3) 43 / √ τ (3) 33 τ (3) 44 −0.22 [−0.77, 0.49] −0.30 [−0.80, 0.39] −0.29 [−0.81, 0.40] τ (3) 53 / √ τ (3) 33 τ (3) 55 −0.06 [−0.68, 0.57] −0.17 [−0.73, 0.49] −0.17 [−0.75, 0.49] τ (3) 63 / √ τ (3) 33 τ (3) 66 0.05 [−0.61, 0.68] 0.06 [−0.64, 0.67] 0.06 [−0.61, 0.68] τ (3) 54 / √ τ (3) 44 τ (3) 55 0.23 [−0.36, 0.74] 0.21 [−0.36, 0.71] 0.22 [−0.36, 0.73] τ (3) 64 / √ τ (3) 44 τ (3) 66 0.12 [−0.53, 0.69] 0.10 [−0.54, 0.68] 0.10 [−0.53, 0.68] τ (3) 65 / √ τ (3) 55 τ (3) 66 −0.11 [−0.66, 0.52] −0.08 [−0.65, 0.57] −0.08 [−0.66, 0.56] Continued on next page 118 Table C.4 continued Model 1 Model 2 Model 3 M 95%HPD M 95%HPD M 95%HPD Residual Variances √ σR 2.17 [2.00, 2.35] 2.19 [2.02, 2.37] 2.19 [2.02, 2.37] √ σW 2.09 [1.91, 2.28] 2.11 [1.93, 2.29] 2.10 [1.93, 2.29] √ σO 2.31 [2.12, 2.50] 2.33 [2.14, 2.52] 2.32 [2.14, 2.53] Model Fit Estimate (SE ) Estimate (SE ) Estimate (SE ) elpdpsis-loo −4497.32 (20.51) −4491.78 (21.47) −4491.56 (21.57) ploo 186.89 (7.86) 190.83 (7.83) 191.71 (7.99) LOOIC 8994.65 (41.02) 8983.56 (42.95) 8983.12 (43.15) 119 Table C.5. Parameter estimates for the time-to-reclassification hazard models Model 1 Model 2 Model 3 M 95%HPD M 95%HPD M 95%HPD Fixed Effects α1 [Baseline] −3.89 [−4.82,−3.14] −3.92 [−4.86,−3.12] −3.70 [−4.70, −2.83] α2 [Grade 4] −0.07 [−0.32, 0.18] −0.01 [−0.33, 0.33] 0.46 [0.02, 0.92] α3 [Grade 5] −0.50 [−0.88,−0.11] −0.39 [−0.83, 0.12] 0.44 [−0.14, 1.03] α4 [Grade 6] 0.93 [0.35, 1.55] 1.14 [0.50, 1.81] 3.11 [2.17, 4.16] α5 [Grade 7] −0.45 [−1.88, 0.90] −0.30 [−1.73, 1.02] 2.66 [1.03, 4.28] α6 [Male] −2.82 [−3.65, −2.11] α7 [SWDe] −0.27 [−0.71, 0.15] Variance Components√ ν 0.40 [0.11, 0.78] 0.57 [0.25, 1.00] Model Fit Estimate (SE ) Estimate (SE ) Estimate (SE ) elpdpsis-loo −332.63 (12.95) −330.21 (13.10) −290.07 (14.44) ploo 5.47 (0.62) 12.81 (0.86) 16.92 (1.26) LOOIC 665.26 (25.90) 660.41 (26.20) 580.15 (28.87) 120 Table C.6. Parameter estimates for the time-to-reclassification shared random effects models Model 1 Model 2 Model 3 M 95%HPD M 95%HPD M 95%HPD Growth Model β1 [Initial Status] 64.46 [64.12, 64.80] 64.12 [63.81, 64.43] 64.43 [64.11, 64.76] β2 [Grade] 2.57 [2.14, 2.96] 2.34 [1.97, 2.70] 2.54 [2.16, 2.89] β3 [Grade 2] 0.34 [0.19, 0.48] 0.46 [0.33, 0.59] 0.45 [0.31, 0.58] β4 [SWDe] −2.36 [−2.91,−1.81] −0.87 [−1.40,−0.38] −2.20 [−2.74,−1.68] β5 [Grade] × [SWDe] −0.44 [−0.86,−0.04] −0.29 [−0.60, 0.01] −0.48 [−0.87,−0.08] β6 [Male] −1.24 [−1.68,−0.82] −0.32 [−0.69, 0.05] −1.45 [−1.89,−1.03] Variance Components√ τ (2) 11 1.14 [0.92, 1.36] 1.26 [1.04, 1.49] 1.17 [0.97, 1.37]√ τ (2) 22 0.66 [0.43, 0.91] 0.99 [0.74, 1.24] 0.81 [0.60, 1.03] τ (2) 21 / √ τ (2) 11 τ (2) 22 0.68 [0.28, 0.97] 0.79 [0.53, 0.98] 0.75 [0.43, 0.98]√ τ (3) 11 0.24 [0.01, 0.62] 0.25 [0.02, 0.58] 0.21 [0.01, 0.53]√ τ (3) 22 0.45 [0.12, 0.87] 0.27 [0.02, 0.61] 0.30 [0.03, 0.65] τ (3) 21 / √ τ (3) 11 τ (3) 22 0.52 [−0.80, 0.86] 0.29 [−0.96, 0.67] 0.32 [−0.96, 0.68] √ σ 1.38 [1.27, 1.50] 1.30 [1.21, 1.41] 1.31 [1.21, 1.41] Continued on next page 121 Table C.6 continued Model 1 Model 2 Model 3 M 95%HPD M 95%HPD M 95%HPD Hazard Model α1 [Grade 3] −3.64 [−4.64,−2.81] −10.59 [−13.14,−8.35] −8.66 [−10.95,−6.65] α2 [Grade 4] 0.46 [0.01, 0.93] −0.18 [−1.09, 0.76] 1.31 [0.23, 2.47] α3 [Grade 5] 0.43 [−0.15, 1.05] 2.97 [1.85, 4.27] 4.67 [3.21, 6.32] α4 [Grade 6] 3.05 [2.11, 4.07] 7.89 [5.93, 10.08] 10.52 [8.15, 13.19] α5 [Grade 7] 2.54 [0.92, 4.13] 8.80 [5.85, 11.89] 12.19 [8.87, 15.72] α6 [SWDe] −2.77 [−3.58,−2.05] −8.35 [−10.96,−6.15] α7 [Male] −0.27 [−0.69, 0.16] −0.83 [−2.03, 0.32] λ1 [Initial Status, ζ (2) 1i ] 2.13 [1.00, 3.37] 2.14 [1.05, 3.32] λ2 [Linear Growth, ζ (2) 2i ] 3.08 [1.56, 4.73] 3.01 [1.49, 4.86] √ ν 0.56 [0.25, 0.99] 0.54 [0.04, 1.47] 0.73 [0.08, 1.77] Model Fit Estimate (SE ) Estimate (SE ) Estimate (SE ) elpdpsis-loo −1555.99 (29.41) −1672.89 (32.48) −1637.02 (31.48) ploo 37.98 (2.05) 136.48 (7.57) 114.87 (5.68) LOOIC 3111.99 (58.81) 3345.78 (64.96) 3274.03 (62.96) 122 REFERENCES CITED Abedi, J. (2008). Classification system for English language learners: Issues and recommendations. Educational Measurement: Issues and Practice, 27 (3), 17–31. doi: 10.1111/j.1745-3992.2008.00125.x Abedi, J., & Herman, J. (2010). Assessing English language learners’ opportunity to learn mathematics : Issues and limitations. Teacher College Record , 112 (3), 723–746. Allison, P. D. (1982). Discrete-time methods for the analysis of event histories. Sociological Methodology , 13 (1982), 61–98. Angrist, J. D., & Pischke, J.-S. (2008). Mostly Harmless Econometrics : An Empiricist ’ s Companion. Princeton, NJ: Princeton University Press. doi: 10.1017/CBO9781107415324.004 August, D., & Hakuta, K. (1997). Improving schooling for language-minority children: A research agenda. Washington, DC: The National Academies Press. August, D., & Shanahan, T. (2006). Developing literacy in second-language learners. Mahwah, NJ: Lawrence Erlbaum Associates. Bailey, A. L., & Carroll, P. (2015). Assessment of English language learners in the era of new academic content standards. Review of Research in Education, 39 (1), 253–294. doi: 10.3102/0091732X14556074 Bailey, A. L., & Huang, B. H. (2011). Do current English language development/proficiency standards reflect the English needed for success in school? Language Testing , 28 (3), 343–365. doi: 10.1177/0265532211404187 Bates, D. M., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models. , 1–27. Bauer, D. J., & Curran, P. J. (2010). Probing interactions in fixed and multilevel regression: Inferential and graphical techniques. Multivariate Behavioral Research, 40 (3), 331–349. doi: 10.1207/s15327906mbr4003 Beckett, L. A., Tancredi, D. J., & Wilson, R. S. (2004). Multivariate longitudinal models for complex change processes. Statistics in Medicine, 23 (2), 231–239. doi: 10.1002/sim.1712 123 Blozis, S. A. (2004). Structured latent curve models for the study of change in multivariate repeated measures. Psychological methods , 9 (3), 334–353. doi: 10.1037/1082-989X.9.3.334 Blozis, S. A. (2007). On fitting nonlinear latent curve models to multiple variables measured longitudinally. Structural Equation Modeling , 14 (2), 179–201. doi: 10.1080/10705510709336743 Bock, R. D., & Bargmann, R. E. (1966). Analysis of covariance structures. Psychometrika, 31 (4), 81–87. Bollen, K. A., & Curran, P. J. (2006). Latent Curve Models: A structural equation perspective. Hoboken, NJ: Wiley-Interscience. doi: 10.1353/sof.0.0084 Callahan, R. (2005). Tracking and high school English learners: Limiting opportunity to learn. American Educational Research Journal , 42 (2), 305–328. doi: 10.3102/00028312042002305 Callahan, R., Wilkinson, L., & Muller, C. (2010). Academic achievement and course taking among language minority uouth in U.S. schools: Effects of ESL placement. Educational Evaluation and Policy Analysis , 32 (1), 84–117. doi: 10.3102/0162373709359805 Cheung, A. C. K., & Slavin, R. E. (2012). Effective reading programs for Spanish-dominant English language learners (ELLs) in the elementary grades: A synthesis of research. Review of Educational Research, 82 (4), 351–395. doi: 10.3102/0034654312465472 Conboy, B. T., & Thal, D. J. (2006). Ties between the lexicon and grammar: Cross-sectional and longitudinal studies of bilingual toddlers. Child Development , 77 (3), 712–735. Cook, G. H., Boals, T., Wilmes, C., & Santos, M. (2008). Issues in the development of Annual Measurable Achievement Objectives for WIDA consortium states (WCER Working Paper No. 2008-2). Madison, WI: Wisconsin Center for Education Research School. Council of Chief State School Officers. (2012). Framework for English language proficiency development standards corresponding to the Common Core State Standards and the Next Generation Science Standards. Washington, DC: Council of Chief State School Officers. D’Agostino, R. B., Lee, M. L., Belanger, A. J., Cupples, I. A., Anderson, K., & Kannel, W. B. (1990). Relation of pooled logitstic regression to the time dependent cox regression analysis: The Framingham heart study. Statistics in Medicine, 9 , 1501–1515. 124 De Gruttola, V., & Tu, X. M. (1994). Modelling progression of CD4-lymphocyte count and its relationship to survival time. Biometrics , 50 (4), 1003–1014. Diggle, P., Zeger, S. L., Liang, K., & Heagerty, P. (2002). Analysis of longitudinal data (2nd ed.). Oxford: Oxford University Press. Doss, B. D., Thum, Y. M., Sevier, M., Atkins, D. C., & Christensen, A. (2005). Improving relationships: mechanisms of change in couple therapy. Journal of Consulting and Clinical Psychology , 73 (4), 624–633. doi: 10.1037/0022-006X.73.4.624 Feldman, B. J., & Rabe-Hesketh, S. (2012). Modeling achievement trajectories when attrition is informative. Journal of Educational and Behavioral Statistics , 37 (6), 703 – 736. doi: 10.3102/1076998612458701 Ferrer, E., & McArdle, J. (2003). Alternative structural models for multivariate longitudinal data analysis. Structural Equation Modeling , 10 (4), 493–524. Fieuws, S., & Verbeke, G. (2006). Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles. Biometrics , 62 (2), 424–431. doi: 10.1111/j.1541-0420.2006.00507.x Fieuws, S., Verbeke, G., Maes, B., & Vanrenterghem, Y. (2008). Predicting renal graft failure using multivariate longitudinal profiles. Biostatistics , 9 (3), 419–431. doi: 10.1093/biostatistics/kxm041 Galecki, A. T. (1994). General class of covariance structures for two or more factors in longitudinal data analysis. Communications in Statistics , 23 (11), 3105–3119. Ga´ndara, P., & Orfield, G. (2010). A return to the Mexican room: The segregation of Arizona’s English learners. Los Angeles, CA: Civil Rights Project. Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Boca Raton, FL: Chapman & Hall/CRC. doi: 10.1007/s13398-014-0173-7.2 Gelman, A., Hwang, J., & Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing , 24 (6), 997–1016. doi: 10.1007/s11222-013-9416-2 Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics , 2 (4), 1360–1383. doi: 10.1214/08-AOAS191 125 Gelman, A., Lee, D., & Guo, J. (2015). Stan: A probabilistic programming language for Bayesian inference and optimization. Journal of Educational and Behavioral Statistics , 40 (5), 530–543. doi: 10.3102/1076998615606113 Guo, G. (1993). Event-history analysis for left-truncated data. Sociological Methodology , 23 , 217–243. Hakuta, K. (2011). Educating language minority students and affirming their equal rights: Research and practical perspectives. Educational Researcher , 40 (4), 163–174. Hakuta, K., Butler, Y. G., & Witt, D. (2000). How long does it take English learners to attain proficiency? The University of California Linguistic Minority Research Institute. Hammer, C. S., Lawrence, F. R., & Miccio, A. W. (2008). The effect of summer vacation on bilingual preschoolers’ language development. Clinical Linguistics & Phonetics , 22 (9), 686–702. doi: 10.1080/02699200802028033 Harcourt. (2007). Arizona English Language Learner Assessment (AZELLA) technical manual. San Antonio, Texas: Author. Hedeker, D., & Gibbons, R. D. (2006). Longitudinal data analysis. Hoboken, NJ: John Wiley & Sons. doi: 10.1002/0470036486 Hedeker, D., Siddiqui, O., & Hu, F. B. (2000). Random-effects regression analysis of correlated grouped-time survival data. Statistical Methods in Medical Research, 9 (2), 161–179. doi: 10.1191/096228000667253473 Heilig, J. V. (2011). Understanding the interaction between high-stakes graduation tests and English learners. Teachers College Record , 113 (12), 2633–2669. Henderson, R., Diggle, P., & Dobson, A. (2000). Joint modelling of longitudinal measurements and event time data. Biostatistics , 1 (4), 465–480. doi: 10.1093/biostatistics/1.4.465 Hoffman, M., & Gelman, A. (2014). The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15 (1), 1593–1623. Joreskog, K. G. (1970). A general method for analysis of covariance structures. Biometrika, 57 (2), 239–251. Joreskog, K. G., & Goldberger, A. S. (1975). Estimation of a Model with Multiple Indicators and Multiple Causes of a Single Latent Variable. Journal of the American Statistical Association, 70 (351), 631–639. doi: 10.2307/2285946 126 Kalbfleisch, J. D., & Prentice, R. L. (2011). The statistical analysis of failure time data (2nd ed.). Hoboken, NJ: Wiley-Interscience. Kanno, Y., & Cromley, J. G. (2013). English language learners’ access to and attainment in postsecondary education. TESOL Quarterly , 47 (1), 89–121. doi: 10.1002/tesq.49 Kao, G., & Thompson, J. S. (2003). Racial and ethnic stratification in educational achievement and attainment. Annual Review of Sociology , 29 , 417–442. doi: 10.1146/annurev.soc.29.010202.100019 Kieffer, M. J. (2008). Catching up or falling behind? Initial English proficiency, concentrated poverty, and the reading growth of language minority learners in the United States. Journal of Educational Psychology , 100 (4), 851–868. doi: 10.1037/0022-0663.100.4.851 Kieffer, M. J. (2011). Converging trajectories: Reading growth in language minority learners and their classmates, Kindergarten to Grade 8. American Educational Research Journal , 48 (5), 1187–1225. doi: 10.3102/0002831211419490 Kim, J., & Herman, J. L. (2009). A three-state study of English learner progress. Educational Assessment , 14 (3-4), 212–231. Kohnert, K., & Conboy, B. T. (2010). Lexical and grammatical bilingual preschoolers. Journal of Speech, Language, and Hearing Research, 53 (3), 684–698. Koretz, D., & Guo, Q. (2012). Estimating the impact of the Massachusetts English immersion law on limited English proficient students’ reading achievement. Educational Policy , 27 (1), 121–149. doi: 10.1177/0895904812462776 Lewandowski, D., Kurowicka, D., & Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis , 100 (9), 1989–2001. doi: 10.1016/j.jmva.2009.04.008 Little, R. J. A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83 (404), 1198 – 1202. Little, R. J. A. (1995). Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association, 90 (431), 1112–1121. 127 MacCallum, R. C., Kim, C., Malarkey, W. B., & Kiecolt-Glaser, J. K. (1997). Studying multivariate change using multilevel models and latent curve models. Multivariate Behavioral Research, 32 (3), 215–253. doi: 10.1207/s15327906mbr3203 Molenberghs, G., Beunckens, C., Sotto, C., & Kenward, M. G. (2008). Every missingness not at random model has a missingness at random counterpart with equal fit. Journal of the Royal Statistical Society. Series B , 70 , 371–388. Muthen, B. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54 (4), 557–585. doi: 10.1007/BF02296397 Muthen, B., & Masyn, K. (2005). Discrete-time survival mixture analysis. Journal of Educational and Behavioral Statistics , 30 (1), 27–58. Muthe´n, B. O. (2002). Beyond SEM : general latent variable modelling. Behaviormetrika, 29 (1), 81–117. doi: 10.2333/bhmk.29.81 National Center for Education Statistics. (2015). The condition of education 2015: English language learners. Washington, DC: Author. National Research Council. (2011). Allocating federal funds for state programs for English language learners. Washington, DC: National Academies Press. Office of English Language Acquisition. (2015). Profiles of English learners (ELs). Parrish, T. B., Perez, M., Merickel, A., & Linquanti, R. (2006). Effects of the implementation of Proposition 227 on the education of English learners, K-12: Findings from a five year evaluation. Palo Alto, CA: American Institutes for Research and WestEd. Proust-Lima, C., Se´ne, M., Taylor, J. M. G., & Jacqmin-Gadda, H. (2014). Joint latent class models for longitudinal and time-to-event data: a review. Statistical Methods in Medical Research, 23 (1), 74–90. doi: 10.1177/0962280212445839 Ramsey, A., & O’Day, J. (2010). Title III policy : State of the states. Washington, D.C.: U.S. Department of Education. Raudenbush, S. W. (2001). Comparing personal trajectories and drawing causal inferences from longitudinal data. Annual Review of Psychology , 52 , 501–525. doi: 10.1146/annurev.psych.52.1.501 Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models (2nd ed.). Thousand Oaks, CA: Sage. 128 Reinsel, G. (1982). Multivariate repeated-measurement or growth curve models with multivariate random-effects covariance structure. Journal of the American Statistical Association, 77 (377), 190–195. Ribaudo, H., & Thompson, S. G. (2002). The analysis of repeated multivariate binary quality of life data: a hierarchical model approach. Statistical Methods in Medical Research, 11 (1), 69–83. Rizopoulos, D. (2011). Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics , 67 (3), 819–829. doi: 10.1111/j.1541-0420.2010.01546.x Rizopoulos, D., Hatfield, L., Carlin, B., & Takkenberg, J. (2014). Combining dynamic predictions from joint models for longitudinal and time-to-event data using Bayesian model averaging. Journal of the American Statistical Association, 109 (January 2016), 1385–1397. doi: 10.1080/01621459.2014.931236 Rizopoulos, D., & Lesaffre, E. (2014). Introduction to the special issue on joint modelling techniques. Statistical Methods in Medical Research, 23 (1), 3–10. doi: 10.1177/0962280212445800 Roberts, G., & Bryant, D. (2011). Early mathematics achievement trajectories: English-language learner and native English-speaker estimates, using the Early Childhood Longitudinal Survey. Developmental Psychology , 47 (4), 916–930. doi: 10.1037/a0023865 Robinson, J. P. (2011). Evaluating criteria for English learner reclassification: A causal-effects approach using a binding-score regression discontinuity design with instrumental variables. Educational Evaluation and Policy Analysis , 33 (3), 267–292. doi: 10.3102/0162373711407912 Rojas, R., & Iglesias, A. (2013). The language growth of Spanish-speaking English language learners. Child Development , 84 (2), 630–646. doi: 10.1111/j.1467-8624.2012.01871.x Rubin, D. B. (1976). Inference and Missing Data. Biometrika, 63 (3), 581. doi: 10.2307/2335739 Scarcella, R. (2003). Academic English: A conceptual framework (Technical Report 2003-1). Santa Barbara, CA: The University of California Linguistic Minority Research Institute. Schleppegrell, M. J. (2001). Linguistic features of the language of schooling. Linguistics and Education, 12 (4), 431–459. doi: 10.1016/S0898-5898(01)00073-0 129 Singer, J. D., & Willett, J. B. (1993). It ’ s about time : Using discrete-time survival analysis to study duration and the timing of events. Journal of Educational Statistics , 18 (2), 155–195. Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling. Boca Raton, FL: Chapman & Hall/CRC. Slama, R. B. (2012). A longitudinal analysis of academic English proficiency outcomes for adolescent English language learners in the United States. Journal of Educational Psychology , 104 (2), 265–285. doi: Doi 10.1037/A0025861 Slama, R. B. (2014). Investigating whether and when English learners are reclassified into mainstream classrooms in the United States: A discrete-time survival analysis. American Educational Research Journal , 51 (2), 220–252. doi: 10.3102/0002831214528277 Stevens, J. J., & Schulte, A. C. (2016). The interaction of learning disability status and student demographic characteristics on mathematics growth. Journal of Learning Disabilities . doi: 10.1177/0022219415618496 Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obuchowski, N., . . . Kattan, M. W. (2010). Assessing the performance of prediction models: A framework for traditional and novel measures. Epidemiology , 21 (1), 128–138. doi: 10.1097/EDE.0b013e3181c30fb2.Assessing Stoolmiller, M. (1994). Antisocial behavior, delinquent peer association, and unsupervised wandering for boys: Growth and change from childhood to early adolescence. Multivariate Behavioral Research, 29 (3), 263–288. doi: 10.1017/CBO9781107415324.004 Teachers of English to Speakers of Other Languages. (2006). PreK-12 English language proficiency standards. Alexandria, VA: Teachers of English to Speakers of Other Languages. Thompson, K. D. (2015). English learners’ time to reclassification: An analysis. Educational Policy , 1 – 34. doi: 10.1177/0895904815598394 Thum, Y. M. (1994). Analysis of individual variation: A multivariate hierarchical linear model for behavioral data (Unpublished doctoral dissertation). Chicago, IL: University of Chicago. doi: 10.1017/CBO9781107415324.004 Thum, Y. M. (1997). Hierarchical linear models for multivariate outcomes. Journal of Educational and Behavioral Statistics , 22 (1), 77–108. doi: 10.3102/10769986022001077 130 Thum, Y. M., & Matta, T. H. (2015a). MAP College Readiness Benchmarks: A research brief. Portland, OR: NWEA. Thum, Y. M., & Matta, T. H. (2015b). Predicting College Readiness from Interim Assessment Results : Growth modeling with selection. Chicago: Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, IL. Tsiatis, A. A., & Davidian, M. (2004). Joint modeling of longitudinal and time-to-event data: An overview. Stat Sinica, 14 , 809–834. Tsiatis, A. A., Degruttola, V., & Wulfsohn, M. S. (1995). Modeling the relationship of survival to longitudinal data measured with error. Applications to survival and CD4 counts in patients with AIDS. Journal of the American Statistical Association, 90 (429), 27–37. Uchikoshi, Y. (2012). Development of vocabulary in Spanish-speaking and Cantonese-speaking English language learners. Applied Psycholinguistics , 35 (2014), 1–35. doi: 10.1017/S0142716412000264 Umansky, I. M., & Reardon, S. F. (2014). Reclassification patterns among Latino English learner students in bilingual, dual immersion, and English immersion classrooms. American Educational Research Journal , 51 (5), 879–912. doi: 10.3102/0002831214545110 U.S. Department of Education, Institute of Education Sciences, & National Center for Education Statistics. (2015). NAEP data explorer. Vagh, S. B., Pan, B. A., & Mancilla-martinez, J. (2009). Measuring growth in bilingual and monolingual children’s English productive vocabulary development: The utility of combining parent and teacher report. Child Development , 80 (5), 1545–1563. Valentino, R. A., & Reardon, S. F. (2015). Effectiveness of four instructional programs designed to serve English learners: Variation by ethnicity and initial English proficiency. Educational Evaluation and Policy Analysis , 37 (4), 612–637. doi: 10.3102/0162373715573310 Vaupel, J. W., Manton, K. G., & Stallard, E. (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography , 16 (3), 439–454. doi: 10.2307/2061224 Vehtari, A., Gelman, A., & Gabry, J. (2016). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. arXiv . Retrieved from http://arxiv.org/abs/1507.04544 131 Verbeke, G., & Davidian, M. (2008). Joint models for longitudinal data: Introcution and overview. In G. Fitzmaurice, M. Davidian, G. Verbeke, & G. Molenberghs (Eds.), Longitudinal data analysis (pp. 319–326). Boca Raton, FL: Chapman & Hall/CRC. Verbeke, G., Fieuws, S., Molenberghs, G., & Davidian, M. (2014). The analysis of multivariate longitudinal data: A review. Statistical Methods in Medical Research, 23 (1), 42–59. doi: 10.1177/0962280212445834 Wu, M. C., & Carroll, R. J. (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics , 44 (1), 175–188. doi: 10.2307/2531905 Wulfsohn, M. S., & Tsiatis, a. a. (1997). A joint model for survival and longitudinal data measured with error. Biometrics , 53 (1), 330–339. doi: 10.1111/j.1541-0420.2006.00719.x Xu, S., & Blozis, S. A. (2011). Sensitivity Analysis of Mixed Models for Incomplete Longitudinal Data. Journal of Educational and Behavioral Statistics , 36 (2), 237–256. Yu, M., Taylor, J. M. G., & Sandler, H. M. (2008). Individual prediction in prostate cancer studies using a joint longitudinal survivalcure model. Journal of the American Statistical Association, 103 (481), 178–187. doi: 10.1198/016214507000000400 132