METHODOLOGICAL ISSUES IN THE MULTI-LEVEL ANALYSIS OF SCHOOL ENVIRONMENTS Jean Stockard ABSTRACT For a full understanding of student achievemenJ it is essential to use multi-level analyses, which take into account the characteristics of individual students, as well as the nature of their families, classrooms, schools, and communities. Such analyses can be quite complex; and this ·article reviews, in non-technical terms, relevant methodological issues. Over the last 30 years statistical analyses available for multi-level models haveimproved greatly. While researchers once used cross- tabulations and, somewhat later, variants of the general linear model, they can now use hierarchical linear models. These techniques allow the exploration of very complex interaction effects, which are often theoretically expected in multi- level models of achievement. Issues of measurement and model specification can also be more difficult with multi-level than with single-level models and need to be carefully considered. lt is suggested that a better understanding and use of multi-level models can help bridge the gap between the "input-output" and "process-product" traditions of educational research. Advances in Research and Theories of School Management and EducationaIPolicy, Volume 2, pages 217-240. Copyright© 1993 by JAi Press Inc. All rights ofreproduction in any form reserved. ISBN: 1-55938-253-8 217 218 J&\N STOCKARD An overwhelming amount of evidence indicates that students' achievement is influenced by individual characteristics, such as their ability and socioeconomic background._ Yet, a great deal of research indicates that classroom, school, and community environments also affect students' learning. For· instance, among students with equal measured ability and similar socioeconomic backgrounds, those who are in classrooms and schools with high achievement related norms and more supportive interpersonal environments tend to have high achievement. Similarly, students who live in communities whose citizens support and participate in school activities have higher achievement than students in other communities, even when they have equal individual characteristics. (Stockard & Mayberry, 1992, for an extensive review of this literature.) Thus, most educational researchers today would probably agree that multi-level analyses are essential if we are to fully understand student achievement. Studies of student learning and achievement must take into account not just students' individual characteristics, but also the environments in which they learn. Analyzing multilevel effects, however, is far from simple. It involves careful attention to the proposed models which are studied, the data used to measure variables in these models, and the techniques used to analyze these data. While there are still many unanswered questions in this area, understanding of the complexities involved has expanded greatly in recent years. Some of these discussions are highly technical and statistical, while others are more accessible to all researchers. Unfortunately, the most recently developed analysis techniques, which are much better suited than earlier methods to handle the various theoretical and substantive complexities,• are unfamiliar to many researchers. They are not yet covered in standard statistics textbooks nor included in general statistical packages, and many researchers do not understand how these techniques are related to more familiar methods. In this paper I review this literature, presenting the material in general terms, with references to the more technical literature for those who are so inclined. I first describe analysis techniques that have been used over the years to examine environmental influences on students' learning. Second, I examine the translation between theory and research design, called the specification of theory; and then explore the issue of measurement, how theory is translated into data. Finally, I briefly describe how multi-level analyses, and especially the most recently developed analysis techniques, can begin to bridge the unfortunate gap between what are commonly called the "input-output" and "process-product" research traditions in studies of student achievement. My focus is primarily on quantitative analyses, rather than qualitative or ethnographic/field work. This is not meant to denigrate the latter area of research, for such methods are the only way in which to obtain detailed, subjective, rich accounts of how students learn within classrooms, schools, and communities. Moreover the questions and issues I discuss regarding Methodological Issues in the Multi-Level Analysis of School Environments 219 measurement and specification of models ar:e not unique to quantitative work. Still, mos.t of the literature regarding methodological issues has focused on quantitative analyses, and thus I primarily discuss that literature. ANALYSIS TECHNIQUES The analyses used to examine environmental influences on students' learning developed and changed over the years along with advances in computer technology. The earliest analyses used only aggregated data. For instance, researchers examined the relationship between the average achievement of students within a school or classroom and their average socioeconomic status and the average educational level of their teachers. Unfortunately, such analyses could properly lead only to generalizations about aggregate units, such as schools or classrooms, and could prompt researchers to commit what is known as the "ecological fallacy," a problem discussed in much greater detail below. Recognizing the limitations of analyses that included only aggregated variables, when their true interest was how both individual level and aggregated or environmental variables affected individual students, researchers began to develop multilevel analysis techniques. These could be used to explore hypotheses regarding influences from both individual and aggregated level independent variables on dependent variables measured on the individual level. The first of these methods were cross-tabulation procedures that could be accomplished with counter-sorters and hand computations. By the 1970s, when computers became more common, most analysts moved to using variations of regression and the general linear model. With the development of very high speed computers in the 1980s iterative techniques building on regression and Bayesian statistics were proposed. Each development has improved researchers' ability to describe the relationships that underlie the influence of environmental variables on students' learning. Cross-Tabulation Analyses In an article published in 1960 and now considered classic, Peter Blau demonstrated how analysts can separate group or contextual influences on individuals from the influences of their own personality or other characteristics. Building on the elaboration techniques popularized by Paul Lazarsfeld and his assoc~ates (e.g. Kendall & Lazarsfeld, 1950; Lazarsfeld & Rosenberg, 1955; also see Rosenberg, 1968), Blau demonstrated how the "structural effects of a social value can be isolated by showing that the association between its prevalence in a community or group and certain patterns of conduct is independent of whether an individual holds this value or not" (Blau 1960, p. 180). He suggested that researchers use the characteristics of the individ\lal as 220 JEAN STOCKARD Table 1. Illustrations of Blau's Cross-Tabular Analysis of Structural or Contextual Effects Individuals' Socioeconomic Status Individual's Low High Plans to School SES Context School SES Context Attend College Low High Total Low High Total A. Results supporting lhe Effect of Schools' Socioeconomic Context on Individuals' College Plans No 90% 40% 80% 60% 10% 20% Yes 10% 60% 20% 40% 90% 80% Total 100% 100% 100% 100% 100% 100% (n) 400 100 500 100 400 500 B. Results which do not support the Effect of Schools' Socioeconomic Context on Individuals' College Plans: No 80% 80% 80% 40% 40% 40% Yes 20% 20% 20% 60% 60% 60% Total 100% 100% 100% 100% 100% 100% (n) 400 100 500 100 400 500 a control variable and then look at the relationship between the contextual variable and the dependent variable within each category of this control variable. If differences persisted then one would be better able to conclude that a group-level or contextual variable had an influence over and above the influence of the individuals' own characteristics. (See also Blau, 1957, as well as Davis, Spaeth, & Husen, 1961 for a slight modification of this technique.) An example can illustrate Blau's reasoning: Suppose a researcher was interested in studying students' plans to attend college and hypothesized that students with more classmates from nigher socioeconomic status (SES) backgrounds would be more likely to aspire to college than students with lower status classmates. This is called a "contextual effect" and has been documented a number of times in the literature (e.g., Alexander & Eckland, 1985; Alexander, McDill, Fennessey & D'Amico, 1979; Mortimore, Sammons, Stoll, Lewis & Ecob, 1988; Stockard & Mayberry, 1992). In testing this hypothesis, it would be important to know that the influence of the socioeconomic context was present whatever the socioeconomic background of the individual was; that is, that the contextual effect was independent of the students' own individual backgrounds. Using the techniques proposed by Blau the researcher would look at data such as that shown in Table 1. (All data are imaginary.) The results in part A of Table I indicate that a "contextual" effect does occur. Students with higher individual socioeconomic backgrounds are more likely to plan to go to college, no matter what type of school they attend. Yet, all students in the high SES schools, whether they come from lower SES or higher 0 Methodological Issues in the Multi-Level Analysis of School Environments 221 SES backgrounds, are more likely than those from lower SES schools to plan to attend college. The influence of the environmental or group level variable is independent of the influence of the students' backgrounds. In contrast, the results in part B would indicate that such a contextual effect does not occur. In Part B higher SES students in both types of schools are more likely to plan to attend college than lower SES students and there is no difference in the college plans of students in the two types of schools. While the example in part A of Table I involves individual effects which parallel those on the group level, it is possible that the results on the individual level may be the inverse or opposite of those on the group level. For instance, individual students with higher socioeconomic backgrounds may be more likely to plan to attend college. Yet, students in higher socioeconomic schools might be less likely to plan college attendance, even with similar levels of ability or individual social status, perhaps because of the intense competition they feel within their schools (cf. Davis, 1966; Bachman & O'Malley, 1986; Stockard & Mayberry, 1992). In addition, results on the individual level might be contingent on or vary with the group level value. For instance, participation in a high SES school might positively affect individuals from a lower socioeconomic background, but participation in a lower SES school might not affect students from a high socioeconomic background at all. Following the tradition of elaboration analysis (see Kendall & Lazarsfeld 1950, Rosenberg, 1968), the techniques suggested by Blau could help researchers find such patterns. The cross-tabulation method is intuitively appealing and was widely used. It also represents a clear improvement over simply looking at aggregated and single-level analyses of student learning. Yet the approach required that variables on the individual level be categorized in some manner. To a large extent ariy such categorization must be arbitrary in nature and cannot fully represent the true or underlying distribution of a variable. Thus these analyses can produce misleading conclusions about the relative presence of group and individual effects (Boyd & Iversen, 1979; Tannenbaum & Bachman, 1964). • In a sometimes biting and satirical article published in 1970, Robert Hauser (1970) explored what he calls the "contextual fallacy." This occurs "when residual differences among a set of social groups, which remain after the effects of one or more individual attributes have been partialled out, are interpreted in terms of social or psychological mechanisms correlated with group levels of one of the individual attributes" (Hauser, 1970, p. 659). Because people are not randomly assigned to social groups, any variable that can describe these groupings must be associated with a variety of individual level characteristics. For instance, the kind of school students attend is often highly related to their socioeconomic background and race and ethnicity. In a contextual analysis, such as that shown above, a researcher generally controls for one and usually not more than 2 or 3 individual level variables while looking for the influence 222 JEAN STOCKARD of a group level variable. In so doing the researcher implicitly assumes that the group-level variable is only related to these selected individual level variables and no others. If, in fact, the grouping variable is associated with other individual level variables not included in the analysis (the "residual differences" Hauser refers to), conclusions about the existence of a contextual effect are likely to be faulty. Hauser's critique hinges on the problem of properly specifying the range of variables that may influence a dependent variable, a problem that can appear with any kind of analytic technique and which we discuss more fully below. Yet the problems associated with grouping of variables are especially noticeable with cross-tabulations and Hauser recommended that researchers interested in contextual and structural effects begin to use techniques associated with the general linear model rather than cross-tabulations. General Linear-Model Techniques By the 1970s computers had largely replaced counter sorters.and were widely available to researchers. While analyses belonging to what is called the general linear model (Cohen, 1968; Gordon, 1968), such as multiple regression and complex analyses of variance and covariance, are extremely difficult to do by hand, the computer made it possible to conduct such analyses relatively easily with large amounts of data. Within a few years most researchers agreed with Hauser that such techniques were much more suitable to the analysis of multi- level data than cross-tabulations, even though the cross-tabulation techniques and analysis of covariance, a variation of the general linear model, are essentially equivalent under certain circumstances (Alwin 1976). The techniques within the general linear model allow a researcher to examine the relationship between a dependent variable and a number of independent variables, which can be on varying levels of analysis. The elements of the general linear model are familiar to most researchers. The simplest form is the bivariate regression equation: (I) where Y; is the score or measure on the dependent variable for an individual i; a represents the regression intercept or the predicted value of Y when X=O; by, represents the slope of the regression line, or the predicted change in the dependent variable Y that would occur with a unit change in the independent variable X; and e; represents the error term for individual i, or the difference between the actual value ofY; and the value predicted by the regression equation. This basic equation may be expanded in a variety of ways. When several intervally measured variables are used as predictors, the equation is simply known as a multiple regression equation. When categoric variables are used as predictors in the form of dummy variables and the equation also includes Methodological Issues in the Multi-Level Analysis of School Environments 223 interaction terms for these predictors, the model is that of analysis of variance. When both continuous and categoric predictor variables as well as their interaction terms are used, the model becomes that of analysis of covariance. Because of its flexibility this model can easily be expanded to encompass analyses of many of the relationships that are inherent in descriptions of environmental influences on learning. For instance, many studies examine how student achievement is influenced by variables related to individual students, such as measures of their ability, socioeconomic status, and previous achievement; and by variables related to the groups in whichthese students learn, such as the average socioeconomic status of the class or the school and other characteristics of the classroom, school, or community. A model could be developed that includes each of these variables. As a simple example, suppose that a researcher wanted to predict Y, a measure of student achievement, from students' measured ability (Xi), their socioeconomic status (X2), their teachers' verbal ability (X3), and the average socioeconomic status of their school (X4). The resulting predictiqn equation would be • (2) Estimating the regression coefficients (the b's) through a technique called ordinary least squares yields results that describe the extent to which changes in each of the independent variables (ability, socioeconomic status, teachers' verbal ability, and the average SES of the school) influence changes in the dependent variable (students' achievement), independent, or net, of the influence of each of the other independent variables. A wide variety of analyses are possible within this general model, each dependent on the hypotheses which a researcher wants to test (see discussion below on specification and Boyd & Iversen -1979; Burstein, 1980, 1981; de Graaf, 1984; Firebaugh, 1980; Stipak, 1980; Stipak & Hensler, 1982). More complex extensions· of this general technique, such as structural equation, dynamic change, and multiple time series models, allow researchers to include feedback loops within the analysis and to examine longitudinal data dealing with such issues as changes in achievement over time (see Rogosa 1980). Although relatively rare still in the literature on student learning, it is also possible to extend the regression model to the analysis of categoric dependent variables with the use of discriminant function analysis and varieties of log-linear modeling (see Goodman, 1978). Structural equation' models and the popular LISREL program, also allow researchers to explore hypotheses concerning the extent to which measurement error affects the results and to use multiple indicators of measures (see Hayduk, 1987). These models are extremely flexible and probably -among the most useful of the analysis techniques to arise from the general linear models. ~y the 1980s most researchers agreed that if a researcher was interested in 224 JEAN STOCKARD studying students, rather than using schools or classrooms as the unit of analysis, multilevel analysis using general linear model techniques was preferable to cross- tabulation methods (e.g. Hopkins, 1982; Lincoln & Zeitz, 1980). The analyses based on the general linear model have become widely accepted and are commonly included in statistics textbooks and discussed in methodology classes. Yet, because they are not specifically designed to deal with multilevel analyses, they cannot deal with all of the concerns and complexities involved in understanding environmental influences on individuals. Hierarchical Linear Models The most recent analytic techniques suggested for use with multilevel data are called by names such as Hierarchical Linear Models, Random Coefficient Models, and Empirical Bayes Techniques. These are potentially very powerful tools that only became available for widespread use with the development of very high speed computers in the 1980s. To date the techniques have been used by only a few educational researchers and, unlike the general linear and structural equation models, are not available in commonly used statistical packages nor reviewed in statistics textbooks. Hierarchical linear models can deal with a problem that has not been solved with the general linear model techniques. When researchers use these traditional analyses to examine how both individuals' characteristics and school or classroom policies or procedures affect behaviors of individual students they must assume that the influence of the individual level independent variables is consistent from one school or classroom to another. For instance, a researcher may examine the influence of socioeconomic status (SES) on students' achievement in a large number qf both public and private schools, but the general linear model techniques do not allow the researcher to explore the possibility (if not probability) that the influence of SES on achievement varies from one school to another and, more importantly, to develop models that can explain why the influence of SES varies from one school to another. The hierarchical linear models help fill this gap. They use the basic outlines of the general linear model, predicting a dependent variable from a series of independent variables that reflect not just the level on which the dependent variable is measured but also those at higher or more aggregated levels of analysis. The researcher first estimates the model in essentially the same way one would with the general linear modt:I. Then, however, the researcher proceeds to use the parameters (the regression coefficients) that were estimated at this first level of analysis (usually the individual level in studies ofe ducational effects) as dependent variables and the variables from higher levels of analysis as independent variables. The results of this second stage of the analysis allow the researcher to see how school and/ or classroom related variables affect the way in which individual level variables influence achievement. That is, the model Methodological Issues in the Multi-level Analysis of School Environments 225 allows the researcher to directly estimate how group level variables, such as characteristics of the school or classroom, influence the way in which individual level variables, such as individuals' SES, affect achievement. There are a variety of ways of calculating the parameters in these models, all of which involve a series of iterations, or repeated calculations, to develop estimates that most accurately reflect the data. These estimates are developed through the use of Bayesian statistics and can be calculated with the aid of computer programs developed by various authors in the field. (See Bryk & Raudenbush, 1989b; Burstein, Kim & Delandshere, 1989; Mason, Wong, & Entwisle, 1983; Raudenbush & Bryk, 1986, 1988, 1989 for extensive discussions of these methods.) Because the techniques are not yet widely used, relatively few examples of their utility for studies of environmental influences on achievement are in the literature. However, the methods appear to be very well suited for the theoretical issues that face researchers interested in this area. They can handle ·not just estimations of effects at different levels of analysis, but also changes in effects over time (see Bryk & Raudenbush, 1989b), categoric dependent variables (Wong & Mason, 1985), and, to some extent, estimates of measurement error (DiPrete & Grusky, 1990). No matter what type of analysis technique a researcher chooses to use, adequate hypotheses and measures are a necessity. No analysis, no matter how sophisticated, can compensate for poor specification of theories or poor measurement, and I tum now to these issues. SPECIFICATION OF THEORETICAL MODELS Specification refers to the precise delineation of theoretical models in a testable format. It is especially problematic when dealing with environmental influences precisely because researchers are usually concerned with more than one level of analysis. Sometimes specification errors result because researchers have used aggregated data rather than data that more closely matches their theories, and we first discuss issues related to the use of aggregate data. We then move to a more general discussion, examining common specification errors and how these may affect analyses of environmental influences. Issues in Aggregation Aggregated data have been used in a variety of ways, some more appropriate than others. Below I discuss reasons researchers use aggregated data and criticisms of their use. Why Aggregated Data are Used 226' JEAN STOCKARD Aggregated data have often been employed because there were no alternatives. Researchers, actually interested in studying i11:dividual level variables, such as student achievement, have been forced by the nature of the available data to examine variables measured at higher levels of analysis. Thus, instead of looking at the achievement of individual students, they have examined average student achievement in classrooms or schools. If the researcher's theoretical interest is actually at the level of the individual, using data from a higher level can, as we show below, introduce biases and produce misleading results. It is, however, possible that researchers might choose to examine data at an aggregated level because their theoretical interests lie in explaining processes that occur at that level. Those who advocate using the individual as the unit of analysis focus on the fact that students learn and react to school environments as individuals and that even environmental influences are often only an aggregation of individual effects (Wittrock & Wiley, 1970). Those who aavocate using classrooms as the unit of analysis assert that this is appropriate when evaluating the effect of classroom instruction because all students are simultaneously exposed to the treatment variable, or instruction, and it is important to examine the effect of this treatment at the level where it occurs (e.g., Wiley, 1970). Those who advocate using schools or districts as the unit of analysis stress the importance of policy decisions that are based at the school or district level (e.g., Bidwell & Kasarda, 1975, 1976). As long as the theoretical arguments are centered around the level of aggregation at which the data are gathered and analyzed, the. work may resist criticism. , Scholars might also choose to analyze experimental data at an aggregated level because they are concerned about the independence of the units that are studied. Experimental treatments or variables are often administered to students within classrooms. Because students received experimental treatments within groups, it was once argued that it is appropriate to analyze the effect of the experimental va!'iable using the groups, generally classrooms, as the unit of analy!,is (Lindquist, 1940). More recent examinations of this area, however, suggest that the problem has been overstated and that such experimental data may be more appropriately analyzed by using a multi-level analysis such as those described above (Hopkins, 1982). Criticisms of Aggregated Data The first widely read and influential criticism of the use of 'aggregated data was developed by William Robinson (1950) in his discussion of the "ecological fallacy." He described the problems that can· result from using data on an aggregated level of analysis to generalize to a lower level of analysis, for instance, usinKcorrelations based on school level data to generalize to individuals within those schools. Robinson demonstrated that correlations based on grouped data . ' Methodological Issues in the Multi-Level Analysis of School Environm~nts 227 can often overestimate the correlations obtained when individual data are used and suggested that great care be taken in drawing conclusions from studies using • aggregated data. Using aggregated data rather than individual level data essentially destroys information about the individual involved and reduces the variability of the variables. It can also mask nonlinear effects that"rnight be present in the data and assumes that all students within a group are essentially similar on the range of characteristics considered. In general, studies that rely only on aggregated data can produce inconsistent estimates of effects. This problem can appear with both simple and multiple regression models, non- . recursive models, and longitudinal models (Burstein, 1980). Later analyses of the "ecological fallacy" have expanded upon Robinson's conclusions by demonstrating that aggregation bias, or the bias that comes from inferring across levels of analysis, can be understood by analyzing the way in which the grouping of cases relates to the distribution of the variables involved in the analysis. Grouped-level and individual level estimates of effects are only different from each other when the way in which the cases are grouped is related to scores on the independent and/ or dependent variables. Unfortunately for researchers, this often happens in studies in education. For instance, classroom groupings are often based on children's prior achievement; the school to which children are assigned is often related, through patterns of neighborhood segregation, to their socioeconomic status and racial and ethnic background. The fact that these individual characteristics are related to both the groups to which children are assigned as well as to individual level variables of interest, such as achievement, means that aggregation bias is likely to occur. Conclusions derived from analyses on an aggregated level cannot be extended to an individual level. Fortunately, the extent of this aggregation bias may be estimated with a variable which measures group membership. This may be done either directly, through entering the grouping variable into the analysis (see Burstein, 1978; Hannan & Burstein, 1974), so that estimates of effects include an estimate of the effect of grouping itself, or more indirectly, by including the group mean on the independent variable within the analysis (Firebaugh, 1978). Both techniques yield similar results (Burstein, 1980, 1981). On a more general level, a number of authors have suggested that estimates ofe ffects differ across levels of aggregation because the theoretical models have been misspecified. In other words, aggregation bias is simply specification bias (Hanushek, Jackson & Kain, 1974; Irwin & Lichtman, 1976; Langbein & Lichtman, 1978). If all of the individual level variables that actually influenced the dependent variable were included in a model, any bias introduced by grouping should be nil. Certainly such an argument makes logical sense, and it has been suggested that the methods for checking for aggregation bias are in reality a search for specification errors. ., . 228 JEAN STOCKARD Typical Specification Errors Even though researchers now increasingly realize the importance of specifying multi-level models in analyzing educational outcomes, many potential methodological problems related to specification remain. The most often cited problem is that of omitting relevant variables from the analysis. We first examine this area and then discuss theoretical explanations of group effects and other problems. Omission of Relevant Variables Omitting relevant variables is a very serious problem because it is likely to directly bias estimates of the effect of the variables of interest. The statistical explanation of why these biased estimates occur can best be understood by considering analyses that use a regression model such as that described in equations (1) and (2) above. In an equation such as (1), ei represents the error term, or the extent to which the predicted value of Y deviates from the actual value of the dependent variabie Yi for that case. Essentially, e; represents all possible influences on the dependent variable other than those included in the model. The regression model (and actually all techniques of data analysis) assumes that the error term in the equation is uncorrelated with the independent variables. That is, any possible influences on the dependent variable other than those represented by the independent variables are totally unassociated with the independent variables already in the equation. The estimates of the regression coefficients are correct only if this assumption is true. If the error term is associated with any of the predictor variables, the estimates in the regression equation are said to be biased and the equation is said to be misspecified. The possibility of such misspecification is unfortunately quite large wlien survey data are used to analyze environmental influences on learning. While strictly constructed experimental designs can control for the possibility of correlated error terms, such designs are usually impractical in educational research. Most large-scale studies of student learning, apart from very small laboratory settings, involve the use of survey-based data and intact groups. When researchers wish to estimate the effect of environmental variables, such as the average ability level or the average socioeconomic status of a group, on students' achievement, any estimate of that effect will be biased unless the specified model includes all other variables that are conceivably associated with both the dependent variable and the hypothesized independent variable. For instance, characteristics of teachers in a district may be associated with characteristics of the community and students, with higher status communities and students of higher ability more likely to attract teachers with more extensive Methodological Issues in the Multi.-Level Analysis of School Environments 229 qualifications. Any estimate of the effect of teachers' characteristics on students' achievement would then need to come from a model that incorporated not just variables related to the students, but also those related to the school and the community. This problem can occur no matter what level of analysis is involved. But, when researchers are dealing with data from several levels of analysis the probability of specification errors may beincreased. For instance, a researcher might suggest that the achievement of individual students could be accounted for by the educational level of teachers and the quality of a school's library. Yet, a:II three of th~se variables might well be associated with students' ability, attitude toward schooling, and individual socioeconomic status. If the researcher did not also include these individual level variables in the model the estimate of the effect of teachers' education and the school library on achievement would be biased and incorrect. (For more extensive discussions of the statistical problems involved in this area see Burstein, 1980, 1981; for a fascinating debate regarding the effect of specification errors on an analysis see Bidwell and Kasarda, 1975, 1976, Hannan, et al., 1976; Alexander & Griffin, 1976). Explaining G_roup Effects This statistical explanation of why omitting variables from a model of environmental effects can produce biased results holds across all discussions of environmental influences. Yet, theoretical distinctions can be made regarding how group effects occur. That is, group effects may result simply from how people are assigned to groups (selection effects) or they may reflect group processes or properties. The input-output tradition of research has tended to focus on identifying selection effects. These selection effects reflect the fact that students are nonrandomly assigned to classrooms, schools, and school districts. Race, ethnicity, socioeconomic status and ability can all affect these assignments. To the extent that these individual-level variables are associated with both the independent and dependent variables in a model, the results may be misleading. In other· words, if the assignment of stu·dents to schools and classrooms is related to background variables that also influence achievement, any differences in achievement between classrooms and schools may reflect these background differences rather than any effect of the s·chools or classrooms themselves. A properly specified model, which allows for the influence of both the background variables and the grouping variables, can indicate the independent effect of both types of variables on students' learning. Only when the effect of grouped level variables on leariiing persists after the influence of selection related variables is controlled is it possible to suggest that some group process is influencing student achievement. The process- 230 JEAN STOCKARD product tradition of educational research has focused on trying to understand more how group processes and interactions influence educational outcomes. Yet, as I discuss more below, this tradition has generally not simultaneously tried to rule out the possibility that selection effects might be affecting the results. Most discussions of specification errors tend to focus on the problems of overestimating the effects of group-level variables (e.g., Hannan, 1971~ Robinson, 1950), focusing on the problems inherent in biases related to the differential selection of students into learning groups. A few scholars, however, have altered this discussion by noting that when we control for individuallevel variables we may tend to underestimate the actual effect of schools and other group level effects (e.g., Bidwell & Kasarda, 1980, p. 403; Centra & Potter, 1980, p. 277). The authors are making a theoretical point and choosing to focus on the ways in which groupings affect the processes within schools. Individual level variables, such as students' socioeconomic background, as well as group level variables, such as the schools to which students are assigned, influence students' achievement. But these variables also influence the process of schooling itself. Multi-level and cross-level analyses of students' learning must take into account not just how students' backgrounds affect their individual rates of learning, but also how groupings of students affect the process of teaching and learning which occurs within classrooms and schools. In the final section of tl]is paper I suggest that the most recently developed statistical techniques may help researchers actually carry out analyses such as these. Other Rrob/ems While the omission of important variables is undoubtedly the most common and potentially serious specification error in examining cross-level data, a number of other problems may also occur. First, the time order of variables and the specification of cause and effect may be faulty (cf. Eckart & Durand, 1985). For instance, a researcher may suggest that more highly qualified teachers tend to enhance students' learning, while in fact, rnore capable, harder working, and higher achieving students tend to attract more qualified teachers. The direction of causality may be exactly opposite to that hypothesized. Related to this concern is a common failure, often attributable to limitations of available data, to include time dimensions in hypothesized models. The possibility of simultaneous or reciprocal casual effects as well as longitudinal influences should be considered. If the characteristics of students and teachers within schools and districts could be tracked over time, it would be possible to more clearly understand the reciprocal relationship between these two variables . .It is possible that studies that take the cumulative nature of schooling into account actually indicate a greater degree of influence of environmental influence (e.g., Heynes, 1978; see also Stockard and Mayberry, 1992). Because " Methodological Issues in the Multi-Level Analysis of School Environments 231 students learn over time it is important that models of student learning incorporate this longitudinal aspect (cf. Bidwell & Kasarda, 1980; Centra & Potter, 1980, p. 277; Eckart & Durand, 1985, p. 3-6; Stipak, 1980, p. 47). Most analyses of environmental effects also specify relationships as linear in nature. In fact, however, they may not approximate this form at all, and constraining the analysis in this manner 111ay falsely estimate the actual effect. The literature on class size illustrates this possibility, where most of the. influence of decreasing class size on achievement occurs only when classes are smaller than approximately 15 students (Glass, Cahen, Smith, & Filby, 1982). Only studies that take this curvilinear effect into account can demonstrate this relationship (cf. Centra & Potter, 1980, p. 277). MEASUREMENT ISSUES As noted earlier, most researchers now agree that it is usually best to use multi- level analyses when examining environmental influences on student learning. In general, the literature on measurement suggests that one should usually • measure the dependent variable of student learning on the individual level and the explanatory variables at levels that represent their theoretical influence-(cf. Bidwell & Kasarda, 1980; Burstein, 1980, 1981; Stipak & Hensler, 1982). However, if researchers are to make accurate conclusions about environmental influences, they must not only correctly specify the theoretical models to be examined and analyze them in an appropriate manner, they also must correctly measure the variables that are involved. Measurement rules apply to all research endeavors but are, if po·ssible, even more crucial when examining environmental influences and multi-level models. Errors and inadequacies at one level simply become compounded when extended to another level, especially if composite measures are used. Below we examine issues related both to aaequacy of measures and measurement error. Adequate Measures When one considers the adequacy of measures it is important to consider both how the measures are related to the theory. one hopes to test and how the measures are constructed. Matching Theory and Measures Measures must accurately reflect the theory which is to be tested. Social scientists often use measures that only begin to tap the full complexi~y of theoretical concepts. Yet, if they are to adequately test their theories it is important that researchers attempt to operationalize concepts in ways that reflect their theories as closely as possible. For instance, a researcher might 232 JEAN STOCKARD hypothesize that the socioeconomic composition of a classroom influences the, expectations which students receive in their interactions with each other and with teachers. To adequately measure the concepts in the theory, it would be important to measure not just the socioeconomic composition of the classroom, but also the expectations which students and teachers hold regarding achievement. The resulting model would much more accurately reflect the researcher's actual theory and thus would provide a more ·accurate test. It is also important to realize that the conceptual meaning of variables may differ from one level of analysis to another. For instance, on the individual level of analysis a measure of socioeconomic status (SES) reflects the individual resources that students bring to the learning situation. On the level of the classroom an aggregated measure of socioeconomic status may reflect the expectations that teachers have for the students, the type of curriculum assigned, the nature of peer expectations, and implicit messages regarding success. At the level of the school and district an aggregated measure of SES mayreflectthe financial resources of a community. In addition, studies of the effects of environment on students usually specify that measures of the central tendency of these groups indicate their characteristics, focusing on the average socioeconomic status.or average ability of students in a classroom or school. Yet, many times it may be aspects of the groups other than central tendency that affect the process of schooling. Groups that are very homogeneous may require different teaching methods and produce different environments than groups that are very heterogeneous. Analyses that take 1nto account this variability would benefit from considering measures of variance rather than measures of central tendency. Similarly, groups that deviate a great deal around a rather low mean would probably have substantially different environments than those that deviate a smaller amount around a larger mean. Here, a measure of inequality, such as a coefficient of variation, might be an appropriate indicator (cf. Burstein, 1981, p. 214-215; Stipak & Hensley; 1982, p. 155-157). The hierarchical linear model techniques discussed above seem especially well suited to analyzing dispersion of a dependent variable (see Bryk & Raudenbush, 1989 a·& b; Raudenbush, 1988; Raudenbush & Bryk, 1987). Standard Rules of Measurement While all researchers, whether dealing with one level of analysis or several, must endeavor to follow standard rules of measurement, some are especially important to those studying environmental influences on student achievement. Two of the most crucial involve the variation of the measures and categorization. Estimates of the effects of environments upon individuals may be underestimated if the groups or environments which are studied are not sufficiently variable or different from each other. For instance, estimates of • • Methodological Issues in the Multi-Level Analysis of School Environments 233 the effects of resources and facilities on student learning may be underestimated in the current literature because most studies include schools with only a small amount of variation on these variables. Regulations regarding certification of t_eachers and accreditation of schools ensure that teachers have fairly similar formal qualif1cations and schools have relatively similar facilities. Studies in countries other than the United States that have incorporated more variation on these variables typically find much larger effects than studies in the United States (Stockard & Mayberry, 1992). Similarly, contradictory results in studies of the effect of class size may be directly traced to the lack of sufficient variation in the independent variable (Glass, et al., 1982; Stockard & Mayberry, 1992). Another standard rule of measurement (and analysis) is that researchers should preserve the original metric of measures and not needlessly categorize variables. If a variable is conceptually continuous in nature the measure of that variable should reflect, as much as possible, that metric. As we noted above, the rather arbitrary categorization of variables for cross-tabulation analyses of contextual effects could produce faulty results (see Alwin, 1976; Hauser, 1970). Today, when cross-tabulation techniques are less widely used, researchers may use categoric measures rather than interval or ratio level measures simply as a matter of convenience. If we are to have accurate tests of our theories, however, it is important that researchers strive to measure variables in ways which accurately reflect the underlying metric. Measurement Error As with the issue of adequacy, the issue of measurement error affects all social research. With multi-level analyses, some aspects are even more difficult. Research. on student achievement is often conducted within schools and intact classrooms. Even when all students in a school or a classroom are not studied, cluster sampling is often used to minimize research costs. These procedures almost inevitably result in clusters or units from which students are sampled, whether they be schools or classrooms, which are more homogeneous than -the total population. Because of the way students are assigned to classrooms and schools, they tend to be more similar to each other than to the total population. In addition, in statistical language, the observations are dependent upon each other or errors are correlated. Students within classrooms have many experiences in common: the same teachers, the same peers, usually the same communiti background and often similar social class and racial-ethnic backgrounds. They interact with each other on a daily basis, and thus the observations of their behavior cannot be considered independent. The variance of the error term may also differ across groups. For some groups an independent variable, such as socioeconomic status, may be a strong predictor of a dependent variab!e such as achievement, while for others it may be 234 JEAN STOCKARD somewhat weaker. As we noted above, when either the error terms are correlated or the variance of the error term differs across gro~ps, the value of ihe regression coefficients will be. inaccurate (see Burstein 1981, pp. 192- 3). A technique such as hierarchical linear modeling should be used to assess these differential effects. Special estimation problems arise when contextual values or aggregated variables estimated from sample data on individuals are used in conjunction with the data from individuals. For instance, a researcher interested in how socioeconomic status is related to achievement may estimate the socioeconomic status context of a scho61 (symbolized here by S) by taking the average of the measures ofs ocioeconomic status for each of the students (s) through the simple procedure shown in equation (3). ks/n=S (3) The researcher may then examine how both this aggregated measure of socioeconomic context (S) and the measure of socioeconomic status from the individual level (s) affect students' achievement (A), as in equation (4). (4) In this situation the sampling error for the independent variable measured on the aggregated level (S) is correlated with the sampling error for the corresponding variable on the individual level (s) because both variables come from the same underlying measurement. This violates the basic assumption of the linear model involving independence of the error term and can lead to inaccurate estimates. The fewer the number of cases used to estimate a contextual variable, the worse this problem becomes. One possible solution to this prbblei:n is to estimate the value of the contextual variable for each individual by omitting the individual's own value. That is, the measure of the school's socioeconomic context would be estimated from an average of the socioeconomic status of cases other than that individual and would be different for each individual. Although it is not always possible to do so, a far better solution would simply be to gather independent estimates of the contextual variable itself. That is, the researcher could try to assess more directly the aggregate variable through the use of other observations and indicators. This could also result in more valid measures of the aggregate variable. Researchers have traditionally used estimates of reliability and validity to estimate the extent of error within their measures. While relia]?ility and validity of measures should always be of concern to researchers, such issues have often not been confronted by those dealing with environmental effects. Validity of variables may be assessed in a variety of ways, with, as discussed immediately above, attempts ·to develop several independent indicators of environmental influences being perhaps the most important (Zeller & Carmines, 1980). ·- Methodological Issues in the Multi-Level Analysis of School Environments 235 Reliability of environmental variables which are not composites of those from lower levels may be assessed through traditional means. It is also possible to estimate the reliability of composite variables (those aggregated from scores on a lower level) from the aggregate level data through methods developed by O'Brien (1984, 1990, 1991). BRIDGING GAPS IN EDUCATIONAL RESEARCH Two major areas of educational research have examined environmental influences on student achievement. Some analysts use what has been called an "input-output" approach, focusing on the ways in which student, classroom, and school characteristics or resources (the "inputs") influence student achievement (the "outputs''). Others use a "process-product" orientation, which focuses on how the actual behaviors of teachers and students (the "process" within a classroom) influence student achievement (the "product'). Much useful research has resulted from· both orientations. The process-product approach seems especially well suited for exploring the actual process by which students learn, the way in which school resources and environmental influences are transl!lted into individual students' learning. The input-output approach seems especially well suited for analyzing data from larger surveys and estimating and summarizing general patterns of influences of environmental effects. While most researchers would probably agree that the variables included in both the input-output and the process-product approaches are important influences on acllievement, there has been little movement toward merging these two traditions, largely perhaps because their methodological approaches have been fairly distinct. Input-output research tends to utilize large-scale survey data and/ or school and district based records of achievement and resources, as well as relatively sophisticated statistical analyses. Many of·the methodological advances described in this paper grew out of the input-output tradition. The process-product approach generally relies on classroom and student observations and is less likely to use large samples and sophisticated statistical techniques. Theoretically, however, it seems clear that a full understanding of environmental influences on student achievement requires a merger of these traditions. In a recent book, I present a conceptual framework that shows how both the general characteristics of students' environments (the "inputs") and the way in which these characteristics are translated and understood by teachers and students (the "process'') influence students' learning (the "output" or "product''). That is, both the "social order" of schools, classrooms, and communities, and the "social actions" of students, teachers, administrators, and parents work together to produce higher achievement (Stockard & Mayberry, 236 JEAN STOCKARD 1992). We need to understand these multilevel relationships if we are to continue to promote student achievement. I believe that the current challenge for educational researchers is to expand upon and examine multilevel conceptual frameworks such as this through appropriate specification of models, accurate measurement of the variables involved, and suitable analysis techniques. Multilevel and integrated conceptual frameworks need to be translated into clearly specified and researchable models of achievement. Researchers need to expand the models of learning which they now have to include variables related to both social order and social action and to be careful to ensure that relevant variables. are included. Researchers need to understand that students are embedded within classrooms, schools, and communities and that all of these influence the ways in which they learn. Multiple measures appropriate for a variety of levels of analysis need to be tested for both validity and reliability. It is especially important that these variables reflect the theoretical constructs specified in the model, that is that they are valid. With multilevel models it _is especially important that these measures accurately reflect the construct at the hypothesized level. It is also essential that the measures be reliable-that they can produce consistent estimates of characteristics and behavior when used by different researchers and in different settings. Finally, we need to use analysis techniques that can incorporate both the analysis of classroom process and the analysis of environmental inputs or structures. I believe that the most recently developed hierarchical models can help to bridge this analytic gap. They allow a researcher to examine how classroom processes, the specific behaviors of teachers and students, influence learning and how larger school environments, including characteristics of classrooms, schools and communities, affect the way in which these influences take place. Thus they can help researchers analyze models that incorporate notions of both how the social actions within schools and the social order of schools, classrooms and communities affect student learning. Testing such models, however, will ~mly be possible if, and when, data sets are developed which incorporate both types of data. This will probably require researchers to show more communication and flexibility than is sometimes apparent. We need data sets that incorporate the beneficial aspects of both the input-output and process-product traditions, data sets that include broad- based samples and data from a variety of environmental sources as well as indepth analyses of classroom and school processes. SUMMARY Educational researchers generally agree that students' learning is influenced not only by their individual characteristics but also by the environments in -- . Methodological Issues in the Multi-Leve/ Analysis of School Environments 237 which they live. Yet, analyzing these multilevel influences is far from simple. Techniques for analyzing multilevel data have become. much more advanced in the last 30 years, from the early cross-tabulation techniques, through the various applications of the general linear model to the most recent developments of hierarchical linear models, which allow researchers to examine much more complex interaction effects. Yet, no matter how complex an analysis technique is, it is useless without a properly specified theoretical model and adequate measures. While researchers certainly continue to try to develop better measures and to use care in specifying models, we suspect that these two areas are still in need of a great deal of work. If further advances are to be made in understanding environmental influences on education, it seems extremely important that researchers develop better measures, carefully specify their theories, and work toward developing data sets and using analysis techniques that can incorporate the essential aspects of both the input~output and process-product research traditions. ACKNOWLEDGMENTS I would like to thank Samuel Bacharach, Sally Bowman, Maralee Mayberry, Robert O'Brien, and Joe Stone for helpful comments on earlier drafts of material related to this article. Any errors and all opinions are, of course, my own responsibility. REFERENCES Alexander, K. L., & Eckiand, B. K. ( 1975). Contextual effects in the high school attainment process. American Sociological Review, 40, 402-416'. Alexander, K. L., & Griffin, L. J. (1976). School district effects on academic achievement: A reconsideration. American Sociological Review, 41, 144-152. Alexander, K., McDill, E., Fennessey, J., & D'Amico, R. (1979). School SES influences- composition or context? Sociology ofE ducation, 52, 222-237. Alwin, D. F. (1976). Assessing school effects: Some Identities. Sociology of Education. 49, 294- 303. Bachman, J. G. & O'Malley, P. M. (1986). Self-concepts, self-esteem, and education experience: The frog-pond revisited (Again). Journal of Personality and Social Psychology, 50. 35- 46. Bidwell, C. E. & Kasarda, J. D. (1980). Conceptualizing and measuring the effects of school and schooling. American Journal ofE ducalion, 88, 401-430. Bidwell, C. E. & Kasarda, J. D. (1976). Reply to Hannan, Freeman, and Meyer, and Alexander and Griffin. American Sociological Review, 41 152-160. Bidwell, C. E., & Kasarda, J. D. (1975). School district organization and student achievement. American Sociological Review, 40, 55-70. Blau, P. M. (1960). Structural effects. American Sociological Review, 25, 178- 193 Blau, P. M. (1957). Formal organization: Dimensions of analysis. American Journal ofS ociology, 63, 58-69. ... .. 238 JEAN STOCKARD Boyd, L. H., Jr., & Iversen, G. R. (1979). Contextual analysis: Concepts ands tatistical techniques. , Belmont, California: Wadsworth. Bryk, A. S. & Raudenbush, S. W. (1989b). Toward a more appropriate conceptualization of research on school effects: A three-level hierarchical linear model. In R. D. Bock (Ed.), Multilevel analysis ofe ducational data (pp. 396-404). San Diego: Academic Press. Bryk, A. S. & Raudenbush, S. W. (1989a). Heterogenity of Variance in experimental studies: A challenge to conventional interpretations. Psychological Bulletin, 396-404. Burstein, L. (1981). The analysis of multiievel data in educational research and evaluation. Review ofR esearch in Education, 8, 158-233. Burstein, L. (1980). The role of levels of analysis in the specification of educational effects. In R. Dreeben & J. A. Thomas (Eds), The analysis of educational productivity, Volume 1: Issues in microanalysis (pp. 119-190). Cambridge, Massachusetts: Ballinger. Burstein, L (1978). Assessing differences between grouped and individual-level, regression coefficients: Alternative approaches. Sociological methods and Research, 7, 5-28. Burstein, L., Kim, K., & Delandshere, G. (1989). Multilevel investigations ofs ystematically varying slopes: Issues, alternatives, and consequences. In R. D. Bock (Ed), Multilevel analysis of educational data (pp. 233-276). San Diego: Academic Press. Centra, J. A. & Potter, D. A. (1980). School and teacher effects: An interrela~ional model. Review ofE ducational Research, 50, 273-291. Cohen, J. (1968). Multiple regression as a general data-analytic. system. Psychological Bulletin, 70, 426-443. . Davis, J . .A. (1966). The ca'mpus as a frog pond: An application of the theory ofrelative deprivation ' to career decisions of college men.American Journal of Sociology, 72, 17-31. Davis, J. A., Spaeth, J. L., & Husen, C. (1961). A technique for analyzing the effects of group composition. American Sociological Review 26, 2i5-225. deGraaf, C. (1984). Multi-level analysis in educational research. In H. Oosthoek & P. Van Den Eeden (Eds), Education from the multi-level perspective: Models, methodology and empirical findings (pp.45-70). New York: Gordon and Breach. 1 DiPrete, T. A., & Grusky, D. B. (1990). The multilevel analysis of trends with repeated cross- sectional data. In C. C. Clogg (Ed), Sociological• Methodology 1990 (pp. 337-368). Washington, D.C.: American Sociological Association. Eckart, D.R., & Durand, R. (1985). Contextual variables and policy arguments: A critique. T11e Social Science Journal. 22, 1-14. Firebaugh, G. (1980). Assessing group effects: A comparison of two methods. In E. F. Borgatta, & D. J. Jackson (Eds), Aggregate data: Analysis and interpretation (pp. 13-24). Beverly Hills: Sage Publication. Firebaugh, G. (1978). A rule for inferring individual-level relationships from aggregate data. American Sociological Review, 43, 557-572. Glass, G. V., Cahen, L. S. Smith, M. L., & Filby, N. N. (1982). School class size: Research and policy. Beveriy Hills, CA: Sage Publications. Goodman, L. A. (1978). Analyzing qualitative/ categorical data: Log-linear models and latent- structure analysis. Cambridge, Massachusetts: Abt Books. Gordon, R. (1968). Issues in multiple regression. American Journal of Sociology, 73, 592-616. Hannan, M. T. (1971). Aggregation and disaggregat_ion in sociology. Lexington, Mass.: Heath- Lex:ington. Hannan, M. T., & Burstein, L. (1974) .. Estimation from grouped observations. American Sociological Review, 39, 374-392. Hannan, M. T., Freeman, J. H., & Meyer, J. W. (1976). Specification of model for organizational effectiveness. American Sociological Review, 41, 136-143. Hannushek, E. A., Jackson, J. & Kain, J. (1974). Model specification, use of aggregate data, and the ecological correlation fallacy. Political Methodology,/, 89-!07. Melhodological Issues in the Multi-LevetAnalysis of School Environments 239 Hayduk, L. A. (1987). Structural equation modeling with LISREL· Essentials and advances. Baltimore: Johns Hopkins University Press. Heyns, B. (1978). Summer Learning. New York: Academic Press. Hopkins, K. D. (1982). The unit of analysis: Group mean versus individual observations. American Educational Research Joumal. 19, 5-18. Irwin, L., & Lichtman, A. J. (1976). Across the great divide: Inferring individual level behavior from aggregate data. Political Methodology, 3, 411-439. Kendall, P. L., & Lazarsfeld, P. F. (1950). Problems of survey analysis. In R. K. Merton & P. F. Lazarsfeld (Eds), Continuities in Social Research (pp. 187-195). Glencoe, Ill.: Free Press. Langbein, L. I., & Lichtman, A. J. (1978). &o/ogical Inference. Beverly Hills, California: Sage Publisher. Lazarsfeld, P. F., & Rosenberg, M. (Eds.) (1955). The language ofsocial research. New York: Free Press. • Lincoln, J. R., & Zeitz, G. (1980). Organizational properties from aggregate data: Separating individual and structural effects. American Sociological Review. 45, 391-408. Lindquist, E. F. (1940). Statistical analysis in educational research. Boston: Houghton Mifflin. Mason, W. M., Wong, G. Y., & Entwisle, B. (1983). Contextual analysis through the multilevel linear model. In S. Leinhardl (Ed.), Sociological methodology 1983-1984 (pp. 72-103). San Francisco: Jossey Bass. Mortimore, P., Sammons, P., Stoll, L., Lewis, D., & Ecob, R. (1988). School Matters. Berkeley: The University of California Press. O'Brien, R. M. (1991). Correcting measures of relationship between aggregate-level variables. In P. Marsden (Ed.), Sociological methodology (pp. 125-165). Washington, D.C.: American Sociological Association. O'Brien, R. M. (1990). Estimating the reliability of aggregate-level variables based on individual- level characteristics. Sociological Methods and Research, 18, 473-504. O'Brien, R. M. (1984). Using generalizability theory to estimate the dependability of aggregate- level variables. Quality and Quantity, 18, 193-205. Raudenbush, S. W. (1988). Estimating change in dispersion. Journal of Educational Statistics, 13, 148-171. Raudenbush, S. W., & Bryk, A. S. ( 1989). Quantitative models for estimating teacher and school effectiveness. In R. D. Bock (Ed.), Multilevel analysis of educational data (pp. 205-232). San Diego: Academic Press. Raudenbush, S. W., & Bryk, A. S. (1988). Methodological advances in analyzing the effects of schoo"ts and classrooms on student learning. Review of Research in Education, 15, 423- 475. Raudenbush, S. W., & Bryk, A. S. (1987). Examining correlates of diversity. Journal of Educational Statistics, 12, 241-269. Robinson, W. S. ( 1950). Ecological correlations and the behavior of individuals. American Sociological Review, 15, 351-357. Rogosa, D. (1980). Time and time again: Some analysis problems in longitudinal research. In C. E. Bidwell & D. M. Windham (Eds), The analysis ofe ducational productivity, volume _ II: Issues in macroanalysis (pp. 153-201). Cambridge, Mass.: Ballinger. Rosenberg, M. (1968). The logic ofs urvey analysis. New York: Basic Books. Stipak, B. (1980). Analysis of policy issues concerning social integration. Policy Sciences 12, 41- 60. Stipak, B., & Hensler, C. (1982). The workshop: Statistical inference in contextual analysis. American Journal of Political Science, 26, 151-175. Stockard, J. & Mayberry, M. (1992). Effective Educational Environments. Newbury Park CA: Corwin/Sage. Tannenbaum, A. S., & B!ichman, J. G. (1964). Structural versus individual effects. Americ_an JEAN STOCKARD Journal of Sociology, 69, 585-595. Wiley, D. E. (1970). Design and analysis of evaluation.studies. In M. C. Wittrock, & D. E. Wiley (Eds,), The evaluation of instruction: Issue and problems (pp. 259-269). New York: Holt, Rinehart and Winston. Wittrock, M. C., & Wiley, D. E. (Eds.). (1970). The evaluation ofi nstruction: Issues andp roblems. New York: Holt, Rinehart and Winston. Wong,.G. Y., & Mason, W. M. (198S). The hierarchical logistic regression model for multilevel analysis. Journal of the American Statistical Association, 80, SJ3a524 .. Zeller, R. •A ., & Carmines, E. G. (1980). Measurement in the social sciences: The link ·between theory and data. Cambridge: Cambridge University Press.