CONSTRUCT RELEVANT AND IRRELEVANT VARIABLES IN MATH PROBLEM SOLVING ASSESSMENT by LISA E. BIRK A DISSERTATION Presented to the Department of Educational Methodology, Policy, and Leadership and the Graduate School of the University of Oregon in partial fulfillment of the requirements for the degree of Doctor of Education June 2013 ii DISSERTATION APPROVAL PAGE Student: Lisa E. Birk Title: Construct Relevant and Irrelevant Variables in Math Problem Solving Assessment This dissertation has been accepted and approved in partial fulfillment of the requirements for the Doctor of Education degree in the Department of Educational Methodology, Policy, and Leadership by: Dr. Gerald Tindal Chairperson Dr. Julie Alonzo Core Member Dr. Gina Biancarosa Core Member Dr. McKay Sohlberg Institutional Representative and Kimberly Andrews Espy Vice President for Research and Innovation; Dean of the Graduate School Original approval signatures are on file with the University of Oregon Graduate School. Degree awarded June 2013 iii © 2013 Lisa E. Birk iv DISSERTATION ABSTRACT Lisa E. Birk Doctor of Education Department of Educational Methodology, Policy, and Leadership June 2013 Title: Construct Relevant and Irrelevant Variables in Math Problem Solving Assessment In this study, I examined the relation between various construct relevant and irrelevant variables and a math problem solving assessment. I used independent performance measures representing the variables of mathematics content knowledge, general ability, and reading fluency. Non-performance variables included gender, socioeconomic status, language proficiency and special education qualification. Using a sequential regression and commonality analysis, I determined the amount of variance explained by each performance measure on the Oregon state math assessment in third grade. All variables were independently predictive of math problem solving scores, and used together, they explained 58% score variance. The math content knowledge measure explained the most variance uniquely (12%), and the measures of math content and general ability explained the most variance commonly (16%). In the second analysis, I investigated whether additional variance was explained once student demographic characteristics were controlled and how this affected the unique variance explained by each independent performance measure. By controlling for demographics, the model explained slightly more than 1% additional variance in math scores. The unique variance explained by each independent measure decreased slightly. v This study highlighted the influence of various construct relevant and irrelevant variables on math problem solving scores, including the extent to which a language-free measure of general ability might help to inform likely outcomes. The use of variance partitioning expanded understanding of the unique and common underlying constructs that affect math problem solving assessment. Finally, this study provided more information regarding the influence demographic information has on outcomes related to state math assessments. vi CURRICULUM VITAE NAME OF AUTHOR: Lisa E. Birk GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: University of Oregon, Eugene University of Idaho, Moscow DEGREES AWARDED: Doctor of Education, Educational Leadership, 2013, University of Oregon Master of Education, Educational Leadership, 2010, University of Oregon Bachelor of Science, Special Education, 2005, University of Idaho Bachelor of Science, Elementary Education, 2005, University of Idaho Bachelor of Science, Secondary Education: Mathematics-major, French-minor, 2005, University of Idaho AREAS OF SPECIAL INTEREST: Building Effective Teacher Teams and Positive School Culture Educational Assessment Systems Diversity in Schools Early Intervention and Predictive Variables for Academic Success Special Education Identification Systems and Flexible Services Mathematics Education PROFESSIONAL EXPERIENCE: Student Services Coordinator at Bear Creek Elementary School, Bend-La Pine Schools, 2011-present Teacher on Special Assignment (Mathematics and Data Support), Bend-La Pine Schools, 2010-2011 Special Education Teacher at Juniper Elementary School, Bend-La Pine Schools, 2007-2010 Special Education Teacher at Fir Grove Elementary School, Roseburg Public Schools, 2005-2007 vii PUBLICATIONS: Birk, L. (2009). Mathematics Knowledge Development for Special Education Teachers. University of Oregon Scholar's Bank: One-Goal School Improvement Plans. Retrieved from http://hdl.handle.net/1794/10125 viii ACKNOWLEDGMENTS I would like to express thanks to the faculty and staff of the College of Education who have demonstrated endless dedication to the success of this cohort. Our achievements are a direct reflection of the clear commitment to student learning and success. Additionally, I am so grateful for the friendship and encouragement I have received from the members of the Bend cohort. I am proud to graduate among you and look forward to continued experiences in education together. Finally, I would like to sincerely thank my parents, Erica, Lee, and Lora for their patience and moral support as I navigated through this process. Your perspectives, critique, and encouragement were invaluable and I am so lucky to have had each of you by my side. ix To my parents, Bruce and Sandy who are the best teachers I have ever had. x TABLE OF CONTENTS Chapter Page I. INTRODUCTION ................................................................................................... 1 Defining the Construct(s) Measured in State Math Assessments ......................... 4 Mathematical Content Knowledge ................................................................. 5 Problem Solving Ability ................................................................................. 7 Identification of Measurement Construct (Irrelevant) Variables in Math Problem Solving .............................................................................................. 9 The Influence of (g) ........................................................................................ 10 The Influence of Reading ................................................................................ 12 Identification of Student Demographic Construct (Irrelevant) Variables in Math Problem Solving .................................................................................... 15 Gender ............................................................................................................. 15 Poverty ............................................................................................................ 16 Limited English Proficiency ........................................................................... 17 Special Education ............................................................................................ 18 The Quantification of Construct Relevant and Irrelevant Variables in Math Problem Solving .............................................................................................. 19 Content Knowledge ........................................................................................ 20 Non-verbal, Content-free General Problem Solving Ability .......................... 21 Reading Fluency ............................................................................................. 22 Demographic Variables .................................................................................. 22 Possible Outcomes .......................................................................................... 23 xi Chapter Page Research Questions ............................................................................................... 25 II. METHODOLOGY ................................................................................................. 26 Setting ................................................................................................................... 27 Participants ............................................................................................................ 27 Curriculum ............................................................................................................ 28 Materials ............................................................................................................... 29 easyCBM-math Second Grade Spring Benchmark Assessment ..................... 30 NNAT2 Second Grade Spring Assessment .................................................... 32 DORF Second Grade Spring Benchmark Assessment ................................... 34 OAKS-math Third Grade Assessment ............................................................ 35 Procedures ............................................................................................................. 36 Assessment Administration and Training Procedures .................................... 36 Data Collection and Subject Selection ............................................................ 38 Analyses .......................................................................................................... 42 III. RESULTS ............................................................................................................. 44 Descriptive Statistics ............................................................................................. 44 Analysis One: Performance Measures .................................................................. 45 Analysis Two: Measures with Student Demographic Characteristics .................. 46 IV. DISCUSSION ....................................................................................................... 49 Summary ............................................................................................................... 49 Limitations ............................................................................................................ 53 Threats to Internal Validity ............................................................................. 53 xii Chapter Page Threats to External Validity ............................................................................ 54 Interpretations ....................................................................................................... 56 Influential and Non-influential Variables in Math Problem Solving .............. 57 Utility of Predictive Measures in Assessment ................................................ 66 Defining a Complex Construct ....................................................................... 69 Implications and Future Research ......................................................................... 70 Practical Considerations .................................................................................. 70 Future Studies ................................................................................................. 73 APPENDICES ............................................................................................................ 76 A. ASSESSMENT EXAMPLES .......................................................................... 76 B. VARIABLE RELATIONS .............................................................................. 80 C. DISTRIBUTION OF SCORES FOR STUDY VARIABLES ......................... 82 D. LITERATURE SEARCH DESCRIPTION ..................................................... 86 REFERENCES CITED ............................................................................................... 88 xiii LIST OF FIGURES Figure Page 1. Example easyCBM question (grade 2). ................................................................ 76 2. Example OAKS-math question (grade 3) ............................................................. 77 3. Pictorial representation of NNAT2 items ............................................................. 78 4. Student scoring printout (NNAT2) ....................................................................... 79 5. Possible relations among variables in math problem solving ............................... 80 6. Variance partitioning using a commonality analysis ............................................ 81 7. Commonality analysis results. .............................................................................. 51 8. Distribution of easyCBM-math scores. ................................................................. 82 9. Distribution of NNAT2 scores .............................................................................. 83 10. Distribution of DORF scores ................................................................................ 84 11. Distribution of OAKS-math scores ....................................................................... 85 xiv LIST OF TABLES Table Page 1. Valid and Missing Test Data by Gender ............................................................... 39 2. Valid and Missing Test Data by Free or Reduced Lunch Status (FRL) ............... 40 3. Valid and Missing Test Data by Special Education Eligibility ............................ 40 4. Valid and Missing Test Data by ELL Qualification ............................................. 41 5. Means, Standard Deviations, and Intercorrelations for Variables in Math Problem Solving .............................................................................................. 44 6. Sequential Regression Analysis Predicting OAKS-math from easyCBM, NNAT2, and DORF ........................................................................................ 45 7. Variance Partition of R2 = 58.1% with easyCBM, NNAT2, and DORF (N=913) ........................................................................................................... 45 8. Sequential Regression Analysis Predicting OAKS-math from Performance and Non-performance Indicators ........................................................................... 47 9. Comparison of Unique Variance Attributed to Performance Variables Before and After Control of Demographic Variables ................................................. 48 1 CHAPTER I INTRODUCTION State assessments in education are intended to measure progress toward proficiency in specific areas of instruction. In mathematics, assessments in each state are developed based on state standards that may include a number of different domains such as (a) measurement, (b) geometry, (c) numbers and operations, and/or (d) algebra (National Council of Teachers of Mathematics [NCTM], 2000). The content standards and embedded domains represent what students must know and be able to do to demonstrate proficiency in mathematics. Researchers have pointed out that state standards vary widely (Webb, 1999). Thus, a student who demonstrates mathematic proficiency in Oregon will be unlikely to demonstrate the same level of proficiency in Idaho because the assessments are based on different state content standards and proficiency expectations. Additionally, the extent to which current state assessments accurately or adequately measure the standards or domains of interest is a subject of debate (Webb, 1999). Despite the current variability between state content standards in mathematics, problem solving continues to be one of the primary areas of focus in both instruction and assessment (NCTM, 2000, 2006). In recent years, a group comprised of the National Governors Association Center for Best Practices (NGA Center) and the Council of Chief State School Officers (CCSSO) resolved to eliminate differences and create common standards for use by all states (2010). By May 2012, nearly every state had joined the initiative known as the Common Core State Standards (CCSS). The adoption of the CCSS will mean many changes for states in terms of instruction, focus, and assessment as each adapts to the new 2 common expectations. However, despite common standards, some variation is likely to continue due to differences in proficiency standards (cut scores) set by each state independently as well as differences in assessment measures. To monitor achievement, states will choose from two major assessment systems created by two different assessment consortia (Center for K-12 Assessment and Performance Management at ETS [ETS], 2010). Although differences may exist in assessments and state designated cut scores, problem solving will remain a constant (NGA Center & CCSSO, 2010). Experts agree that problem solving is and will continue to be a primary focus of instruction in mathematics and is critical for the demonstration of proficiency in the subject area (National Council of Teachers of Mathematics [NCTM], National Council of Supervisors of Mathematics [NCSM], Association of State Supervisors of Mathematics [ASSM], & Association of Mathematics Teacher Educators [AMTE], 2010). Researchers note that large-scale assessments typically reflect a complex combination of two major constructs: (a) declarative knowledge and (b) developing abilities in complex tasks (Haladyna & Downing, 2004). State assessments in mathematics are no exception. In order to demonstrate proficiency, students must use information that they know about numbers (declarative knowledge) to solve problems in mathematical situations (a developing ability). Mathematical problem solving is a developing ability that is difficult to measure using standard assessment systems. In fact, researchers point out that when trying to evaluate proficiency around a complicated construct such as math problem solving, significant limitations exist. They contend that difficult-to-monitor systematic variance will exist within an assessment, despite attempts 3 to limit its influence (Haladyna & Downing, 2004). One type of this systematic variance is construct irrelevant variance. Certain skills are clearly related to certain constructs. For example, numerical fluency is a skill that will likely help a student be successful on assessments measuring math proficiency. Such competencies are considered construct relevant because they are clearly related to the measurement topic; however, construct irrelevant variance (CIV) is variance due to the existence of variables that influence an outcome, yet are not otherwise related to the concept measured. In any assessment system, CIV can (and probably does) exist (Haladyna & Downing, 2004). For example, English language proficiency is likely a factor that would influence outcomes on math or science assessment outcomes, yet has little to do with the constructs being measured. From a research perspective, it would be ideal to eliminate CIV completely; however, this is improbable. In mathematics problem solving, for example, written or spoken language is the medium through which assessment is delivered. Although unrelated directly to ability in math, language proficiency or reading ability may impact math performance outcomes. It is unlikely that large-scale math assessments will change such that language is unnecessary for assessment. This is just one example of how systematic variance continues to exist in the assessment of complex constructs like math problem solving. As educators work toward student success on state assessments, it is important to identify variables that may impact outcomes on these measures. Further, if these variables can be altered through instruction, teachers will be better able to allocate resources and focus instruction in order to attain better results for student achievement. To do this, one 4 must quantify, understand, and consider the variance accounted for by various influential variables when interpreting outcomes. Therefore, the purpose of this study is to broaden the identification and understanding of construct relevant and irrelevant variables on math problem solving outcomes as measured by state assessments in mathematics. Educators will more accurately identify proficiencies and make instructional decisions for students in mathematics when they have greater understanding about the degree to which different variables influence state math assessment outcomes. First, we will consider variables relevant to the constructs represented by state math assessments. Defining the Construct(s) Measured in State Math Assessments In this age of accountability, state testing programs are of much interest. It is important, however, to remember that they exist not simply to determine whether or not students do well on the test. Rather, state tests are designed as a way to determine if students are on an academic trajectory toward becoming college and career ready (Conley, 2010). In order to demonstrate readiness, students must show proficiency in several different content domains; one of which is mathematics. Mathematics is important not only in daily life, but is also a necessary competency for technological jobs that exist in increasing numbers in today’s society (Jitendra, 2005). Construct is defined as “the concept or characteristic that a test is designed to measure” (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 1999, p. 173). Using this definition, presumably, the construct represented by a state math assessment is mathematics. However, mathematics, like all major subject areas, is a multi-dimensional construct and therefore, difficult to teach, learn, and 5 measure (Haladyna & Downing, 2004). So although state test programs must report different levels of proficiency, proficiency in mathematics is less than clear. What does it mean to be proficient in math? As previously described, declarative knowledge in mathematics refers to skill competency and efficiency (computation), while developing abilities refer to the use of efficient skills to solve problems in mathematical situations (math problem solving). State math assessments such as the Oregon Assessment of Knowledge and Skills (OAKS- math) focus primarily on the latter (Oregon Department of Education, Office of Assessment and Information Services [ODE], 2012). Because assessments are to measure progress toward college and career readiness, or practical application, the focus on mathematical problem solving over skill competency makes sense. Math problem solving is a complex idea and logically contains two terms: (a) math (related to content) and (b) problem solving (related to either skill or ability). Mathematical content knowledge. In 2001, the Mathematics Learning Study Committee of the National Research Council (NRC) identified five strands of mathematics proficiency. They recognized a need for integrated adaptive reasoning, strategic competence, conceptual understanding, productive disposition, and procedural fluency. The National Mathematics Advisory Panel (NMAP, 2008, pp. xvi-xvii) mirrored these conclusions, indicating a balanced need for a coherent progression of learning coupled with proficiency with key concepts to solve problems (emphasis in original). From a teaching perspective, mathematics would be much easier to teach if only knowledge of key concepts (declarative knowledge) was expected; however, because proficiency means that students are able to synthesize the key concepts and use 6 them to solve problems, skill knowledge is not enough. Further, a hierarchy of skill development in mathematics is not yet clear. The five strands of mathematics proficiency are tightly intertwined at all levels of math learning (NRC, 2001); thus, critical skills and competencies are not easily isolated or measured. In an effort to support content delivery, the NCTM (2000) outlined what they believed to be the knowledge and skills that students must be able to demonstrate at each grade level. Like the strands identified by the Math Learning Study Committee, these competencies were broken into five standard domains including numbers and operations, algebra, measurement, data analysis and probability, and geometry. In each grade band (K-2, 3-5, 6-8, and high school) the NCTM specified what type of skills a student should master; however, every standard area was important (to varying degrees) in every grade band (emphasis added). A sixth standard, process, held the same expectation in all grades. This standard stated that students should demonstrate the ability to problem-solve and more specifically, communicate, prove, reason, make connections, and justify in every mathematical task in every grade. The standards created by the NCTM were influential as states set standards in mathematics, and therefore typically represented (and continue to represent) the content assessed on many state assessments in math. Such is the case in Oregon (ODE, 2012). However, despite the guidance from NCTM, states were not required to use the suggested standards. Therefore, wide variability of state standards existed, which also affected state assessments. According to one study conducted at the Wisconsin Center for Educational Research, the content of statewide mathematics tests appeared quite varied and addressed a number of different domains to different degrees (Webb, 1999). Some states assessed 7 certain domains more than others and based on Webb’s study, the degree to which the number of questions in any domain represented mastery was also in question. The recent move by the NGA Center and the CCSSO attempts to eliminate, or at least minimize, differences between states in both instruction and assessment. Like standards of the past, the CCSS include a number of different mathematical standards including the process standard of problem solving through reasoning, justification, and communication (Common Core State Standards Initiative, 2010). Additionally, this policy movement includes the development of two common assessments used to measure standard achievement (ETS, 2010). Because this movement includes nearly all states, it is likely that math content instruction will become more similar among states than when NCTM initially suggested standards. However, the extent to which mathematical content knowledge will be accurately measured on the new common assessments remains to be seen. Current studies in which researchers examine the predictive validity of curriculum based measures of content knowledge to determine likely success on state assessment outcomes provide a foundation for future replication studies using the CCSS assessments. Once the CCSS assessments are in use, researchers can use these previous studies as models to investigate the construct validity of the new assessment systems as well as the construct relevant and irrelevant variance within them. With this knowledge, teachers and researchers will be able to more accurately identify students who are at risk for failure on state assessments and adjust resources and instruction accordingly to support their path toward college and career readiness. Problem solving ability. As reflected in the math standards creation from the 8 NCTM and the NGA Center and CCSSO, problem solving in mathematics is a concept that is critical to mathematical success. It includes the ability to reason, model, justify, and communicate mathematical ideas (NCTM, 2000). In this way, problem solving is construct relevant because it is embedded in mathematical content. It is a skill that is developed. In the CCSS, the problem solving concepts are described as the “standards for mathematical practice” (CCSSI, 2010, p. 10). These standards require students to: (a) make sense of problems and persevere in solving them, (b) reason abstractly and quantitatively, and (c) look for and express regularity in reasoning. These standards of problem solving are related to mathematical skill because students use understanding of numbers to solve problems. Problem solving is also sometimes referred to as ability (Kaufman, 2009). For example, in reflection of the language outlined in the CCSS, words and phrases like reason, communicate, make sense, and solve problems are concepts that extend well beyond the subject of mathematics. We reason when we decide what route to take when we go to the grocery store, we communicate with one another in different settings and different ways and we are always trying to make sense of the world around us. In this way, problem solving is not content specific but rather, an important ability that we use in every setting every day. Both concepts of problem solving are important because they relate to mathematical testing outcomes; however, as a skill, problem solving is construct relevant and as an ability, it is construct irrelevant. Because both conceptual frameworks may influence outcomes on large-scale assessments, and they can be uniquely measured as described in the next section, they can be considered separately as different variables of interest. 9 Content knowledge and problem solving are sub-constructs represented in state math assessments. They are both construct relevant to mathematics proficiency. Other variables that may influence outcomes but are unrelated directly to mathematics are those that are construct irrelevant. These are more difficult to recognize yet still important to identify. Some of these variables are quantified by performance measures and others are inherent student demographic characteristics. In the next section, construct irrelevant performance variables related to mathematical problem solving assessments are described. Identification of Measurement Construct (Irrelevant) Variables in Math Problem Solving Another line of research indicates that state testing programs test various dimensions of the skills being targeted but also a number of features that are not relevant to the content area of interest (Abedi & Leon, 1999; Abedi, Leon, & Mirocha, 2003). These are considered to be construct irrelevant and create CIV. Two main types of CIV exist: that which exists within a group and that which exists at the individual level. Categories of group or environmental CIV include test preparation methods, test creation, language load, administration, scoring, and cheating (Haladyna & Downing, 2004). In high stakes assessments, the influence of these variables is mediated by extensive test protocols that cover each of these areas. For example, to assure consistency in administration, test designers often use scripted directions during assessment. Teachers do not create these protocols, but do follow them during test administration. In this way, at the group level, teachers have only an indirect control over potential CIV because they are bound by the protocols designed to support assessment. 10 At the individual level however, teachers have direct control over the interpretations made regarding testing outcomes. CIV for individuals might be from variables like general ability or reading proficiency (Haladyna & Downing, 2004). Other student characteristic variables like language facility, socio-economic status, and disability may influence outcomes also, yet are unrelated to the construct of math problem solving (Abedi, Leon, & Mirocha, 2001). Because teachers have direct control over the influence of CIV at the individual level through assessment interpretation, it is important to understand assessments and their influential variables in depth in order to make accurate decisions about student instruction and intervention. Additionally, according to Haladyna and Downing (2004), more research is needed in this topic area specifically to better understand the influence of verbal abilities and accommodations on assessment outcomes. The influence of (g). Problem solving tends to be a construct that has broad reach and can be conflated with intelligence and ability. It is often viewed as a trait that has permanence and is inherent in people (Kaufman, 2009). The history of this concept (particularly intelligence and ability) in the United States began with the first tests used to operationalize the constructs: Stanford-Binet, Wechsler, and most recently Woodcock- Johnson and Kaufman. Most of these tests purport to measure a general trait that dominates other specific abilities (e.g., motor versus verbal or sequential versus simultaneous processing). This trait, or factor, is described as mental intelligence that underlies performance on any cognitive task (Jensen, 2002). This content-free concept of problem solving may be important and possibly related to outcomes on state assessments. Researchers agree that this factor, often referred 11 to as general intelligence, or g (Spearman, 1904), exists and has an interesting correlation between cognitive tasks that would otherwise be unrelated. Specifically in education, g has received much attention over several years and has been shown to be a reliable predictor for success in various academic areas (Brody, 1992; Spearman, 1904). In mathematics, there is evidence that g was highly correlated to math ability outcomes as far back as the early 1900s (Spearman, 1904). Current research indicates a positive correlation between assessments of intelligence and those of math proficiency, and thus problem solving (Fuchs et al., 2006; Hart, Petrill, Plomin, & Thompson, 2009; Mannamaa, Kikas, Peets, & Palu, 2012). The correlations found in studies like these demonstrate a consensus that general intelligence impacts individuals as they complete any cognitive task; and further, the completion of a cognitive task is, at its core, a type of problem solving. However, in mathematics, a well-developed consensus does not exist about the degree to which g might uniquely influence high-stakes academic outcomes such as state assessments of math problem solving for the average student. Some researchers believe that the traditional general intelligence tests using language (verbal or written) are not sensitive to diverse populations (Naglieri & Das, 2002). Nonverbal general ability assessments have emerged as tools, according to their authors, that researchers can use to measure innate problem solving ability (g) for all subjects regardless of diverse background or native language (Naglieri, 1997; Naglieri, 2008; Raven & Raven, 2003; Wechsler, 1999; Wechsler & Naglieri, 2006). In correlational studies, researchers demonstrated a correlation between math outcomes on state and other math and reading assessments and outcomes on nonverbal measures of general ability (Fuchs et al., 2005; Fuchs et al., 2006; Naglieri & Ronning, 2000). The 12 correlations found using these types of measures lend support for researchers to further investigate the influence of general intelligence on academic outcomes, particularly in math, while including subjects that represent diverse populations. With more reliable information regarding the potential link between general ability and math outcomes for all students, researchers and educators can make better decisions as they continue to answer the question of what it means to be proficient in mathematics. By understanding the amount of variance on state math tests that can be attributed to g, teachers will be able to make better instructional decisions to support struggling students in math and researchers will be able to craft more reliable assessment tools to measure proficiency. The influence of reading. Another potentially influential variable to consider is reading ability. As outlined by Haladyna and Downing (2004), reading is a skill that often is more important than it should be in assessments of math problem solving or other content areas. The impact of reading on mathematics outcomes has been documented from several different angles. For example, Abedi, Lord, Hofstetter, and Baker (2000) found that linguistic modification of math items in assessment decreased the gap between language minority and language majority students. Helwig, Rozek-Tedesco, Tindal, Heath, and Almond (1999) drew similar conclusions. In the study, students were given portions of a math assessment in paper-pencil format and the other portion in video format. Student scores were more positive using video presentations of math problems without the requirement of reading. Tindal, Heath, Hollenbeck, Almond, and Harniss (1998) conducted a study using read-aloud as an accommodation for math assessments and found that students performed better when the reading task was eliminated. Both 13 studies highlight reading ability as a basic access skill in mathematics for all students, including those representing diverse populations. One would expect that a disfluent reader would do poorly on a measure of MAZE reading (a short measure of reading comprehension); however, it is less obvious that MAZE measures would positively correlate with measures of math. Various researchers have demonstrated a positive link between outcomes on MAZE reading measures and math measures of problem solving (Jiban & Deno, 2007; Thurber, Shinn, & Smolkowski, 2002; Whitley, 2010). The correlations for MAZE and state testing outcomes were larger than typical in each study and were stronger in the upper grades (fourth and fifth grade) than in the third grade. Additionally, Jiban & Deno (2007) noted that the MAZE task and a task of calculation accounted for much variance in state testing outcomes. These results are evidence that success on math assessments might be controlled to some degree by proficient reading comprehension, particularly in the upper elementary grades. Whitley (2010) found that the correlation between a measure of oral reading fluency and state outcomes in math was nearly the same as the correlation found between a MAZE measure and state testing outcomes in math. Crawford, Tindal, and Stieber (2001) also found moderate correlations between oral reading fluency measures and math achievement. They demonstrated that students who had very low reading fluency were much more likely to not pass the state exam than those who were proficient readers. These two studies highlight the utility of a one-minute measure of reading for the prediction of math outcomes; however Jiban and Deno (2007), argue that this type of measure should be used as only one piece of information to help determine, interpret, and/or predict future outcomes in math [emphasis added]. They demonstrate that single 14 measures do not account for as much variance as do a combination of outcomes to predict future success. Rutherford-Becker and Vanderwood (2009) reported that measures of arithmetic fluency and measures of reading comprehension predicted an applied math outcome better than a measure of oral reading fluency alone. In this study, as in several regarding comprehension variables discussed previously, the subjects were in upper elementary school. A clear consensus regarding oral reading fluency is that as student reading ability grows, which is the case in later elementary years, oral reading fluency becomes less of a valuable predictor for outcomes than measures of comprehension (Fuchs & Fuchs, 1993; Silberglitt, Burns, Madyun, & Lail, 2006). Based on this information, fluency measures may provide the most useful predictive information for teachers and researchers regarding proficiency if subjects are in grades three and below. This also tends to be the time in school when early intervention and identification of special supports for students are most often first implemented. The described studies represent a foundation for the belief that content-free problem solving, often measured by non-verbal ability tests, and reading proficiency, often measured by oral reading fluency probes, may be influential construct irrelevant variables on math problem solving outcomes as measured by state math tests, particularly in early grades. Additional construct irrelevant variables such as student demographic characteristics are also important to recognize and consider when evaluating math outcomes and are further described in the following section. 15 Identification of Student Demographic Construct (Irrelevant) Variables in Math Problem Solving Other variables that may affect outcomes for students include gender, poverty, language facility, and disability. Each variable presents unique, yet related considerations. These are important factors to consider for teachers largely because they cannot be influenced by instruction. The influence of these factors is important for teachers and researchers to understand so assessments are designed to limit the influence of these factors and truly measure aspects of learning over which teachers have direct control. Gender. Although there is evidence that the lack of women in Science, Technology, Engineering, and Mathematics (STEM) fields is a current reality, it does not appear to be due to a lack of assessment achievement in mathematics by females (Beede et al., 2011; Hyde, Lindberg, Linn, Ellis, & Williams, 2008; Scafidi & Bui, 2010). Using state assessment data from several states, in both 2008 and 2010, researchers demonstrated that girls and boys performed relatively equally on measures of mathematics achievement (Hyde et al., 2008; Scafidi & Bui, 2010). Further, Scafidi and Bui demonstrated that this performance was not moderated by participation in other special population categories (ethnicity, ELL, etc.). These studies were conducted in both middle and high schools. Despite the lack of evidence for actual difference in state content assessment performance by males and females, gender seems to be a significant factor related to educational experience and likely instruction based on the fact that boys typically are overrepresented in special education categories. According to Wehmeyer (2001), this 16 could be due to several reasons including biology, behavior, or bias. Wehmeyer conducted a study involving students who initially qualified for special education. Naturally, most of these students were of elementary age at initial referral. Results indicated that IQ was significantly different between males and females (females slightly lower) and males most often had behavioral issues associated with their referral to special education. The study also indicated that the behavioral factors might have created bias for a higher rate of special education referral. From this information, it appears that gender may co-vary with performance indicators such as g as well as non-performance indicators like special education eligibility. It may be an influential, yet construct irrelevant factor to consider when interpreting academic outcomes for students. Poverty. Another demographic variable of influence in educational success is poverty. Students who experience poverty as a group are at increased risk for failure on educational outcome measures. According to a meta-analysis conducted by Sirin (2005), socio-economic status (SES) was positively correlated to outcomes on academic measures. Specifically in the area of math, the correlation was very high when compared to the correlations between SES and outcomes in other academic areas such as reading or science. Other researchers have demonstrated that the influence of poverty on math achievement is significant, especially in the early years (Burnett & Farkas, 2009). This may support the belief that once students are exposed to curriculum and good teaching, math deficits can be outgrown. Another study, though, noted that although students may progress once exposed to teaching, they might not ever catch up to their peers who have higher socio-economic status (Jordan, Kaplan, Olah, & Locuniak, 2006). The federal 17 government identifies students living in poverty (i.e. low SES) as a focus group that receives additional educational resources through Title I due to consistently low performance on academic assessments as compared to students of average wealth. These resources are intended to support additional teachers and materials to deliver instructional interventions to diminish the negative impact of poverty. Again, poverty is unrelated to the construct of mathematics, but appears to be a variable of influence on assessment. Limited English proficiency. In a similar way, language proficiency correlates highly with state testing outcomes. Testing for students who are learning English has been an area of increased focus during the past 15 years. The Individuals with Disabilities Education Act (IDEA) of 1997 required that all students be included in state testing programs. Jamal Abedi has been noted in the literature for several years on the topic of English Language Learners (ELLs). He argued that by using testing results from current assessment systems, educators are in jeopardy of making decisions that have detrimental consequences for this population (Abedi, 2006). For example, reliability and validity information is greatly affected by the fact that typical state assessments are not normed for this particular population. Therefore, assessments may not fairly reflect abilities of students who are learning English. When this happens, educators might make decisions that are unfair for this group of students based on an inaccurate understanding of proficiency. Linguistic difficulty of assessments is another feature that impacts ELLs more than native English speakers. This has been documented in several ways. Abedi (2006) describes features such as long phrasing, complex sentences, unfamiliar vocabulary, and conditional clauses, among others, that present unnecessary and unfair negative bias for 18 ELLs. In one study (Abedi, Lord, Kim-Boscardin, & Miyoshi, 2000), researchers presented an assessment in different formats, including one format with a dictionary and another with a translation of the text. They found that ELLs did much better when they had supports for language than without. Additional studies in which researchers modified the language highlighted the alterations as supportive for ELLs (Abedi & Lord, 2001; Abedi, Lord, & Hofstetter, 1998; Abedi, Lord, & Plummer, 1997). Overall, it appears that ELLs are impacted differently than proficient speakers of English in their ability to demonstrate proficiency in mathematics on assessments. Special education. One reason that the ELL subgroup has gained attention is because students who are learning English are often over identified in special education programs (Sullivan, 2011). In general, both groups (ELLs and students in special education) demonstrate deficits in reading and writing skills when compared to English speaking peers or those without disabilities (Garcia & Tyler, 2010). It is possible that because reading is an access skill to mathematics assessments, both groups also demonstrate lower scores in math. In this way, a danger exists that an uninformed teacher may believe a student has a learning disability in math when, in reality, he or she may be having difficulty with language more than content. In addition to reading disabilities, math disabilities or cognitive impairments are conditions that are likely to negatively affect outcomes on math assessments. It is also important to recognize that not all handicapping conditions pose a threat to state assessment outcomes. These include conditions such as orthopedic impairments and articulation concerns. However, regardless of student characteristic or exceptionality code, students within these 19 subgroups experience differences in resource allocation, scheduling, and peer interactions than students of the majority in schools. Like poverty, it may also be related to other performance indicators (like reading ability) or other non-performance indicators like gender. It’s possible that like gender, special education eligibility may even co-vary with other variables. In fact, it is likely that all of the construct relevant or irrelevant variables interact in different ways. Within the literature, additional difficulties in working memory, processing speed, attention difficulties, and phonological skills are correlates for math disabilities (Fuchs et al., 2005) too. Other variables may influence student outcomes and be of interest to explore but because of the confines of the methodology for this study described in the next section, special education, like other student demographic variables and performance indicators were the only variables investigated intently. As with the other student characteristics, special education eligibility appears to be a construct irrelevant variable to further investigate and consider when making decisions for students. It is important for teachers to be thoughtful when using state assessment data for any student within a special population group. With careful consideration, they can more accurately interpret student assessment results and make sound decisions regarding instruction or intervention needs for students. The Quantification of Construct Relevant and Irrelevant Variables in Math Problem Solving In order to monitor learning fairly, it is important that the monitoring tools are free from bias, equitable to all groups, and produce equal scores for groups that should be equal (AERA, APA, & NCME, 1999). Several researchers over the past 20 years have 20 devised different ways to reach this goal (Abedi & Lord, 2001; Abedi, Lord, & Plummer, 1997). Although attempts to accurately measure math problem solving by reducing CIV are supportive for students, the fact remains that elimination of CIV is unrealistic for large-scale tests, specifically state exams measuring complex constructs like math problem solving through the medium of language. For this reason, it is worthwhile for researchers to identify and quantify relevant and irrelevant variables that may influence outcomes. This way, educators can make more informed decisions regarding test scores, instructional implications, and resource allocation to support students toward the end goal of college and career readiness. Based on the literature previously described and the methodological confines outlined in the next section, three measurement variables may be important to consider in math problem solving. These include (a) content knowledge, (b) general ability, and (c) reading ability/language facility. According to the authors of each assessment, these variables can be measured using three different measures: (a) easyCBM Mathematics Assessment (easyCBM-math), (b) the Naglieri Nonverbal Ability Test-Second Edition (NNAT2), and (c) the Oral Reading Fluency component of the Dynamic Indicators of Basic Early Literacy Skills (DORF). Outcomes on these measures can be compared to outcomes on the Oregon Assessment of Knowledge and Skills (OAKS-math), to determine the variance in math problem solving outcome scores that can be explained by construct relevant and irrelevant performance variables. Content knowledge. As defined by Alonzo, Tindal, Ulmer, and Glasgow (2006), the easyCBM-math assessment tests math content using universal design for assessment (UDA) principles described by Ketterlin-Geller, Alonzo, and Tindal (2004). UDA is an 21 ideal that promotes accessible assessment for all by reducing the influence of any external factors (environment, disability, etc.) that may act as barriers to access and outcomes. In essence, UDA diminishes CIV in assessment. UDA options used on easyCBM-math include the increase of white space on a page, use of fewer answer options, and use of read aloud options. For example, the easyCBM-math in third grade has an average of 3.9 words per question on the first 20 questions. Each question has only three possible answers. This is noticeably different than the practice questions on OAKS-math in third grade. The average number of words per question on the state example test was 15.8 words per question and each question had four possible answer options. Figures 1 and 2 (see Appendix A for Figs. 1-4) visually demonstrate the differences in language load between the two assessments. In technical reports, researchers demonstrate that the easyCBM- math assessment is technically reliable for students in both special and general populations (Nese et al., 2010). Non-verbal, content-free general problem solving ability. According to Naglieri (2008), the NNAT2 is a measure of general ability that requires no language. Each question is presented in a pictorial format without any words on the page. Figure 3 displays what is presented to students during the testing session. As defined by Naglieri (2011), the NNAT2 is designed for the purpose of measuring general problem solving ability for students who have limited language to the same level of accuracy as their language-proficient counterparts. While easyCBM-math can measure content-embedded problem solving, the NNAT2 can help to quantify the concept of general problem solving that is not content specific (as defined by the authors). This understanding can help to 22 distinguish between the influence of problem solving as a skill and problem solving as ability. As with easyCBM-math, technical reports conducted by Naglieri note that this measure is reliable for use with special populations. Reading fluency. According to Good, Gruba and Kaminski (2009), DORF is a measure of oral reading fluency. Each measure consists of a passage with approximately 240 words that a student reads in one minute. Although construct irrelevant, reading is an important access skill in math problem solving assessment and as discussed earlier, researchers have demonstrated a correlation between comprehension measures and outcomes on math assessments (Jiban & Deno, 2007; Thurber et al., 2002; Whitley, 2010). Actually, researchers believe that DORF is a very accurate predictor of future success on both reading assessments and measures of comprehension (Center on Teaching and Learning [CTL], 2012). DORF measures can help to determine how much influence reading proficiency (or lack thereof) can have on outcomes of math problem solving. Knowledge of this influence can help teachers make good decisions regarding instructional supports for students who are struggling in math, particularly those representative of special populations who may also be struggling with the access skill of reading. Demographic variables. Teachers can make instructional decisions to support all students on their path toward college and career readiness when they more fully understand the influence variables have on math assessments. When these influential variables can be altered through teaching, teachers can immediately alter instruction and therefore change the course of success for students. Sometimes, however, these variables cannot be altered through teaching. 23 Although static, it is also important to quantify the influence of demographic variables on math assessment outcomes, particularly those that impact special populations. Ideally, score variance on subject area assessments should be explained by predictive variables that can be altered by instruction. This would highlight the belief that instruction in a subject area is important, influential, and can change outcomes such as those in mathematics. However, if additional variance in assessment scores is explained by demographic characteristics that are not altered by instruction, this may suggest that the outcome measure actually assesses something other than, or in addition to, the learning that occurs in the classroom. This would be important information for teachers and researchers to consider when making any decisions regarding proficiency for students within special populations. If this occurs, it would also be interesting to know to what degree the unique variance explained by each independent measure changes when demographic information is considered. If little or no change occurs, the influence of the various measurement variables (i.e. content, g, and fluency) would continue to be important to consider, regardless of any additional special student factors. Possible outcomes. Each independent variable related to math problem solving assessment can help researchers better understand the various influences on math problem solving assessment and thus better understand the construct of math problem solving itself. Figure 5 (see Appendix B for Figs. 5-6) shows a graphic representation of possible relations between these measurement variables. Any two variables may have a certain degree of correlation; however, in order to better recognize the construct or constructs represented in a math problem solving assessment like OAKS-math, one must consider more than correlations. Variance 24 partitioning is one way to recognize variance explained uniquely and commonly by independent variables within a regression. This type of analysis can also support better understanding as to the similarities that exist between various independent variables. Figure 6 shows a representation of this method of analysis. For example, if students score high on the NNAT2, low on easyCBM-math, and high on OAKS-math, this may suggest that the quality shared by general problem solving ability as defined by NNAT2 and math problem solving as defined by OAKS-math may be much more notable than the quality shared by problem solving as defined by OAKS- math and content knowledge as defined by easyCBM-math. On the contrary, if students score poorly on the NNAT2, high on easyCBM-math, and high on OAKS-math, content may be more similar to what is measured on OAKS-math than what is measured by NNAT2. If students do poorly on DORF, perform highly on NNAT2, average on easyCBM-math and poorly on OAKS-math, this may indicate that reading ability is very influential to OAKS-math outcomes. Further, it may indicate that reading/language instructional supports may be more critical for students who have this need than instructional supports in math content. If students score high or low on all tests, it may mean that all of the tests are equally similar or that one or two in particular are highly similar to what is measured on OAKS-math assessments and this similarity overshadows the dissimilar third variable. Other combinations of outcomes may reveal other important information related to the construct or constructs represented in a math problem solving assessment and variance partitioning could help to determine specific influences on outcomes. 25 Research Questions The purpose of this study is to use an extant data set to better understand the extent to which construct relevant and irrelevant variables explain variance on large-scale, math problem solving assessments. In order to do this, I will use a sequential multiple linear regression with commonality analysis to answer the following two research questions: 1. How much unique and common variance in math problem solving as defined by OAKS-math scores can be accounted for by measures of content-embedded problem solving as defined by easyCBM-math, content-free problem solving as defined by the Naglieri Nonverbal Ability Test (Second Edition), and reading fluency as defined by Dibels Oral Reading Fluency? 2. Is any additional variance explained once student demographic characteristics of gender, FRL status, ELL eligibility, and special education eligibility are controlled, and if so, how does the unique variance accounted for by each performance measure change? 26 CHAPTER II METHODOLOGY As a requirement of the Doctorate of Education program, researchers were required to use extant data sets. Students in the D. Ed. Program intend to be practitioners in the field. As such, extant data are used to answer questions of practice in education. For the purposes of this study, the construct relevant and irrelevant performance and demographic variables related to math problem solving assessment were confined to indicators found within the literature that were also available within the scope of daily data collection at a school district level. This study included extant data from one district in the Pacific Northwest. As previously indicated, performance indicators could have included several other interesting variables such as attention/memory ability or achievement in multiple content areas that may have included reading, math, writing, executive functioning, etc. Additionally, student demographic characteristics could have included school movement, early school entry, instructional tier, instructional grouping, or attendance rates. All of these variables may be of interest for future similar studies; however, based on the literature reviewed and the extant data that were accessible, the analyses in this study included data from four assessment measures (easyCBM-math, DORF, NNAT2, and OAKS-math) and four demographic factors (gender, FRL status, ELL status, and IEP status) gathered during the spring of 2011 and the spring of 2012. According to the authors of each specific assessment, easyCBM-math is a measure of content-embedded problem solving, DORF is a measure of reading fluency, NNAT2 is a measure of general problem solving ability, and OAKS-math is a measure of math problem solving. Each 27 measure is explained in more depth in the materials section. Specific information regarding setting, participants, curriculum, measures, procedures, and analyses will be described in the following sections. Setting The participants in this study attended school in a mid-sized school district in the Pacific Northwest. The district is located in a community representing a large geographical area of over 16,000 square miles and a population of nearly 158,000 residents. It supports over 16,000 students in 27 different schools. In this community, there are 16 elementary schools, 5 middle schools, one K-8 school, and five high schools. The community is rapidly growing and the unemployment rate was 12.0% in December 2011. Participants The participants in this study included all students in the district who took the third grade OAKS-math test during the spring of 2012 and as second grade students, took easyCBM-math, the NNAT2, and DORF during the spring of 2011. This describes 913 students. During the 2011-2012 school year, there were 629 males (49.0% of the population) and 654 females (51.0% of the population) in the second grade. Students who identified as Caucasian represented 85.9% of the population. The next highest majority group was represented by those who identified as Hispanic or Latino, (10.2%), while other ethnic groups made up the remaining 3.9%. The district Talented and Gifted (TAG) percentage was equal to approximately 7% of the student population at the time of this study. 28 In the reviewed literature, demographic characteristics of gender, socio-economic status (represented by Free or Reduced lunch qualification, “FRL”), English language learner status (ELL), and special education eligibility (IEP) represent subgroups that are impacted in different ways by academic assessments. Because of this, these groups were specifically investigated. For this study, students were considered part of the FRL and IEP subgroup if they had qualified for the subgroup at anytime during either spring of second grade or spring of third grade since this categorization would entitle students to differences in service allocation and support during their time of qualification. Students who were considered ELL students had at least one score of 4 or less on the English Language Proficiency Assessment (ELPA) during either spring of second grade or spring of third grade. Those scoring at level 5 were grouped as if they did not qualify for services because they did not receive specific supports or instruction for needs in English Language Development (ELD) during the time of the study. Once all cases that did not have complete data (i.e. all four test scores) were excluded, the study sample contained 913 valid cases. Excluded students (i.e. missing a score for any measure) were investigated in order to determine any common characteristics. Possible reasons for exclusion included a lack of access to the assessments due to very low cognitive ability or absenteeism during a testing window. Specific information regarding students who were not included is displayed in the data collection and subject selection section. Curriculum In the spring of 2009, the district adopted the Bridges In Mathematics (Bridges) curriculum published by the Math Learning Center. The students in this study were 29 exposed to the Bridges curriculum during second grade. In kindergarten and first grade, teachers used the previous adoption of Investigations in Number, Data, and Space curriculum published by Pearson Education, Inc. According to a district official, during the 2010-2011 school year the district had not yet implemented district-wide interventions for mathematics and the district focus was on implementation of the general classroom curriculum with fidelity (L. Nordquist, personal communication, February 7, 2012). During the time of this study, school district agreements existed regarding time spent in mathematics instruction. In the elementary schools, all students in all-day kindergarten through grade five participated in 60 minutes of math instruction with an additional 15 minutes for Number Corner (another component of the Bridges program) each day. These instructional agreements began along with the new math adoption in the fall of 2010. Prior to this, agreements did not exist regarding time spent in direct subject instruction. Materials The students in this study took (a) a content-embedded test of mathematics problem solving and skill with limited language (easyCBM-math), (b) a non-verbal measure of content-free general problem solving ability (NNAT2), (c) a measure of the access skill of oral reading fluency (ORF), and (d) the third grade OAKS-math. Students took the first three tests in the spring of second grade and the latter in the spring of third grade. Each assessment was developed by different groups of researchers within the past six years. All of the assessments have established reliability and validity described in 30 detail in the following sections. Researchers used them in previous studies regarding achievement outcomes with the exception of the NNAT2. The first edition of the NNAT, a paper-pencil test, has only been used in one study relating to math achievement to date. easyCBM-math second grade spring benchmark assessment. All students within the sample took a measure called the easyCBM-math second grade spring benchmark, which according to the authors is a measure of content-based problem solving with a light language load (Alonzo, Lai, & Tindal, 2009). Teachers use this measure, like other curriculum-based measures, to identify students who are at risk for educational failure in the area of mathematics. According to Nese et al. (2010), this assessment represents an adequate general outcome measure of math content knowledge and typically correlates well with large-scale math assessments. In the district, building testing coordinators conduct the assessment three times each year via computer in grades one through eight. Although available in the fall, testing coordinators typically administer the kindergarten assessment beginning in the winter and then again in the spring. For the purposes of this study, participants were included if they were second grade students in the spring of 2011 and completed the second grade spring benchmark assessment. Although the easyCBM measure has language, researchers designed the assessment so that the language impact is minimal (Alonzo et al., 2009). As previously described, it uses UDA principles that are shown to be supportive for all students. There are fewer words on this test than on a state math assessment (average 3.9 vs. 15.8 per question). Additionally, white space is increased and language is simplified (see Figure 1). 31 The test consists of 45 multiple-choice questions related to the focal point standards for each grade outlined by the National Council of Teachers of Mathematics (Alonzo et al., 2006). In Oregon, state standard developers used the focal points as a primary document when creating the Oregon state content standards in mathematics (ODE, 2012). Therefore, although the easyCBM math measure is not directly written to match the Oregon state standards, the content is generally the same in terms of focus and emphasis. To investigate technical adequacy of the easyCBM math measures, researchers used split-half reliability and Cronbach’s alpha to represent internal consistency reliability. Using a sample of 283 subject responses, Cronbach’s alpha was .82. The split- half coefficient was .79. This demonstrates that the easyCBM math measure has adequate reliability as a measure for which it is described (Anderson et al., 2010; Anderson, Alonzo, & Tindal, 2010a; Anderson, Alonzo, & Tindal, 2010b). Anderson et al. (2010a, 2010b) also reported on criterion and content validity evidence for easyCBM math. In order to demonstrate criterion validity, researchers determined the relation between the easyCBM math questions and math questions on the TerraNova assessment. In second grade, the three easyCBM math benchmark measures accounted for 66 percent of the variance in the TerraNova score. This was statistically significant. The relation between the spring benchmark score and the TerraNova math score demonstrated concurrent validity. The spring benchmark score accounted for 51 % of the variance in the TerraNova math score (again, statistically significant). To demonstrate content validity, the researchers conducted a Rasch analysis and a confirmatory factor analysis (CFA). All but six spring items had a mean square outfit 32 between 0.7 and 1.3, with most between 0.8 and 1.2. This is considered adequate content validity for high-stakes test items. Based on the unique qualities of this assessment described by the authors, for this study, the second grade spring benchmark form was used as a measurement of content- embedded problem solving with limited language influence. Because it represents mathematical content, it is thought to be construct relevant: related to the construct of mathematical problem solving. NNAT2 second grade spring assessment. The NNAT2 is a measure free of language. This assessment is described as a measure of general problem solving ability. As described on the publisher’s website, the test “uses progressive matrices to allow for a culturally neutral evaluation of students’ nonverbal reasoning and general problem- solving ability, regardless of the individual student’s primary language, education, culture or socioeconomic background” (Pearson, 2012). The author notes that prerequisite skills are not required for the assessment (Naglieri, 2011). The assessment system has seven different levels; however, each student within this study was given test level C. Levels A-G loosely correlate with the grade in which the student receives instruction. An example of a question on the NNAT2 is shown in Figure 3. The computer generates scores in several different formats. These include stanine score, percentile rank, ability index (standardized score), and a scaled score. For this study, the ability index was used. A sample student report is attached (see Figure 4). According to the information regarding updated norms (Naglieri, 2011), researchers used split-half reliability and Kuder-Richardson Formula 20 (KR20) to evaluate internal consistency. Using a sample of 99,004 subjects, the mean was 100.0 33 with a standard deviation of 16.0. The split-half coefficient was .90 and the KR20 was .88. The Standard Error of Measurement (SEM) was 5.22. This demonstrates that NNAT2 has adequate reliability as a measure for which it is described. The manual also refers to studies of validity. Researchers correlated mean scores from the previous version of the assessment to those of the NNAT2. The correlation between tests was .998, indicating a very high level of performance consistency across measures. Further, researchers conducted a correlation between the NNAT2 and the Wechsler Nonverbal Scale of Ability (WNV) (Wechsler & Naglieri, 2006) using a subgroup of gifted students who were part of the updated NNAT2 norms study. The 2011 Naglieri Ability Index (NAI) scores were highly correlated to the WNV Full Scale scores and T-scores (.74). There was also a high correlation between the NNAT2 and Matrices indicating the measurement of similar constructs between tests (Balboni, Naglieri, & Cubelli, 2010). It is important to note that Naglieri himself conducted the technical adequacy studies for this measure so the claims should be interpreted with caution. However, this measure is an unusual performance indicator rarely accessible to researchers on a large scale. Most often, measures of general intelligence or general ability are only administered to students who are part of an evaluation for specialized services. In this particular district, this assessment is used as one measure to identify students with intellectual giftedness and as such, it is given to every second grade student near the end of the year. Because of this situational convenience, the results of this measure can be further used to investigate the influence of content-free general problem solving ability on assessment outcomes. 34 For this study, the NNAT2 was used as a pure measure of problem solving ability unrelated to content. This is thought to be one of the two major concepts that underlie proficiency on state assessments, the other skill proficiency. Because this measure does not represent mathematical content, it could be considered construct irrelevant. However, because this study examines mathematical problem solving as it relates to assessment, this test highlights problem solving as a potentially influential construct relevant variable of interest. Study analysis will help to determine the relevant influence of pure problem solving on mathematical problem solving assessment outcomes. DORF second grade spring benchmark assessment. A DORF measure was used to determine the potential influence of the construct irrelevant variable of reading ability on math outcomes. Although only a measure of fluency, researchers have demonstrated that this brief measure alone accounts for as much variance on reading performance outcomes as multiple reading measures combined (CTL, 2012). Further, other researchers have demonstrated a relation between reading performance and math performance on state assessments (Jiban & Deno, 2007) that may be due to the importance of this skill for access to large-scale assessments. The DORF benchmark measure is administered three times each year in the district. The benchmark assessment consists of three one-minute passages (approximately 240 words long) that a student reads. The tester reads scripted directions before and during each administration. For this study, I used the highest score out of the three passages delivered in the spring. Researchers used alternate form, test-retest, and inter-rater reliability to represent DORF as a reliable tool to measure reading fluency. The coefficients of .96, .91, and .99 35 represented very high reliability. The concurrent validity coefficient was .73. This was significant at the .001 level, and was a large effect size. For this study, the spring benchmark measure in second grade was used as a representative measure of reading proficiency that is consistently shown to be an influential variable (although construct irrelevant) on outcomes in math assessments, both predictively and concurrently (Lamb, 2010). OAKS-math third grade assessment. Teachers administer the OAKS-math during the spring of each school year beginning in third grade. This assessment is designed to measure proficiency in the area of mathematics related to third grade standards. On this assessment, students use skill efficiency and content comprehension to understand and solve problems in mathematical situations. For this study, this is the operational definition used for math problem solving and this test was the measure that represented the multi-faceted construct. Students were able to take the measure up to three times during the test window. For this study, I used the highest score received on the assessment during the testing window. Researchers continue to reexamine reliability and validity information for this assessment. According to an official from the Oregon Department of Education, they use standard error of measurement for reliability evidence and they explain test development practices for validity evidence. According to this official, the assessment is technically adequate to measure mathematics proficiency in the area of problem solving (S. Slater, personal communication, September 4, 2012). In this study, this assessment was used as the dependent variable representing mathematical problem solving in assessment. 36 Procedures The district had many assessment training and administration protocols in place to support the best possible testing opportunities for students. Because the district is quite large, each building used a testing coordinator to support the administration of assessments at the sites. The assessment administration and training procedures used for each test are described in the following sections. Assessment administration and training procedures. Each assessment had specific instructions that were followed as part of a district training given to site testing coordinators. Many of the testing coordinators had delivered each assessment for more than one year and needed few supports; however, the district testing coordinator was available for any additional questions regarding administration. The district testing coordinator also opened and closed the benchmark window for each assessment. For easyCBM-math, it was expected that testing coordinators in each school read the easyCBM Teacher Manual prior to administering the assessment. The manual contains answers to many common questions teachers have during testing. Students had unlimited time to complete the measure, although the typical student finished the assessment in approximately 25 minutes. Depending on the building resources, as suggested by the district coordinator, the assessment was delivered either in a lab setting (whole class) or in small groups in the classroom on laptop computers. For this particular test, the assessment window was May 16, 2011 to June 3, 2011. Testing coordinators administered the NNAT2 assessment on the computer to all second grade students during the spring of 2011. All testing coordinators received training on administration protocols by the district assessment coordinator prior to giving 37 the assessment. Assessment coordinators have the option of using Spanish as an accommodation instead of English for their verbal directions if they believe that it would benefit the student. However, because pictorial options are also available it is rare that assessment coordinators use this accommodation. The average assessment takes approximately 30 minutes. This assessment, as suggested by the district testing coordinator, was delivered in either a lab setting or on laptops in small groups. The window for this test was May 2, 2011 to June 10, 2011. Testing coordinators participated in DORF trainings prior to the testing window with staff members who administered the measure. A reading specialist, special education teacher, or building testing coordinator conducted these trainings. The focus of the training was to review protocols outlined by the assessment system, to practice using the measure, and to calibrate scores between those who would deliver the assessment. These assessments were delivered individually using paper copies of passages during the testing window from May 16, 2011 to June 3, 2011. Assessment proctors kept scores in assessment booklets for each individual student. Scores were then entered into the DIBELS database from which the district testing coordinator gathered results for each school. The OAKS-math assessment had very strict training guidelines to ensure the secrecy of testing items and to support fair and equitable testing practices. Each school testing coordinator participated in a state-standardized training delivered by the district assessment coordinator. School coordinators then trained teachers in the building who delivered the assessment. Each person involved in the testing completed applicable training modules, read the test administration manual, and signed a test security form. By 38 signing the form, individuals verified that they had completed the module trainings and readings. The only individuals allowed in testing areas were district employees who had completed the training. Trained testers administered the OAKS-math measure, either on laptops within classrooms or in a computer lab. Some students were given the assessment in a small group or individual setting based on accommodations determined by an accommodation team or listed on their Individual Education Plan (IEP) or Section 504 Plan. The assessment consisted of approximately 45 questions and typically took about 75 minutes to complete. This test was most frequently delivered during more than one testing opportunity and students had unlimited time to complete the assessment during the assessment window. The window for this test was November 8, 2011 to May 17, 2012. For this study, the highest score attained during the testing window was used. Data collection and subject selection. In this study, extant data were used. The school district initiated the collection of data for each assessment during the window each was conducted and therefore, an exemption from the Independent Research Board was requested. Once approved, the appropriate data set from the district assessment coordinator was requested in early December 2012. The coordinator collected all of the data and converted student names to numbers to protect any sensitive student information beyond the scope of this study. The data set was received in January 2013. In addition to scores of each measurement variable, the data file included demographic information regarding gender, free and reduced lunch eligibility, language proficiency level and special education eligibility. These data were drawn from district records on June 10, 2011 and June 10, 2012. These dates were chosen because all the 39 testing windows were complete. As previously described, students were considered participants in the free or reduced lunch category or special education category if they had participated in the program at any time in second or third grade. Additionally, students who had a score of 4 or lower on ELPA at any time in second or third grade were considered in the ELL subgroup. These decisions were made based on access to special services or resource allocations during the time of the study. All information remained confidential using guidelines outlined by the American Psychological Association (APA) to maintain records. As previously mentioned, there were 913 students who took all four assessments. Missing cases were investigated to determine commonalities among non-participants. Information regarding valid cases and those missing for each demographic group and each independent variable is displayed in Tables 1-4. Table 1 Valid and Missing Test Data by Gender Gender Cases Valid Missing n Percent n Percent OAKS Female 541 100.0 0 .0 Male 510 100.0 0 .0 CBM Female 484 89.5 57 10.5 Male 446 87.5 64 12.5 NNAT Female 516 95.4 25 4.6 Male 490 96.1 20 3.9 DORF Female 521 96.3 20 3.7 Male 493 96.7 17 3.3 40 Table 2 Valid and Missing Test Data by Free or Reduced Lunch Status (FRL) FRL Cases Valid Missing n Percent n Percent OAKS No 469 100.0 0 .0 Yes 582 100.0 0 .0 easyCBM No 428 91.3 41 8.7 Yes 502 86.3 80 13.7 NNAT2 No 452 96.4 17 3.6 Yes 554 95.2 28 4.8 DORF No 452 96.4 17 3.6 Yes 562 96.6 20 3.4 Table 3 Valid and Missing Test Data by Special Education Eligibility SPED Cases Valid Missing n Percent n Percent OAKS No 901 100.0 0 .0 Yes 150 100.0 0 .0 easyCBM No 799 88.7 102 11.3 Yes 131 87.3 19 12.7 NNAT2 No 865 96.0 36 4.0 Yes 141 94.0 9 6.0 DORF No 870 96.6 31 3.4 Yes 144 96.0 6 4.0 41 The numbers of missing cases for the easyCBM, NNAT2, and DORF were 121, 45, and 37, respectively. The percentage of missing cases in each demographic group (gender, FRL, IEP, ELL) was nearly the same as the percentage of those missing cases not part of the demographic group. In the original sample of students who took OAKS- math, 50% qualified for free or reduced lunch, 13% qualified for special education, six percent qualified for English Language Learner services, and 48% were male. In the actual sample (including those who took all four assessments), 53% qualified for free or reduced lunch, 14% qualified for special education, six percent qualified for ELL services and 48% were male. These percentages represent 488, 128, 52, and 438 students, respectively. This demonstrates that students who did not take part in an assessment were not markedly different than those who took the assessment in terms of demographic representation. Table 4 Valid and Missing Test Data by ELL Qualification ELL Cases Valid Missing n Percent n Percent OAKS No 992 100.0 0 .0 Yes 59 100.0 0 .0 easyCBM No 877 88.4 115 11.6 Yes 53 89.8 6 10.2 NNAT2 No 949 95.7 43 4.3 Yes 57 96.6 2 3.4 DORF No 956 96.4 36 3.6 Yes 58 98.3 1 1.7 42 Analyses. The analyses addressed the unique and common variance in OAKS- math scores that could be accounted for by three independent performance measures. Measures included (a) content-embedded problem solving, (b) content-free problem solving, and (c) reading fluency. Each measure represented a construct relevant or irrelevant variable of interest in the assessment of mathematics problem solving. First, descriptive statistics for each measure were outlined including correlation coefficients for the measures related to one another. Next, step one of a sequential multiple linear regression was conducted to determine the amount of variance explained in the dependent variable by the independent variables. In order to investigate the explained variance fully, a commonality analysis was then used to partition the variance into that unique to each variable and that common to two or three variables together. These variances were determined using the following formula: U(1) = R2y.123 – R2y.23 U(2) = R2y.123 – R2y.13 U(3) = R2y.123 – R2y.12 C(12) = R2y.13 + R2y.23 – R2y.3 – R2y.123 C(23) = R2y.12 + R2y.23 – R2y.2 – R2y.123 C(13) = R2y.12 + R2y.13 – R2y.1 – R2y.123 C(123) = R2y.123 + R2y.1 + R2y.2 + R2y.3 – R2y.12 – R2y.13 – R2y.23 where the numbers represent predictor variables (1=easyCBM-math, 2=NNAT2, 3=DORF) and U/C represent unique and common variance respectively (Nimon, Lewis, Kane, & Haynes, 2008). 43 A second step within the sequential multiple linear regression was used to determine if any additional variance was explained once student demographic characteristics were controlled. The unique variance accounted for by the performance measures in the second step was compared to that explained in the first step. These analyses provided information about which variables explained the most variance in performance on a measure of math problem solving (OAKS-math). Additionally, this information provided insight to the represented constructs in math problem solving assessment and the relative importance of each independent variable for success on state outcome measures. Finally, the analyses provided information about the extent to which inherent student demographic characteristics influence outcomes on state assessments in mathematics. 44 CHAPTER III RESULTS Descriptive statistics were calculated for each of the variables in order to determine normal distribution. Correlations were also calculated between all variables. Next, two multiple regression models were run to determine the variance explained by a model including testing variables, followed by a model to determine the additional variance explained by any demographic or non-performance indicator. After the first step in the regression, a commonality analysis was used to determine the amount of variance explained by each variable uniquely, as well as the common variance explained jointly by the variables. Descriptive Statistics Descriptive statistics for each variable, as well as intercorrelations, are displayed in Table 5. Each variable had normal distribution (skewness between -1.0 and 1.0). As the correlation values show, all variables were significantly correlated with OAKS-math scores as well as one another (.36 - .71). OAKS-math scores and easyCBM-math scores were most highly correlated (.71) and NNAT2 achievement index scores and DORF scores had the lowest correlation (.36). Table 5 Means, Standard Deviations, and Intercorrelations for Variables in Math Problem Solving Variable M SD OAKS easyCBM NNAT2 DORF OAKS 217.30 9.829 --- .71*** .60*** .51*** easyCBM 36.57 7.478 --- .58*** .49*** NNAT2 99.75 13.185 --- .36*** DORF 106.68 39.471 --- ***p < .001. 45 Analysis One: Performance Measures A sequential regression was conducted to determine the degree to which each independent construct relevant or irrelevant performance measure predicted OAKS-math scores in third grade (Table 6). After the first step, a commonality analysis was conducted to determine the unique and common variance accounted for by each measure and measures in combination (Table 7). Table 6 Sequential Regression Analysis Predicting OAKS-math from easyCBM, NNAT2, and DORF Step and Predictor B SE β t R2 Adj. R2 r sr Step 1 170.30 1.16 105.35*** .58 .58 easyCBM .61 .04 .46 16.22*** .71 .35 NNAT2 .20 .02 .27 10.12*** .60 .22 DORF .05 .01 .18 7.33*** .51 .16 Note. sr = semipartial correlation coefficient. N = 913. ***p < .001. Table 7 Variance Partition of R2 = 58.1% with easyCBM, NNAT2, and DORF (N=913) U/C easyCBM (T1) NNAT2 (T2) DORF (T3) R2 Partition U1 12.11% 12.11% U2 4.71% 4.71% U3 2.46% 2.46% C1, 2 15.74% 15.74% 15.74% C1, 3 7.13% 7.13% 7.13% C2, 3 0.76% 0.76% 0.76% C1, 2, 3 15.14% 15.14% 15.14% 15.14% Sum = r2 50.12% 36.35% 25.49% -- Sum = R2 58.05% 46 Sequential regression results indicated that each variable (easyCBM-math, NNAT2, DORF) significantly predicted OAKS-math scores and that together they explained 58.1% of the variance in OAKS-math, F(3, 909) = 419.70, p < .001. Each factor had a positive effect on OAKS-math. For each point increase in easyCBM, an increase of .61 in OAKS-math was predicted, t = 16.22, p < .001, 95% CI [.53, .68]. For each point increase in NNAT2, an increase of .20 in OAKS-math was predicted, t = 10.12, p < .001, 95% CI [.16, .24]. For each point increase in DORF, and increase of .05 in OAKS-math was predicted, t = 7.33, p < .001, 95% CI [.03, .06]. Results from the commonality analysis (variance partitioning) revealed that easyCBM-math and NNAT2 uniquely explained 12.11% and 4.71% of the variance (respectively) in OAKS-math scores. CBM and DORF jointly explained 7.13% of the variance in OAKS-math. The largest variance partitioning percentages came from the variance explained by all three variables commonly (15.14%) and from the variance explained jointly by easyCBM-math and NNAT2 (15.74%). The lowest variance partitioning percentages came from DORF uniquely (2.46%) and the variance explained jointly by NNAT2 and DORF (.76%). Figures 6 and 7 represent a visual display of the partitioning of variance. Analysis Two: Measures with Student Demographic Characteristics To determine if any additional variance was explained once student demographic characteristics were controlled, a second step was added to the sequential regression including variables of gender, FRL, ELL, and IEP (Table 8). 47 Table 8 Sequential Regression Analysis Predicting OAKS-math from Performance and Non- performance Indicators Step and Predictor B SE β t R2 Adj. R2 r sr Step 1 170.30 1.62 105.35*** .58 .58 easyCBM .61 .04 .46 16.22*** .71 .35 NNAT2 .20 .02 .27 10.12*** .60 .22 DORF .05 .01 .18 7.33*** .51 .16 Step 2 172.29 1.80 95.87*** .60 .59 easyCBM .55 .04 .42 14.49*** .71 .31 NNAT2 .20 .02 .27 10.23*** .60 .22 DORF .04 .01 .18 6.76*** .51 .14 FRL -1.37 .46 -.07 -3.00** -.33 -.06 ELL -.97 .96 -.02 -1.02 -.25 -.02 IEP -.56 .63 -.02 -.88 -.14 -.02 Gender 2.02 .43 .10 4.72*** .14 .10 Note. sr = semipartial correlation coefficient. N = 913. **p < .01. ***p < .001. Results of the second step of the sequential regression indicated that, controlling for non-performance indicators, the model as a whole was a significant predictor of OAKS-math scores, R2 = .60, F(7, 905) = 189.97, p < .001. A closer investigation revealed that although the control of demographic variables added statistically significant predictive power to the model, R2 change = .01, F(4, 905) = 7.99, p < .001, only two of the added variables were significantly influential (FRL and gender). Qualification into FRL status had a negative impact on OAKS-math, and males had higher scores (β = -.06 and .10 respectively). Although these variables added predictive power, they explained very little unique variance (sr2) in OAKS-math. In fact, the control of demographic variables only accounted for an additional 1.4% explained variance in OAKS-math scores. 48 To determine how the unique variance accounted for by each independent performance measure changed once demographic variables were controlled, the semipartial correlation coefficients were squared. These were then compared with the original unique variances attained from the first step in the regression. Results are displayed in Table 9. Table 9 Comparison of Unique Variance Attributed to Performance Variables Before and After Control of Demographic Variables Variance Predictor Step 1 Step 2 Δ Variance Relative Δ Variance easyCBM 12.11% 9.36% - 2.75% -22.71% NNAT2 4.71% 4.67% - .04% -0.85% DORF 2.46% 2.04% - .42% -17.07% Note. Relative Δ Variance = Δ Variance/Step 1 Variance In all cases, the unique variance accounted for by each independent performance variable decreased when demographic variables were controlled. The variance accounted for by easyCBM-math, which started with the largest amount of explained variance attributed to it decreased the most, both actually and relatively. NNAT2 decreased the least (- .04%). Reduction in uniquely explained variance is to be expected when additional variables are added into a model: the more variables, the less opportunity for uniquely explained variance. 49 CHAPTER IV DISCUSSION This study highlighted specific performance and non-performance variables as influential factors for outcomes on high-stakes assessment measures of math problem solving as defined by OAKS-math. In the first analysis, mathematical content knowledge, content-free problem solving ability, and oral reading fluency were used as independent performance variables. In the second analysis, non-performance variables added to the model were gender, FRL, ELL, and IEP status. In the following sections a summary of outcomes is provided followed by a discussion of the limitations for this particular study. In the interpretations section, this study is compared and contrasted with previous research and important topics regarding the use of predictive measures in assessment and construct definition are discussed. The last section contains a discussion of practical considerations and areas for future research. Summary The purpose of this study was to provide additional research on the underrepresented topic of construct validity in large-scale assessments: specifically construct relevant and irrelevant variance as it relates to the assessment of math problem solving. To do this, a sequential multiple linear regression was conducted to determine the relative predictive nature of various performance variables (both construct relevant and irrelevant) to large-scale math assessment outcomes. This was followed by variance partitioning to further understand the unique variance in OAKS-math explained by each variable as well as the variance explained by characteristics held commonly between variables. Next, another regression was used to examine if by controlling for 50 demographic variables more variance in OAKS-math could be explained. Each analysis was conducted in order to better understand the construct of math problem solving as it relates to assessment. As complex constructs, such as math problem solving, are more clearly defined, decisions regarding use of the assessment results can become more valid. This is of particular importance when high-stakes educational decision-making happens based on assessment outcomes. Further, a more complete understanding of systematic error in assessment may allow for better assessment design, thus leading to more accurate assessment results and interpretations (Haladyna & Downing, 2004). Studies such as this also provide a foundation for future investigation of mathematical problem solving and how the construct can best be assessed. The four assessments (easyCBM-math, NNAT2, DORF, and OAKS-math) were strongly positively correlated to one another. Each correlation was significant at the p < .001 level; however, the correlation was strongest between easyCBM-math and OAKS- math (r = .71) and weakest between NNAT2 and DORF (r = .36). This makes sense in terms of construct representation. The easyCBM-math assessment and OAKS-math clearly represent math content knowledge in assessment, while DORF and NNAT2 represent what appear to be two relatively different constructs: reading fluency and non- verbal problem-solving (Good et. al., 2009; Pearson, 2012). It is inappropriate and beyond the scope of this study to comment on causation among these variables; however, the significant correlations between and among each performance variable indicate that students who do well on one of the assessments will likely do well on another, regardless of the represented construct. Often, high correlations such as these also pose a threat for multicollinearity, which I will discuss in the following section. 51 Figure 7. Commonality analysis results. This figure illustrates the unique and common variance explained in OAKS-math by three different performance measures. Variance was partitioned using a commonality analysis. U = unique variance, C= common variance. Note. Figure not drawn to scale. As displayed in Figure 7, the results of the first analysis indicated that a large amount of variance in OAKS-math (58.1%) was explained by the three independent measures taken together. Additionally, the unique variance contributed by easyCBM- math (12.1%) was more than that contributed by any other performance variable alone. This finding demonstrates that the uniqueness of easyCBM-math, possibly attributed to mathematical content knowledge (Alonzo et al., 2006; ODE, 2012), is more similar to the 52 construct measured on the OAKS-math than any of the other assessments’ unique constructs. Another interesting finding was that the OAKS-math variance explained by combining measures of content knowledge (easyCBM) and problem-solving ability (NNAT2) was more than any other, unique or common (15.74%). This finding suggests that the quality shared by innate problem solving ability as measured by NNAT2 and content knowledge as measured by easyCBM-math is also a quality foundational to math problem solving as measured by OAKS-math. The other variables (NNAT2 and DORF) uniquely explained smaller amounts of variance (4.71% and 2.74%, respectively). The results of the second analysis indicated that, in general, demographic characteristics did not add much to the variance explained by performance indicators alone. Technically, FRL and gender together accounted for another 1.4% of the variance in OAKS-math scores. While this result is, from a technical standpoint, statistically significant, it is not very interesting. More specifically, the unstandardized beta values for each of the performance indicators did not change from Step 1 of the model to Step 2. This indicates that the variance explained by each of the performance indicators was virtually unaffected by the addition of demographic characteristics to the model. Based on this stability, one could conclude with confidence that gender, FRL, ELL, and IEP status have very little impact on outcomes of math problem solving once math content and problem-solving abilities are controlled, which is what one would hope. After all, state assessments should not be measures of demographics. Once demographics were controlled, the unique variance explained by each performance indicator was compared to the variance explained prior to demographic control. The variance explained by each independent measure went down slightly. From a 53 technical standpoint, this result would be expected because whenever more variables are entered into a model, the unique variance attributed to each factor is likely to decrease. The variance attributed to NNAT2 changed the least, while the variance attributed to easyCBM-math changed the most (-0.04% and - 2.75%, respectively). The results of this analysis are more thoroughly interpreted in a following section. Limitations As with any study, limitations to the internal consistency and generalizability exist. These include instructional considerations, mortality, extant data use, demographic representation, grade representation, and statistical conclusion validity. Threats to internal and external validity are outlined in the following sections. Threats to internal validity. A threat to internal validity was instructional controls. For this study, there was no control over the instruction that students received during the year. This threat is important to consider because nearly a year of instruction took place between administration of the independent measures and the OAKS-math test. As explained in Chapter II, the district established instructional agreements for the amount of minimum time that students received instruction in the core mathematics curriculum. Any additional time spent in instruction, including instruction delivered in small groups or individually due to IEP needs, was not investigated. Not only was time in additional instruction not investigated, the quality of instruction due to difference in teachers was not considered. Both of these factors (time and instruction quality) may have affected the results in OAKS-math scores in different ways that without additional study will remain unknown. 54 Threats to external validity. In this study, 1116 students were part of the original data sample. Only 913 subjects had complete data, meaning they completed all four assessments. This means that the mortality for this study was 203 subjects. The demographic characteristics of the missing cases were outlined in previously displayed Tables 1-4. From these analyses, it appears that subjects not included in the study sample were not unlike those included, meaning that there was little evidence to suggest that students were excluded for specific demographic reasons. As minimum criteria, students were only considered to be part of the original data sample if they had taken the OAKS-math assessment in the spring of third grade, as this was the dependent variable. For this reason, there are no missing cases listed under the OAKS-math category. Students who did not take part in OAKS-math would undoubtedly be markedly different than those taking the assessment because exclusion from this test most frequently is due to the inability to access the assessment due to extreme educational needs. These students most often qualify for special education and have alternate assessment plans. For each assessment, normal distribution was investigated. Figures 8-11 in Appendix C show normal distributions for each of the variables. Each was relatively normal without skewness or kurtosis issues of concern. However, the loss of scores in each assessment does impact statistical conclusion validity. If fewer students had incomplete assessment scores, the statistics found in the analyses would be more complete. Another limitation is due to the use of an extant data set. Because of the confines of these previously gathered data, I was only able to investigate the influence of the 55 performance and non-performance indicators described in the study. Although this study is relatively small in scope, it does provide a basis for replication using other influential variables. The lack of subject diversity is another threat. The community from which these results were drawn was relatively homogeneous. Based on district information, few students represent ethnic or racial categories different than the Caucasian majority; however, in this study, gender, FRL, ELL, and IEP were the only demographic categories investigated and therefore are the only categories that can be discussed. Of the 913 subjects, 488 students qualified for FRL, 52 qualified for ELL services, and 128 qualified for special education services. In the case of ELL and IEP qualification, these numbers represent only a fraction of the entire population (5.7% and 14%, respectively). Because the numbers were so small, the ELL levels and special education handicapping conditions were not broken into separate categories. With a larger, more diverse sample, the impact of these levels and conditions could have been more thoroughly represented. Another threat to the generalizability of this study is the single grade level focus. For this study, the OAKS-math assessment in third grade was used as the dependent variable. As with all state academic assessments, OAKS-math has questions regarding the knowledge and skills that students should have mastered by the end of third grade (ODE, 2012). The state standards for third grade differ from those in other grade levels. They also differ from standards in other states (Webb, 1999). Future similar studies using common standards should reduce state-to-state differences; however, content mastery will remain different at each grade level. It is also possible that as grade levels increase, the correlations among the independent variables chosen for this study and state outcome 56 assessments at other grade levels may differ. Long-term studies investigating the link between these variables and large-scale assessment outcomes in each grade will help to more completely examine the stability of influence in all grades. Finally, because all measures were highly correlated, multicollinearity could be considered an issue of concern. In typical regression models, this creates a problem because it becomes difficult to determine what variables account for the variance in outcomes. In this particular study, Tolerance values were greater than .42 for each predictor, so multicollinearity was not an issue; however, this is a problem that is frequently recognized in social science research. One way to minimize this threat is to analyze data using some form of variance partitioning (Zientek & Thompson, 2006). In this study, a commonality analysis was used to support the analysis of predictive and influential variables on OAKS-math outcomes. In the next section, I discuss the use of variance partitioning to support interpretations as well as other interpretations based on the results of this study. Interpretations Educational accountability continues to be a topic of much interest and debate in this country. Each year, district leaders all across the United States work hard to ensure teacher quality and student access to current curriculum and instructional tools in order to support educational learning gains. These learning gains are demonstrated using assessment at the classroom, school, district, state, and national levels. Because high- stakes decisions are made based on assessment results, it is important for teachers and researchers to understand deeply what each assessment truly measures (Haladyna & Downing, 2004). Often, educational assessments purport to measure very broad topics or 57 developing abilities (Messick, 1984). One example of a developing ability is mathematics problem solving. In the following sections, I discuss large-scale issues one might consider in light of the results of this study and how they relate to previous research. First I discuss variables or underlying constructs of particular importance and non-importance in the measurement of math problem solving. Next, I describe the practical use of this information from a formative perspective. Finally, I describe commonality analysis as a useful way to more thoroughly understand variance in high-stakes assessments of complex constructs. Influential and non-influential variables in math problem solving. As far back as 2000, the NCTM outlined domains of mathematical proficiency, each containing specific content knowledge to be mastered in order for one to be a successful mathematician. Since then, researchers have continued to demonstrate the importance of mathematical content knowledge for success on various state assessments in mathematics (Anderson et al., 2010a; Anderson et al., 2010b). In previous studies, easyCBM-math reliably predicted success on OAKS-math and Measures of Student Progress (MSP) over the course of a single school year focused on a specific grade-level set of standards (Anderson et al., 2010a). The results of this study lend additional support to the predictive nature of easyCBM-math; however, the results indicate a predictive quality spanning more than one grade level. This means that not only is mathematical content important for instruction and assessment during the current year, it also has enduring significance. These findings suggest that content knowledge gained at any point in the 58 educational career will likely support more successful outcomes on math problem solving assessments in the future. Based on outcomes from this study and others, problem solving, (g), is indeed influential on outcomes relating to mathematical problem solving (Fuchs et al., 2005; Fuchs et al., 2006; Hart et al., 2009; Mannamaa et al., 2012; Naglieri & Ronning, 2000). In each of the reviewed studies, correlations and beta weights were used to demonstrate the relation between general ability and math problem solving. This study adds to the understanding of this link by using variance partitioning. It is noticeable that although g can help to explain much variance in OAKS, the variance that it uniquely explains is quite small. Rather, it is what it shares in common with easyCBM-math that contributes to the most explanation of variance in OAKS-math scores (see Figure 6). This may be the difference between the unique construct of IQ and the commonly held construct of problem solving or problem attack. For example, a student may have a high IQ but choose not to spend any time on using their understanding or knowledge to actually solve a problem. The application of this problem solving ability or base of understanding is more commonly reflected in OAKS-math and easyCBM-math than the level of student ability alone (Alonzo et al., 2006; ODE, 2012; Pearson, 2012;). Without application, ability seems to be of little importance in explaining variance in OAKS-math scores. Similarly, the variance explained by the combination of NNAT2 and easyCBM- math was quite large (~16%). This was expected based on various correlational studies previously described (Fuchs et al., 2005; Fuchs et al., 2006; Naglieri & Ronning, 2000). This result may be a reflection of a quality that is common to all three assessments such as logic. Both content-embedded and content-free problem solving rely heavily on a 59 logical processes by which to problem solve as well as a common-sense understanding of the reasonableness of a potential answer. This is speculation and more research is necessary to determine the differences in these constructs definitively. Several of the variables in this study. All of the construct irrelevant variables in this study, including reading fluency and the non-performance variables of gender, FRL, ELL, and special education eligibility were determined to be only marginally influential on math problem solving outcomes, if at all. For example, using variance partitioning, the influence of DORF was partitioned. As a result, the quality that is unique to reading fluency was compared to the quality that it shares with easyCBM-math. The variance reading fluency uniquely explains in OAKS-math performance was not nearly as large as the variance it jointly explained with easyCBM-math (3.5% and 7.1%, respectively). This was surprising given the research from Crawford et al. (2001) and more recently from CTL (2012) that indicates that DORF may be a predictive measure for success on math outcomes. Other research regarding the link between MAZE tasks and math outcomes leads to the same conclusion (Jiban & Deno, 2007; Thurber et al., 2002; Whitley, 2010). This study lends additional support to the predictive nature of DORF for math problem solving outcomes; however, variance partitioning provides additional important information. Perhaps this marginal unique influence is a reflection of the importance of comprehension over fluency at third grade. As described by other researchers reading comprehension has shown to correlate highly with math outcomes (Jiban & Deno, 2007; Thurber et al., 2002; Whitley, 2010). Although DORF has been shown to be a highly predictive assessment of comprehension (CTL, 2012) it has also been discussed as a 60 variable that has less of a predictive quality as students move beyond the early years in school (Jiban & Deno, 2007). This is a reflection of the move from students’ ability to decode fluently to their skill in comprehending what they have read which is a change that happens approximately during the second or third grade. Because students become fluent readers at different times, it is likely that DORF may be more or less predictive accordingly. According to Jiban and Deno (2007), the correlations between MAZE and state testing outcomes were stronger in the older grades than in the younger grades. The current study utilized the single measure of DORF as a proxy for both reading ability and reading comprehension and although it explained much variance, the unique variance explained was quite small. Perhaps comprehension measures would explain more unique variance in OAKS-math scores at this grade level. The quality shared between all of the assessments, particularly the common explained variance by various measures and DORF, may be processing speed. Fuchs et al. (2006) describe cognitive correlates to arithmetic as processing speed and decoding. Both DORF and NNAT2 rely on speed of processing as well. Arithmetic is an obvious construct relevant skill to math problem solving, although not a skill investigated in this study; however, the results of this study may be additional evidence of the importance of the construct shared between processing speed and numeracy more so than the unique quality of decoding or orally producing words. Again, further research is necessary to determine the underlying constructs definitively. The results of this study also lend further credence to Jiban and Deno’s (2007) assertion that DORF should only be used as one piece of information when predicting outcomes in mathematics. They note that no matter how predictive, most often single 61 measures do not account for as much variance as do a combination of variables. Results from this study support their claim. The three performance variables, when taken together, accounted for six times the amount of variance in OAKS-math scores as DORF did alone. When demographic variables in this study were controlled, the explained variance in OAKS-math scores increased only marginally. This means that despite research outlining the influence of each of these construct irrelevant factors on math outcomes (Abedi et al., 1998; Beede et al., 2011; Burnett & Farkas, 2009; Fuchs et al., 2005; Sirin, 2005), the information gathered using performance variables is more predictive of success than any of the non-performance variables in the current study. However, in this study, by controlling for these variables, the variance in OAKS-math scores accounted for uniquely by any of the performance indicators was slightly lowered in all cases. Interestingly, the relative changes in unique variance for easyCBM-math and DORF were far greater than that of NNAT2 (-22.7% & -17.1% vs. -0.9%, respectively). These findings appear to indicate that demographic indicators affect outcomes related to easyCBM-math and DORF more than outcomes on the NNAT2. This would be expected because the NNAT2 test is a measure of general problem solving, which is thought to be a relatively stable ability throughout life (Davis, Arden, & Plomin, 2008; Gustafsson & Undheim, 1992; Larsen, Hartmann, & Nyborg, 2008; Reeve & Lam, 2005) and, as a measure free of language, it is less likely to affect special populations differently (Pearson, 2012). Additionally, this finding is interesting because it is an indication that NNAT2 is a measurement of something truly unique and unrelated to demographic factors. In essence, this assessment is highlighting a skill or competency 62 that is not overlapping at all with demographic impact. This may lend more support for using a measure such as this in order to gain specific knowledge about the impact of general problem solving ability on math problem solving outcomes; however, it should be considered with cost in mind. This consideration is described in the next section. The fact that demographic variables did not account for virtually any additional variance in OAKS-math scores was very surprising given the literature base for performance differences demonstrated by these special populations (Abedi, 2006; Burnett & Farkas, 2009; Jordan et al., 2006; Sirin, 2005). One potential reason for this may be attention to test design by researchers and test creators. It is possible that because of the growing body of research related to discrepant performance by these special populations and the identification of the barriers to assessment success, tests like OAKS-math have been designed to limit CIV related to demographic characteristics. This is quite likely. Haladyna and Downing (2004) note an abundant research base in both differential item functioning and test item formatting. This research base was in existence during the creation of the current OAKS-math assessment (ODE, 2012). Additionally, the authors of two of the performance measures used in this study explicitly speak to this consideration in the literature. Both easyCBM-math and NNAT2, according to the authors, were created to limit the influence of access barriers for special populations (Alonzo et al., 2006; Pearson, 2012). This means that in this study, demographic factors would not influence the outcomes on performance variables and therefore, the performance indicators alone would account for any true variance in achievement on OAKS-math. It is possible that the additional variance explained by demographic characteristics is a function of sample size or grade level more than a lack of additional variance. In this 63 study, FRL and gender were the only variables that explained any additional variance (albeit small). Previously described research suggests similar academic assessment performance by girls and boys at the middle and high school levels (Hyde et al., 2008; Scafidi & Bui, 2010). Perhaps the difference found in this study was due to a grade level focus in early grades rather than later years. Gender and FRL also represented the largest sample size. The number of students in special education and ELL were far less than those who were male or who qualified in the FRL category. In a larger sample of students, ELL or special education eligibility (not to mention any other demographic factors not explored in this study) may account for more additional variance in OAKS- math scores than performance measures alone although further research is needed to make this determination. By using three performance variables, the model accounted for approximately 58% of the variance in OAKS-math scores. Use of variance partitioning provided additional important information regarding unique and common variance that may indicate constructs of different underlying importance. From construct validity and construct definition perspectives, it is also important to recognize that 42% unaccounted- for variance still exists. If OAKS-math is purported to be a measure of math proficiency and problem solving, what other factors or constructs make up this score if not related to content knowledge as defined by easyCBM-math, general problem solving ability as defined by NNAT2, or the access skill of reading as defined by DORF? Again, a definitive answer to this question is beyond the scope of this study; however, one can speculate as to reasons for the variance left unexplained. 64 Although easyCBM-math explained much of the variance in OAKS scores, it is ultimately a screener (Alonzo et al., 2006). As such, it was not designed to have the depth or breadth to completely reflect all of the skills or standards that students are exposed to during a school year (Deno, 1985). Perhaps a complete battery of math assessments may more completely reflect all of the learning one could gain throughout the year, but from a cost perspective this is unreasonable. Additionally, easyCBM-math in this study represents standards at the second grade level (Anderson et al., 2010c), while OAKS- math represents third grade standards (ODE, 2012). For this reason, it makes sense that a large amount of variance would be left unexplained. For example, if the NNAT2, DORF, easyCBM-math and OAKS-math were all given in the spring of third grade, it is likely that the independent measures taken together may have explained even more variance due to the fact that easyCBM-math and OAKS-math would be measures of the same standards. In fact, one would hope that the more instruction a student had, the less single- point-in-time measures would explain outcomes. That easyCBM-math in second grade explained more than 50% of the variance in OAKS-math scores a year later is, in fact, somewhat depressing from an intervention standpoint; however, as stated previously, this may mean that content in mathematics is largely built by broadening understanding each year rather than learning completely new skills in isolation. Another possible explanation for the unaccounted for variance is teacher use of data. For example, if a particular school has a systematic method to collect and review data, a teacher may recognize struggling students quickly. If a teacher identified a struggling or low-performing student based on end-of-year data in second grade and began to systematically address areas of understanding deficits, it is likely that that 65 student would not struggle to the same degree they would have, had the teacher not intervened. In this scenario, one would hope that the independent measures used in this study would explain very little variance in OAKS-math scores by the end of the third grade year. This would indicate that the intervention designed by the teacher due to his or her use of data was extremely effective and substantially changed the academic trajectory for the student. Measurement tools may also impact the potential for explained variance. Certain performance skills such as arithmetic or reading comprehension may be important variables to consider in future studies as well as attention, memory, or executive functioning. Student demographic characteristics that may be of influence might be parent education level, days of attendance, school movement, or instructional grouping. We know, based on this study, that any of these factors may overlap others in terms of unique qualities and what qualities they would share with OAKS-math outcomes. Perhaps there is a combination of skills that accounts for more variance in OAKS-math scores than the model used in this study. If so, it would be important to recognize which measured skills could be influenced through instruction and help support teachers so their instruction can be designed accordingly. Though not of concern in this study, it is important to recognize measurement characteristics as potential barriers to variance explanation for future studies. For example, sometimes an independent measure may have a ceiling effect or a small amount of score variance. When a ceiling effect occurs, this means that students are unable to show the range of potential that they could demonstrate on the dependent measure. If the independent measure didn’t have an adequate score variance, it is unlikely that the 66 independent measure score could explain much of the variance in scores attained on the dependent measure. Both situations impact the potential for explained variance. Based on the high stakes associated with state achievement assessments, there is obvious reason to continue to explore this complex construct and the predictive variables with which it might be associated. Additionally, as discussed in the next section, there is instructional utility to understanding formative variables that influence summative assessment outcomes. Utility of predictive measures in assessment. From a public standpoint, outcome assessments such as OAKS-math hold much importance. They are used at the district, state, and federal level in order to reflect progress toward important outcomes like college and career readiness (Conley, 2010). Although they bear much weight on a large scale, summative assessments such as these hold little utility for teachers. Practically, the information gathered from these types of assessments is rarely used at the classroom level except to demonstrate to families in a broad sense if students became proficient on standards of importance for the grade level throughout the course of the school year. Predictive measures, by contrast, can be widely influential at the classroom level, and results on these assessments will likely influence instructional practices immediately. Seminal work by Deno (1985) and countless studies since demonstrate that Curriculum Based Measures (CBMs) are reliable, fast, and cost effective assessment tools that can help teachers make everyday instructional decisions to support student outcomes. Many studies over the last 25 years have been conducted to investigate the predictive qualities of these measures. If measures at the classroom level are reliable, fast, cost effective, and 67 predictive, teachers are able to use them formatively to support students throughout the year toward success on outcome measures. The results from this study indicate several construct irrelevant variables that are either only slightly or not at all influential for success in math problem solving as defined by OAKS-math. These include demographic variables and the access skill of reading fluency. Demographic variables are not factors that a teacher can alter. So it is helpful to know that inherent variables like gender, FRL, ELL, and special education status also do not influence math problem solving outcomes greatly. On the other hand, oral reading fluency, although only minimally influential for math problem solving outcomes, is a factor that can be changed through instruction. In addition to instruction, predictive tools such as formative progress monitoring measures, like CBMs, can help teachers gauge progress toward the goal of increased fluency (Good et al., 2009). Based on this study, it is likely that as fluency increases, math problem solving success will also increase, although not necessarily in a causal way. In this study, the two major construct relevant variables of math content knowledge and general problem solving ability, (g), were found to be influential independently and in combination for math problem solving outcomes. Content knowledge, like oral reading fluency, is not a static skill or ability and can be altered by instruction. Similarly too, easyCBM-math is a predictive and formative measure teachers can use to monitor progress toward content knowledge development as the year progresses (Anderson et al., 2010a). It is likely that as knowledge of content increases, scores on easyCBM-math measures will increase and at the end of the year, scores on OAKS-math would be higher than they would have been had teachers not had this 68 predictive formative tool to use. The combination of instruction and formative tools to monitor success has significant potential to support student success on math problem solving outcomes. Based on the results in this study, general problem solving ability, g, as measured by the NNAT2, shares something in common with OAKS-math uniquely (approximately 5% explained variance). It also shares a quality that is common to OAKS-math and the other measures (approximately 30% explained variance). These data suggest that general problem solving ability may indeed influence outcomes in math problem solving, yet according to the literature, it is thought to be relatively stable throughout life (Davis, Arden, & Plomin, 2008; Gustafsson & Undheim, 1992; Larsen et al., 2008; Reeve & Lam, 2005). None of the studies investigated the change in g for students in third grade specifically. Additionally, the time span difference between measures in each study ranged from months to several years and included groups of all ages. This evidence suggests that even though g is a highly influential variable, teachers may have little success working on changing outcomes for this particular construct. Although the results of studies reviewed did not indicate that g was a factor that could be altered, one study (Davis, Arden & Plomin, 2008) investigated changes among general intelligence among groups of twins. In this study, genetic influence over g was most pronounced; however, there was evidence to suggest that environmental influence accounted for 30% variance in g. Although limited in scope, this evidence may indicate that certain environmental factors can impact g and thus, g may be alterable. Obviously, much more research in this area is needed in order to definitively determine if g can be altered by instruction and how this change could influence academic outcomes. 69 Variance partitioning may lend additional information to better understand influential variables on g. For example, it is possible that the unique characteristic attributed to g (perhaps intelligence) is not alterable, while the characteristic common to easyCBM-math and NNAT2 (perhaps problem attack or strategy) is alterable with instruction. This finding would be important for teachers as they alter instruction to support lagging skills students may have in particular areas. Additional research around the topic of general intelligence stability would help teachers and researchers make sound decisions regarding the use general ability assessment results to help design instructional programs for students. Defining a complex construct. High-stakes assessments often measure complex constructs like math problem solving. According to Haladyna and Downing (2004) “Each [developing] ability involves contextualized mental models, schemas, or frames, and complex performance that may have multiple correct pathways that depend on knowledge and skills” (p. 17). Because of the complexity of these types of constructs, it is very unlikely to have non-overlapping variance between independent variables or underlying constructs of importance. Often, studies use correlations or beta weights to indicate predictive characteristics of specific variables toward outcome measures; however, this may lead to incomplete interpretations. In order to more deeply define a complex construct, variance partitioning may be a better option for analysis (Zientek & Thompson, 2006). Using variance partitioning, one can recognize unique constructs of influence as well as characteristics held commonly between variables. For constructs with much overlapping variance, this exploration can highlight important qualities of each 70 independent variable, especially when each variable may overlap significantly with others. By recognizing unique and common variance, teachers can better target instruction based on constructs over which they have control rather than trying to change influential characteristics that are inherent or static. Prediction and correlations only minimally describe the relation among variables but the use of variance partitioning may help to support good decision-making based on a deeper construct understanding (Zientek & Thompson, 2006). Implications and Future Research This study highlights several topics of interest for researchers as well as classroom teachers. In the following paragraphs, I will discuss practical implications including cost, early intervention opportunities, and grade specific considerations. The section ends with possible topics for future research and exploration. Practical considerations. Teachers should always consider the costs associated with any initiative in the classroom, including the addition of assessment. Costs may be monetary expenses, but more often, costs relate to instructional time. Based on this study, a commonality between easyCBM-math and NNAT2 explained the most variance in OAKS-math outcomes. NNAT2, in this district, is delivered once in second grade for all students and easyCBM-math is also mandated in second grade. A teacher in this district might consider using this information since it is already available to them. However, it would not be wise for a teacher from a district not implementing either measure to insist on administering and using both assessment tools. This would waste valuable resources including material gathering, time in training to administer the assessment, actual student time spent in assessment, and time to use the results. 71 Instead, someone interested in using a predictive formative measure for mathematics problem solving might consider the use of easyCBM-math in his or her classroom, school, or district. Based on results from this study, easyCBM-math uniquely accounted for 12% of the variance in OAKS-math scores but explained 50% of the variance in scores as a whole. This measure alone would give nearly as much information to a teacher as it would in combination with any other variable, while cutting the needed resources in half. Additionally, it is important for teachers to use assessment as just one of several informative tools to determine student needs in the classroom. Although easyCBM may be the best measure in terms of cost, it does not mean that it should be used alone or as a summative measure. Assessments should be considered one of many tools available to teachers (Jiban & Deno, 2007), and should also be utilized the way in which they are intended in order to guarantee validity (Messick, 1984). The results of this study provide compelling evidence for teachers to have information about their students as early as possible. Typically in schools, each year begins with substantial time spent on creating community within the classroom, followed by assessment, and then teachers begin to create specific instructional groupings. This wait time appears to be unnecessary. This study demonstrated that scores on second grade indicators explained a significant amount of variance in math outcomes even at the end of third grade. In essence, teachers know who is struggling based on information from the previous year. As discussed, although teachers cannot change demographic variables, math content knowledge, reading fluency, and possibly even general ability can be changed through instruction. With a bit of organization, teachers can have access to student information very early, and begin specific instructional interventions quickly to 72 help change the academic trajectory for struggling students. This type of information may also be helpful in terms of student class placement to ensure that each student receives the best possible instruction for his or her specific needs. Because math standards are different at each grade level, we do not yet fully understand the influence of variables like content knowledge, general ability, or oral reading fluency on math problem solving outcomes at each level. It is important then that teachers do not apply results of this study freely to any grade level or group of students that they support. For example, it would be inappropriate to claim that oral reading fluency predicts math outcomes in eighth grade. Without further research, this claim is unwarranted. It would be appropriate; however, to be thoughtful about student reading ability when giving an assessment of math problem solving to students. One might consider ways to accommodate students so the skill of reading is less influential to the outcome of the math assessment. As a third grade teacher, it also would be appropriate to rely more heavily on a combination of scores from math content and general ability when creating classroom intervention groups for math rather than DORF scores or NNAT2 scores alone. Teachers should always be cautious when applying results of specific studies to the classroom due to differences in grade levels as well as population, subject, setting, etc. With continued research, the generalizability of specific claims may increase greatly. The following section outlines four areas of further research in this topic including the use of performance assessments, research using additional independent variables, studies at differing grade levels, and exploration of the assessment of other complex constructs. 73 Future studies. Recently, the movement toward the CCSS has also begun to change the traditional system of assessment. Future assessments will likely incorporate more performance-based tasks, as well as explanatory components, allowing students to demonstrate their thinking in ways not traditionally utilized (ETS, 2010). Haladyna and Downing (2004) refer to performance-based assessment as the best form of measurement for constructs of developing abilities like mathematics problem solving. This type of measurement may provide a more authentic demonstration math problem solving skill, and studies such as this one provide a foundation for replication studies using the new assessment systems. As explored in this study, researchers can continue to analyze the unique and common variance attributed to several variables thought to be foundational to the construct of math problem solving as measured by the new performance assessments. However, although there may be benefits to performance assessment, other CIV threats exist. For example, it is unlikely that these types of tasks will be able to be assessed solely through technological means. Training for raters and inter-rater reliability will be critically important, as human error becomes a consideration in scoring. Future studies using other independent variables such as computation skills, comprehension, vocabulary, and race and ethnicity will help us more fully understand influential factors on the construct of math problem solving in assessment. Research in this area may also be important for development of outcome and formative measures in the future, such as performance or multiple-choice assessments. By identifying unique and common variance attributed by additional factors, researchers might more completely understand what skills or competencies are being assessed on math problem solving 74 assessments. This knowledge may also support understanding about the degree to which we can alter foundational or influential constructs in order to better promote student success. Studies such as this one help to explain predictive characteristics of various variables but only relating to specific grade levels as described in the previous section. Because of this lack of understanding at each grade level, it is important that replication studies at several grade levels be conducted. Although this study attempts to better define the construct of math problem solving and shine light onto construct relevant and irrelevant variables of influence, it only touches the surface. There continues to exist a need for more complete understanding of various skills that influence the outcomes on state assessments in all subject areas and without future research, this void will remain. Complex constructs are very difficult to define and difficult to adequately assess (Haladyna & Downing, 2004). One interesting factor common among complex constructs is how they are traditionally measured. Reading comprehension, math problem solving, and other academic content areas are typically measured with tests delivered through the medium of language. This presents specific systematic variance that is sometimes completely unrelated to the construct of interest. Studies such as this one help to define what is actually measured on these assessments and how much influence language or other variables have on outcomes. Variance partitioning also offers a deeper understanding as to the underlying constructs of importance for each complex construct. Future studies involving other complex constructs will help to define predictive and alterable factors of importance for successful outcomes. These studies will also help to recognize the impact of certain construct irrelevant variables and variance (such as 75 reading facility) on outcomes. To measure student achievement fairly and comment on educational quality responsibly, these factors should be identified and minimized, if not eliminated, from assessment. 76 APPENDIX A ASSESSMENT EXAMPLES Figure 1. Example easyCBM question (grade 2). This figure illustrates the minimal wording used in easyCBM-math questions. 77 Figure 2. Example OAKS-math question (grade 3). This figure illustrates the relative greater words used in OAKS-math questions compared to easyCBM-math. 78 Figure 3. Pictorial representation of NNAT2 items. This figure illustrates the item format and assessment procedure for the NNAT2. 79 Figure 4. Student scoring printout (NNAT2). This figure illustrates the information included on the student scoring printout including ability index and percentile rank. 80 APPENDIX B VARIABLE RELATIONS Figure 5. Possible relations among variables in math problem solving. This figure illustrates the possible relations between and among construct relevant and irrelevant variables in math problem solving. High, medium and low refer to possible (not actual) correlation degrees among variables. 81 Figure 6. Variance partitioning using a commonality analysis. This figure illustrates the unique and common variance between variables that were separated using a commonality analysis. U = unique variance, C= common variance, 1=DORF, 2= NNAT2, 3= easyCBM-math. Y= dependent variable (OAKS-math). 82 APPENDIX C DISTRIBUTION OF SCORES FOR STUDY VARIABLES Figure 8. Distribution of easyCBM-math scores. This figure illustrates the mean, standard deviation, number of cases, skewness, and kurtosis values for easyCBM-math scores (grade 2). easyCBM-Math 5 04 03 02 01 0 Fr eq ue nc y 6 0 5 0 4 0 3 0 2 0 1 0 0 Mean = 36.57 Std. Dev. = 7.478 N = 913 Kurtosis = .060 Skewness = - .697 Page 1 83 Figure 9. Distribution of NNAT2 scores. This figure illustrates the mean, standard deviation, number of cases, skewness, and kurtosis values for NNAT2 scores. NNAT2 1401201008 06 0 Fr eq ue nc y 6 0 4 0 2 0 0 Mean = 99.75 Std. Dev. = 13.185 N = 913 Skewness = .010 Kurtosis = .174 Page 1 84 Figure 10. Distribution of DORF scores. This figure illustrates the mean, standard deviation, number of cases, skewness, and kurtosis values for DORF scores (grade 2). DIBELS-ORF 2001501005 00 Fr eq ue nc y 8 0 6 0 4 0 2 0 0 Mean = 106.68 Std. Dev. = 39.471 N = 913 Skewness = .072 Kurtosis = - .383 Page 1 85 Figure 11. Distribution of OAKS-math scores. This figure illustrates the mean, standard deviation, number of cases, skewness, and kurtosis values for OAKS-math scores (grade 3). 86 APPENDIX D LITERATURE SEARCH DESCRIPTION My search for literature on the topic of construct relevant and irrelevant variables in assessments related to math problem solving originated in electronic databases including ERIC, Academic Search Premier, and Google Scholar. I narrowed the search in Google Scholar to retrieve results published since 2006, while the other databases included all date ranges. I searched using various combinations of the following terms: construct, validity, irrelevant, irrelevance, mathematics, problem solving, cognitive correlates, general ability, assessment, variance, variables, elementary, predictive validity, state assessment, and achievement. The search combinations produced a group of 636 journal articles, theses, book chapters, and reports. I further narrowed these search results based on my interests in (a) predictive variables related to outcomes on math assessments, (b) construct irrelevant variables studied in math assessments, and (c) the construct of math problem solving. Most often I excluded studies or research that did not address correlations to outcomes in math assessments. I also excluded research focused on the impact of specific interventions or on teacher background or training. With these restrictions, I reviewed 262 articles, chapters, theses, reports and studies in addition to their related reference pages. I chose to not restrict my search to journal articles because many of the concepts and terms within this study are based on definitions created by national groups and measurement experts and found in books, reports, and presentations. Additionally, this area of research is relatively new, so by including the most recent work of students (reflected in theses) and experts (sometimes reflected in book chapters) I could more 87 accurately depict the current interest in and impact of construct relevant and irrelevant variables without being restricted to studies with large effect sizes and large populations (most frequently published in journals). 88 REFERENCES CITED Abedi, J. (2006). Psychometric issues in the ELL assessment and special education eligibility. Teachers College Record, 108, 2282-2303. Abedi, J., & Leon, S. (1999). Impact of students’ language background on content-based performance: Analyses of extant data. Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing. Abedi, J., Leon, S., & Mirocha, J. (2001). Examining ELL and non-ELL student performance differences and their relationship to background factors: Continued analyses of extant data. Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing. Abedi, J., Leon, S., & Mirocha, J. (2003). Impact of student language background on content-based performance: Analyses of extant data (CSE Tech. Rep. No. 603). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing. Abedi, J., & Lord, C. (2001). The language factor in mathematics tests. Applied Measurement in Education, 14, 219–234. Abedi, J., Lord, C., & Hofstetter, C. (1998). Impact of selected background variables on students’ NAEP math performance (CSE Tech. Rep. No. 478). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing. Abedi, J., Lord, C., Hofstetter, C., & Baker, E. (2000). Impact of accommodation strategies on English language learners’ test performance. Educational Measurement: Issues and Practice, 19(3), 16–26. Abedi, J., Lord, C., Kim-Boscardin, C., & Miyoshi, J. (2000). The effects of accommodations on the assessment of LEP students in NAEP (CSE Tech. Rep. No. 537). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing. Abedi, J., Lord, C., & Plummer, J. (1997). Language background as a variable in NAEP mathematics performance (CSE Tech. Rep. No. 429). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing. Alonzo, J., Lai, C. F., & Tindal, G. (2009). The development of K-8 progress monitoring measures in mathematics for use with the 2% and general education populations: Grade 2 (Technical Report No. 0920). Eugene, OR: Behavioral Research and Teaching: University of Oregon. 89 Alonzo, J., Tindal, G., Ulmer, K., & Glasgow, A. (2006). easyCBM online assessment system. http://easycbm.com. Eugene, OR: Behavioral Research and Teaching, University of Oregon. American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME) (1999). Standards for Educational and Psychological Testing. Washington, DC: AERA. Anderson, D., Alonzo, J., & Tindal, G. (2010). easyCBM Mathematics Criterion Related Validity Evidence: Oregon State Test (Technical Report No. 1011). Eugene, OR: Behavioral Research and Teaching, University of Oregon. Anderson, D., Alonzo, J., & Tindal, G. (2010). easyCBM Mathematics Criterion Related Validity Evidence: Washington State Test (Technical Report No. 1010). Eugene, OR: Behavioral Research and Teaching, University of Oregon. Anderson, D., Lai, C. F., Nese, J. F. T., Park, B. J., Sáez. L., Jamgochian, E. M., Alonzo, J., & Tindal, G. (2010). Technical Adequacy of the easyCBM Primary-Level Mathematics Measures (Grades K-2), 2009-2010 Version (Technical Report No. 1006). Eugene, OR: Behavioral Research and Teaching, University of Oregon. Balboni, G., Naglieri, J. A., & Cubelli, R. (2010). Concurrent and predictive validity of the Raven Progressive Matrices and the Naglieri Nonverbal Ability Test. Journal of Psychoeducational Assessment. 28, 222-235. doi: 10.1177/0734282909343763. Beede, D. N., Julian, T. A., Langdon, D., McKittrick, G., Khan, B. & Doms, M. E., Women in STEM: A Gender Gap to Innovation (August 1, 2011). Economics and Statistics Administration Issue Brief No. 04-11. Available at SSRN: http://ssrn.com/abstract=1964782 or http://dx.doi.org/10.2139/ssrn.1964782 Brody, N. (1992) Intelligence. San Diego: Academic Press Burnett, K. & Farkas, G. (2009). Poverty and family structure effects on children's mathematics achievement: Estimates from random and fixed effects models. The Social Science Journal, 46, 297-318. Retrieved from http://dx.doi.org/10.1016/j.soscij.2008.12.009 Center for K-12 Assessment and Performance Management at ETS (2010, December). Coming Together to Raise Achievement: New Assessments for the Common Core State Standards. Retrieved from Education Northwest website: http://educationnorthwest.org/resource/1331 Center on Teaching and Learning (CTL). (2012). 2012-2013 DIBELS Data System Update Part I: DIBELS Next Composite Score (Technical Brief No. 1202). Eugene, OR: University of Oregon. 90 Common Core State Standards Initiative (CCSSI). 2010. Common Core State Standards for Mathematics. Washington, DC: National Governors Association Center for Best Practices and the Council of Chief State School Officers. http://www.corestandards.org/ Conley, D. T. (2010). College and Career Ready: Helping all Students Succeed Beyond High School. San Francisco: Jossey-Bass. Crawford, L., Tindal, G., and Stieber, S. (2001). Using oral reading rate to predict student performance on statewide achievement tests. Educational Assessment, 7, 303– 323. Davis, O. S. P., Arden, R., & Plomin, R. (2008). g in Middle Childhood: Moderate Genetic and Shared Environmental Influence Using Diverse Measures of General Cognitive Ability at 7, 9 and 10 Years in a Large Population Sample of Twins. Intelligence. 36(1), 68-80. Deno, S. L., (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219-232. Fuchs, L. S., Compton, D. L., Fuchs, D., Paulsen, K., Bryant, J. D., & Hamlett, C. L. (2005). The prevention, identification, and cognitive determinants of math difficulty. Journal of Educational Psychology, 97, 493-513. doi: 10.1037/0022- 0663.97.3.493. Fuchs, L. S. & Fuchs, D. (1993). Formative evaluation of academic progress: How much growth can we expect? School Psychology Review, 22(1), 1-30. doi: 9607083090. Fuchs, L. S., Fuchs, D., Compton, D. L., Powell, S. R., Seethaler, P. M., Capizzi, A. M.,…Fletcher, J. M. (2006). The cognitive correlates of third-grade skill in arithmetic, algorithmic computation, and arithmetic word problems. Journal of Educational Psychology, 98(1), 29-43. doi:10.1037/0022-0663.98.1.29. Garcia, S. B. & Tyler, B. (2010). Meeting the needs of English language learners with learning disabilities in the general curriculum. Theory Into Practice. 49, 113-120. doi: 10.1080/00405841003626585. Good, R. H., Gruba, J., & Kaminski, R. A. (2009). DIBELS Next. Longmont, CO: Cambrium Learning Group. Gustafsson, J.-E., & Undheim, J. O. (1992). Stability and Change in Broad and Narrow Factors of Intelligence from Ages 12 to 15 Years. Journal of Educational Psychology. 84, 141-49. Haladyna, T. M. & Downing, S. M. (2004). Construct-Irrelevant Variance in High-Stakes Testing. Educational Measurement: Issues and Practice 23(1), 17-27. doi: 10.1111/j.1745-3992.2004.tb00149.x 91 Hart, S. A., Petrill, S. A., Plomin, R., & Thompson, L. A. (2009). The ABCs of math: A genetic analysis of mathematics and its links with reading ability and general cognitive ability. Journal of Educational Psychology, 101, 388-402. doi: 10.1037/a0015115. Helwig, R., Rozek-Tedesco, M. A., Tindal, G., Heath, B., and Almond, P. J. (1999). Reading as an access to mathematics problem solving on multiple-choice tests for sixth-grade students. The Journal of Educational Research, 93, 113-125. Hyde, J. S., Lindberg, S. M., Linn, M. C, Ellis, A. B., & Williams, C. E., (2008). Gender similarities characterize math performance. Science, 527(5888), 494-495. Jensen, A. R. (2002). Psychometric g: Definition and substantiation. In R. J. Sternberg & R. L. Grigorenko (Eds.), The general factor of intelligence: How general is it? Retrieved from http://read.amazon.com. Jiban, C. L., & Deno, S. L. (2007). Using math and reading curriculum-based measurements to predict state mathematics test performance: Are simple one- minute measures technically adequate? Assessment for Effective Intervention, 32(2), 78–89. Jitendra, A. K. (2005). Mathematics Assessment: Introduction to the Special Issue. Assessment for Effective Intervention, 30(2), 1-2. doi: 10.1177/073724770503000201. Jordan, N. C., Kaplan, D., Olah, L., & Locuniak, M. N. (2006). Number sense growth in kindergarten: A longitudinal investigation of children at risk for mathematics difficulties. Child Development, 77, 153–175. Kaufman, A. S. (2009). IQ Testing 101. [Google Books version]. Retrieved from http://books.google.com/books?id=Z8i8LeV74m4C&printsec=frontcover&dq=his tory+of+intelligence+testing&source=bl&ots=cldxCOePny&sig=ZzwfoEOGhrDr _7tnTQ3d8VJx_tA&hl=en&sa=X&ei=IwhhUPDjNejKiwLYn4G4CA&ved=0CE UQ6AEwBA#v=onepage&q=history%20of%20intelligence%20testing&f=false. Ketterlin-Geller, L. R., Alonzo, J., & Tindal, G. (2004). Use of focus groups to inform the construction of a universally designed mathematics test (Technical Report No. 29). Eugene, OR: Behavioral Research and Teaching, University of Oregon. Lamb, J. H. (2010). Reading Grade Levels and Mathematics Assessment: An Analysis of Texas Mathematics Assessment Items and Their Reading Difficulty. The Mathematics Educator, 20(1), 22-34. 92 Larsen, L., Hartmann, P., & Nyborg, H. (2008). The Stability of General Intelligence from Early Adulthood to Middle-Age. Intelligence. 36(1), 29-34. Mannamaa, M., Kikas, E., Peets, K., & Palu, A., (2012). Cognitive correlates of math skills in third-grade students. Educational Psychology: An International Journal of Experimental Educational Psychology, 32(1), 21-44. Messick, S. (1984). The psychology of educational measurement. Journal of Educational Measurement, 21, 215-237. Naglieri, J. A. (1997). Naglieri Nonverbal Ability Test. San Antonio, TX: Psychological Corporation. Naglieri, J. A. (2008). Naglieri Nonverbal Ability Test- Second Edition (NNAT2). San Antonio, TX: Psychological Corporation. Naglieri, J. A. (2011). Naglieri Nonverbal Ability Test Second Edition: Manual Supplement- Technical Information and Normative Data. San Antonio, TX: Pearson. Naglieri, J. A. & Das, J. P. (2002). Practical implications of general intelligence and PASS cognitive processes. In R. J. Sternberg & R. L. Grigorenko (Eds.), The general factor of intelligence: How general is it? Retrieved from http://read.amazon.com. National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics. Reston, VA: Author. Available: http://www.nctm.org/standards/content.aspx?id=16909. National Council of Teachers of Mathematics. (2006). Curriculum focal points for prekindergarten through Grade 8 mathematics: A quest for coherence. Reston, VA: National Council of Teachers of Mathematics. National Council of Teachers of Mathematics (NCTM), National Council of Supervisors of Mathematics (NCSM), Association of State Supervisors of Mathematics (ASSM), Association of Mathematics Teacher Educators (AMTE). (2010). Mathematics education organizers unite to support implementation of common core state standards. Retrieved from http://www.nctm.org/standards/content.aspx?id=26088 National Governors Association Center for Best Practices and Council of Chief State School Officers. (2010) National Governors Association and State Education Chiefs Launch Common State Academic Standards. Retrieved from http://www.corestandards.org/articles/8-national-governors-association-and-state- education-chiefs-launch-common-state-academic-standards 93 National Mathematics Advisory Panel. Foundations for Success: The Final Report of the National Mathematics Advisory Panel, U.S. Department of Education: Washington, DC, 2008. National Research Council. (2001). Adding it up: Helping children learn mathematics. J. Kilpatrick, J. Swafford, and B. Findell (Eds.). Mathematics Learning Study Committee, Center for Education, Division of Behavioral and Social Sciences and Education. Washington, DC: National Academy Press. Nese, J. F. T., Lai, C. F., Anderson, D., Jamgochian, E. M., Kamata, A., Sáez. L., Park, B. J., Alonzo, J., & Tindal, G. (2010). Technical Adequacy of the easyCBM Mathematics Measures: Grades 3-8, 2009-2010 Version (Technical Report No. 1007). Eugene, OR: Behavioral Research and Teaching, University of Oregon. Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457-466. doi: 10.3758/BRM.40.2.457. Nordquist, L. (February 7, 2012). Personal communication. Oregon Department of Education, Office of Assessment and Information Services. (2012). Mathematics test specifications and test blueprints (grade 3). Retrieved from http://www.ode.state.or.us/search/page/?id=496. Pearson. (2012). Introduction to the Naglieri Nonverbal Ability Test- Second Edition (NNAT2). Retrieved from http://www.pearsonassessments.com/haiweb/Cultures/en- US/Site/Community/Education/Products/NNAT2/nnat2.htm. Raven, J., & Raven, J. (2003). Raven Progressive Matrices. In R. Steve & R. S. McCallum (Eds.), Handbook of nonverbal assessment (pp. 223-237). New York: Kluwer. Reeve, C. L., & Lam, H. (2005). The Psychometric Paradox of Practice Effects Due to Retesting: Measurement Invariance and Stable Ability Estimates in the Face of Observed Score Changes. Intelligence. 33, 535-549. Rutherford-Becker, K. J. & Vanderwood, M. L. (2009). Evaluation of the relationship between literacy and mathematics skills as assessed by curriculum-based measures. The California School Psychologist, 14, 23-34. Scafidi, T. & Bui, K., (2010). Gender Similarities in Math Performance from Middle School through High School. Journal of Instructional Psychology, 37, 252-255. 94 Silberglitt, B., Burns, M. K., Madyun, N. H., & Lail, K. E. (2006). Relationship of reading fluency assessment data with state accountability test scores: a longitudinal comparison of grade levels. Psychology in the Schools, 43, 527-535. doi: 10.1002/pits.20175. Sirin, S. R. (2005). Socioeconomic status and academic achievement: A Meta-analytic review of research. Review of Educational Research, 75, 417-453. Slater, S. (September 4, 2012). Personal communication. Spearman, C.E. (1904). General intelligence objectively determined and measured. American Journal of Psychology, 15, 201–293. Sullivan, A. L. (2011). Disproportionality in special education identification and placement of English language learners. Exceptional Children, 77, 317-334. Tindal, G., Heath, B., Hollenbeck, K., Almond, P., and Harniss, M. (1998). Accommodating students with disabilities on large-scale tests: an experimental study. Exceptional Children, 64, 439-450. Thurber, R. S., Shinn, M. R., Smolkowski, K. (2002). What is measured in mathematics tests: construct validity of curriculum-based mathematics measures. School Psychology Review, 31, 498-513. Webb, N. L. (1999). Alignment of Science and Mathematics Standards and Assessments in Four States. (Monograph No. 18). Madison, WI: National Institute for Science Education & Council of Chief State School Officers. Wechsler, D. & Naglieri, J. A. (2006). Wechsler Nonverbal Scale of Ability. San Antonio, TX: Pearson. Wechsler, D. (1999). Wechsler Abbreviated Scale of Intelligence. San Antonio, TX: Psychological Corporation. Wehmeyer, M. L. (2001). Disproportionate Representation of Males in Special Education Services: Biology, Behavior, or Bias? Education & Treatment Of Children (ETC), 24(1), 28. Whitley, Samuel, "Oral reading fluency and MAZE selection for predicting 5th and 6th grade students' reading and math achievement on the Illinois Standards Achievement Test " (2010). Masters Theses. Paper 607. http://thekeep.eiu.edu/theses/607 Zientek, L. R. & Thompson, B. (2006). Commonality analysis: Partitioning variance to facilitate better understanding of data. Journal of Early Intervention, 28, 299-307. doi: 10.1177/105381510602800405.