EVALUATING PASSAGE-LEVEL CONTRIBUTORS TO TEXT COMPLEXITY by SHAHEEN MUNIR-MCHILL A DISSERTATION Presented to the Department of Special Education and Clinical Sciences and the Graduate School of the University of Oregon in partial fulfillment of the requirements for the degree of Doctor of Philosophy September 2013 ii DISSERTATION APPROVAL PAGE Student: Shaheen Munir-McHill Title: Evaluating Passage-Level Contributors to Text Complexity This dissertation has been accepted and approved in partial fulfillments of the requirements for the Doctor of Philosophy degree in the Department of Special Education and Clinical Sciences by: Roland H. Good, III Chair Kelli C. Cummings Core Member Elizabeth Harn Core Member Laura Lee McIntyre Core Member Gina Biancarosa Institutional Representative and Kimberly Andrews Espy Vice President for Research & Innovation/Dean of the Graduate School Original approval signatures are on file with the University of Oregon Graduate School. Degree awarded September 2013 iii © 2013 Shaheen Munir-McHill iv DISSERTATION ABSTRACT Shaheen Munir-McHill Doctor of Philosophy Department of Special Education and Clinical Sciences September 2013 Title: Evaluating Passage-Level Contributors to Text Complexity The complexity of text has a number of implications for educators in the areas of instruction and assessment. Text complexity is particularly important in formative assessments, which utilize repeated, alternate, equivalent forms to capture student growth towards a general outcome. A key assumption of such tools is that alternate forms of the assessment are of equal complexity. Consequently, there is a need to better understand what variables contribute to text complexity and how they impact student performance. This study was designed to evaluate features of text that are not typically included in readability estimates but may contribute to the text complexity: text cohesion and genre. Currently, text complexity of oral reading fluency measures is often quantified using readability estimates. It is hypothesized that a factor generally excluded from readability estimates, text cohesion – the extent to which the text functions as a cohesive, meaningful whole – contributes to text variability and variability in student performance. This research evaluated the role of a type of text cohesion (referential cohesion) in text complexity by manipulating the cohesion of passages otherwise assumed to be of equal difficulty. Genre was also considered, as research suggests that genre may impact complexity ratings of texts. Passages were strategically selecting to capture four conditions – 1) informational text/low cohesion, 2) informational text/high cohesion, 3) v narrative text/low cohesion, and 4) narrative text/high cohesion. Data were collected on reading rate, accuracy, and passage-specific reading comprehension Results were analyzed using two-way, univariate ANOVA with dependent observations. Results indicate effects for each of the dependent variables included in the design. For rate and accuracy, results indicate significant interactions between genre and referential cohesion; scores were significantly higher for high cohesion narrative text than low cohesion narrative text and high cohesion informational text. There was a significant main effect of genre on comprehension, with students performing significantly better on the comprehension measure for narrative texts than informational texts. Altogether, these results indicate direct effects of genre and referential cohesion on student reading performance and provide evidence that text cohesion may be a meaningful component of text complexity. vi CURRICULUM VITAE NAME OF AUTHOR: Shaheen Munir-McHill GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: University of Oregon, Eugene University of Southern California, Los Angeles DEGREES AWARDED: Doctor of Philosophy, School Psychology, 2013, University of Oregon Master of Science, Special Education, 2011, University of Oregon Bachelor of Arts, Psychology, 2006, University of Southern California AREAS OF SPECIAL INTEREST: Data-Based Decision Making Formative Assessment in Reading PROFESSIONAL EXPERIENCE: School psychology intern, Springfield Public Schools, Springfield, OR, 2012- 2013 Guest lecturer, Department of Education Studies, University of Oregon, 2011 Guest lecturer, Department of Special Education and Clinical Services, University of Oregon, 2011 Supervised college teaching assistant, Department of Special Education and Clinical Services, University of Oregon, 2010-2011 GRANTS, AWARDS, AND HONORS: College of Education Doctoral Research Grant, University of Oregon, 2013 DIBELS Student Support Award, University of Oregon, 2008, 2010 Graduate Teaching Fellowship, Department of Special Education and Clinical vii Services, 2008-2012 Segal Americorps Education Award, 2007 Summa cum Laude, University of Southern California, 2006 viii ACKNOWLEDGEMENTS I would like to extend my deepest gratitude to the following people: My committee chair and advisor, Dr. Roland Good, III, for his infectious enthusiasm for the work and unwavering belief in my skills as a researcher, which kept me going long after the last piece of chocolate was gone; The members of the dissertation committee, Dr. Gina Biancarosa, Dr. Kelli Cummings, Dr. Elizabeth Harn, and Dr. Laura Lee McIntyre, for challenging me to never settle for mediocre when I am capable of so much more; The University of Oregon and the Dynamic Measurement Group for financial and emotional support of this project; The graduate student data collectors who volunteered their time and skills to ensure that this research could happen: Emily Barrett, Vincent Campbell, Kelly Collins, Ronda Fritz, and Caitlin Rasplica; And the cooperating schools and teachers, for welcoming me into their classrooms and reminding me why our work is so important. ix TABLE OF CONTENTS Chapter Page I. STATEMENT OF THE PROBLEM ..................................................................... 1 Text Complexity .......................................................................................... 2 The Importance of Text Complexity in Instruction and Assessment .......... 3 College and Career Readiness ............................................................... 4 School Accountability ............................................................................ 5 Alternate Forms in Formative Assessment ............................................ 6 Alternate Forms for Evaluating Metrics of Text Complexity ................ 8 Text Complexity Metrics ............................................................................. 9 Reader and Task Considerations ............................................................ 9 Quantitative Dimensions ........................................................................ 10 Qualitative Dimensions .......................................................................... 12 Evaluating Qualitative Dimensions of Text ................................................. 12 Text Cohesion .............................................................................................. 13 Integrated Model of Cohesion ............................................................... 15 Grammatical Cohesion..................................................................... 15 Syntactic Structure ..................................................................... 15 Narrative Structure ..................................................................... 18 Lexical Cohesion ............................................................................. 21 Lexical Accessibility and Diversity ........................................... 21 Referential Cohesion .................................................................. 22 Selection of Referential Cohesion Index for Study ............................... 24 Passage Genre .............................................................................................. 25 Purpose of This Research and Hypotheses .................................................. 26 Direct Effects on Oral Reading Fluency Rate........................................ 27 Direct Effects on Oral Reading Fluency Accuracy ................................ 27 Direct Effects on Passage-Specific Comprehension .............................. 28 Research Questions ................................................................................ 28 II. LITERATURE REVIEW ....................................................................................... 30 What Makes Text Difficult? ........................................................................ 30 Text-Based Features of Text Complexity .................................................... 30 Decoding Difficulty ............................................................................... 31 Semantic Difficulty ................................................................................ 31 x Chapter Page Syntactic Difficulty ................................................................................ 32 Coherence and Cohesion........................................................................ 32 Reader-Based Features of Text Complexity ................................................ 33 Approaches to Evaluating Text Complexity ................................................ 33 Readability Formulas ............................................................................. 34 2009 NAEP Reading Framework .......................................................... 38 Common Core Standards Framework .................................................... 39 Text Cohesion: A Potential Contribution to the Evaluation of Text Complexity ................................................................................................... 43 Effects of Cohesion as a Whole ............................................................. 43 Effects of Referential Cohesion ............................................................. 45 Cohesion and Readability: Related but Distinct Constructs ........................ 48 Interactions Between Cohesion and Genre .................................................. 48 Quantifying Text Cohesion Using Coh-Metrix ........................................... 50 Summary ...................................................................................................... 53 III. METHODOLOGY ................................................................................................. 55 Independent Variables ................................................................................. 55 Genre ...................................................................................................... 55 Referential Cohesion .............................................................................. 56 Adjacent Anaphor Overlap .............................................................. 58 Adjacent Argument Overlap ............................................................ 58 Content Word Overlap ..................................................................... 59 Stem Overlap ................................................................................... 59 Latent Semantic Analysis (Sentence All) ........................................ 61 Constructing the RCCS .................................................................... 61 Readability ................................................................................................... 63 Manipulating Independent Variables: Passage Selection ............................ 64 Measure Referential Cohesion and Identify “High” and “Low” Cohesion Passages ................................................................................. 65 Identify Passages with Similar Readability Scores ................................ 65 Identify Two Passages Within Each Genre ........................................... 66 Dependent Variables .................................................................................... 66 Dependent Variable #1: Rate ................................................................. 67 Dependent Variable #2: Accuracy ......................................................... 67 Dependent Variable #3: Comprehension ............................................... 68 xi Chapter Page Measures ...................................................................................................... 69 Oral Reading Fluency ............................................................................ 69 Passage Recall ........................................................................................ 72 Conservative .................................................................................... 74 Liberal .............................................................................................. 74 No Match-Consistent ....................................................................... 75 No Match-Inconsistent ..................................................................... 75 Participants ................................................................................................... 78 Procedure ..................................................................................................... 79 Data Collector Training ......................................................................... 79 Data Collection ...................................................................................... 79 Coding of Passage Recalls ..................................................................... 80 Participant Incentives ................................................................................... 81 Summary ...................................................................................................... 81 IV. RESULTS ............................................................................................................... 83 Characteristics of the Invited Sample .......................................................... 84 Characteristics of the Actual Sample ........................................................... 85 Data Transformations................................................................................... 87 Descriptive Statistics .................................................................................... 87 Intercorrelations ........................................................................................... 88 Oral Reading Fluency Rate .......................................................................... 91 Oral Reading Fluency Accuracy .................................................................. 93 Passage-Specific Reading Comprehension .................................................. 96 Summary ...................................................................................................... 97 V. CONCLUSION ...................................................................................................... 99 Discussion .................................................................................................... 99 Interpretations of Non-Significant Relation Between Referential Cohesion and Comprehension ............................................................... 99 Potential Effects of Background Knowledge ......................................... 102 Cohesion and Grade Level ..................................................................... 103 Implications.................................................................................................. 103 Implications for Instruction.................................................................... 103 Implications for Curriculum-Based Measurement................................. 104 xii Chapter Page Implications for Measurement of Text Complexity............................... 107 Study Limitations ......................................................................................... 109 Next Steps .................................................................................................... 112 Replication ............................................................................................. 112 Future Directions for Measurement of Referential Cohesion ................ 112 Future Directions in Measurement of Comprehension .......................... 114 Summary ...................................................................................................... 116 REFERENCES CITED .................................................................................................. 119 xiii LIST OF FIGURES Figure Page 1. Graph of student progress monitoring data illustrating “bounce” in student performance ............................................................................................................. 9 2. Integrated model of text cohesion ............................................................................ 16 3. A model of relations between independent variables and dependent variables ....... 27 4. Measurement model of referential cohesion ............................................................ 58 5. Pairwise comparisons of interaction effects between referential cohesion and genre on oral reading fluency rate ........................................................................... 93 6. Pairwise comparisons of interaction effects between referential cohesion and genre on oral reading fluency accuracy ................................................................... 95 7. Main effect of genre on passage-specific reading comprehension .......................... 97 8. Revisited model of relations between independent and dependent variables .......... 100 xiv LIST OF TABLES Table Page 1. Qualitative Dimensions of Text Complexity Included in the Common Core Standards Framework. ............................................................................................. 41 2. Results of Selected Studies Evaluating the Effects of Revisions to Improve Referential Cohesion on Reading Comprehension Performance............................. 49 3. Expert Reviewer and Passage Author Judgments of Passage Genre ....................... 57 4. Coh-Metrix Variables Included in the Researcher-Developed Referential Cohesion Composite Score (RCCS) ........................................................................ 60 5. Inter-Correlations Between Variables Included in the Referential Cohesion Composite Score (Z-Scores) .................................................................................... 62 6. DIBELS Next Oral Reading Fluency Passages Selected for Study Inclusion ......... 66 7. Descriptive Statistics for easyCBM Benchmark and Study Passage Rate Scores (First Minute and Pro-Rated Whole Passage) .............................................. 72 8. Correlations Between easyCBM Benchmark and Study Passage Rate Scores (First Minute and Pro-Rated Whole Passage) .......................................................... 73 9. Sample Coding of Student Responses to the Passage Retell Task .......................... 76 10. Descriptive Statistics for Rate (Pro-Rated Whole Passage), Accuracy, and Comprehension for all Passages Included in Study ................................................. 88 11. Descriptive Statistics by Risk Level for Rate (Pro-Rated Whole Passage), Accuracy, and Comprehension for all Passages Included in Study ......................... 89 12. Intercorrelations Between Oral Reading Fluency Rate, Oral Reading Fluency Accuracy, and Passage-Specific Comprehension Scores for All Measures ............ 90 13. Two-Way, Within-Subjects Analysis of Variance Summary Table for the Effect of Genre and Cohesion on Oral Reading Fluency Rate ........................................... 91 14. Two-Way, Within-Subjects Analysis of Variance Summary Table for the Effect of Genre and Cohesion on Oral Reading Fluency Accuracy ................................... 94 15. Two-Way, Within-Subjects Analysis of Variance Summary Table for the Effect of Genre and Cohesion on Passage-Specific Reading Comprehension ................... 96 1 CHAPTER I STATEMENT OF THE PROBLEM Every day, educators face a multitude of questions about the complexity of text. Teachers aiming to match students to text may wonder: how can I assess the difficulty of this book? How do I determine if it is aligned with my student’s skills and needs? How can I be systematic in the assignment of reading materials, so that the demands placed on the student grow commensurate with the student’s skills? Interpreters of test results may ask themselves: how challenging is the text in this assessment? How comparable are alternate forms of this assessment in terms of difficulty? How does the complexity of this assessment align with course content? These, among other questions, highlight the critical role of text complexity in teaching and assessment. Text complexity is defined by the Common Core Standards in English and Language Arts (2010) as the “inherent difficulty of reading and comprehending a text combined with consideration of reader variables” (Glossary, p. 43). This definition suggests that there are characteristics about the text itself that interacts with reader features to determine the complexity of a given text. While reader variables are a critical component of this definition, it highlights that texts contain “inherent difficulty” that is independent of reader variables. The purpose of this research is to evaluate those “inherent” features of text that contribute to complexity. In this chapter, the background and importance of the study will be outlined. First, the components of text complexity will be defined and described, including decoding difficulty, semantic difficulty, syntactic complexity, genre, and especially text cohesion, the extent to which a text hangs together to form a coherent whole. Second, the 2 importance of text complexity in instruction as well as in assessment will be described, with particular focus on the role of text complexity in formative assessment. Third, considerations in evaluating text complexity are described, including: 1) consideration of the reader and task, 2) quantitative measures, and 3) qualitative dimensions. In the fourth section, a next step for improving estimates of text complexity by quantifying traditionally qualitative features of text will be proposed. Specifically, it is proposed that text cohesion can be quantified, and should be included along with passage genre in measures of text complexity. Because cohesion can be described and disaggregated in many different ways, the model underlying the use of the term cohesion in this context will be described. Finally, this chapter will describe how a measure of text cohesion along with estimates of genre may be used to evaluate text complexity of formative assessment tools. Text Complexity Text complexity refers to the text-derived difficulty of a given passage. While reader-based factors such as background knowledge also contribute to the difficulty of the passage, this discussion of text complexity focuses on those features central to the text itself that make the passage more or less challenging to decode and understand. Many components are involved in text complexity, including decoding difficulty, semantic difficulty, syntactic difficulty, genre, and text cohesion, among others. Decoding difficulty describes the decoding demands placed on the student, sometimes measured by the average length of words in the passage or the average number of syllables per word. Semantic difficulty captures the semantic requirements of the text, especially the familiarity and difficulty of the vocabulary in the text. Syntactic difficulty 3 refers to the role of the syntactic structure of the text in supporting reader decoding and understanding, both at the sentence level (e.g., variability of sentence structure) and at the global level (e.g., passage flow and organization). According to the Florida Center for Reading Research (2006), genre refers to “different styles of text that reflect a variety of purposes which children encounter when reading” (e.g., narrative, informational). While many classifications exist, research on text complexity generally focuses on the differential demands of narrative or prose versus informational or expository text. Finally, text cohesion describes the extent to which the text hangs together to form a coherent whole. Cohesive texts provide appropriate linkages between ideas, concepts, narrative elements (e.g., time, setting, characters), and themes to support reader comprehension of the text. This list of text complexity elements is not comprehensive; other text-based features may contribute to the difficulty of the passage, such as the complexity of the ideas and concepts expressed; however, the described elements can be more readily operationalized and measured and, as a result, are the major areas explored in the literature. The Importance of Text Complexity in Instruction and Assessment Understanding and capturing the components that contribute to text complexity has implications for both instruction and assessment. Instructionally, the Common Core Standards Initiative (2010) stress that students develop skills to be able to read and comprehend texts of increasing complexity as they progress through school. This expectation is based on data documenting the importance of comprehending complex texts in college and the workplace. In assessment, knowledge and understanding of text complexity has implications for both summative and formative assessment. For 4 summative assessments such as state accountability tests, understanding of text complexity may help to improve test construction and interpretation. For formative assessment, controlling text complexity is critical in facilitating accurate individual decisions. Additionally, improved measures of text complexity will facilitate the development of better progress monitoring materials. This section describes the role of text complexity in the instruction and assessment domains, and builds a case for better understanding features that contribute to text complexity. College and career readiness. According to a 2006 report by college readiness test developer ACT, Inc., the ability to answer questions about complex texts appears to differentiate between students who achieve the benchmark on the ACT reading test and those that do not (ACT, 2006). As described in the report, the complexity of all reading passages was ranked on a three-point qualitative ranking scale, and performance on those passages ranked as “complex” was the clearest differentiator over inference making and cognitive skills such as identifying the main idea or the meanings of words in context. In fact, students performing below the benchmark performed no better than chance on these test items, highlighting this skill’s impact on overall reading proficiency. These findings were consistent across gender, race/ethnicity, and socio-economic status. Additionally, there is evidence that college and workplace texts are significantly more complex than high school texts, and that this discrepancy in text complexity has increased over time (Common Core Standards Initiative Appendix A, 2010). Evaluations of the complexity of K-12 school reading materials indicate that complexity demands have steadily decreased on measures of readability and vocabulary since the middle of the 19 th century. Students are also provided with more scaffolding and support in reading 5 school texts, decreasing independent reading demands. At the same time, the complexity of college and career reading materials has increased, with increasing emphasis on informational texts like periodicals and independent reading. This discrepancy in K-12 and college/career reading demands indicates that students graduating from high school may be unprepared for the reading demands of college and the work force. Accordingly, educators need accurate measures of text complexity to 1) identify target levels of complexity students should attain, and 2) provide systematic increases in complexity by grade to attain those standards. In order to achieve these goals, researchers and educators must better understand what contributes to text complexity and how to teach students strategies to understand complex text. First, educators must understand the features of texts that impact student comprehension. Only then can educators prepare students for college and career reading demands by sequencing and ordering text in systematic steps of increasing complexity so that students develop skills on less complex text early, develop skills with text of increasing complexity in elementary and middle school, and are able to engage with text of high complexity linked to college and career readiness in high school. Consequently, an evaluation of text complexity factors is a critical prerequisite to building skill in text with increasing complexity and improving understand of texts that may lack inherently supportive text structures. School accountability. As a result of legislation such as No Child Left Behind, summative assessment data are being used to make decisions about school effectiveness. These decisions have potentially serious implications for school funding and resource allocation. While each state has its own accountability assessment, these results are 6 being used nationally to interpret state performance in reading and content areas. However, state assessments are not designed to be comparable in difficulty. Consequently, student performance may differ as a function of the assessment, rather than state instructional practices or student performance. Capturing text complexity information about the assessments could alleviate some of these concerns in two ways. First, standards of text complexity could be used during test development, so that assessments are written within a given band of text complexity for each grade level assessed. This would allow for better understanding of the demands of each assessment and comparability across states. Second, tests may be evaluated on the basis of text complexity post hoc to better interpret student performance. For example, clear operationalizations of text complexity designations would allow evaluators to better understand what skills the assessments are measuring, and how students are performing compared to those skills. Alternate forms in formative assessment. Accurate measurement of text complexity plays a particularly important role in general outcome measurement, where student performance is monitored over time using a common metric and criterion. This type of assessment, called formative assessment, is used to inform instructional practices and facilitate decision-making. Unlike summative assessments, which capture student achievement at the conclusion of an instructional unit, formative assessments allow educators to evaluate student learning as it is occurring and adjust instruction in response to student needs. This is accomplished through the administration of repeated alternate equivalent forms, which capture student growth towards a general outcome or goal. Formative assessment is made possible due the development of assessment tools 7 like curriculum-based measurement (CBM). CBM is an approach to assessing student progress towards critical skills. Unlike mastery measurements, which capture discreet skill mastery, goal oriented monitoring of basic early literacy skills allows CBM to monitor the development of skills toward a meaningful outcome (Deno, 2003). An essential component in CBM is its repeatability; CBM is designed to capture growth over time. Performance can be plotted on an individual student graph, to allow for evaluation of past, present, and projected rates of growth. Within this CBM framework, educators can make decisions about student performance by comparing expected and actual rates of progress. By plotting baseline performance and a goal, educators can create an aimline illustrating the rate of progress necessary to reach the goal. Student performance data are then plotted and compared to the aimline, in order to make decisions about student progress (Deno & Marston, 2006). Consistent performance below the aimline indicates a need to adjust support, while performance at or above the aimline indicates a strong likelihood that the student will achieve the desired level of performance. A key assumption of formative assessment measures like CBM is that all alternate forms of the assessment are of equal complexity. Equivalent forms of CBM probes allow changes in student performance to be attributed to student growth rather than probe effects. Consequently, form equivalency of CBM probes is critical in making valid and reliable decisions about student performance. Passage equivalency in text complexity is thus a major component of CBM probe development and selection. Test developers may attempt to control passages to be of uniform complexity though a variety 8 of means, such as expert review, targeted readability criteria, and pilot testing (Albano & Rodriguez, 2012). Passages of relatively uniform text complexity are also used to evaluate student change in response to instruction. With effective reading instruction, student performance should increase as a result of growth in student skill. In practice, however, student performance is not always so consistent. Students may display “bounce,” or inconsistent performance, across multiple progress monitoring points (see Figure 1). Such variability presents challenges for decision-making, as estimates of the student’s true level of skill and progress are clouded by inconsistent performance (Parker, Vannest, Davis, & Clemens, 2010). There are at least three types of factors that may contribute to such variability in student performance: passage-level factors (e.g., readability, genre, and cohesion), student-level factors (e.g., background knowledge and interest), and environmental factors (e.g., testing conditions and familiarity of tester). While student- level and environmental factors may be challenging to control, it is important to evaluate means of reducing variability due to passage-level factors in order to reduce bounce as much as possible. Alternate forms for evaluating metrics of text complexity. Because uniformity of text complexity is a key assumption of CBM, alternate forms provide an opportunity to evaluate formulas for capturing text complexity. Specifically, CBM offers a technology to evaluate relations between measures of text complexity and student performance. For example, CBM passages written to adhere to strict standards of readability allow researchers to control readability to evaluate the effects of other measures that may contribute to text complexity. The unique nature of CBM alternate forms – far more 9 alternate forms than a published norm referenced test – allows this technology to be used in ways that other assessments cannot in order to evaluate those indices of text complexity. Figure 1. Graph of student progress monitoring data illustrating “bounce” in student performance. Text Complexity Metrics The Common Core Standards (2010) propose a three-part model for measuring text complexity – reader and task considerations, quantitative dimensions, and qualitative dimensions. Authors of the Common Core Standards suggest that estimates of text complexity include evaluation of all three of these domains. A description of each of these domains as well as considerations for evaluation are described below. Reader and task considerations. Reader and task considerations capture the features of text complexity that are individual to the student and the environment of the reading task. These considerations include individual reader characteristics, such as background knowledge, decoding skills, and comprehension strategies. They also 0 10 20 30 40 50 60 70 80 90 100 W o r d s c o r r e c t p e r m in u t e 10 capture features of the environment that may impact performance, such as day of the week, time of day, and environmental stimuli like the presence or absence of other students. Finally, these considerations capture the interaction between the reader and the environment. For example, a student’s individual reading skills may impact performance before lunch or when there are multiple activities occurring in the classroom, but not in other environments or contexts. While these considerations likely contribute to the complexity of a reading task for a given child in a given environment, these features are challenging to capture. Because student and task considerations vary by individual student and context by definition, it is difficult to evaluate or control differences in text complexity due to student and task factors. Quantitative dimensions. Quantitative measures capture the features of text complexity that can be quantified and counted. Quantitative features are generally included in readability formulas. Readability formulas are based on readily observable, countable features of the text that are generally organized into three factors – decoding difficulty, semantic difficulty, and syntactic difficulty. Decoding difficulty describes the demands of student decoding skill. Decoding difficulty is not included in all readability formulas (such as the Lexile Framework for Reading), but can be quantified by counting the number of characters per word or the number of syllables per word (e.g., Powell- Smith, Good, & Atkins, 2010). Semantic difficulty addresses the semantic requirements of the words in the text, such as the familiarity or uniqueness of the vocabulary used. For example, in the Lexile Framework, semantic difficulty is quantified by counting the mean log of word frequency based on a corpus of approximately 600 million words (Lennon & 11 Burdick, 2004). Syntactic difficulty captures features of grammar and sentence structure. For example, in the Lexile Framework, syntactic difficulty is quantified as the mean length of sentences in the text. While these counts capture some of the passage-level contributors to text complexity, readability formulas only consider surface-level features of the text and may fail to capture other features that affect the comprehensibility of the text (e.g., Foorman, 2009; Hiebert, 2011). In some studies, readability formulas have been found to account for some of the variability in text (e.g., Briggs, 2011), but research has consistently found substantial variability in student performance that is unexplained by readability formulas (e.g., Francis, Santi, Barr, Fletcher, Varisco, & Foorman, 2008; Ardoin, Williams, Christ, Klubnik, & Wellborn, 2010). While some of the unexplained variability may be attributed to the student and task considerations, it is possible that there may be other text-level features of text complexity to consider that may be controllable. One approach to improving readability formulas as measures of text-level text complexity is to attempt to quantify some of the qualitative features of text complexity. For example, the overlap of content and structure may be captured by counting the proportion of sentences that contain overlapping content words, nouns, arguments, or sentence stems. Similarly, the conceptual overlap of words in the text can be evaluated by using latent semantic analysis to generate a quantitative measure of semantic relations. The Coh-Metrix program (McNamara, Louwerse, Cai, & Graesser, 2005) is designed to provide a quantifiable metric that may correspond with some of these qualitative features of text complexity. The Coh-Metrix program provides quantitative counts of a number of 12 components of text cohesion, and may potentially be one tool to aid researchers in evaluating the qualitative dimensions of text. Qualitative dimensions. Qualitative features of text complexity are features of the text that may affect a reader understanding of text, but that may be challenging to quantify. The Common Core Standards (2010) describe four qualitative factors that may impact text complexity: 1) the levels of meaning or author’s purpose for writing, 2) overall structure and format of the text, 3) use of language in conventional vs. unconventional ways and clarity of the language used, and 4) the background knowledge demands of the text. Additionally, other features such as the complexity of ideas or author’s message may impact the demands placed on the reader. Unlike quantitative features of text, the Common Core Standards suggest that qualitative features of text are best evaluated through expert judgment and discussion; however, it may be possible to capture some traditionally qualitative features of text complexity through quantitative analysis. Evaluating Qualitative Dimensions of Text As noted above, quantitative estimates of text complexity may benefit from the inclusion of some features of text typically reserved for qualitative analysis. While the quantitative measurement of traditionally qualitative features of text will not replace qualitative analysis, it may help improve estimates of quantitative complexity by including a broader range of the variables that impact text complexity. In particular, it may be possible to quantify some aspects of the overall structure and format of the text, one of the four qualitative factors identified by the Common Core Standards. Two 13 components of the overall text structure and format –text cohesion and passage genre – may be suitable for such an analysis. Text Cohesion One feature of text structure and format that may be amenable to being quantified and incorporated into estimates of text complexity is text cohesion. Text cohesion describes the extent to which text hangs together to form a coherent whole (Morris & Hirst, 1991). This definition suggests a qualitative assessment of the overall structure and clarity of the text in delivering the intended message. However, text cohesion can be disaggregated into component parts (cohesive devices), which may support understanding of the qualitative components of text complexity. Evaluation of these devices may be one means of understanding and evaluating the qualitative features that contribute to text complexity and the linkages between the qualitative and quantitative dimensions of text. Text cohesion can be described as the extent to which a passage constitutes a unified whole. Specifically, cohesion captures the ties between idea units within a text, and is what differentiates a passage from a series of sentences. Take, for example, the following sentences: The nation of Fiji is made up of more than 300 individual islands. The almost century-long occupation by the British ended in 1970. The sugar in sugarcane is extracted with water or by diffusion. As they stand, there is little to nothing connecting these sentences together to form a meaningful whole. While an experienced reader may use background knowledge or inferencing skills to attempt to create meaningful connections between the sentences (e.g., by using known information about Fiji’s colonization by the British to infer that the 14 “century-long occupation” refers to Britain’s occupation of Fiji), such connections are not supported by the text. Cohesion may be imposed upon these sentences by creating connections between ideas, such as: The nation of Fiji is made up of more than 300 individual islands. Fiji was occupied by the British for almost a century, but this occupation came to an end in 1970. One of the primary industries of Fiji is sugar processing, which requires sugar from sugarcane to be extracted with water or by diffusion. In this example, the sentences are connected by a number of devices, such as the repetition of the key word “Fiji” in all three sentences and the use of conjunctions like “but” and “which.” Unlike the first example, these sentences now constitute a cohesive text. While the latter example demonstrates how cohesion connects sentences into meaningful texts, it could certainly be re-written to make such connections even more explicit. In doing so, the text would represent a greater degree of cohesion than either of the provided examples. Texts may vary in the degree of cohesion because cohesion exists upon a continuum; the presence or absence of ties between items in the text and across the entire text affect the cohesiveness of the selected passage. In order to better understand what makes a text cohesive, researchers have examined the construct of cohesion in a variety of different ways (e.g., Halliday & Hasan, 1976; Kintsch & van Dijk, 1978). The Halliday and Hasan model of cohesion defines cohesion as the semantic relations within the text, which can be coded at three levels: 1) the semantic system, 2) the lexicogrammatical system, and 3) the phonological and orthographic systems. This model focuses heavily on devices within a text that promote 15 cohesion – co-reference, substitutions, ellipses, conjunctions, and reiteration – and does not emphasize how these devices interact with the reader to impact interpretation or understanding of the text. In contrast, the Kintsch and van Dijk model focuses primarily on the interactions between the surface-level textbase (e.g., many of the devices described by Halliday and Hasan), the meaning of the passage, and the reader. This model emphasizes cohesion as a means of supporting the reader’s construction of text meaning. Each of these models provides important contributions to the field’s understanding of cohesion; rather than selecting one existing model of cohesion, this paper presents a synthesis of different approaches to evaluating cohesion, which will be called the integrated model of cohesion. Integrated model of cohesion. In the integrated model of cohesion, cohesion is conceptualized in two distinct ways – grammatical cohesion and lexical cohesion. Each type of cohesion as well as the elements and devices within grammatical and lexical cohesion are described below. The entire model is summarized in Figure 2. Grammatical cohesion. Grammatical cohesion refers to ties and connections between elements of the text that are grammatical in nature (Halliday & Hasan, 1976). One component of grammatical cohesion is the redundancy and complexity of sentence structure, called the syntactic structure. Additionally, grammatical cohesion includes the use of a predictable structure across the text – such as the logical inclusion of predictable elements of story grammar – and the maintenance of consistency of space and time, which can be described as the narrative structure of the text. Syntactic structure. The syntactic structure of a text captures the variability of sentence structures across the text as well as the complexity of such sentence structures 16 Figure 2. Integrated model of text cohesion. (Graesser, McNamara, & Kulikowich, 2011). These two components – syntactic redundancy and syntactic complexity – capture the effect of syntactic structure on the cohesion of the test as a whole. Syntactic redundancy is the repetition of sentence structures across the text (Stanovich, 1980). Syntactic redundancy is related to syntactic priming and syntactic parallelism, because the repetition of syntactic structure primes the reader for syntactic processing. This device contributes to the cohesion of a text because it structurally links a series of sentences and allows for efficient processing of text meaning. Additionally, syntactic redundancy contributes to faster processing times (as measured by reading rates), separate from the effects of lexical word repetition (Ledoux, Traxler, & Swaab, 17 2007). For an example of syntactic redundancy, consider the following sentences: “Ben and Alice had a picnic. Ben and Alice were happy.” While these sentences share lexical features (e.g., the words “Ben” and “Alice”), they also share a grammatical structure – they are both single independent clauses following a subject-subject-verb format. The repeated use of the same sentence structure builds familiarity so that the reader can attend to passage meaning. At a broader level, consider this section: each cohesive device is described in a single paragraph in four steps: a) a description of the device, b) an explanation of the device’s connection to text cohesion, c) an example of the device, and d) means to measure the device. Because this structure is repeated across all paragraphs within the section, the syntactic structure is redundant. Syntactic redundancy can be challenging to measure, but the authors of a program called Coh-Metrix have developed a method of parsing sentences into part-of-speech categories to create a tree-style representation of the syntactic structure, which can be compared to other sentences to obtain a measure of syntactic similarity (Graesser & McNamara, 2011). A second feature of the syntactic structure of a text, syntactic complexity, refers to the complexity of sentence structures within the text (Graesser, McNamara, Louwerse, & Cai, 2004). For example, a syntactically simple sentence may contain just one independent clause, while a syntactically complex sentence may contain multiple independent and dependent clauses. Syntactic complexity contributes to cohesion because the complexity of grammatical structures can support or hinder reader understanding of connections across the text (Pearson, 1974). For example, consider the sentences: “All African elephants have tusks. Their tusks are highly sought by ivory hunters.” Both sentences follow the basic format subject+verb+object. Now, consider 18 these sentences: “Unlike their Asian counterparts, of which only males grow ivory tusks, all African elephants male and female have tusks. These tusks are popular among artists and have become a target of ivory hunters, who have impacted elephant populations by aggressively hunting tusked elephants.” These sentences utilize more complex structures, including the use of multiple clauses in a single sentence. While the second example may be more informative to a skilled reader, the use of complex sentence structures may impact a novice reader’s ability to pick out important information or understand how pairs of sentences are related. Syntactic complexity is generally measured by counting the number of words per sentence; longer sentences are generally indicative of more complex syntactic structures, while shorter sentences generally capture simpler syntactic structures. Some researchers (see Graesser, McNamara, & Kulikowich, 2011; Graesser, McNamara, Louwerse, & Cai, 2004) have also quantified syntactic complexity by measuring the number of causal verbs, intentional actions or events, syntactic similarity, type-token ratio (for each word, the type-token ratio is one divided by the number of occurrences of the word), and mean number of modifiers per noun phrase. Narrative structure. The narrative structure of a text refers to the consistency of the overall structure of the text (van den Broek & Gustafson, 1999). Narrative structure is included in a discussion of cohesion because it represents the continuity of a recognizable text structure across the entire text. The narrative structure of a text includes four components: story grammar, spatial consistency, temporal consistency, and causal consistency. Story grammar refers to the use of predictable text components or devices, such as the presentation of characters, a setting, a problem or initiating event, and resolution or 19 conclusion (Jungjohann, 2008). Story grammar contributes to text cohesion because it allows readers to access schema of how the text should function, and build a mental representation of the text by adding new information from the text to the existing model. Breaks in this global-level consistency may make it more challenging for readers to attend to relevant information and build a clear representation of the text. For example, consider a text that presents the problem resolution without ever stating the initiating event. Such a text may impact a less skilled reader’s comprehension of that problem resolution and its role in the overall text. Story grammar may be measured using a qualitative analysis of the text guided by structured questions or checklists. Spatial consistency refers to the maintenance of orientation in space across the text (Zwaan, Radvansky, Hilliard, & Curiel, 1998; Tapiero, 2007). Spatial consistency can be achieved by maintaining a single spatial orientation or following a logical and explicit progression of space. Spatial consistency contributes to text cohesion because it creates spatial links between sentences and across the text. For example, consider the following sentences: “The detective entered the museum through large, imposing doors. Inside the foyer, the cool drafts and low, rumbling echoes only added to the detective’s sense of anxious anticipation.” These sentences demonstrate the maintenance of spatial consistency because the character’s orientation in space – in the first sentence she enters the building and in the second sentence she is inside the foyer – is logically connected across the sentences. Spatial consistency may be measured by counting the number of spatial indicators – words that provide information about space like “inside” or “over” – in the text. 20 Temporal consistency refers to the continuity of time across the text (Zwaan et al., 1998; Zwaan, 1996; Tapiero, 2007). Texts that maintain temporal consistency present the passage of time clearly and explicitly, as opposed to jumping from various periods in time or presenting time ambiguously. Temporal consistency contributes to text cohesion because it creates temporal links between sentences and across the text. An example of temporal consistency is highlighted in the following sentences: “Lucille’s day began with a large bowl of oatmeal and a glass of orange juice. After breakfast, Lucille showered and dressed for her big day.” Temporal consistency is maintained in these sentences because events occur sequentially and the passage of time is made explicit to the reader. Temporal consistency may be measured by counting the frequency of temporal connectives like “next,” “before,” or “after.” Finally, causal consistency refers to the maintenance of a logical cause and effect structure in the text (Zwaan et al., 1998; Tapiero, 2007). Causal consistency contributes to text cohesion because it allows the reader to link initiating ideas or actions with the resulting cause and generate causal inferences while reading (Zwaan & Radvandsky, 1998). The following sentence illustrates causal consistency by explicitly stating the causal relationship between idea units: “Elijah went to the grocery store because he was out of milk.” The two clauses – “Elijah went to the grocery store” and “he was out of milk” – are connected by the word “because,” which identifies that Elijah going to the store is the effect of being out of milk. Causal consistency may be measured by counting the frequency of causal connectives – words that indicate causal relations between ideas – such as “because,” “therefore,” and “as a result of.” 21 Lexical cohesion. Lexical cohesion refers to ties and connections between elements of the text that are due to lexical similarity (Morris & Hirst, 1991). Lexical cohesion preserves the continuity of word meaning across text through the use of lexically similar ideas. Two primary components of lexical cohesion are lexical accessibility and diversity and referential cohesion. Lexical accessibility and diversity. Lexical accessibility and diversity captures the extent to which the vocabulary in the text is understandable to the reader (Graves & Graves, 2003). This “understandability” of vocabulary is influenced by the familiarity of the vocabulary, redundancy of vocabulary, and the concreteness of vocabulary. The familiarity of text vocabulary refers to how familiar vocabulary is to the reader (Graesser, McNamara, Louwerse, & Cai, 2004). While text familiarity will vary by reader due to background knowledge, there are features of vocabulary familiarity that are inherent to the text; specifically, the commonness or uniqueness of words in discourse. The familiarity of text vocabulary is related to text cohesion because unfamiliar vocabulary may hamper a reader’s ability to connect thoughts and ideas across the text. For example, texts that use challenging, unique, or content-specific vocabulary to describe a common construct, such as “cephalopod” for “squid” may impact a reader’s ability to integrate the vocabulary to other information provided in the text due to a lack of familiarity. Vocabulary familiarity can be evaluated by measuring word frequency in the written language based on a corpus of available text. The redundancy of vocabulary in the text refers to the repetition of vocabulary across the text. The redundancy of vocabulary, or reiteration, as it is termed by Halliday and Hasan (1976), is related to cohesion because it links texts through a common 22 referent. An example of lexical redundancy can be seen in the following sentences: “She walked carefully through the old building, ducking out of the way of several cobwebs. She stopped at the largest cobweb, where a dark spider sat in the center waiting patiently for her prize.” In these sentences, the word “cobweb” is reiterated through lexical redundancy. The redundancy of vocabulary in a text can be evaluated by measuring the type-token ratio. Finally, the concreteness of text vocabulary refers to the level of concreteness or abstractness of words in the text (Graesser et al., 2004). Word concreteness is related to cohesion because it affects the reader’s global representation of the text. Word concreteness can be a product of the word itself – such as abstract constructs like “love” or “freedom” – or can be due to insufficient word meaning information – such as the sentence “We saw her duck.” One way in which word concreteness is measured is polysemy, which captures the number of senses or meanings of a word. Words with greater polysemy are more abstract because the word can mean many different things. Additionally, words can be assigned concreteness scores by human raters, as in the Coh- Metrix program. Referential cohesion. Referential cohesion captures cohesive ties developed through the continuity of reference (Freebody & Anderson, 1983). Specifically, referential cohesion captures elements in the text that are only interpretable by reference to something else. This type of cohesion is a subtype of lexical cohesion because word meaning is derived from previously provided information. Referential cohesion has many components, which can be grouped together as endophora reference and exophora reference. 23 Endophora reference refers to word meanings that are derived through reference to previously provided information presented in the text itself (Halliday & Hasan, 1976). Endophora reference is a key component of cohesion because it requires the reader to integrate new information (the reference) to previously provided information in the text (the referent). Examples include personal pronouns like “she,” “they,” and “his,” which only have meaning through reference to other words in the text. Endophora reference can be measured by counting things like anaphor overlap (co-reference between pronouns and referent), argument overlap (shared nouns, pronouns, or noun phrases), content word overlap (re-occurrence or overlap of key content words) and stem overlap (shared morphological elements). Exophora reference can also be described as situational reference because it describes instances in which word meaning is derived through reference to background information and/or conceptually similar vocabulary available in the lexicon (Halliday & Hasan, 1976). Exophora reference is related to text cohesion because it enables the reader to build a coherent representation of the text meaning through the integration of prior knowledge and background information with the text itself. For example, consider the following sentences: “John went for a run. He likes to exercise.” In these sentences, common situational knowledge links words like “run” and “exercise.” Consequently, a reader can interpret the relationship between the two sentences based on situational or background knowledge about the construct “run.” Exophora reference also captures general (non text-based) reference like “One must be polite to others,” in which the meaning of “one” and “others” is not explicitly stated in the text but generally accepted. 24 Exophora reference can be measured using Latent Semantic Analysis, a method of statistically capturing the semantic relations between words. Selection of referential cohesion index for study. While all of these components represent aspects of cohesion, referential cohesion was selected for further evaluation in this study. As previously noted, referential cohesion represents the extent to which words and ideas are related across sentences and the entire passage to create explicit connections for the reader (McNamara, Graesser, Cai, & Kulikowich, 2011). Texts that are high in referential cohesion contain words and ideas that overlap across the text, so that connections between text elements are made explicit to the reader (Hiebert & Pearson, 2010). Referential cohesion was selected for evaluation because it likely captures a meaningful component of text complexity that is omitted from typical text complexity evaluations. First, referential cohesion likely measures something different than readability ratings. Typically, readability ratings capture decoding difficulty (how difficult the words in the text are to decode), semantic difficulty (how rare the words are in the lexicon), and syntactic difficulty (sentence structure). In theory, syntactic difficulty should be able to capture complexity across sentences; in practice, many readability formulas capture this construct by assessing the mean number of words within each sentence. Consequently, it is hypothesized that readability measures do not capture how words and ideas connect across the entire text. Second, referential cohesion appears to play an important role in explaining variability in student reading performance. In analysis by Graesser and colleagues (2011), referential cohesion is second only to narrativity in explaining variance in student reading performance (14.1% compared to 25 18.5%), and was a stronger predictor of reading performance than syntactic simplicity, word concreteness, causal cohesion, verb cohesion, logical cohesion, and temporal cohesion. This suggests that referential cohesion plays an important role in predicting student reading proficiency. Referential cohesion has also been identified as a particularly strong predictor of reading comprehension (McNamara, Graesser, Cai, & Kulikowich, 2011). Passage Genre A second feature of text structure and format that may be incorporated into estimates of text complexity is passage genre. Genre is defined as a category of text characterized by similarities in form, style, or subject matter. While genre is generally described qualitatively – for example, text is characterized as either narrative or informational – this qualitative dimension may have an impact on other types of quantitative features of text complexity. For example, existing research has identified systematic differences in oral reading fluency performance (e.g., Briggs, 2012; Saenz & Fuchs, 2002) and passage-specific comprehension (e.g., Cervetti, Bravo, Hiebert, Pearson, & Jaynes, 2009; Best, Floyd, & McNamara, 2008) on narrative vs. informational text. For the purpose of this research, genre was included as a variable of study as a means of better understanding text cohesion. Specifically, passage genre was included for two reasons: 1) to expand upon existing research on the effects of text cohesion on reading performance, and 2) to explore potential interactions between genre and cohesion. First, much of the existing research evaluating the effects of cohesion on reading fluency and/or reading comprehension has used exclusively narrative or exclusively informational text. Consequently, it is unclear whether findings apply to the 26 other genre. Secondly, research that has examined cohesion across genres suggests that there may be an interaction between cohesion and genre on reading comprehension. For example, Best and colleagues (Best, Ozura, Floyd, & McNamara, 2006) found that high cohesion texts supported reader comprehension better than low cohesion texts for narrative texts, but that there was no difference in comprehension as a function of cohesion for the informational texts. These results suggest that there is a need to better understand how cohesion functions in both narrative and informational texts to support reader comprehension as well as reading fluency. Purpose of This Research and Hypotheses The purpose of this study is to evaluate referential cohesion and passage genre as features of text complexity that may enhance the utility and precision of formative assessment tools. Specifically, this design evaluated the effects of referential cohesion and genre on reading rate, accuracy, and passage-specific comprehension on passages deemed equivalent by existing means of quantifying text complexity (i.e., readability formulas). The study design allowed for examination of main effects of referential cohesion and genre as well as interaction effects. The primary hypothesis was that that readers perform better – read more correct words in one minute, with a higher degree of accuracy, and with better comprehension – on passages with high referential cohesion compared to passages with low referential cohesion, when readability is held constant. Similarly, it is hypothesized that genre acts directly on oral reading fluency, accuracy, and passage-specific comprehension, with increases in all three dependent variables for narrative texts compared to informational texts. It is also hypothesized that a relation exists between the independent variables, 27 which is the reason for the inclusion of genre in the study design. However, analysis of this hypothesis is beyond the scope of this study. Figure 3. A model of relations between independent variables (genre, referential cohesion) and dependent variables (oral reading fluency rate, oral reading fluency accuracy, passage-specific reading comprehension). This study design allowed for examination of direct effects between independent and dependent variables. Direct effects on oral reading fluency rate. It was hypothesized that referential cohesion acts directly on oral reading fluency rate, because high referential cohesion increases the predictability of text, which may lead to increases in word reading speed (e.g., semantic priming). It was also hypothesized that genre acts directly on rate, as narrative texts follow predictable structures that may lead to more efficient reading. Direct effects on oral reading fluency accuracy. As with rate, it was hypothesized that referential cohesion acts directly on oral reading fluency accuracy by increasing the predictability of text through lexical redundancies. Assuming that students Referential Cohesion Genre Compre- hension Fluency Accuracy Independent Variables Dependent Variables Note. Arrows represent hypothesized relations between independent and dependent variables. 28 read words correctly the first time, it was predicted that the repetition of previously mastered words would increase accuracy of reading. It was also hypothesized that genre acts directly on accuracy, as informational texts may contain a greater number of content- specific words. Direct effects on passage-specific comprehension. It was hypothesized that referential cohesion acts directly on passage-specific comprehension by making explicit the relations between ideas in the text. It was hypothesized that increased explicitness leads to increased passage-specific reading comprehension. It was also hypothesized that genre acts directly on passage-specific comprehension, as narrative texts may be more predictable in structure and consequently lead to greater understanding. Research questions. Evaluation of these hypotheses was guided by the following research questions: 1. When readability is held constant, do students read more correct words per minute on passages with higher referential cohesion than passages with lower referential cohesion? 2. When readability is held constant, do students read passages with higher referential cohesion with greater accuracy than passages with lower referential cohesion? 3. When readability is held constant, do students perform better on a measure of passage-specific reading comprehension for passages with higher referential cohesion than passages with lower referential cohesion? Additionally, it was hypothesized that the effects of cohesion may interact with the genre of the passage. This hypothesis included evaluation of main effects of genre 29 and interactions between genre and referential cohesion, as described in the following questions: 4. When readability and referential cohesion are held constant, do students read more correct words per minute on narrative texts than informational? 5. When readability and referential cohesion are held constant, do students read narrative texts with greater accuracy than informational texts? 6. When readability and referential cohesion are held constant, do students perform better on a measure of passage-specific reading comprehension on narrative texts than informational? 7. If differences in oral reading performance are noted on high and low cohesion passages (questions 1, 2, and 3), do the effects depend on whether the text is narrative or informational? 30 CHAPTER II LITERATURE REVIEW What Makes Text Difficult? Before evaluating methods of assessing the difficulty of a text, one must first understand what variables contribute to text complexity. Researchers seem to agree that both text and reader variables not only independently contribute to the complexity of a given text, but also interact to affect text complexity (Anderson & Pearson, 1984; Common Core Standards Initiative, 2010; McKeown, Beck, Sinatra, & Loxterman, 1992; Hiebert & Fisher, 2007). In a synthesis of existing work on text complexity, Graves and Graves (2003) summarize text complexity factors as including ten features: vocabulary, sentence structure, passage length, elaboration, coherence and unity, text structure, familiarity of content and background knowledge required, audience appropriateness, quality and verve of the writing, and interestingness. Graves and Graves divide these factors into two broad categories – text-based and reader-based features. Text-Based Features of Text Complexity Graves and Graves (2003) identify vocabulary, sentence structure, length, elaboration, coherence and unity, and text structure as text-based features of text complexity. The remaining features are described as reader-based features of text complexity, and consequently will not be the focus of this review. These six text-based features can be grouped into three domains: semantic complexity (vocabulary, elaboration), syntactic complexity (sentence structure, length), and coherence and cohesion (coherence and unity, text structure). An additional domain, decoding difficulty, is added to this discussion, as Graves and Graves fail to capture this feature of 31 complexity in their domains. Text-based variables are identified as features inherent to the text itself. While Graves and Graves acknowledge that no text is completely independent from the reader, these features largely describe variability captured in the text itself, detached from reader skills and knowledge. Decoding difficulty. At its most basic level, the complexity of a passage is affected by how difficult the words in that passage are to decode. Word recognition is a foundational reading skill (Archer, Gleason, & Vachon, 2003), and while decoding skills and deficits vary from reader to reader, there are features of the word itself that can support or hinder efficient decoding (Hiebert, 1998). For example, English has a fairly opaque or deep orthography, in which grapheme-phoneme correspondences are not always consistent (Baker, Stoolmiller, Good, & Baker, 2011; Ehri, 2005). Multiple graphemes may represent the same sound, as in hay, late, and sleigh (all are pronounced with the long ā sound, but spellings vary). Conversely, one grapheme may represent multiple phonemes, as in the letter “a” in ago, apple, and wary. Additionally, longer words may be more difficult for readers to decode (Powell-Smith, Good, & Atkins, 2010). Semantic difficulty. While decoding difficulty refers to how difficult words in the passage are to decode, semantic difficulty refers to how difficult words are to understand. Research suggests that semantic difficulty is a strong predictor of overall passage difficulty (Graves & Graves, 2003). In general, texts containing lots of challenging words tend to be more challenging overall. However, the semantic difficulty of a passage is not just about how “easy” or “hard” individual words are; rather, it is the appropriateness of the words for the context that impacts a reader’s ability to comprehend 32 a passage. Consequently, texts with “harder” vocabulary may be easier for a reader to comprehend if that vocabulary is necessary to convey the author’s meaning. Syntactic difficulty. Syntactic difficulty captures the structural features of the text, both at the sentence-level and at the passage-level. At the sentence-level, syntactic difficulty refers to sentence length and complexity. While shorter sentences are generally considered to reduce the difficulty of a text (see description of readability formulas below), short sentences that fail to resemble spoken language and lack connectives between ideas may be more difficult for readers to comprehend. There is also evidence that texts with varied sentence structure may be easier to read and comprehend than those with limited, short sentence structures (Hiebert, 1998). At the passage-level, syntactic difficulty captures the organization of the text as a whole. This includes how ideas are sequenced, the use of illustrations, headings, and the expression of relationships between ideas (Risko & Walker-Dalhouse, 2011; Beers & Nagy, 2009). Genre also contributes to passage-level text structure, as narrative and expository texts tend to be organized differently (Graves & Graves, 2003). Finally, Graves and Graves argue that overall passage length contributes to difficulty, as length may prompt expectations for the reader, and may be indicative of text structure. Coherence and cohesion. Coherence serves as a bridge between text-based and reader-based features of text complexity, as it captures how the text supports the reader’s formation of a mental representation of the text. Coherence includes many features of text complexity described above, such as text organization, sequencing, explicitness of relationships, and language (McKeown, Beck, Sinatra, & Loxterman, 1992). Coherence also captures the extent to which the reader must make inferences in order to understand 33 the meaning of the text. The coherence of a text cannot be measured, because coherence by definition is influenced by reader skills and background knowledge. However, the text-based features that support coherence are described as text cohesion factors and can be measured (Graesser, McNamara, Louwerse, & Cai, 2004). Text cohesion includes the kinds of variables that influence reader understanding of the text, such as co-reference and overlap, the incidence of connective words, connectives between causes and effects, and semantic similarity of words in a passage (McNamara, Louwerse, McCarthy, & Graesser, 2010). Highly cohesive texts support reader comprehension by increasing comprehensibility, while low cohesion texts require the reader to use his/her skills and background knowledge to piece together the meaning of the text. Reader-Based Features of Text Complexity Graves and Graves (2003) group the remaining four variables – familiarity of content and background knowledge required, audience appropriateness, quality and verve of the writing, and interestingness – together as reader variables, because they involve the reader and the reader’s interaction with a text. In general, these features capture what the reader brings to the text: background knowledge about the nature of reading and the content of the text, age and developmental level, and interests and preferences. Because these features vary from reader to reader and can’t necessarily be captured or controlled, reader-based contributions are generally omitted from text complexity measurement approaches. Approaches to Evaluating Text Complexity Researchers and practitioners have developed frameworks for evaluating text- based features of text complexity, including readability formulas, the National 34 Assessment of Education Progress (2008) framework, and the Common Core Standards (2010) framework. These approaches evaluate text complexity in slightly different ways, but all identify means of quantifying text-based features of complexity. Readability formulas. One approach to assessing the difficulty of a text is to focus strictly on the quantitative features of the text. In this approach, the language elements present in a selection of text are counted and used to predict reader performance on a criterion measure of comprehension. These scores can then be used to create readability formulas, which can be applied to new texts to determine the text complexity. Readability formulas typically assess text complexity on three of the four domains described above: decoding difficulty, semantic difficulty, and syntactic difficulty (Powell-Smith, Good, & Atkins, 2010). Decoding difficulty is generally assessed by counting the number of letters or syllables in each word. Semantic difficulty is generally assessed by counting the number of low-frequency or rare words in the passage – as determined by a list of high-frequency words (e.g., the Dale list) or a corpus of text (e.g., Lexile Framework for Reading). Syntactic difficulty is generally assessed by counting sentence length – either the number of words per sentence, or the number of syllables per sentence. Together, these scores are combined to create an overall indicator of the complexity of the passage. While readability formulas have a long history in the assessment of text complexity, they have not gone without criticism. First, readability formulas are designed to allow for predictions of student comprehension skills; however, in practice readability scores don’t relate strongly to student comprehension of the text (Hiebert, 2011). For example, passages with short sentences and frequent words would lead to 35 “easier” designations of text complexity, but such texts may not actually be easier to comprehend (Hiebert, 2011). Beck and colleagues (Beck, McKeown, Omanson, & Pople, 1984) illustrate this argument in a study focusing on how text revisions impact student comprehension of the passage. In this work, researchers manipulated stories from basal readers to be more coherent from both bottom-up (e.g., altering specific wording or phrasing) and top-down (e.g., re-organizing events to be more conceptually consistent) perspectives. In doing so, researchers increased both the number of words in the stories as well as the readability ratings, which increased by one grade level for each story. Participants in the Beck et al. study were then presented with either the original or the revised passages, and were directed to read the passages as they would during a basal reading lesson. After completing each passage, comprehension was assessed using a measure of passage recall and a multiple-choice comprehension test. Beck and colleagues found that students who read the revised passages, which had higher readability estimates (i.e., were less readable) than the original passages, scored higher on both the recall and multiple-choice comprehension assessments. Specifically, students that read the revised passages recalled more information that was central to the passage narrative, and answered more comprehension questions than students in the control group. These findings suggest that something other than text readability – namely, text coherence and cohesion – affects student comprehension of text. Given that comprehension is the goal of reading, these findings cast doubt on the ability of readability formulas to fully capture the text elements that contribute to text complexity. More recently, McNamara (2001) examined the relationship between coherence and reader skill level. While this research focused on how coherence interacts with 36 reader skills to support comprehension, which is beyond the scope of this discussion, it also sheds some light on the relationship between coherence and readability. Like Beck, McNamara manipulated the cohesiveness of science texts to be less or more cohesive. Passages were generally low in cohesiveness, so the majority of revisions sought to increase text coherence by: replacing pronouns with nouns, adding elaborations, inserting words to connect relationships between ideas (e.g., however, because), increasing content overlap across sentences, inserting headings, adding explicit topic sentences, and rearranging sentence order. McNamara found meaningful differences in readability between the high- and low-coherence texts – while the high-coherence texts contained 900 words in 50 sentences, the low-coherence texts contained 650 words in 48 sentences. As a result, readability grade-level estimates ranged from 11.2 (high-coherence) to 9.3 (low-coherence). These differences in readability suggest that the low-coherence passages should be easier to read – in other words, to comprehend – than the high- coherence passages. However, McNamara found that low-coherence passages were only easier to read if readers had high levels of background knowledge about the topic. Without such pre-existing knowledge, readers benefitted from reading text with high levels of cohesion, even if readability was more difficult as a result. Second, there is evidence to suggest that the variables included in readability formulas may contribute to reading performance differently based on the type of text. Research by Cohen and Steinberg (1983) has examined the semantic difficulty indicator used in readability formulas within the context of science textbooks. Traditionally, readability formulas have used word lists like the Dale List of 3000 Familiar Words (Dale & Chall, 1948) to identify rare or unfamiliar words. However, elementary science 37 textbooks tend to use words that do not appear on such word lists but appear repeatedly and are defined within the context of the text, suggesting that these words are not truly unfamiliar. Consequently, readability formulas using this approach to capture semantic difficulty may overestimate the text complexity of science textbooks. Cohen and Steinberg evaluated this argument by analyzing the types of unfamiliar words present in elementary science textbooks. Using three commercially available elementary science textbooks, the researchers categorized unfamiliar words (which, according to the Dale List, made up almost 15% of evaluated words) into three categories – technical (words that were the subject of the text or were defined in the text), technical support (words that are not as recognizable as technical words but are commonly used in science), and non- technical (words that are not common in science or central to the content of the text). Cohen and Steinberg found that the majority of unfamiliar words were technical words, and that the inclusion of these technical words in the percent of rare/unfamiliar words count in many readability formulas inflated readability estimates for science texts. Similarly, readability estimates may fail to capture the unique contributors to text complexity that occur in other specialized texts, like poetry and early reading texts. According to Foorman (2009), meaning in poetry texts is often tied to language and text structure, rather than word frequency or sentence length – two features central to readability estimates. Foorman argues that poetry may include vocabulary that would be considered unfamiliar based on word lists or banks, but readers can extract meaning from the text by relying on text structure. Thus, the difficulty of such a passage may be inaccurately captured by readability estimates, which fail to assess text beyond surface- level characteristics. In contrast, early reading texts tend to contain a number of high- 38 frequency words, which would correspond with lower readability estimates; however, evidence suggests that the majority of words included in early reading texts only occur once, and consequently fail to provide enough exposures for such high-frequency words to be integrated into student sight word vocabularies (Foorman, Francis, Davidson, Harm, & Griffin, 2004, as cited in Foorman, 2009). As a result, even texts with a low percentage of rare/unfamiliar words can be challenging for early readers. Finally, some researchers have questioned the validity of a single score in capturing the complexity of a text, particularly longer texts. For example, the Lexile Map rates the narrative text Pride and Prejudice as a 1100 Lexile, corresponding with approximately an 8 th -12 th grade level. However, individual chapters of the text show great variability in readability estimates, from 670 (3 rd grade) to 1310 (college) (Hiebert, 2011). This suggests that a single readability estimate cannot capture text complexity across a variable text. Additionally, the use of a single measure of text complexity limits the treatment utility of using readability estimates to select appropriate texts (Graesser, McNamara, & Kulikowich, 2011). While a placement system like the Lexile Framework for Reading may place two readers at the same level, their individual needs may not be met by the same text. 2009 NAEP reading framework. The National Assessment of Education Progress (NAEP), or “Nation’s Report Card,” is an ongoing effort to collect data on national student achievement in academic subject areas such as reading and mathematics (National Assessment Governing Board, 2008). The NAEP is administered to a demographically representative sample of students in grades 4, 8, and 12, and can be used to assess student achievement at the national and state levels as a whole and for targeted 39 subgroups. The most recent NAEP reading assessment was administered in 2009 in accordance with the 2009 NAEP Reading Framework. A central assumption of the NAEP is that text increases in complexity from grade 4 to grade 12. Consequently, an evaluation of passage complexity is critical to the selection of appropriate testing materials. According to the Framework, selected texts must be “of the highest quality, evidencing characteristics of good writing, coherence, and appropriateness for each grade level” (p. 27) and must become “successively more complex” (p. 16) at each grade level. In general, the complexity of potential passages is evaluated by considering the following variables: passage length, quality of writing, interestingness, writing style, text organization, sentence structure, vocabulary, supplementary materials (e.g., definitions of technical terms), and elaboration. Specific text structures and features are presented for each grade level and each type of text included in the assessment (fiction, literary nonfiction, poetry, exposition, argumentation/persuasive text, and procedural text/documents). Evaluation of the identified contributors to text complexity is based primarily on expert judgment, but must also include story and concept mapping and at least two research-based readability formulas. Common Core Standards framework. Text complexity is a central component in the Common Core Standards (2010), an initiative towards universal standards in English/Language Arts, History/Social Studies, Science, Math, and Technical Subjects. These standards are based on existing educational research, and are designed to support schools in targeting the skills students need for college and workplace success. Embedded within the Common Core Standards is the expectation that students read and 40 comprehend text of increasing complexity as they progress through their schooling. In order to assess the complexity of a given text, the Common Core Standards describe a three-fold evaluation approach. Within this model, each individual evaluation contributes to the overall evaluation of the text. First, the Common Core Standards recommend a quantitative evaluation of text complexity. The Common Core Standards do not endorse any particular method of quantitative analysis; rather, they suggest a thoughtful review of existing tools to best match the measurement tool with the purpose. Some of the quantitative tools suggested for review include traditional readability formulas, newer readability methodologies like the Lexile Framework, and the Coh-Metrix system for assessing text cohesion. The Common Core Standards caution users that many quantitative measures may underestimate complex text (e.g., text with complex ideas, multiple meanings, etc.), and consequently it is important to remember that quantitative analysis is just one component of a thorough evaluation of text complexity. Second, the Common Core Standards recommend a qualitative evaluation of the text. This evaluation includes analysis of four factors: levels of meaning, text structure, language conventionality and clarity, and knowledge demands (see Table 1 for additional information about each factor). These factors are not intended to be captured quantitatively; rather, the Common Core Standards recommend using evaluator judgment and expertise to determine the contributions of each of these factors to the overall difficulty of the passage. The Common Core Standards stress that quantitative measures alone do not capture all elements of text complexity, and this qualitative evaluation is a necessary supplement to quantitative analysis. 41 Table 1. Qualitative Dimensions of Text Complexity Included in the Common Core Standards Framework. Dimension Less Complex More Complex Levels of meaning or purpose Single layer of meaning Multiple levels of meaning Explicitly stated purpose Implicit purpose, may be hidden or obscure Structure Simple Complex Explicit Implicit Conventional Unconventional Events related in chronological order Events related out of chronological order Traits of a common genre or subgenre Traits specific to a particular discipline Simple graphics Sophisticated graphics Graphics unnecessary or merely supplementary to understanding the text Graphics essential to understanding the text and may provide information not otherwise conveyed in the text Language Conventionality and Clarity Literal Figurative or ironic Clear Ambiguous or purposefully misleading Contemporary, familiar Archaic or otherwise unfamiliar Conversational General academic and domain-specific Knowledge Demands: Life Experiences Simple theme Complex or sophisticated themes Single theme Multiple themes Common, everyday experiences or clearly fantastical situations Experiences distinctly different from one’s own Single perspective Multiple perspectives Perspective(s) like one’s own Perspective(s) unlike or in opposition to one’s own Knowledge Demands: Cultural/Literary Knowledge Everyday knowledge and familiarity with genre conventions required Cultural and literary knowledge useful Low intertextuality (few if any references/allusions to other texts) High intertextuality (many references/allusions to other texts) Knowledge Demands: Content/Discipline Knowledge Everyday knowledge and familiarity with genre conventions required Extensive, perhaps specialized discipline-specific content knowledge required Low intertextuality (few if any references to/citations of other texts) High intertextuality (many references to/citations of other texts) 42 Third, the Common Core Standards propose an evaluation of reader and task considerations. While the previous two evaluations focus on text-based variability in text complexity, this evaluation shifts focus to reader variables such as cognitive skills, motivation, knowledge, and experiences (RAND Reading Study Group, 2002). Additionally, this evaluation should include a review of the complexity of the academic task assigned, as the academic expectation (e.g., skimming vs. studying) may impact how challenging the text is for that particular purpose. For example, a science text with organizing headings and highlighted vocabulary may be easier for the purpose of skimming for key points and more challenging for the purpose of identifying specific information. These three components are then combined to assign a grade band to the text. Unlike readability indices, which assign a quantitative readability score to the passage, the Standards provide recommended placement in one of the following grade bands: 2-3, 4-5, 6-8, 9-10, and 11-college/career readiness level. Because these bands span multiple grades, it is expected that students in the lower range require scaffolding and support to comprehend the text, while students at the upper range should be able to read and comprehend the text independently. While the Common Core Standards specify that all three components are equally important in an evaluation of a text’s complexity, each of the three methods should not be given equal weight for every text. Professional judgment is required to determine how appropriate each assessment is for the selected text. For example, the authors argue that a quantitative tool such as a readability formula may provide valuable information in evaluating a dramatic text, but may fail to capture the difficulty of a poem. A thoughtful 43 evaluation of text complexity should include consideration of how to weigh each component based on the individual text. Text Cohesion: A Potential Contribution to the Evaluation of Text Complexity Text cohesion captures the extent to which a text hangs together as a coherent whole (Morris & Hirst, 1991). Cohesion is different from the construct of coherence, which describes the mental picture a reader constructs during reading based on both the text and background knowledge. While coherence addresses the interaction between the text and the reader in constructing meaning, text cohesion focuses exclusively on the supportiveness of the text in facilitating comprehension. Consequently, the cohesive features of a text can be evaluated, and may support our understanding of the complexity of the text structure. Effects of cohesion as a whole. Support for the role of cohesion in oral reading performance is provided by comparisons of student performance of tasks of word list reading fluency and passage reading fluency. This allows researchers to evaluate the effect of text cohesion – inherent in the passage – versus the absence of cohesion – inherent in the word lists. If oral reading fluency is strictly a product of efficient decoding skills and cohesiveness of text plays no role, performance should be similar on a passage or the same words in a list. Jenkins and colleagues (Fuchs, Fuchs, Hosp, & Jenkins, 2001; Jenkins, Fuchs, van den Broek, Espin, & Deno, 2003a; Jenkins, Fuchs, van den Broek, Espin, & Deno, 2003b) contrasted word-list and passage reading performance for students across a range of reading skills. Students were administered two brief, fluency-based measures – one of word-list reading skill and another of passage reading skill – and a group administered 44 test of reading comprehension. Mean fluency score for passage reading was significantly higher than the mean fluency score for word list reading. Furthermore, regression analyses indicated that passage fluency uniquely explained 42% of the variance in reading comprehension scores, while word-list fluency uniquely explained only 1%. These findings suggest that the cohesiveness inherent in the passage may contribute to comprehensibility. Researchers performed additional analysis using passage fluency as the outcome variable and reading comprehension and word-list fluency as predictors. They found that word-list fluency uniquely explained 11% of the variance in passage reading while reading comprehension uniquely explained an additional 28% above and beyond word-list fluency. This may be explained by the fact that word lists by nature lack cohesion – words are unrelated and unconnected. Passages, on the other hand, contain more cohesion, which may be contributing to student ability to read connected text with appropriate rate and accuracy. These results indicate that cohesive text both 1) facilitated oral reading fluency, and 2) increased the relation between fluency and comprehension. More recent work using a sample of students receiving both English- and Spanish-language instruction evaluated the contribution of comprehension in explaining passage fluency within and across languages (Baker, Stoolmiller, Good, & Baker, 2011). In this work, participants were assessed using measures of word reading fluency, passage reading fluency, and comprehension in order to evaluate relations between skills. In addition to evaluating performance on all measures across languages, researchers were interested in examining the effects of comprehension on passage reading fluency performance, after controlling for word reading skills. Results suggest that passage 45 meaning and context contribute to oral reading fluency. First, researchers found that scores on the measures of word reading fluency were significantly lower than scores on passage reading fluency in both languages, indicating that a cohesive passage context contributes to oral reading rate. Second, correlations between passage reading fluency and comprehension scores were significantly higher than correlations between word reading fluency and comprehension in both languages. These findings indicate that factors like cohesion both increase the comprehensibility of text and are likely to increase oral reading fluency. Effects of referential cohesion. Evidence suggests that, in addition to global cohesion, specific cohesive elements contribute to text complexity. Specifically, measures of referential cohesion have been linked to differences in reading performance. In one study, researchers quantified features of text cohesion to create two cohesion indices: referential overlap (referential cohesion) and vocabulary accessibility (Duran, Bellissens, Taylor, & McNamara, 2007). The referential overlap score captured the degree to which a text displayed conceptual redundancy, or relatedness between sentences. Vocabulary accessibility went beyond typical measures of word frequency to capture word familiarity, ambiguity, and abstractedness. These indices were selected because they were hypothesized by the author to be key features of text complexity. While both scores were significantly correlated with a measure of readability, the correlations were low to moderate (.32 to .54) suggesting that the cohesion indices were measuring a similar but not identical construct as the readability estimates. Scores on these cohesion indices were then used to group four texts on two topics as easy or difficult. Participants read all four passages, and measures of reading rate and passage 46 retell were obtained. Results indicate significant differences in both reading times and retells for easy vs. difficult texts, suggesting that these measures of text cohesion are capable of distinguishing between high and low complexity texts. Additionally, work by Posner and Snyder (1975) supports a relation between comprehension and fluency by asserting that the context of a word – which refers to the relations between words in the text, a key feature of text cohesion – facilitates increased word recognition through the activation of semantic networks. While Posner and Synder did not describe it as such, this facilitation of word reading by the context can be described as a type of exophora reference, because situational knowledge activates networks of similar words (e.g., as in Latent Semantic Analysis). According to Posner and Snyder, each word processed by the reader activates a network of semantically related words, and thus speeds recognition of any subsequent stimuli that fall within the network. As the reader continues to read, the conscious expectancy process inhibits the retrieval of unexpected words. In theory, these processes should support more efficient reading of words that carry similar meaning as opposed to those read out of context or meaning. Referential cohesion is tied to this process because cohesion captures linkages between words through endophora and exophora reference. When texts are highly cohesive, readers can anticipate what is coming because the entire text is constructed as a unified whole, while texts that lack cohesion may disrupt the expectancy process by lacking clear relations between words and ideas. Stanovich and West (1981) found support for the Posner-Snyder expectancy theory in their evaluation of sentence context on word recognition. Sentence context is one component of cohesion, as context facilitates relations between words and ideas. In 47 this work, Stanovich and West manipulated the decoding difficulty of words in sentences and measured reader reaction times. Results demonstrated an interaction between word difficulty and cohesiveness of text – the more difficult the words, the greater the effect of text cohesiveness on reaction times. When context sentences were more cohesive, readers read faster, while context sentences with less cohesion increased reading times. Stanovich and West argue that reading speed increased in cohesive contexts because semantic activation occurs while readers decode difficult words. Consequently, the difficult words act to prime the reader to remaining words in the sentence, a process which is only effective if words in the text are related to the cohesive whole. In addition to these studies on the effects of referential cohesion at a global level, a number of studies explore the role of multiple referential cohesion devices in comprehension. In these types of studies passages are re-written to improve cohesion as well as other features of text complexity (such as syntactic structure), and student performance is compared on original and revised texts. For example, one study by McNamara and colleagues (1996) evaluated the effects of cohesion on reading performance by making the following revisions: replacing ambiguous pronouns with nouns or noun phrases (a component of endophora reference, a referential cohesion device), connecting unfamiliar concepts to familiar ones through elaboration, adding connectives between sentences, increasing argument overlap (a type of endophora reference), and manipulating the syntactic structure of the text by adding topic headers and topic sentences. While this type of research makes it challenging to isolate the specific effects of any one element of cohesion or cohesive device, these studies do support the inclusion of these elements and devices in the integrated model of cohesion 48 (see Table 2 for a summary of selected research on multiple cohesive devices). Studies have identified many of the cohesive devices included in the integrated model to be related to improvements in reading comprehension when studied in combination with other devices. Consequently, these elements of cohesion and cohesive devices have been identified as meaningful components of text cohesion in the integrated model of cohesion. Cohesion and Readability: Related but Distinct Constructs Research suggests that readability formulas and measures of text cohesion do not evaluate text in the same way. In a recent study, Hiebert (2011) used the Lexile Framework Lexile score and component scores (sentence length and word frequency) and a measure of referential cohesion derived from the Coh-Metrix framework (Graesser, McNamara, Louwerse, & Cai, 2004) to evaluate exemplar texts as identified by the Common Core Standards (2010). Her findings indicate that rank orderings of text complexity differ fairly dramatically depending on the metric used. For example, a text identified as the “easiest” or least complex by the overall Lexile score was ranked the “hardest” or most complex text by the referential cohesion score. Additionally, correlations between referential cohesion and the Lexile measures were not statistically significant, implying that referential cohesion is capturing something different than the Lexile readability measures. Consequently, there is a need to further evaluate the contributions of text cohesion to text complexity. Interactions Between Cohesion and Genre While cohesion is the primary variable of interest in this study, genre was also selected as a variable for this research. Genre was included in the design because there is 49 Table 2. Results of Selected Studies Evaluating the Effects of Revisions to Improve Referential Cohesion on Reading Comprehension Performance. Study Devices evaluated Findings Ozuru, Briner, Best, & McNamara, 2010 Consistency, endophora reference (anaphor reference, argument overlap) Text revisions were related to higher quality responses when asked to self-explain the text. However, performance on open-ended comprehension questions was higher for original (low cohesion) texts. Ozuru, Dempsey, & McNamara, 2009 Consistency, endophora reference (anaphor reference), exophora reference (content word overlap), semantic and syntactic structures Revised texts were associated with improved comprehension on passage-specific open-ended comprehension questions. Interactions were found between reader skill level and cohesion on comprehension. McNamara, 2001 Consistency, endophora reference (anaphor reference), exophora reference (content word overlap), semantic and syntactic structures Revised texts were associated with improvements in passage-specific comprehension questions for students with low background knowledge. Vidal-Abarca, Martinez, & Gilabert, 2000 Causal consistency, endophora reference (argument overlap) Revisions to argument overlap alone did not improve comprehension as measured by inference questioning and recall. However, revisions to both devices resulted in larger improvements in comprehension than revision to causal connectives alone. McNamara et al., 1996 Consistency, endophora reference (anaphor reference, argument overlap), semantic and syntactic structures Revised texts were associated with improvements in comprehension as measured by a recall task, open-ended questions, and a card sorting task. Interactions were noted between background knowledge and text cohesion. Britton & Gulgoz, 1991 Endophora reference (anaphor reference, argument overlap), syntactic structure Revised texts were associated with improvements in comprehension as measured by a recall task, multiple choice questions, and a keyword association task. Beck, McKeown, Omanson, & Pople, 1984 Background knowledge, conjunctions, content problems, endophora reference (anaphor reference,) semantic and syntactic structures Revised texts were associated with improvements in comprehension as measured by a recall task. Note: Devices that are highlighted in bold are identified as components of referential cohesion. 50 evidence to suggest that cohesion may impact student reading performance differently based on the genre of the passage. For example, Best and colleagues (Best, Ozura, Floyd, & McNamara, 2006) had students read two narrative and two expository texts selected from school textbooks. All passages were re-written to include both a high cohesion and a low cohesion version (which included manipulations of referential cohesion); students read one high cohesion text within each genre and one low cohesion text within each genre. Comprehension was then measured using a multiple choice question format. Results indicated a main effect for genre, with students earning higher comprehension scores on narrative texts than on expository texts. Researchers also found a main effect for cohesion, with students demonstrating greater comprehension on high cohesion passages than low cohesion passages. Finally, researcher found a significant interaction between genre and cohesion. Students demonstrated greater comprehension on high cohesion narrative passages than low cohesion narrative passages, but did not perform differently on high cohesion versus low cohesion expository texts. These results suggest that cohesion supports reader comprehension for narrative texts, but may be less important for expository texts. Further study is necessary to better understand the relationship between genre and cohesion on comprehension as well as oral reading fluency, and implications for formative assessment. Quantifying Text Cohesion Using Coh-Metrix Coh-Metrix was developed to assess text beyond the two to three components typically included in readability analysis; Coh-Metrix provides quantitative information on 54 domains of text cohesion and readability, including lexicons, syntax, and latent semantic analysis (LSA) (McNamara, Louwerse, McCarthy, & Graesser, 2010). These 51 variables are categorized into five broad indices: 1) readability, 2) general word and text information (characteristics of words in the text, such as frequency of usage), 3) syntax (syntactic complexity, syntactic composition, and frequency of the syntactic classes in text), 4) referential and semantic indices (relationships between words in the text), and 5) situation model dimensions (aspects of the text that contribute to a reader’s mental model). These indices are designed to analyze text on multiple levels of language and discourse, consistent with multilevel theoretical frameworks of text comprehension (Graesser, McNamara, and Kulikowich, 2011). Research on the Coh-Metrix tool suggests that the program is capable of differentiating between texts with high cohesion and those with low cohesion, and captures something different than text readability. In one study, the Coh-Metrix authors manipulated natural texts (i.e., texts culled from existing literature such as textbooks and encyclopedias) to create two versions of each passage: one that was highly cohesive, and another that lacked text cohesion (McNamara, Louwerse, McCarthy, & Graesser, 2010). As many features of text cohesion are available in the Coh-Metrix program, researcher selected four indices – LSA, co-reference (referential cohesion), connectives, and ratio of incidence of causal connectives to change-of-state verbs. Results indicate that readability formulas failed to differentiate between high and low cohesion texts while the Coh- Metrix tool successfully differentiated between high cohesion and low cohesion texts on all of the selected indices. These findings support the validity of Coh-Metrix in assessing cohesion and the sensitivity of the tool to discriminate between texts. Other scholars have suggested that Coh-Metrix is capable of differentiating between texts that other measures of text difficulty might deem equivalent. In one such 52 argument, Elfenbein (2011) explored previous research in which passage equivalency was key to parsing out specific text complexity effects. Elfenbein described the work of McKoon and Ratcliff (1992), in which three versions of a passage were developed to be equivalent in passage difficulty but variable in the level of inference required of readers. Central to the design is the equivalency of the three version of the passage, as they provide control for the hypothesis that it is the manipulation of level of inference that impacts reading performance. However, Elfenbein inputted each version into the Coh- Metrix tool, and found a number of linguistic differences between the passages. For example, passages varied on the incidence of connectors and proportion of overlapping content words. These results indicate that the Coh-Metrix program may be more capable of capturing distinctions between text than other means of complexity evaluation. Finally, research on Coh-Metrix suggests that the detailed information provided by the program produces more accurate estimates of text difficulty than surface-level readability characteristics. In work by Crossley and colleagues (Crossley, Greenfield, & McNamara, 2008), researchers compared the validity of three complexity indices derived from Coh-Metrix in predicting reading difficulty for English Language Learners (ELLs). Coh-Metrix variables were selected to provide information on three domains: lexical (word frequency), syntactic (syntactic structure similarity across adjacent sentences), and meaning construction (content word overlap). The lexical and syntactic indices captured much of the same information as readability formulas, while the meaning construction domain went deeper to capture a key component of referential cohesion. When using performance on a cloze task as a criterion, researchers found that all three predictors accounted for 86% of the variance in cloze performance for the ELL sample. This is an 53 increase over previous work done by the authors, in which surface-level indices accounted for 72% of the variance explained. These findings suggest that the inclusion of a measure of referential cohesion may allow us to make better predictions about text complexity and student performance. Summary The body of evidence suggests that text cohesion contributes to reader performance on measures of fluency and comprehension, and consequently may be an important component of text complexity. Research indicates that readers read more quickly and comprehend better when sentences and passages are more cohesive, and when words are provided in a cohesive context that imparts a meaning or purpose for reading. In short, cohesion matters. Therefore, methods of assessing the complexity of text that focus solely on the decodability, semantic, and syntactic features of the text and not how the words form a coherent whole may not capture the potential impact that cohesion has on reading proficiency. Measures that capture the contributions of cohesion may provide an important improvement to the assessment of text complexity. In sum, it is important to be able to understand, quantify, and control text complexity for the purposes of: building student skills in reading and understanding increasingly complex text, preparing students for the reading demands of college and the working world, improving summative assessment practices, and reducing variability in formative assessment tools to facilitate better decision making. Additionally, the Coh-Metrix tool may provide a means to evaluate text beyond those features captured in readability formulas. Coh-Metrix allows for a quantitative evaluation of features of text cohesion, and can support researcher understanding of the 54 potential contributions of cohesion to text complexity. Consequently, Coh-Metrix can be used as a tool in evaluating the referential cohesion of a passage, and the effects of referential cohesion on reading performance. 55 CHAPTER III METHODOLOGY As outlined in Chapter 1, it was hypothesized that referential cohesion and passage genre have direct and indirect effects on student oral reading fluency rate, accuracy, and passage-specific comprehension. In order to evaluate these hypotheses, this study included two qualitative independent variables, each with two levels – referential cohesion (high/low) and genre (narrative/informational). The study evaluated the effects of these independent variables on three dependent variables – oral reading fluency rate, oral reading fluency accuracy, and passage-specific reading comprehension. Participating students read four passages that were strategically selected to manipulate referential cohesion and genre while tightly controlling readability. Selected passages represented the following conditions: 1) informational text/low cohesion, 2) informational text/high cohesion, 3) narrative text/low cohesion, and 4) narrative text/high cohesion. The study design allowed for evaluation of direct effects of referential cohesion and genre on rate, accuracy, and passage-specific comprehension, as well as interaction effects between referential cohesion and passage genre on dependent variables. Independent Variables Two independent variables were manipulated in this study: genre and referential cohesion. Genre. Passage genre was identified as either narrative or informational. Genre for all passages was determined by the authors of the selected measure and verified by 56 expert ratings. Genre is defined as a dichotomous qualitative variable, with two levels: narrative and informational. Narrative texts are defined as writing that conveys experience, either real or imaginary, and uses time as its deep structure. It can be used for many purposes, such as to inform, instruct, persuade, or entertain (Common Core Standards Initiative Appendix A, 2010, p. 23). Informational texts are texts that convey information accurately. This kind of writing serves one or more closely related purposes: to increase readers’ knowledge of a subject, to help readers better understand a procedure or process, or to provide readers with an enhanced comprehension of a concept. Informational/explanatory writing addresses matters such as types…and components…; size, function, or behavior…; how things work…; and why things happen (Common Core Standards Initiative Appendix A, 2010, p. 23). Author judgments of genre were evaluated by a panel of graduate student expert reviewers using these definitions. All reviewers have received graduate-level training in school psychology, and have studied early literacy intervention and assessment. Reviewers were provided with selected passages in a random order and the Common Core Standards Initiative (2010) definitions of narrative and informational text, and asked to label passages as narrative or informational. Reviewers were in 100% agreement with each other and passage authors on genre assignments. See Table 3 for genre definitions and reviewer and author ratings. Referential cohesion. A researcher-developed referential cohesion composite score was created for this study. The researcher-developed referential cohesion 57 Table 3. Expert Reviewer and Passage Author Judgments of Passage Genre. Passage Reader 1 Reader 2 Reader 3 Author Judgment 1 Informational Informational Informational Informational 2 Narrative Narrative Narrative Narrative 3 Narrative Narrative Narrative Narrative 4 Informational Informational Informational Informational Note: Definitions of narrative and informational text were provided from the Common Core Standards Initiative (2010). Narrative texts are defined as texts that convey experience, either real or imaginary, and use time as the structure. They can be used for many purposes, such as to inform, instruct, persuade, or entertain. Informational texts are texts that convey information accurately. These kinds of text serve one or more closely related purposes: to increase readers' knowledge of a subject, to help readers better understand a procedure or a process, or to provide readers with an enhanced comprehension of a concept. composite score (RCCS) was meant to capture variables that are conceptually related to the construct of referential cohesion as described in the integrated model of cohesion. In order to determine which variables to include in the RCCS, the primary researcher performed a qualitative evaluation of all Coh-Metrix variables to identify variables that can be linked to the referential cohesion devices outlined in the integrated cohesion model. This evaluation was based on existing work using Coh-Metrix to measure referential cohesion (see Hiebert, 2011; Graesser, McNamara, & Kulikowich, 2011) and correspondence to the integrated model of cohesion. A total of five variables were identified. These five variables were determined to be explicitly related to the integrated model of cohesion: adjacent anaphor overlap, adjacent argument overlap, content word overlap, stem overlap, and latent semantic analysis (sentence all). Figure 4 outlines the relations between these variables and referential cohesion, as conceptualized in the 58 integrated model of cohesion. Each of these variables is described in detail below and summarized in Table 4. Figure 4. Measurement model of referential cohesion. Adjacent anaphor overlap. The Coh-Metrix adjacent anaphor overlap variable captures the proportion of anaphor (pronouns that refer to previous nouns) references between adjacent sentences. For example, in the sentences “Jasmine stayed up all night studying for a physics exam. In the morning, she was exhausted,” the anaphor “she” in the second sentence refers to referent “Jasmine.” Adjacent anaphor overlap is a feature of endophora reference because anaphors are co-referent within the text, and do not require the reader to reference information outside of the text. Adjacent argument overlap. The Coh-Metrix adjacent argument overlap variable captures the proportion of adjacent sentences that share arguments. An argument refers to a noun, pronoun, or noun phrase. Consider the following sentences: “Jimmy’s family went out for ice cream. Jimmy chose chocolate, because it is his favorite flavor.” In both • • • meaning • Endophora reference Exophora reference Adjacent anaphor overlap Adjacent argument overlap Stem overlap Content word overlap Latent semantic analysis Cohesive Element Cohesive Device Cohesive Device Measurement Measurement 59 of these sentences, the noun “Jimmy” is used, linking the sentence through a shared reference. Adjacent argument overlap is a component of endophora reference, because it maintains a continuity of reference within the text itself. Content word overlap. The Coh-Metrix content word overlap variable captures the proportion of content words that overlap between adjacent sentences. For example, consider these sentences: “The American Civil War was initiated by the secession of several states from the Union. A total of eleven states declared their session and formed the Confederate States of America.” In these sentences, “secession” is a key content word and is represented in both sentences, linking the sentences through content word overlap. Unlike LSA, in which conceptually similar words are activated through shared semantic networks, content word overlap is a feature of endophora reference because it creates explicit linkages within the text. In other words, content word overlap does not require the reader to infer relations between words based on conceptual similarity; instead, these relations are made explicit through continuity of reference within the text. Stem overlap. The Coh-Metrix stem overlap variable captures the proportion of all sentence pairs in a paragraph that share one or more word stems. In this context, a stem refers to a core morphological element. For example, the words “electricity” and “electrical” share a common morphological element – the word part “electric,” which informs the reader that both words refer to flow of electrical charges. The overlap of the word part “electric” informs the reader that both sentences are referring to the same thing, idea, or concept – they are co-referent. Stem overlap is a feature of endophora reference because co-reference is contained within the text through the repetition of shared morphological elements. 60 Table 4. Coh-Metrix Variables Included in the Researcher-Developed Referential Cohesion Composite Score (RCCS). Variable Description Example Discussion Adjacent argument overlap Proportion of adjacent sentences that share common arguments (nouns, pronouns, or noun phrases) Cell division occurs to reproduce and replace cells. The division of cells with a membrane-bound nucleus and organelles (eucaryotic cells) involves two distinct but overlapping stages, mitosis and cytokinesis. The word cells overlaps between two adjacent sentences LSA sentence all Conceptual similarity of word meanings across all sentences The field was full of lush, green grass. The horses grazed peacefully. The young children played with kites. The women occasionally looked up, but only occasionally. A warm summer breeze blew and everyone, for once, was almost happy. The words in the text tend to be thematically related to a pleasant day in an idyllic park scene: green, grass, children, playing, summer, breeze, kites, and happy Content word overlap Proportion of content words that overlap between adjacent sentences One stage of cell division is mitosis. Mitosis occurs to replicate the cell's genetic material in the nucleus. The words cell and mitosis are content-specific words that recur across sentences. Stem overlap Proportion of all sentence pairs in a paragraph that share one or more word stems (core morphological element) The division of cells with a membrane-bound nucleus and organelles (eucaryotic cells) involves two distinct but overlapping stages, mitosis and cytokinesis. Mitosis occurs to replicate the cell's genetic material in the nucleus, whereas cytokinesis occurs to divide the gel-like liquid surrounding the cell's nucleus, called cytoplasm. The word division has a stem overlap with divide Adjacent anaphor overlap Proportion of anaphor (pronouns that refer to previous nouns) references between adjacent sentences There are four distinct phases of mitosis called prophase, metaphase, anaphase, and telophase. These four phases are well known to researchers who can easily observe them with, for example, the simple light microscope. The pronoun them refers to phases in the previous sentence Source: McNamara, D.S., Louwerse, M.M., Cai, Z., & Graesser, A. (2005). 61 Latent semantic analysis (sentence all). The Coh-Metrix latent semantic analysis (sentence all) variable captures the conceptual similarity of word meanings across all sentences using a procedure called latent semantic analysis (LSA). Latent semantic analysis is a computer-based method of capturing the semantic similarity of words in the text based on frequent word co-occurrence (Magliano & Millis, 2003). For example, words like “solar system” and “planets” are more likely to co-occur than words like “solar system” and “blueberry;” LSA captures the similarity of words through statistical analysis and provides an overall score between 0 and 1. While latent semantic analysis evaluates text on a semantic level, it can be considered a component of referential cohesion and not lexical accessibility and diversity because it captures the extent to which words in a text relate to one another and activate similar semantic networks, a feature of exophora reference. Prior work has established a precedent for evaluating referential cohesion using a researcher constructed variable. For example, recent work by Hiebert (2011) used individual Coh-Metrix variables to create a referential cohesion composite. In this work, Hiebert created a referential composite score using argument overlap and stem overlap variables. These variables capture components of the endophora reference domain of referential cohesion, but they do not represent all of the available information that may contribute to referential cohesion – namely, other aspects of endophora reference (such as anaphor reference) and exophora reference. As a result, the RCCS was created for this study to capture a broader range of cohesive devices that contribute to referential cohesion. Constructing the RCCS. Before constructing the RCCS the researcher evaluated 62 inter-correlations between the five variables to be included in the composite, as variables that capture a common construct should, in theory, be inter-correlated. Results of these correlations are presented in Table 5. Table 5. Inter-Correlations Between Variables Included in the Referential Cohesion Composite Score (Z-Scores). Content word overlap Stem overlap Adjacent anaphor overlap Latent semantic analysis Adjacent argument overlap .78 ** .54 ** .24 .50 ** Content word overlap .38 ** .20 .55 ** Stem overlap -.43 ** .63 ** Adjacent anaphor overlap -.29 * Note: Correlations marked with a * are significant at the p < .05 level. Correlations marked with a ** are significant at the p < .01 level. In general, small to modest correlations were found between the variables included in the RCCS. This was to be expected, as each variable captures a small and distinct component of the larger construct of referential cohesion. Some of the individual variables were not correlated; this was also expected, as it is not possible for these variables to occur together. For example, correlations between adjacent argument overlap and adjacent anaphor overlap were expected to be non-significant because anaphors are used in lieu of, rather than in addition to, arguments. In other words, sentence pairs that demonstrate adjacent argument overlap will, by definition, fail to demonstrate adjacent anaphor overlap because arguments are used in place of anaphors. The first step in creating the RCCS was to set each variable to the same scale of measurement by converting it to a z-score for use in a unit-weighted improper linear 63 model (Dawes, 1979). The Coh-Metrix program provides raw counts for each variable included in the RCCS, and the metric varies based on what is being calculated (e.g., proportion, frequency, etc.). Using Coh-Metrix analysis output for all considered passages (N = 29), means and standard deviations were calculated for each variable included in the RCCS. These means and standard deviations were then used to convert raw scores into z-scores. Once variables were converted to a standard metric, they could be combined to create a composite score. Because there were no hypothesized differences in how each variable contributes to the overall referential cohesion of a passage, all variables were equally weighted by averaging the z-scores together (Dawes, 1979). The resulting score was also converted to a z-score, which became the RCCS. RCCS scores ranged from -1.12 to 3.17. It is unknown how these values relate to other methods of measuring referential cohesion. While the RCCS was measured quantitatively, referential cohesion was treated as a qualitative variable with two levels: high cohesion (RCCS above the 75 th percentile) and low cohesion (RCCS below the 25 th percentile). This decision was made to maximize differences in referential cohesion, and to capture how text complexity estimates are used in application. For example, educators may select texts based on the assigned reading level (i.e., 1 st grade, 2 nd grade, 3 rd grade, etc.). While text complexity estimates may vary on a quantitative scale, in practice educators rely on ordinal scales to interpret and apply text complexity information. Readability In order to evaluate the unique contribution of referential cohesion on reader comprehension and reading rate and accuracy, readability was held constant across all 64 passages. All passages considered for inclusion in the study were evaluated using the Lexile® Framework for Reading. The Lexile Framework for assessing text complexity evaluates text on two domains: syntactic and semantic complexity. As in other widely available readability formulas (see Klare, 1974), Lexiles use mean sentence length as a proxy for syntactic complexity. Where Lexiles differ from other readability formulas is the evaluation of semantic complexity; rather than using high-frequency word lists to categorize uniqueness of words, the Lexile measure draws from a corpus of texts containing nearly 600 million words (Lennon & Burdick, 2004). Based on these two variables, texts are assigned a Lexile score ranging from 200 to 1700+, with lower scores indicating higher readability (i.e., lower text complexity) and higher scores indicating lower readability (i.e., higher text complexity). For this study, passages were selected to meet specific criteria for referential cohesion, readability, and genre. As a result, it was not possible to target passages within a specified readability range (e.g., selecting only highly readable passages). Instead, the pool of passages was first narrowed based on referential cohesion and then genre, and, from the remaining pool, passages with nearly identical readability were selected for study inclusion. Manipulating Independent Variables: Passage Selection Passages were strategically selected from the available set of passage probes to allow for the testing of the main effects of referential cohesion and genre and two-way interaction effects between the independent variables. Because the selected population is third grade students, all third grade DIBELS Oral Reading Fluency (DORF) passages were considered for study inclusion. All benchmark and progress monitoring passages 65 were considered, for a total of 29 passages. From these 29 passages, four passages were selected to test the effects of referential cohesion and genre on reading rate, accuracy, and comprehension using the following procedure: Measure referential cohesion and identify “low” and “high” cohesion passages. Once the battery of potential passages was identified, all passages were analyzed using the Coh-Metrix tool and the referential cohesion composite score (RCCS) was computed. The 29 passages were then divided into quartiles based on RCCS score in order to identify passages with high and low cohesion. All passages below the 25 th percentile (RCCS < -0.75, n = 7) were considered for inclusion as “low cohesion passages,” and all passages above the 75th percentile (RCCS > 0.48, n = 7) were considered for inclusion as “high cohesion passages.” Identify passages with similar readability scores. The passages in the DIBELS Next assessment battery were developed with the intent of closely controlling readability. Within each grade level, only texts that represented readability within a specified range were included in the final measure (Powell-Smith et al., 2010). Despite such attempts to control for readability, passages included in the third grade set of oral reading fluency measures range in readability from 640 to 860 on the Lexile scale (according to the Common Core Standards [2012] Lexile to grade correspondences, this range of Lexile scores spans third and fourth grades; using the MetaMetrix [2013] Lexile to grade correspondences, these scores span third through eighth grades). Because this study is designed to evaluate the effects of referential cohesion when readability is held constant, readability scores were examined for each potential passage. From the sample of seven 66 “low cohesion passages” and seven “high cohesion passages,” the researcher identified passages that were nearly identical in Lexile score. Identify two passages within each genre. Once high and low cohesion passages with similar readability scores were identified, the researcher identified one high cohesion and one low cohesion text within each genre. A total of four passages were selected, to capture the following conditions: 1) informational text/low cohesion, 2) informational text/high cohesion, 3) narrative text/low cohesion, and 4) narrative text/high cohesion. See Table 6 for detailed information about each of the selected passages. Table 6. DIBELS Next Oral Reading Fluency Passages Selected for Study Inclusion. Condition Probe Genre Lexile RCCS Condition A Woodland Path Progress Monitoring #7 Narrative 760 0.85 Narrative/High cohesion Living in Singapore BOY Benchmark #3 Narrative 750 -1.19 Narrative/Low cohesion Raising a Calf MOY Benchmark #2 Informational 790 1.22 Informational/High cohesion Save the Turtles! Progress Monitoring #11 Informational 790 -0.81 Informational/Low cohesion Note: BOY = beginning of year, MOY = middle of year, RCCS = referential cohesion composite score. Dependent Variables Three dependent variables were selected for examination: reading rate, accuracy, and passage comprehension. These variables all capture components of oral reading fluency, defined as: 67 efficient, effective word recognition skills that permit a reader to construct the meaning of text. Fluency is manifested in accurate, rapid, expressive oral reading and is applied during, and makes possible, silent reading comprehension” (Pikulski & Chard, 2005, p. 3). Dependent variable #1: rate. As noted by Pikulski & Chard (2005), one component of oral reading fluency is reading with sufficient rate. Traditionally rate, as measured by the number of words read correctly in one minute (wcpm), is the primarily score obtained in oral reading fluency measures. Reading rate was selected as a dependent variable because: 1) it captures the complex integration of multiple skills necessary for reading with comprehension, and 2) there is evidence that reading rate is a strong indicator of overall reading performance. First, reading rate captures the process of mastering decoding skills to the point of automaticity, a critical component of reading with comprehension as recognized by automaticity, interactive, and reciprocal relationship theories of reading development (Fuchs, Fuchs, Hosp, & Jenkins, 2001). Second, empirical research indicates that measures of oral reading fluency (which determine reading competence largely on wcpm scores) may be more highly correlated with a criterion test of reading comprehension than more direct methods of measuring reading comprehension, such as question answering, retelling, and close procedures (Fuchs et al., 2001). Additionally, strong correlations have been found between rate scores and student performance on high-stakes state testing (Wood, 2006; Stage & Jacobsen, 2001). Dependent variable #2: accuracy. Oral reading accuracy score, expressed as a percentage of words read correctly, captures a second component of Pikulski & Chard’s 68 (2005) definition of fluency. Accuracy was included as a dependent variable because accurate reading is a critical part of reading proficiency, as accuracy is necessary for comprehension. As noted by Kame’enui and Simmons (2001) “fluency as an index of sheer speed without accuracy is a reckless indicator of processing, cognitive or otherwise. Instead, fluency should always serve to index both accuracy and speed” (p. 206). Consistent with this argument, a measure of accuracy is included in the evaluation of student oral reading fluency. Dependent variable #3: comprehension. It was hypothesized that passage comprehension is one mechanism by which referential cohesion affects oral reading fluency rate and accuracy. As a result, it was essential that the selected reading comprehension measure capture passage-specific comprehension, rather than global comprehension or verbal reasoning skills. While many tools are available to capture passage-specific reading comprehension, a passage recall task was selected to measure comprehension. The selected task allows for measurement of comprehension at the individual idea unit-level, providing a far greater sample of student responses than a question-based task. It also specifically targets the text and the text’s impact on understanding, rather than student-level comprehension construction and integration skills. While these are important and meaningful components of comprehension, understanding the specific comprehension processes students use in reading a text is beyond the scope of this study. Previous work in the area of text cohesion and reading comprehension has used a variety of comprehension measures, such as cloze (e.g., Greenfield, 1999), recall (e.g., Beck, McKeown, Omanson, and Pople, 1984; Beck, McKeown, Sinatra, and Loxterman, 1991; Britton and Gulgoz, 1991; McNamara & 69 Kintsch, 1996; Vidal-Abarca, Martinez, and Gilabert, 2000), multiple-choice questions (e.g., Britton and Gulgoz, 1991; McNamara & Kintsch, 1996), open-ended questions (e.g., Beck, McKeown, Sinatra, and Loxterman, 1991; McNamara & Kintsch, 1996; Vidal-Abarca, Martinez, and Gilabert, 2000; McNamara, 2011), keyword sorting or association (e.g., Britton and Gulgoz, 1991; McNamara & Kintsch, 1996). In selecting a comprehension measure for this study, it was important that the comprehension measure could be used in combination with a measure of oral reading fluency, as both measures are based on the same passage. This restriction makes it challenging to use cloze or maze procedures, as they are not designed to be used after a student has already read the complete passage to measure oral reading fluency. Multiple choice and open-ended questions can be given after a student completes a one-minute timed read of the passage for fluency, but student performance is largely related to the quality of the questions and, in the case of multiple-choice tasks, response choices. Additionally, comprehension questions generally sample understanding from select portions of the passage; while main idea-type questions may capture whole-passage comprehension, they require student- level comprehension skills that are independent of the task. For example, these types of questions may require a student to use deductive or inductive reasoning skills, which represent general comprehension skills rather than specific understanding of the passage itself. For these reasons, a recall task was selected as the measure of comprehension. Measures Oral reading fluency. Student oral reading rate and accuracy were assessed using the passages from the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) – Next Edition, a curriculum-based measurement system. One of the measures, DIBELS 70 Oral Reading Fluency (DORF), reportedly captures a number of components of overall reading proficiency including advanced phonics skills, accurate fluent reading of connected text, and reading comprehension (Good, Kaminski, Dewey, Wallin, Powell- Smith, & Latimer, 2011). According to the DIBELS Next Technical Manual (Good et al., 2011), the standard error of measurement (SEM) for the wcpm score for a single DORF passage in third grade is 11.29. The SEM can be used to compute a confidence interval for an individual test score. In order to calculate a 95% confidence interval, the SEM is multiplied by 1.96. For example, if a student earned a DORF rate score of 92 wcpm, there is 95% confidence that the student’s true score lies within the range of 70 to 114 wcpm. In order to evaluate the effect of referential cohesion on oral reading fluency and comprehension, the DIBELS Oral Reading Fluency (DORF) passages were administered and scored in a non-standardized manner. Students were administered each passage using standardized DIBELS Next directions, but, rather than stop the student after one minute, the examiner allowed the student to read the entire text. The examiner recorded the total time it took for the student to read the passage, and this time was used to calculate an overall words correct per minute (wcpm) rate score (words correct/total time in seconds * 60). Accuracy score was calculated based on performance on the entire passage (words correct/total words in passage * 100). These alternative procedures were used for two reasons. First, text cohesion and readability estimates were calculated based on the entire passage. It is possible that the first minute of the passage (which will vary for each student) is more or less readable or cohesive than the entire passage; allowing students to read the entire passage ensures that readability and referential cohesion ratings 71 align with the actual text students are exposed to. Second, standardized administration constrains the extent to which the student can access the comprehension measure. Specifically, students that read more text are able to demonstrate comprehension of a greater number of idea units than students who read with a slower rate. Allowing all students to read the entire passage provided all students with exposure to the same amount of content. In order to assess the validity of these non-standardized procedures, two additional types of oral reading fluency data were collected. First, the primary researcher accessed school-wide easyCBM oral reading fluency data collected as a part of school- wide benchmarking procedures for all students who participated in the study. These data were collected during the same two-week period as study data collection. While passage development and equating procedures are different than those used by the authors of the passages used in the study, easyCBM oral reading fluency probes follow standardized CBM procedures to provide a measure of oral reading fluency rate (words read correctly per minute; wcpm). Second, during data collection examiners recorded student scores for the first minute of administration in addition to scores for the entire passage (referred to as “first minute” scores). Descriptive statistics for all passages can be found in Table 7. An examination of correlations between the three types of scores indicates that the non-standardized DIBELS Next wcpm scores (referred to as “pro-rated” scores) correlate strongly with easyCBM and first minute DIBELS Next wcpm scores for all passages (r = .91 to .98). These correlations suggest that the non-standardized, pro-rated procedure did not compromise the validity of the rate measure, and that the results and 72 Table 7. Descriptive Statistics for easyCBM Benchmark and Study Passage Rate Scores (First Minute and Pro-Rated Whole Passage). Passage N M SD easyCBM Benchmark 74 118.12 40.50 Narrative/High Referential Cohesion Pro-Rated 74 94.55 38.05 Narrative/High Referential Cohesion First Minute 74 96.54 39.68 Narrative/Low Referential Cohesion Pro-Rated 74 88.28 33.16 Narrative/Low Referential Cohesion First Minute 74 84.15 34.36 Informational/High Referential Cohesion Pro-Rated 74 90.97 33.71 Informational/High Referential Cohesion First Minute 74 85.72 32.06 Informational/Low Referential Cohesion Pro-Rated 74 89.09 32.81 Informational/Low Referential Cohesion First Minute 74 91.12 33.16 conclusions are likely to be applicable to common educational practice for progress monitoring. See Table 8 for correlations for all passages. Passage recall. Student passage-level comprehension was measured using a passage recall task. This task is based on the work of McMaster and colleagues (McMaster et al., 2012), who adapted the coding scheme of van den Broek and colleagues (e.g., Kendeou & van den Broek, 2005; Linderholm & van den Broek, 2002) in order to capture student recall at the idea-unit level. In this type of recall, students are asked to retell the passage and responses are coded based on how closely students captured the meaning or gist of each idea unit in the text. In preparation for data collection, each original passage was parsed into individual idea units. An idea unit is defined as a distinct, identifiable, and meaningful idea, which 73 Table 8. Correlations Between easyCBM Benchmark and Study Passage Rate Scores (First Minute and Pro-Rated Whole Passage). N-HRC Pro- Rated N-HRC First Min N-LRC Pro- Rated N-LRC First Min I-HRC Pro- Rated I-HRC First Min I-LRC Pro- Rated I-LRC First Min easyCBM .94 .91 .95 .93 .94 .94 .94 .91 N-HRC Pro-Rated .97 .97 .96 .98 .96 .97 .94 N-HRC First Min .95 .95 .96 .95 .96 .94 N-LRC Pro-Rated .98 .97 .96 .97 .95 N-LRC First Min .96 .95 .96 .94 I-HRC Pro-Rated .98 .97 .95 I-HRC First Min .96 .95 I-LRC Pro-Rated .97 Note: N-HRC = Narrative/high referential cohesion passage, N-LC = Narrative/low referential cohesion passage, I-HRC = Information/high referential cohesion passage, I- LRC = Informational/low referential cohesion passage. All correlations are significant at the p < .01 level. generally includes a subject and a verb and constitutes an independent or dependent clause. For example, consider the following sentence from a selected passage “Of the seven species of sea turtles, the largest is the leatherback.” This sentence was parsed into two idea units because there are two distinct thoughts expressed in the sentence – 1) “of the seven species of sea turtles” (idea: there are seven species of sea turtles), and 2) “the largest is the leatherback” (idea: the largest species of sea turtle is the leatherback). This definition allows for researcher judgment in important main ideas and allows the researcher to capture the specific information of interest, and is consistent with previous work by McMaster (2012). Each passage was parsed into 34-37 individual idea units. 74 The recall task was administered immediately after each oral reading fluency passage. Students were presented with study-standardized recall directions. All student recalls were recorded for transcription and coding. Student recalls were untimed. General prompts were provided until students indicated that they could not remember any additional information about the text (e.g., Lynch & van den Broek, 2007) After data collection concluded, student recalls were parsed into individual idea units and compared to the original text idea units. Based on this comparison, recalled idea units were coded as: 1) conservative, 2) liberal, 3) no match-consistent, or 4) no match-inconsistent. These codes were developed by McMaster and colleagues (McMaster et al., 2012), and definitions are consistent with those provided by the original authors. For the purpose of this research, one code was omitted (highly connected). This code was designed to capture the number of causal connectives in the student’s recall. Because causal consistency (a feature of grammatical cohesion) was not a variable of interest for this research, it was omitted from the design. Descriptions of each code follow, and examples of these codes as applied to actual student responses can be found in Table 9. Conservative. Conservative recalled idea units are literal or near-literal retellings of the targeted idea unit. A conservative response accurately captures the meaning or gist of the idea unit and includes most or all of the words in the original text. A conservative response also captures all important components of the original idea unit (e.g., all characters or actions). Liberal. Liberal recalled idea units are non-literal retellings of the targeted idea unit. A liberal response somewhat captures the primary meaning of the idea unit, but 75 may be summarized in the reader’s own words. Additionally, a liberal response may omit a detail or important component from the original idea unit. No match-consistent. No match-consistent recalled idea units are retellings that cannot be matched directly to an idea unit in the text, but represent a logical or valid inference based on the text. No match-consistent responses are consistent with the text meaning but go beyond what is included in the original text. The inclusion of this code in the coding scheme allows the comprehension measure to capture readers who have gone beyond the text to form a mental representation of the passage meaning. No match-inconsistent. No match-inconsistent recalled idea units do not match directly with an idea unit and are inconsistent with the meaning of the text. These responses may be incorrect recall of text information, student opinion, off-track responses, etc. Each retell was assigned a score for each of the four codes: 1) total number of conservative responses, 2) total number of liberal responses, 3) total number of no match- consistent responses, and 4) total number of no match-inconsistent responses. The total number of conservative responses, liberal responses, and no-match consistent responses were added together to obtain a total number of consistent recall responses. This score was then divided into the total number of idea units from the original text, in order to obtain a proportion of consistent (or “correct”) responses. It was possible for students to earn a comprehension score that exceeded one (i.e., student provided conservative or liberal responses for all idea units in the text and provided no-match consistent responses); however, this did not occur. This comprehension score is slightly different from that used by McMaster and colleagues (McMaster et al., 2012). In McMaster’s 76 Table 9. Sample Coding of Student Responses to the Passage Retell Task. Original Idea Unit Student Response Code Rationale The largest is the leatherback. Leatherbacks are the biggest sea turtles. Conservative The student’s response is a near literal retelling of the original text. The response captured the central idea of the idea unit – that leatherbacks are the largest – and included the implicit co-referent – leatherbacks are the largest of the sea turtles. Other types of sea turtles are not able to do this. And other turtles can't really do that. Conservative While this student used his/her own words, this response captures all of the important parts of the original idea unit – that there are other types of turtles, and that they are not able to do something. One thing Nell and her family had to get used to was the rain. They had to get used to the rain Conservative The student’s response includes all of the important features of the original text – that the subject is Nell and her family (captured by the use of “they”) and that they had to get used to the rain. It also includes most of the words from the original idea unit, making it a near-literal retelling. In the clearing was the most beautiful waterfall they had ever seen. They found a waterfall Liberal The student’s response captured the main idea of the original idea unit, which is that the children found a waterfall. However, the response is missing key details, such as the waterfall being the most beautiful that the children had ever seen. The whole family moved She moved somewhere Liberal The student’s response captured the gist of the idea unit (someone moved) but failed to include a key detail – that it was the whole family and not just the protagonist that moved. Inclusion of this detail would make this response Conservative. They are called leatherbacks because they have a softer, more flexible shell than other turtles. They have much softer and more flexible shells than other turtles. Liberal This response captured the central idea of the original idea unit – that the leatherback’s shell is softer and more flexible than that of other turtles. However, it omits a key detail – that this shell is why leatherbacks were given that name. This omission makes this a liberal response (it captures the gist, but excludes some key words or details from the original). 77 Table 9 continued Original Idea Unit Sample Response Code Rationale You must not throw plastic bags or anything in the ocean No Match- Consistent This response cannot be matched to an idea unit in the original text, as the text never explicitly stated that people should not throw trash into the ocean. However, the text did state that plastic bags are harmful to sea turtles, and that people are beginning to recycle and throw away fewer plastic bags. It would be a logical inference to make that people should not throw plastic bags into the ocean (because they are harmful to turtles). Nell wanted an ice cream cone, No Match- Consistent The text states that the protagonist (Nell) stopped and stared at the snow cones, and then her dad bought her one. While her mental state is not stated explicitly, it would be reasonable to deduce that she was staring at the snow cone and got one because she wanted one. They like hiking. No Match- Consistent The original text does not contain an idea unit in which it is explicitly stated that the characters enjoy hiking. However, the text states that the characters hike every day, spend the whole day exploring, and are excited when they find a new path. It is reasonable to infer that the characters like to hike. You never put dresses on turtles. No Match- Inconsistent This response cannot be matched to an idea unit in the original text, making it a No Match response. It is inconsistent with the original text because it is not a logical and reasonable inference that could be made from the text. Who would name a girl Nell? No Match- Inconsistent This response cannot be matched to an idea unit in the text. It is an inconsistent response because it is a non-sequitur. Well they lived in a small cottage No Match- Inconsistent The original text does not provide any information about where the characters live, nor it is suggested that their home is small. It is not an unreasonable inference, but is also not supported in any way by the text, making it an inconsistent response. 78 work, the number of conservative, liberal, highly connected (code not used in this study), no match-consistent, and no match-inconsistent codes were added together and divided by the total number of idea units in the story. This procedure was not used in the present study because: 1) the highly connected code was not used in this study, and 2) the author determined that no match-inconsistent responses do not indicate passage-specific comprehension, and consequently should not be counted toward the student’s comprehension score in this study. Participants Participants were recruited from two public elementary schools in the Pacific Northwest. Neither school currently uses DIBELS Next Oral Reading Fluency (DORF) passages for screening or progress monitoring. All third grade students at each participating school were invited to participate through an open recruitment letter. A total of 117 students were invited to participate. The parents/guardians of 14 students did not provide consent to participate. Consent forms were not returned for 12 students. Thus, consent to participate was provided for 91 students. Of these students, one was no longer enrolled at the school when testing began and was consequently not assessed. Additionally, seven of the 91 students with consent but did not participate in the study due to absences or scheduling conflicts during the testing window. Consequently, 83 students participated in the study. Of these 83 students, 74 had complete data (rate, accuracy, and comprehension scores for all four passages). Incomplete data was due to passage spoilage (n = 2) or student request to discontinue testing (n = 7). All analyses included only students with complete data (n = 74). Third grade students were selected 79 because, by third grade, students should have developed enough reading skills to be able to complete the tasks and show meaningful variability in reading competence. Procedure Data collector training. Data were collected by graduate students in the special education and clinical services department at the University of Oregon. All data collectors reported having some prior training in DIBELS Next administration. In addition, all data collectors were required to attend a training session in administration and scoring of oral reading fluency passages. This training was led by the primary researcher, who has attended DIBELS Next Essentials and Mentor trainings and is a member of the DIBELS Mentor Network. The training session included background on the measure and its use, review of administration and scoring procedures, and opportunities to practice scoring oral reading fluency probes. At the conclusion of the training, examiner inter-rater agreement data were collected based on live administration of passages to the trainer. Data collector scores were compared to a master key, which was developed by the trainer. All data collectors achieved at least 90% inter-rater agreement with the master key (range: 97% to 98% inter-rater agreement). One data collector chose to complete a second inter-rater agreement check for extra practice. Inter- rater agreement was calculated by dividing the number of items (words) in agreement (correct or incorrect) by the total number of items (words) in the passage, for a percent agreement score. Data collection. All participating students were asked to read all four selected DORF passages. Passages were presented in random order (nconditions = 24), and examiners and students were blind to passage condition. All passages were administered 80 using standardized study directions. A discontinue rule was included in the standardized procedures for students who could not read any correct words in the first line; however, no students met criteria for implementation of the discontinue rule. After each passage, students completed the passage recall task using researcher-developed directions. Passage recalls were audio recorded to allow for transcription and idea-unit level coding. Administration time for all four passages was approximately 30 minutes per student. Efforts were made to restrict testing to a single testing session; however, due to variables outside of the researcher’s control (e.g., unanticipated interruption, testing taking longer than expected), some students were tested across two sessions. All students were assessed during the same two-week time period in January of 2013. Inter-rater agreement data were collected on 20% of the final sample (complete data only, n = 15). For each examiner, inter-rater agreement was collected for 17% to 21% of students tested. Inter-rater agreement data were collected by comparing item- level scores (words scored as correct or incorrect) for the entire passage as scored by the examiner and a shadow scorer (primary researcher). Inter-rater agreement ranged from 95% to 100% agreement. Coding of passage recalls. After all data were collected, the audio recordings of the passage retells were transcribed by a professional transcription company. Upon receipt of written transcription of passage recalls, coding of responses began. All passage recalls were coded by the primary investigator by parsing the student recall into idea units and assigning a code to the idea unit based on correspondence with the original text. Based on these codes, each retell was assigned frequency scores for each type of code (conservative, liberal, no match-consistent, no match-inconsistent). The total number of 81 consistent responses (conservative, liberal, no match-consistent) was divided by the total number of idea units in the original passage to obtain the comprehension score. Participant Incentives As a thank you for participating in the study, teachers of participating classrooms (n = 5) were given gift cards to a local bookstore, to be used to purchase curricular and other materials for the classroom. These materials were intended to benefit the entire classroom, not just the students that participated in the study. Funding for these gift cards was provided by the research department of the participating school district. Summary This study evaluated the effects of referential cohesion and passage genre on oral reading fluency rate, oral reading fluency accuracy, and passage-specific comprehension. This research utilized an experimental, within-subjects, repeated measures design with two qualitative independent variables and three quantitative dependent variables. Strengths of the design include the control of passage readability, the development of a referential cohesion composite which includes all components of referential cohesion as outlined in the integrated model of cohesion, and the repeated measures design. One notable strength of this design is the passage-specific reading comprehension measure. This measure was carefully selected to capture the reader’s understanding of specific elements of each individual passage, rather than a student’s global reasoning or inferencing skills. Unlike other measures of reading comprehension, which may measure how much a student recalls or student comprehension of specific features of the text, this recall task captures the breadth of student comprehension of the text (by measuring 82 comprehension of the entire passage) and the depth of student understanding (by capturing literal and non-literal responses as well as logical inferences). 83 CHAPTER IV RESULTS The final design included two qualitative, within-subjects independent variables with two levels, which were analyzed using two-way analysis of variance (ANOVA) with dependent observations. Three univariate ANOVAs were performed, one for each quantitative dependent variable. Analysis allowed for evaluation of main independent variable effects (genre and referential cohesion) and interaction effects. Analyses evaluated the following research questions: 1. When readability is held constant, do students read more words correctly per minute on passages with higher referential cohesion than passages with lower referential cohesion? 2. When readability is held constant, do students read passages with higher referential cohesion with greater accuracy than passages with lower referential cohesion? 3. When readability is held constant, do students perform better on a measure of passage-specific reading comprehension for passages with higher referential cohesion than passages with lower referential cohesion? 4. When readability and referential cohesion are held constant, do students read more correct words per minute on narrative texts than informational? 5. When readability and referential cohesion are held constant, do students read narrative texts with greater accuracy than informational texts? 6. When readability and referential cohesion are held constant, do students perform better on a measure of passage-specific reading comprehension on narrative texts 84 than informational? 7. If differences in oral reading performance are noted on high and low cohesion passages (questions 1, 2, and 3), do the effects depend on whether the text is narrative or informational? Characteristics of the Invited Sample A total of 117 students across five third grade classrooms in two school schools were invited to participate in the study, though only 116 were still enrolled at the time of winter benchmark assessment. Forty-five students attended School 1, and 71 students attended School 2. All invited students were administered the easyCBM winter benchmark by school personnel as a part of the schools’ universal screening processes. Scores are used by school personnel to identify a student’s level of risk for future reading failure (cut points determined by the easyCBM authors and participating school district). Students falling in the “low risk” range performed at or above the 50th percentile on national norms, and are considered to have a low risk of future reading failure. Students falling in the “some risk” range performed between the 20th-49th percentiles on national norms, and are considered to have some risk of future reading failure without strategic reading intervention. Students falling in the “high risk” range performed below the 20th percentile on national norms, and are considered to be at a high risk for future reading failure without intensive intervention. At School 1, 22% of all third grade students scored in the “high risk” range on the easyCBM measure of oral reading fluency, 28% of students scored in the “some risk” range, and 50% of students scored in the “low risk range.” At School 2, 24% of all third grade students scored in the “high risk” range, 41% of students scored in the “some risk” range, and 35% of students scored in the “low risk” 85 range. Within the easyCBM system, a typical school should have 50% of students in the low risk range, as the goal is set based on the 50 th percentile. Based on this context, the performance of students at School 1 is consistent with other schools using the easyCBM system. School 2, on the other hand, is not consistent with other schools using easyCBM, as only 35% of students fell in the low risk range. Consequently, School 2 may represent a lower performing school system than other schools using easyCBM. In addition, statewide assessment data provide information about school functioning and context. For participating schools, the most recent statewide assessment data available to the public are from the 2010-2011 school year. At School 1, 85% of third grade students met or exceeded the standard on the state assessment in reading. At School 2, 91% of third grade students met or exceeded the standard. At the district level, 81% of students in grades 3-5 met or exceeded the standard. Consequently, participating schools represent slightly higher achievement on the state assessment than the district average. Characteristics of the Actual Sample All participating students were tested on the easyCBM Passage Reading Fluency (PRF) measure during the same two weeks of study data collection as a part of the school’s benchmarking process. These easyCBM PRF scores were used to: 1) validate the use of non-standardized scoring procedures for DIBELS Next Oral Reading Fluency, and 2) better understand the skill level of participating students. First, Pearson correlation coefficients were computed for passage rate scores (pro-rated whole passage rate and first minute only rate) and easyCBM benchmark rate scores. The strength and significance of these correlation coefficients supports the validity of the pro-rated rate 86 score, which was used for all subsequent analyses. Second, easyCBM PRF scores were sorted based on level of risk in order to describe the skill level of participating students. Of the 74 students included in analysis, 13 students performed in the high risk range on the winter easyCBM PRF assessment, which represents 18% of the sample. Thirty-two students performed in the some risk range on easyCBM PRF, which represents 43% of the sample. Finally, 29 students fell in the low risk range, which represents 39% of the sample. Additionally, these easyCBM PRF scores can be used to compare the final sample used for analysis with the sample of students that were excluded from analysis due to incomplete data. A total of nine students were excluded from the final sample due to incomplete data. This is approximately 11% of the students that were tested. Of these nine students, five have easyCBM PRF scores in the high risk range (56%), two had easyCBM PRF scores in the some risk range (22%), and two had easyCBM PRF scores in the low risk range (22%). Compared to the sample of students with complete data, a greater percentage of students with incomplete data fell in the high risk range (18% of complete sample, 56% of incomplete sample). Accordingly, the complete sample included a greater percentage of students performing in the some risk (43% of complete sample, 22% of incomplete sample) and low risk (29% of complete sample, 22% of incomplete sample) ranges than excluded students. This suggests that data were not missing at random, and that the final sample used for analysis may underrepresent low- performing students. 87 Data Transformations Two of the dependent variables, oral reading fluency accuracy and passage- specific comprehension, were measured using counts of correct or appropriate responses divided by the total possible number of responses, resulting in a proportion. Because these proportions were derived from counts, the homogeneity of variance assumption is violated. Additionally, oral reading fluency accuracy scores were negatively skewed, violating the assumption of normality (range across all passages: 0.80 to 1.00). In order to make the data better fit the assumptions of ANOVA, these scores were transformed using the arcsine square root transformation (McDonald, 2009). The arcsine square root transformation is appropriate for these scores as both accuracy scores and passage- specific comprehension scores were expressed as proportions and were constrained between the range of 0 and 1. These transformed values were then used for all ANOVAs. Descriptive statistics reported in Table 10 and Table 11 and scores presented in graphs were back transformed to proportion scores. Descriptive Statistics Before exploring evidence related to research questions, descriptive statistics were computed for each variable of interest (rate, accuracy, comprehension) for each passage. See Table 10 for descriptive data for the entire sample. Additionally, descriptives are provided by student skill level (determined by easyCBM risk level on the Passage Reading Fluency measure) in Table 11. As expected, students falling the low risk range on easyCBM earned higher rate and accuracy scores on study passages than students falling the some risk range, who earned higher scores than students falling in the high risk range. Differences in rate and 88 accuracy scores across groups were consistent across all passages. While differences are noted in comprehension scores across skill levels, standard deviations indicate that these may not be meaningful differences. Further analysis by skill level was not completed due to the small sample size of each group. Table 10. Descriptive Statistics for Rate (Pro-Rated Whole Passage), Accuracy, and Comprehension for all Passages Included in Study. Rate Accuracy Comprehension Passage M SD M SD M SD Narrative/High Referential Cohesion 94.55 38.05 0.96 0.01 0.23 0.02 Narrative/Low Referential Cohesion 88.28 33.16 0.95 0.01 0.24 0.03 Informational/High Referential Cohesion 90.97 33.71 0.95 0.01 0.17 0.02 Informational/Low Referential Cohesion 89.09 32.81 0.96 0.01 0.18 0.02 Note: N for all passages was 74. Accuracy and comprehension mean scores are expressed as proportions. Rate score represents words correct per minute based on pro-rated, whole passage reading. Intercorrelations In order to better understand relations between variables, intercorrelations for all dependent variables are reported in Table 12. These correlations indicate that, with the exception of the narrative/low referential cohesion accuracy and narrative/low referential cohesion comprehension scores, all scores are significantly correlated. This suggests that the measures may all be capturing a 89 related construct or constructs. Correlations between rate scores ranged from .97-.98, indicating strong alternate form reliability. Correlations between comprehension scores were lower but still fairly strong, ranging from .57-.66. This indicates good alternate form reliability, though not as strong as rate. Correlations between accuracy scores ranged from .69-.75, indicating strong alternate form reliability across passages. Across Table 11. Descriptive Statistics by Risk Level for Rate (Pro-Rated Whole Passage), Accuracy, and Comprehension for all Passages Included in Study. Low Risk Some Risk High Risk Passage N M SD N M SD N M SD Narrative/High Referential Cohesion Rate 29 131.17 29.19 32 75.56 15.68 13 49.77 13.08 Accuracy 29 0.98 0.00 32 0.96 0.00 13 0.92 0.01 Comprehension 29 0.28 0.02 32 0.20 0.02 13 0.20 0.01 Narrative/Low Referential Cohesion Rate 29 120.55 24.42 32 74.88 14.14 13 49.31 11.85 Accuracy 29 0.98 0.00 32 0.95 0.00 13 0.90 0.01 Comprehension 29 0.30 0.04 32 0.21 0.04 13 0.19 0.02 Informational/High Referential Cohesion Rate 29 123.34 24.73 32 78.75 13.61 13 48.85 12.40 Accuracy 29 0.97 0.01 32 0.95 0.00 13 0.91 0.01 Comprehension 29 0.22 0.01 32 0.14 0.02 13 0.14 0.01 Informational/Low Referential Cohesion Rate 29 120.31 24.19 32 77.25 13.73 13 48.62 13.85 Accuracy 29 0.98 0.00 32 0.95 0.00 13 0.91 0.01 Comprehension 29 0.22 0.03 32 0.17 0.02 13 0.14 0.01 Note: Accuracy and comprehension mean scores are expressed as proportions. Rate score represents words correct per minute based on pro-rated, whole passage reading. Risk levels determined by performance on winter easyCBM Passage Reading Fluency benchmark. 90 score types, rate and accuracy scores were moderately correlated, ranging from .58-.69. This is to be expected, as poor accuracy would affect a reader’s fluency score; however, strong accuracy alone does not insure a high rate score. Correlations between comprehension and rate (.25-.35) and comprehension and accuracy (.24-.33) were more Table 12. Intercorrelations Between Oral Reading Fluency Rate, Oral Reading Fluency Accuracy, and Passage-Specific Comprehension Scores for All Measures. N- LRC Rate I- HRC Rate I- LRC Rate N- HRC Comp N- LRC Comp I- HRC Comp I- LRC Comp N- HRC Acc N- LRC Acc I- HRC Acc I- LRC Acc N-HRC Rate .97 ** .98 ** .97 ** .27 * .32 ** .34 ** .33 ** .62 ** .69 ** .62 ** .66 ** N-LRC Rate .97 ** .97 ** .27 * .31 ** .35 ** .34 ** .58 ** .69 ** .58 ** .66 ** I-HRC Rate .97 ** .28 * .30 * .32 ** .34 ** .58 ** .67 ** .64 ** .64 ** I-LRC Rate .25 * .28 * .32 ** .31 ** .58 ** .68 ** .58 ** .69 ** N-HRC Comp .59 ** .66 ** .57 ** .27 * .24 * .31 ** .26 * N-LRC Comp .59 ** .59 ** .28 * .20 .29 * .25 * I-HRC Comp .52 ** .29 * .26 * .33 ** .32 ** I-LRC Comp .31 ** .32 ** .25 * .33 ** N-HRC Acc .75 ** .69 ** .70 ** N-LRC Acc .58 ** .80 ** N-HRC Acc .59 ** Note: N-HRC = Narrative/high referential cohesion passage, N-LC = Narrative/low referential cohesion passage, I-HRC = Information/high referential cohesion passage, I- LRC = Informational/low referential cohesion passage, Comp = Comprehension Score, Acc = Accuracy Score. Correlations flagged with * are significant at the p < .05 level. Correlations flagged with ** are significant at the p < .01 level. 91 modest. Oral Reading Fluency Rate In order to evaluate research questions 1, 4, and 7, a two-way ANOVA with dependent observations was performed with oral reading fluency rate (pro-rated, whole passage) as the dependent variable. It was hypothesized that students would read more correct words per minute on passages with high referential cohesion than passages with low referential cohesion. It was also hypothesized that referential cohesion and genre would interact; however, the nature of this interaction was not hypothesized. There was a significant interaction between genre and referential cohesion on oral reading fluency rate, F(1, 73) = 10.80, p < .05. See Table 13 for the ANOVA summary table. Table 13. Two-Way, Within-Subjects Analysis of Variance Summary Table for the Effect of Genre and Cohesion on Oral Reading Fluency Rate Source df SS MS F Genre 1 141.98 141.98 3.54 Genre*Subject 73 2927.77 40.11 Cohesion 1 1228.41 1228.41 27.33* Cohesion*Subject 73 3281.34 44.95 Genre*Cohesion 1 356.84 356.84 10.80* Genre * Cohesion * Subject 73 2211.91 33.04 Note: F values marked with a * are significant at the p < 0.05 level. This interaction effect was further evaluated with pairwise comparisons using the Bonferroni procedure to control family-wise Type I error at .05. Results indicate that rate 92 scores were significantly higher for the high cohesion narrative text (M = 94.55, SD = 38.05) than the low cohesion narrative text (M = 88.28, SD = 33.16). The effect size for this comparison is considered very small, d = 0.18, based on Cohen’s convention (1988, p. 49, equation 2.3.8). Cohen’s d was selected to measure effect size because it provides information about the magnitude of the effect, which can be used to interpret the practical significance of the findings. However, it is important to note that Cohen’s d may underestimate the strength of the effect for a power analysis as it does not take into consideration the correlation between the measures. The results of the second pairwise comparison indicated that rate scores were significantly higher for the high cohesion narrative text (M = 94.55, SD = 38.05) than the high cohesion informational text (M = 90.97, SD = 33.71). The effect size for this comparison is also considered very small, d = 0.10 (Cohen, 1988). The pairwise comparison of high cohesion informational text and low cohesion informational text was non-significant, as was the pairwise comparison of low cohesion narrative text and low cohesion informational text. See Figure 5 for illustration of the referential cohesion differences by genre. This interaction indicates that, when referential cohesion is high, rate is higher on narrative passages (M = 94.55, SD = 38.05) than informational passages (M = 90.97, SD = 33.71). Follow up pairwise comparisons indicate that there is not a significant effect of referential cohesion on informational text reading rate. These findings suggest that referential cohesion may provide greater support for student oral reading fluency rate for narrative passages, but less support for informational passages. Conversely, for passages with low cohesion, informational passages are read at about the same rate as narrative 93 passages. For passages with high cohesion, informational passages are read at a lower rate. Figure 5. Pairwise comparisons of interaction effects between referential cohesion and genre on oral reading fluency rate. Oral Reading Fluency Accuracy In order to evaluate research questions 2, 5, and 7, a two-way ANOVA with dependent observations was performed with oral reading fluency accuracy (based on the entire passage) as the dependent variable. It was hypothesized that students would read passages with high referential cohesion with greater accuracy than passages with low referential cohesion. It was also hypothesized that referential cohesion and genre would interact; however, the nature of this interaction was not hypothesized. The two-way ANOVA yielded a significant interaction between genre and cohesion on oral reading High Cohesion High Cohesion Low Cohesion Low Cohesion 70 75 80 85 90 95 100 Narrative Informational R a te ( w cp m ) Genre 94 fluency accuracy, F(1, 73) = 16.19, p < .05. See Table 14 for the ANOVA summary table. Table 14. Two-Way, Within-Subjects Analysis of Variance Summary Table for the Effect of Genre and Cohesion on Oral Reading Fluency Accuracy. Source df SS MS F Genre 1 0.01 0.01 2.83 Genre*Subject 73 0.15 0.00 Cohesion 1 0.00 0.00 0.05 Cohesion*Subject 73 0.23 0.00 Genre*Cohesion 1 0.03 0.03 16.19* Genre * Cohesion * Subject 73 0.12 0.00 Note: F values marked with a * are significant at the p < 0.05 level. This interaction effect was further evaluated with pairwise comparisons using the Bonferroni procedure to control family-wise Type I error at .05. Results indicate that accuracy scores were significantly higher for the high cohesion narrative text (M = .94, SD = .01) than the low cohesion narrative text (M = .95, SD = .01). The effect size for this comparison is considered small, d = 0.25 (Cohen, 1988). Additionally, accuracy scores were significantly higher for the high cohesion narrative text (M = .96, SD = .01) than the high cohesion informational text (M = .95, SD = .01). The effect size for this comparison is also considered small, d = 0.33 (Cohen, 1988). The pairwise comparison of high cohesion informational text and low cohesion informational text was non- significant, as was the pairwise comparison of low cohesion narrative text and low cohesion informational text. See Figure 6 for illustration of the referential cohesion 95 differences by genre. As with rate, these findings indicate that high referential cohesion was related to Figure 6. Pairwise comparisons of interaction effects between referential cohesion and genre on oral reading fluency accuracy. greater accuracy for narrative texts (M = .96, SD = .01), while low referential cohesion was related to greater accuracy for informational texts (M = .96, SD = .01). This means that students read narrative passages with high referential cohesion with a greater degree of accuracy than narrative passages with low referential cohesion. Pairwise comparisons indicate that there was no significant effect of referential cohesion on informational text accuracy. These findings suggest that referential cohesion supports greater accuracy of oral reading fluency for narrative passages but not informational passages. High Cohesion High Cohesion Low Cohesion Low Cohesion 85% 87% 89% 91% 93% 95% 97% 99% Narrative Informational A cc u ra cy Genre 96 Passage-Specific Reading Comprehension In order to evaluate research questions 3, 6, and 7, a two-way ANOVA with dependent observations was performed with the comprehension score (proportion of consistent responses) as the dependent variable. It was hypothesized that students would earn higher comprehension scores on passages with high referential cohesion than passages with low referential cohesion. It was also hypothesized that referential cohesion and genre would interact; however, the nature of this interaction was not hypothesized. No hypotheses about main effects for genre were made. Surprisingly, the interaction effect between referential cohesion and genre on passage-specific comprehension was non-significant. An evaluation of main effects indicated that the main effect of referential cohesion was non-significant. See Table 15 for the ANOVA summary table. Table 15. Two-Way, Within-Subjects Analysis of Variance Summary Table for the Effect of Genre and Cohesion on Passage-Specific Reading Comprehension Source df SS MS F Genre 1 0.40 0.40 44.61* Genre*Subject 73 0.65 0.01 Cohesion 1 0.03 0.03 2.08 Cohesion*Subject 73 0.87 0.01 Genre*Cohesion 1 0.00 0.00 0.02 Genre * Cohesion * Subject 73 0.71 0.01 Note: F values marked with a * are significant at the p < 0.05 level. This finding indicates that referential cohesion as measured in this study is not related to passage-specific comprehension. There was a significant main effect for genre 97 on passage-specific reading comprehension, F(1, 73) = 44.61, p < .05. Passage-specific comprehension scores were higher on narrative passages than informational passages. These findings indicate that students demonstrated significantly better comprehension of narrative passages (M = 0.24, SD = 0.02) than informational passages (M = 0.18, SD = 0.01). The effect size for this comparison is considered medium, d = 0.55, based on Cohen’s convention (1988). See Figure 7 for illustration of the main effect of genre on passage-specific comprehension (reported means are transformed back from the arcsine square root transformation used for analysis). Figure 7. Main effect of genre on passage-specific reading comprehension. Summary This study evaluated the effects of two qualitative independent variables with two levels – genre (narrative/information) and referential cohesion (high/low) on oral reading fluency rate, oral reading fluency accuracy, and passage-specific comprehension. Results indicate that genre and referential cohesion have an interaction effect on rate and 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Narrative Informational C o m p re h e n si o n S co re Genre 98 accuracy, with strongest performance on the high cohesion narrative text. Performance on high cohesion narrative text was significantly greater than low cohesion narrative text and high cohesion informational text. Surprisingly, there was no effect of referential cohesion on passage-specific comprehension, on informational text reading accuracy, or on informational text reading rate. For passage-specific comprehension, there was a main effect for genre, indicating that students performed better on a measure of passage- specific comprehension on narrative texts than informational texts. 99 CHAPTER V CONCLUSION Discussion The purpose of this study was to evaluate the effects of referential cohesion and passage genre on student reading proficiency (measured by oral reading fluency rate, accuracy, and passage-specific comprehension) within the context of curriculum-based measurement. The results of this study provide evidence that referential cohesion and genre affect student performance on oral reading fluency passages when readability is held constant. Specifically, these results indicate that high referential cohesion supports student rate and accuracy for narrative passages, but does not significantly increase oral reading fluency rate or accuracy for informational passages. As outlined in the model of relations, it was hypothesized that genre and referential cohesion would have direct effects on oral reading fluency rate, accuracy, and passage-specific comprehension. As indicated in Figure 8, the study design allowed for evaluation of direct effects of genre and referential cohesion on the dependent variables. Results are consistent with hypothesized direct relations between: 1) referential cohesion and rate, 2) referential cohesion and accuracy, 3) genre and rate, 4) genre and accuracy, and 5) genre and comprehension. Interpretations of non-significant relation between referential cohesion and comprehension. One potential interpretation of these findings is that referential cohesion may affect reading comprehension, but the selected measure failed to capture 100 Figure 8. Revisited model of relations between independent and dependent variables. Dashed arrows represent interaction effects. Solid black arrow represents a direct, main effect. these effects. While the recall task was selected only after careful consideration, it is possible that this task lacked the sensitivity to detect differences in comprehension performance. Reading comprehension is a large and complex construct, and existing technologies for measuring reading comprehension target individual features of understanding (such as the ability to retell a story using a high number of words or answer specific questions about events in the text). These challenges are articulated by Pearson & Hamm (2005): Comprehension…is a phenomenon that can only be assessed, examined, or observed indirectly. We talk about the “click” of comprehension that propels a reader through a text, yet we never see it directly…We quiz them on “the text” in some way – requiring them to recall its gist or its major details, asking specific questions about its content and purpose, or insisting on an interpretation and Referential Cohesion Genre Compre- hension Fluency Accuracy Independent Variables Dependent Variables Note. Dashed arrows represent interaction effects. Solid black arrow represents a direct, main ef ect. 101 critique of its message. All of these tasks, however challenging or engaging they might be, are little more than the residue of the comprehension process itself. (p. 14) This statement captures some of the challenges educators and researchers face in assessing reader comprehension of a text. As Pearson and Hamm (2005) argue, every measure of comprehension “carries with it a cost,” (p. 62) as researchers have yet to find a single measure that best captures the complex process of reading comprehension. For this study, a recall task was selected to measure the “residue” of the comprehension process; however, other measures may be used to measure passage-specific reading comprehension, such as multiple-choice or open-ended comprehension questions, cloze or maze procedures, or counts of words in a recall. It is possible that the selected comprehension measure was not sensitive to differences in passage-specific comprehension, or failed to capture the aspects of comprehension affected by referential cohesion; however, it is unknown whether other currently available alternatives would be any more sensitive. While it is possible that this measure was limited in sensitivity, it is also a strength of this design and was selected to provide the most sensitivity to effects possible, based on currently available technologies for measuring passage-specific comprehension. A second potential interpretation of these findings is that referential cohesion only affects comprehension enough to increase fluency, but does not impact global understanding of the passage. For example, referential cohesion may reduce student hesitations or re-reads of text necessary for understanding by making connections between ideas explicit. While this would still impact rate of reading, it may not directly 102 impact comprehension, as the compensatory strategies named (hesitating or re-reading) allow for comprehension of the meaning of the idea unit. If students were limited to reading only the first minute of text, the impact on rate may more directly affect comprehension, as slower rate would limit the amount of text available to comprehend. However, because students in this study read the entire passage, differences in rate did not limit student comprehension score (i.e., students were exposed to the entire passage, even if rate was slow). In order to evaluate this hypothesis, future research should adjust scoring criteria to capture hesitations and repetitions and evaluate differences between high and low cohesion passages. A third potential interpretation of these findings is that cohesion may impact comprehension, but that this effect is not stronger than individual and environmental contributors to reader comprehension. According to the members of the RAND Reading Study Group (2002), reader comprehension of text is based on three elements: the reader, the text itself, and the purpose for reading. The current study evaluated one feature (referential cohesion) of one of these elements (the text). While referential cohesion may have some effect on passage-specific comprehension, it is possible that this effect is overpowered by reader and environmental variables. For example, a reader might have a strong preference for narrative texts and consequently attend more to text meaning on narrative texts, which would overshadow any minimal benefits of referential cohesion. Potential effects of background knowledge. One challenge in interpreting any measure of passage-specific comprehension is the potential effects of reader background knowledge. This is especially relevant to the present study, as students were only presented with one passage per condition. While background knowledge is a reader 103 variable and is challenging to control, it has the potential to have a strong effect on student performance on measures of comprehension. Additionally, there is evidence to suggest that background knowledge also contributes to oral reading fluency rate (Klauda & Guthrie, 2008). Consequently, educators must consider how to develop and build upon background knowledge in instruction, as well as identify how background knowledge may impact assessment tools. Cohesion and grade level. The present study only evaluated the effects of referential cohesion for third grade students. While referential cohesion did appear to be important in the oral reading fluency of third grade students, it is possible that referential cohesion may impact performance differentially by grade level. For example, referential cohesion may have a stronger effect on reading performance in the early grades, because readers are still learning the alphabetic code and may rely on context clues to supplement limited decoding skills. Similarly, referential cohesion may be less important in later grades, as increased background knowledge and other student-level factors may have a greater impact on reading proficiency. Additional work is needed to understand whether effects can be generalized to the larger population of school-aged children. Implications Implications for instruction. The results of this study suggest that readers are impacted by the referential cohesion and genre of a passage, indicating a need for targeted instruction in approaching texts that may not inherently support fluent reading. Students may benefit from exposure to texts with high cohesion as well as low cohesion, so that students have strategies for decoding and understanding challenging texts when faced with when reading to learn and during assessment. As proposed in the Common 104 Core Standards (2010), educators should systematically introduce texts of higher complexity, including texts with lower referential cohesion. Students may benefit from instruction in explicitly identifying cohesive ties and using these ties to track passages meaning. Additionally, results indicate significant differences in student reading comprehension of narrative and informational texts, indicating that general comprehension instruction may be insufficient in supporting readers to comprehend various types of texts. Readers may benefit from comprehension instruction targeted to specific genres of texts; while general strategies may apply to all texts, students may need additional support in using effective strategies to extract information from informational texts. For example, students may benefit from instruction focused on purposes of reading (i.e., reading for enjoyment rather than reading for information), using structural elements in informational texts to identify information (e.g., table of contents, headings, tables and figures), and self-questioning strategies specific to informational texts (e.g., “What was this section about?” “What new words did I learn and what do they mean?”). Implications for curriculum-based measurement. Analyses of student rate and accuracy scores indicate significant differences between passages due to referential cohesion and genre. However, it is necessary to consider the practical significance of these differences in interpretation of variability in CBM scores. Results of pairwise comparisons indicate significant differences in rate scores between the narrative/high referential cohesion passage and narrative/low referential cohesion passage (94.55 wcpm and 88.28 wcpm), as well as significant differences in rate scores between the narrative/ high referential cohesion passage and the informational/high referential cohesion passage (94.55 wcpm and 90.97 wcpm). While these differences were statistically significant, 105 consideration of effect sizes indicates that these effects are very small. Additionally, it is necessary to consider how differences of this magnitude affect instructional practices. Based on normative growth rates, a school may set a goal in which a student’s rate score increases by two words per week. With such a goal, differences in rate scores from five to seven wcpm would affect educator interpretation of student oral reading fluency rate. However, the SEM for third grade for third grade DIBELS Next passages is reported as 11.29, indicating that the 95% confidence interval for a given rate score is +/- 22 wcpm (while the present study used non-standardized scoring procedures, the resulting scores were strongly correlated with standardized scores, so the SEM is likely still applicable). Therefore, differences in rate scores between passages of five to seven wcpm fall within the SEM of the third grade DIBELS Next passages. While such differences may have a meaningful impact on educator interpretation of student oral reading fluency skills, they are within the expected range for the selected passages. The practical significance of differences in accuracy scores may be even more limited. Results of pairwise comparisons indicate significant differences in accuracy scores between the narrative/high referential cohesion passage and narrative/low referential cohesion passage (96% accuracy and 95% accuracy), as well as significant differences in accuracy scores between the narrative/high referential cohesion passage and the informational/high referential cohesion passage (96% accuracy and 95% accuracy). While these differences were statistically significant, effect sizes were small, indicating that differences were significant but minimal. Additionally, in practice there is little meaningful significance between 95% and 96% accuracy. Based on current research on oral reading fluency accuracy, both represent scores on texts that would be 106 considered at a student’s instructional reading level (Hasbrouck, 1998). Consequently, differences in performance on oral reading fluency accuracy due to referential cohesion and genre will likely have little impact on educator interpretation of scores. In addition to interaction effects on oral reading fluency rate and accuracy, the main effect of genre on comprehension may have an impact on assessment practices. Results of this study indicate a significant difference in passage-specific reading comprehension due to genre, with readers earning higher comprehension scores on narrative passages. The effect size of this difference was considered a medium effect (d = 0.55), indicating moderate practical significance. While existing CBM systems do not currently use the coded recall task for comprehension, many systems do integrate a measure of comprehension into the battery of benchmark measures (e.g., DIBELS Next Oral Reading Fluency-Recall and easyCBM Multiple Choice Reading Comprehension). Consequently, equivalency of alternate forms has implications for comprehension as well as oral reading fluency rate and accuracy. The results of the present study suggest that passage genre may impact equivalency of alternate forms in measuring comprehension. Specifically, these results indicate that students perform significantly better on narrative passages than informational. This poses a challenge for CBM development, as readers may comprehend and respond to narrative and informational texts differently. Consequently, test developers must consider means of reducing variability in student comprehension of narrative and informational passages. One approach, which expands upon the present study, is to continue to explore variables that may impact comprehension and include these variables in estimates of text complexity. A second approach is to administer both a narrative and an informational passage at each data 107 point, and compare performance within genres rather than between. Further research is necessary to evaluate the feasibility and practical benefits of these and other approaches to controlling for genre differences in reader comprehension. Implications for measurement of text complexity. These findings suggest that genre and referential cohesion may contribute to the complexity of reading curriculum- based measurement oral reading fluency passages. Because Lexile (readability) scores were held constant across passages, these results indicate that Lexile scores did not entirely capture differences in passages due to genre and referential cohesion. While passage differences are inevitable in alternate forms, the results of this study indicate that such differences may have a meaningful impact on student performance on measures of oral reading fluency rate, accuracy, and comprehension. These findings provide evidence that referential cohesion and genre contribute to reading rate and accuracy; consequently, there may be a benefit in considering referential cohesion in estimates of passage complexity. Additionally, this study presents a method of quantifying the effects of referential cohesion that is (a) feasible, and (b) sensitive enough to capture differences in passages. First, the use of a referential cohesion composite score to capture referential cohesion is easily accessible and feasible. The Coh-Metrix program is available to the public, and allows passages to be analyzed easily and quickly. The variables related to referential cohesion can be combined into a composite score using commercially available software, allowing for the measurement of the referential cohesion of selected passages. Second, these referential cohesion scores were capable of distinguishing between high and low cohesion passages, even within a set of passages that was designed to tightly control text 108 complexity. This is evidenced by differences in student performance as a function of this variable. Future research should further evaluate the validity of the RCCS in distinguishing between high and low cohesion passages by comparing the RCCS to expert ratings of the cohesiveness of selected passages. Furthermore, these findings indicate that qualitative evaluation of text complexity may fail to fully capture the contribution of referential cohesion. Traditionally, the cohesiveness of a passage has been perceived as a qualitative feature of text, best evaluated through expert judgment and discussion (Common Core Standards, 2010). The text complexity of the selected passages was evaluated primarily by quantitative analyses (readability ratings); however, the authors of the measure report that anecdotal information was included in the overall assessment of passage difficulty. Based on the recommendations of the Common Core Standards, this qualitative analysis should be sufficient in capturing the variability due to text cohesion. However, the results of this study indicate that this qualitative analysis did not capture differences in the referential cohesion of the selected passages. As these differences were found to impact student reading performance, the ability to differentiate between highly cohesive and less cohesive passages appears to be an important feature of text complexity. Additional research is needed to determine if targeted qualitative analysis focused on referential cohesion can capture meaningful differences; however, these results suggest that current methods of qualitative analysis did not differentiate between the selected passages even though these differences were related to changes in student reading performance. 109 Study Limitations One important limitation of the design is that each condition was represented by only a single passage. Consequently, it is difficult to know if the differences captured in analysis are due to condition or unique passage effects because the two are confounded. It is possible that there were characteristics of the selected passages that were not fully explained by Lexiles, genre, or referential cohesion estimates that impacted reading performance. Specifically, significant effects were associated with the high cohesion narrative passage; it is possible that there is something about that specific passage that aided in oral reading fluency rate and accuracy, in addition to or instead of high referential cohesion. It is also possible that readers responded to this specific passage differently than other passages, perhaps due to reader factors such as interest and background knowledge. Additional research should include additional passages for each condition in order to minimize passage effects due to text factors unrelated to referential cohesion. A second limitation of this study is that it did not examine differential effects by skill level as an effect in the design. There is evidence to suggest that the effects of text cohesion on comprehension may vary based on reader proficiency (e.g., O’Reilly & McNamara, 2007). This design did not explore that issue and instead evaluated the effects of referential cohesion on readers across skill levels; however, it is possible that findings may not be applicable to subsets of students with very high or very low skills. Student skill level was not selected for inclusion in the study design because the focus of this work is text-based contributors to text complexity in the context of curriculum-based measurement (CBM). Because the same CBM passages are administered to all students 110 regardless of skill level, student skill level was not included as a central independent variable in this design. However, future research should evaluate the role of student skill level in text complexity both in CBM passages and other types of reading assessments. A third limitation is that the administration procedures the DIBELS passages and scoring of oral reading fluency rate did not follow standardized procedures, which may affect the ability to generalize results to general outcome progress monitoring. Correlations between the non-standardized rate score and first minute and easyCBM scores were strong, suggesting that the non-standardized, pro-rated procedure did not compromise the validity of the rate measure; however, follow up studies that more closely resemble standard CBM administration and scoring will be necessary to support the effects of genre and referential cohesion in educational practice. A fourth limitation of this study design is that passages are controlled to only represent readability within a constrained range. It is possible that results will vary if readability scores were held constant at a lower or higher range (i.e., more or less readable texts). Consequently, future research should evaluate the role of referential cohesion on comprehension and reading rate for highly readable and less readable passages. A fifth limitation is that passages were selected from a small sample of texts (third grade DIBELS Next Oral Reading Fluency passages), and effects may be sample- specific. Referential cohesion levels (high and low) were assigned based on ratings from the included set of passages; it is possible that other samples of passages (e.g., DORF passages from other grade levels or oral reading fluency passages from other curriculum- based measurement systems) may represent more or less variability in referential 111 cohesion scores. Future research should replicate the procedures for measuring referential cohesion used in this study with other sets of passages to evaluate the generalizability of these results. A sixth limitation is the use of a coded recall task to measure comprehension. As discussed in Chapter 3, this measure may fail to capture meaningful differences in passage-specific comprehension. The use of different scoring schemes may yield different results, as might the use of different measures of passage-specific comprehension. Additionally, recall tasks rely on oral language skills, so it is possible that performance on the recall task was confounded with oral language ability. However, it is important to note that reading comprehension remains a difficult construct to measure across the field, and the selected measure was determined to be the best available tool to measure passage-specific reading comprehension. A final limitation of the study design involves data collection and coding procedures. While inter-observer agreement data were collected on oral reading fluency scores, these data do not verify the procedural fidelity of administration of the oral reading fluency and passage-specific comprehension measures. The study would have been strengthened by the use of a procedural fidelity data collection tool, such as a checklist, to assess data collector fidelity to standardized data collection procedures. Additionally, all coding of student recalls was completed by a single researcher. Consequently, the reliability of the assigned codes is unknown. This may be remedied post-hoc by having a second researcher trained in the coding scheme verify coding of a proportion of recalls. 112 Next Steps Replication. Replication allows for limitations to be addressed through small changes in study design. Perhaps most critical, replication is needed with more than one passage per study condition. As discussed above, it is possible that effects were due to differences in individual passages that were not captured by Lexile scores, genre, or referential cohesion measurement. Consequently, future work should include multiple passages in each condition. In order to address possible differential effects by skill level, future work should include student skill level as an independent variable in the study design. Effects should be evaluated across the entire sample and by skill group in order to identify potential differences in performance. Similarly, future research should expand the readability level of passages included in the study design. Not only did this study focus on a specific grade-level, but within that level passages were selected due to similarity of Lexile scores. Future work should replicate the basic study design using a greater range of passages, including passages and participants at various grade-levels, as well as passages within each grade-level with higher and lower Lexile scores. Future research should also replicate the study design using passages from a different source, to evaluate whether effects are isolated to the passages included in this study. Such research would allow for the generalization of effects beyond third grade DORF passages within a specified range of Lexile scores. Future directions for measurement of referential cohesion. The results of this study suggest that it may be worthwhile to continue to explore means of measuring the referential cohesion of passages selected for assessment and instruction. This study 113 design presented one means of quantifying referential cohesion as a composite of the various devices within a text that support continuity of reference. Study results suggest that this composite may be sensitive to some differences in referential cohesion, possibly due to the inclusion of multiple devices that effect referential cohesion. It is recommended that future research continue to explore the use of a composite score, as individual devices may support referential cohesion while failing to capture the contributions of other devices. For example, two texts may represent similar levels of referential cohesion, but such cohesion may be accomplished in various ways. While one passage may maintain continuity of reference through the use of adjacent argument overlap, the second task may accomplish strong referential cohesion through a completely different device, adjacent anaphor overlap. The use of a composite score allows the referential cohesion of these passages to be compared, even though referential cohesion is maintained through different means. However, the measurement of individual devices may have value in understanding why a text has strong or weak referential cohesion. As in previous work on the effects of cohesive devices on reading comprehension (see Table 2), an examination of individual devices may allow educators and passage developers to revise texts to better support reader oral reading fluency rate and accuracy. While the selected referential cohesion composite score demonstrates promise in the measurement of referential cohesion, future work should evaluate alternative means of creating a composite score to capture the referential cohesion of a passage. For example, future work should consider the weighting of the various cohesive devices in the creation of the composite score. For this research, all device scores were weighted 114 equally. However, the composite score may be strengthened if weighting of the individual devices were driven by a theory on relations between these devices. Additionally, future research should evaluate whether the inclusion of all devices in the composite score is necessary. In previous work, Hiebert (2011) created a referential cohesion composite score using only the argument overlap and stem overlap variables. Consequently, future work should evaluate whether the inclusion of additional devices contributes to the sensitivity to the composite score. Finally, future work on the measurement of referential cohesion should consider the use of an external criterion for determining whether a passage represents high or low referential cohesion. In the present work, referential cohesion composite scores (RCCS) ranged from -1.12 to 3.17. However, it is unknown how this range of cohesion scores should be interpreted. Instead, this range of scores poses a number of questions that impact interpretation: Do these scores represent a wide or narrow range of referential cohesion? How do these scores compare to other methods of measuring referential cohesion? One means of beginning to address these questions is to compare quantitative estimates of referential cohesion to qualitative evaluation. For example, expert reviewers may assign referential cohesion ratings to passages, which can be compared to the RCCS. While this process alone would be insufficient to understand the range of the RCCS, it may help researchers understand if quantitative differences are detected and identified as meaningful through qualitative review. Future directions in measurement of comprehension. It is recommended that future work continue to evaluate methods of measuring reading comprehension in relation to referential cohesion. One possibility is to explore secondary analysis of recall 115 data based on the comprehension codes assigned to recalled idea units. Rather than evaluating comprehension as a single score, future research may examine proportion or frequency scores for each type of recall response (conservative, liberal, no match- consistent, no match-inconsistent). In particular, future research should focus on differences in the no match-consistent score, as it is possible that highly cohesive texts support deeper comprehension, which may be captured by the extent to which a reader goes beyond what is stated explicitly in the text. Additionally, future work should evaluate alternative methods of coding recalled responses. With the selected coding scheme, recalled idea units that were matched to an idea unit could only be assigned one of two codes: conservative and liberal. However, a qualitative examination of recalled idea units suggests variability in the types of responses within each code. In particular, the liberal code captured all responses that could be matched to an idea unit but did represent a near verbatim recall of all relevant details. For example, responses for the original idea unit “Every day, Carrie and her teenage brother Jackson explored a new part of the preserve,” included: “every day they liked to go to a hike,” and “Carrie and her brother, Jackson, were going to take hikes at this preserve they found.” Both of these responses capture different components of the original idea unit – the first captures that the siblings hikes every day, while the second omits “every day” but includes the detail that the siblings hike in the preserve. One alternative to the selected coding scheme is to assign a rating on an ordinal scale to each recalled idea unit based on alignment to the original idea unit. Future work should also include an additional measure of comprehension in order to verify results using the coded recall. Consistent findings would provide support for the 116 use of a coded recall task as a means of measuring passage-specific comprehension with sensitivity. Inconsistent findings – specifically, significant relations between referential cohesion and reading comprehension – would indicate a need for further explanation of potential relations between referential cohesion and reading comprehension. Summary The complexity of text, which is defined by The Common Core Standards in English and Language Arts (2010) as the “inherent difficulty of reading and comprehending a text combined with consideration of reader variables (Glossary, p. 43),” has a number of implications for educators in the areas of instruction and assessment. Understanding and capturing the components that contribute to text complexity has implications for both instruction and assessment. Instructionally, the Common Core Standards Initiative (2010) stresses that students develop skills to be able to read and comprehend texts of increasing complexity as they progress through school. This expectation is based on data documenting the importance of comprehending complex texts in college and the workplace. In assessment, knowledge and understanding of text complexity has implications for both summative and formative assessment. For summative assessments such as state accountability tests, understanding of text complexity may help to improve test construction and interpretation. For formative assessment, controlling text complexity is critical in facilitating accurate individual decisions. Additionally, improved measures of text complexity will facilitate the development of better progress monitoring materials. This study focused on the role of text complexity in assessment, specifically, formative assessment. Text complexity is particularly important in formative 117 assessments because such assessments utilize repeated, alternate, equivalent forms to capture student growth towards a general outcome or goal, and a key assumption of such tools is that alternate forms of the assessment are of equal complexity. Consequently, there is a need to better understand what variables contribute to text complexity, and how they impact student performance on formative assessments. This study was designed to evaluate features of text that are not typically included in readability estimates but may contribute to the complexity of the passage: passage genre and text cohesion. Specifically, the study evaluated the role of text cohesion and genre on student oral reading fluency (reading with sufficient rate and accuracy) and comprehension performance, for the purpose of enhancing the utility and precision of formative assessment tools. Research questions addressed main effects for text cohesion and genre on reading rate, accuracy, and comprehension, and interactions between passage genre and text cohesion. Univariate ANOVAs allowed for evaluation of direct effects of genre and referential cohesion on oral reading fluency rate, accuracy, and passage-specific comprehension. Results indicated effects for each of the dependent variables included in the study design. For oral reading fluency rate, results indicate a significant interaction between genre and referential cohesion on rate: when referential cohesion was high, rate was higher on narrative passages than informational passages. Follow up pairwise comparisons indicated that rate scores were significantly higher for the high cohesion narrative text than the low cohesion narrative text and the high cohesion informational text. For oral reading fluency accuracy, results also suggest a significant interaction between genre and referential cohesion on accuracy: high referential cohesion was related 118 to greater accuracy for narrative texts, while low referential cohesion was related to greater accuracy for informational texts. As with rate, pairwise comparisons indicated that accuracy scores were significantly higher for the high cohesion narrative text than the low cohesion narrative text and the high cohesion informational text. For passage- specific reading comprehension, there were no significant effects of referential cohesion. There was a significant main effect of genre on comprehension, with students performing significantly better on the passage-specific comprehension measure for narrative texts than informational texts. Altogether, these results indicate direct relations between both genre and referential cohesion on student reading performance. The presence of these relations has implications for the development and interpretation of formative assessment tools. These findings indicate that genre and referential cohesion have a significant impact of student reading performance, and may contribute to complexity of reading CBM passages. Consequently, there is evidence that these features of text should be considered in estimates of text complexity. Additionally, this study provides evidence that referential cohesion may be able to be measured quantitatively. The metric used in this study, the Referential Cohesion Composite Score (RCCS), was easily developed using readily available technologies, and can be used to measure the referential cohesion of any set of passages. Results of this study indicate that the RCCS was able to differentiate passages with high and low referential cohesion, and that those differences were related to differences in oral reading fluency and accuracy. 119 REFERENCES CITED ACT, Inc. (2006). Reading between the lines: What the ACT reveals about college readiness in reading. Iowa City, IA: Author. Albano, A. D., & Rodriguez, M. C. (2012). Statistical equating with measures of oral reading fluency. Journal of School Psychology, 50, 43-59. Anderson, R. C., & Pearson, P. D. (1984). A schema-theoretic view of basic processes in reading comprehension. Handbook of reading research, 1, 255-291. Archer, A. L., Gleason, M. M., & Vachon, V. L. (2003). Decoding and fluency: Foundation skills for struggling older readers. Learning Disability Quarterly, 26, 89-101. Ardoin, S. P., Williams, J. C., Christ, T. J., Klubnik, C., & Wellborn, C. (2010). Examining readability estimates’ predictions of students’ oral reading rate: Spache, Lexile, and Forcast. School Psychology Review, 39, 277-285. Baker, D. L., Stoolmiller, M., Good, R. H., & Baker, S. K. (2011). Effect of reading comprehension on Passage Fluency in Spanish and English for second-grade English learners. School Psychology Review, 40, 331-351. Beck, I. L., McKeown, M. G., Omanson, R. C., & Pople, M. T. (1984). Improving the comprehensibility of stories: The effects of revisions that improve coherence. Reading Research Quarterly, 19, 263-277. Beck, I. L., McKeown, M. G., Sinatra, G. M., & Loxterman, J. A. (1991). Revising social studies text from a text-processing perspective: Evidence of improved comprehensibility. Reading Research Quarterly, 251-276. Beers, S. F., & Nagy, W. E. (2009). Syntactic complexity as a predictor of adolescent writing quality: Which measures? Which genre? Reading and Writing, 22, 185- 200. Best, R. M., Floyd, R. G., & McNamara, D. S. (2008). Differential competencies contributing to children’s comprehension of narrative and expository texts. Reading Psychology, 29, 137-164. Best, R., Ozura, Y., Floyd, R. G., & McNamara, D. S. (2006,). Children’s text comprehension: effects of genre, knowledge, and text cohesion. In Proceedings of the 7 th international conference on learning sciences (pp. 37-42). International Society of the Learning Sciences. 120 Briggs, R. N. (2011). Investigating variability in student performance on DIBELS oral reading fluency third grade progress monitoring probes: Possible contributing factors (unpublished doctoral dissertation). University of Oregon, Eugene, OR. Britton, B. K., & Gulgoz, S. (1991). Using Kintsch’s computational model to improve instructional text: Effects of repairing inference calls on recall and cognitive structures. Journal of Educational Psychology, 83, 329-345. Cervetti, G. N., Bravo, M. A., Hiebert, E. H., Pearson, P. D., & Jaynes, C. A. (2009). Text genre and science content: Ease of reading, comprehension, and reader preference. Reading Psychology, 30, 487-511. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2 nd Ed.). Hillsdale, NJ: Lawrence Earlbaum Associates. Cohen, S. A., & Steinberg, J. E. (1983). Effects of three types of vocabulary on readability of intermediate grade science textbooks: An application of Finn's transfer feature theory. Reading Research Quarterly, 19, 86-101. Common Core State Standards Initiative. (2010). Common core state standards for English language arts & literacy in history/social studies, science, and technical subjects. Washington, DC: National Governors Association Center for Best Practices and the Council of Chief State School Officers. Crossley, S. A., Greenfield, J,. & McNamara, D. S. (2008). Assessing text readability using cognitively based indices. TESOL Quarterly, 42, 475-493. Dale, E., & Chall, J. S. (1948). A formula for predicting readability. Educational Research Bulletin, 27, 11-28. Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American psychologist, 34, 571-582. Deno, S. L. (2003). Developments in curriculum-based measurement. Journal of Special Education, 37, 184-192. Deno, S. L., & Marston, D. (2006). Curriculum-based measurement of oral reading: An indicator of growth in fluency. In S. J. Samuels & A. E Farstrup (Eds.), What research has to say about fluency instruction (pp. 179-203). Newark, DE: International Reading Association. Duran, N. D., Bellissens, C. Taylor, R. S., & McNamara, D. S. (2007). Quantifying text difficulty with automated indices of cohesion and semantics. Proceedings of the 29th Annual Meeting of the Cognitive Science Society. Austin, TX: Cognitive Science Society. 121 Ehri, L. C. (2005). Learning to read words: Theory, findings, and issues. Scientific Studies of Reading, 9, 167-188. Elfenbein, A. (2011). Research in text and the uses of coh-metrix. Educational Researcher, 5, 246-248. Florida Center for Reading Research (2006). Empowering teachers. Retrieved from http://www.fcrr.org/assessment/et/resources/glossary3.html Foorman, B. R. (2009). Text difficulty in reading assessment. In E. H. Hiebert (Ed.), Reading more, reading better (pp. 231-250). New York, NY: Guilford. Francis, D. J., Santi, K. L., Barr, C., Fletcher, J. M., Varisco, A., & Foorman, B. R. (2008). Form effects on the estimation of students’ oral reading fluency using DIBELS. Journal of School Psychology, 46, 315-342. Freebody, P., & Anderson, R. C. (1983). Effects of vocabulary difficulty, text cohesion, and schema availability on reading comprehension. Reading Research Quarterly, 18, 277-294. Fuchs, L. S., Fuchs, D., Hosp, M. K., & Jenkins, J. R. (2001). Oral reading fluency as an indicator of reading competence: A theoretical, empirical, and historical analysis. Scientific Studies of Reading, 5, 239-256. Good, R. H., Kaminski, R. A., Cummings, K., Dufour-Martel, C., Petersen, K., Powell- Smith, K., Stollar, S., & Wallin, J. (2011). DIBELS next. Eugene, OR: Dynamic Measurement Group. Good, R. H., Kaminski, R. A., Dewey, E. N., Wallin, J., Powell-Smith, K. A., & Latimer, R. J. (2011). DIBELS next technical manual. Eugene, OR: Dynamic Measurement Group. Graesser, A.C., & McNamara, D. S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3(2), 371-398. Graesser, A. C., McNamara, D., S., & Kulikowich, J. M. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40, 223-234. Graesser, A. C., McNamara, D., S., Louwerse, M. M., & Cai, Z. (2004). Coh-metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36, 193-202. Graves, M. F., & Graves, B. B. (2003). Scaffolding reading experiences: Designs for student success (2nd ed.). Norwood, MA: Christopher-Gordon. 122 Greenfield, G. (1999). Classic readability formulas in an EFL context: Are they valid for Japanese speakers? Unpublished doctoral dissertation, Temple University, Philadelphia, PA. Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London, England: Longman. Hasbrouck, J. E. (1998). Reading fluency: Principles for instruction and progress monitoring. Professional Development Guide. Austin, TX: Texas Center for Reading and Language Arts, University of Texas at Austin. Hiebert, E. H. (1998). Text matters in learning to read (Report 1-001). Ann Arbor, MI: Center for the Improvement of Early Reading Achievement. Hiebert, E.H. (2001). Standards, assessment, and text difficulty. In A.E. Farstrup & S.J. Samuels (Eds.) What research has to say about reading instruction (3rd Ed.). Newark, DE: International Reading Association. Hiebert, E. H. (2011). Using multiple sources of information in establishing test complexity (Reading Research Report 11.03). Santa Cruz, CA: TextProject, Inc. Hiebert, E. H., & Fisher, C. W. (2007). Critical word factor in texts for beginning readers. Journal of Educational Research, 101, 3-11. Hiebert, E.H., & Pearson, P.D. (2010). An examination of current text difficulty indices with early reading texts (Reading Research Report 10.01). Santa Cruz, CA: TextProject, Inc. Jenkins, J. R., Fuchs, L. S., van den Broek, P., Espin, C., & Deno, S. L. (2003a). Accuracy and fluency in list and context reading of skilled and RD groups: Absolute and relative performance levels. Learning Disabilities Research and Practice, 18, 237-245. Jenkins, J. R., Fuchs, L. S., van den Broek, P., Espin, C., & Deno, S. L. (2003b). Sources of individual differences in reading comprehension and reading fluency. Journal of Educational Psychology, 95, 719-729. Jungjohann, K. (2010, January). Reading and writing in the content area. Lecture conducted at the University of Oregon, Eugene, OR. Kame’enui, E. J., & Simmons, D. C. (2001). Introduction to this special Issue: The DNA of reading fluency. Scientific Studies of Reading, 5, 203-210. Kendeou, P., & van den Broek, P. (2005). The effects of readers’ misconceptions on comprehension of scientific text. Journal of Educational Psychology, 97, 235- 245. 123 Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text comprehension and production. Psychological Review, 85, 363–394. Klare, G. R. (1974). Assessing readability. Reading Research Quarterly, 10, 62-102. Klauda, S. L., & Guthrie, J. T. (2008). Relationships of three components of reading fluency to reading comprehension. Journal of Educational Psychology, 100(2), 310-321. Ledoux, K., Traxler, M. J., & Swaab, T. Y. (2007). Syntactic priming in comprehension: evidence from event-related potentials. Psychological Science, 18(2), 135-143. Lennon, C., & Burdick, H., (2004). The Lexile framework as an approach for reading measurement and success [white paper]. Retrieved from http://www.learningwithjamesgentry.com/Resources/TAKSDP/Lexile-Reading- Measurement-and-Success-0504.pdf Linderholm, T., & van den Broek, P. (2002). The effects of reading purpose and working memory capacity on the processing of expository text. Journal of Educational Psychology, 94, 778-784. Lynch, J. S., & van den Broek, P. (2007). Understanding the glue of narrative structure: Children's on-and off-line inferences about characters’ goals. Cognitive Development, 22, 323-340. Magliano, J. P., & Millis, K. K. (2003). Assessing reading skill with a think-aloud procedure and latent semantic analysis. Cognition and Instruction, 21, 251-238. McDonald, J. H. (2009). Handbook of biological statistics (2nd ed.). Sparky House Publishing: Baltimore, MD. McKeown, M. G., Beck, I., L., Sinatra, G. M., & Loxterman, J. A. (1992). The contribution of prior knowledge and coherent text to comprehension. Reading Research Quarterly, 27, 78-93. McMaster, K. L., van den Broek, P., Espin, C. A., White, M. J., Rapp, D. N., Kendeou, P., Bohn-Gettler, C. M., & Carlson, S. (2012). Making the right connections: Differential effects of reading intervention for subgroups of comprehenders. Learning and Individual Differences, 22, 100-111. McNamara, D. S. (2001). Reading both high-coherence and low-coherence texts: Effects of text sequence and prior knowledge. Canadian Journal of Experimental Psychology, 55, 51-62. 124 McNamara, D. S., Graesser, A. C., Cai, Z., & Kulikowich, J. M. (2011). Coh-metrix easability components: Aligning text difficulty with theories of text comprehension. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. McNamara, D. S., & Kintsch, W. (1996). Learning from texts: Effects of prior knowledge and text coherence. Discourse Processes, 22, 247-288. McNamara, D. S., Kintsch, E., Songer, N. B., & Kintsch, W. (1996). Are good texts always better? Text coherence, background knowledge, and levels of understanding in learning from text. Cognition and Instruction, 14, 1-43. McNamara, D.S., Louwerse, M.M., Cai, Z., & Graesser, A. (2005). Coh-Metrix version 1.4. Retrieved from http//:cohmetrix.memphis.edu. McNamara, D. S., Louwerse, M. M., McCarthy, P. M., & Graesser, A. C. (2010). Coh- metrix: Capturing linguistic features of cohesion. Discourse Processes, 47, 292- 330. MetaMetrix (2013). Lexile-to-grade correspondence. Retrieved from http://www.lexile.com/about-lexile/grade-equivalent/grade-equivalent-chart/ Morris, J., & Hirst, G., (1991). Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17, 21-48. National Assessment Governing Board (2008). Reading Framework for the 2009 National Assessment of Educational Progress. Washington, DC: American Institutes of Research. O’Reilly, T., & McNamara, D. S. (2007). Reversing the reverse cohesion effect: Good texts can be better for strategic, high-knowledge readers. Discourse Processes, 43, 121-152. Ozuru, Y., Briner, S., Best, R., & McNamara, D. S. (2010). Contributions of self- explanation to comprehension or high- and low-cohesion texts. Discourse Processes, 47, 641-667. Ozuru, Y., Dempsey, K., & McNamara, D. S. (2009). Prior knowledge, reading skill, and text cohesion in the comprehension of science texts. Learning and Instruction, 19, 228-242. Parker, R. I., Vannest, K. J., Davis, J. L., & Clemens, N. H. (2010). Defensible progress monitoring data for medium- and high-stakes decisions. Journal of Special Education, XX(X), 1-11. 125 Pearson, P. D. (1974). The effects of grammatical complexity on children's comprehension, recall, and conception of certain semantic relations. Reading Research Quarterly, 155-192. Pearson, P. D., & Hamm, D. N. (2005). The assessment of reading comprehension: A review of practices – Past, present, and future. In S. G. Paris & S. A. Stahl (Eds.), Children’s reading comprehension and assessment. Mahwah, NJ: Lawrence Erlbaum. Pikulski, J.J., & Chard, D.J. (2005). Fluency: Bridge between decoding and reading comprehension. The Reading Teacher, 58(6), 510–519. Posner, M. I., & Snyder, C. R. R. (1975). Facilitation and inhibition in the processing of signals. In P. M. A. Rabbitt & S. Dornic (Eds.), Attention and performance V (pp. 669-682). New York, NY: Academic Press. Powell-Smith, K. A., Good, R. H., & Atkins, T. (2010). DIBELS next oral reading fluency readability study (Technical Report No. 7). Eugene, OR: Dynamic Measurement Group. RAND Reading Study Group. (2002). Reading for understanding: Toward an R&D program in reading comprehension. Santa Monica, CA: RAND. Risko, V. J., & Walker-Dalhouse, D. (2011). Drawing on text features for reading comprehension and composing. The Reading Teacher, 64, 376-378. Roberts, R., Good, R., & Corcoran, S. (2005). Story retell: A fluency-based indicator of reading comprehension. School Psychology Quarterly, 20, 304-317. Saenz, L. M., & Fuchs, L. S. (2002). Examining the reading difficulty of secondary students with learning disabilities: Expository versus narrative text. Remedial and Special Education, 23, 31-41. Stage, S. A., & Jacobsen, M. D., (2001). Predicting student success on a state-mandated performance-based assessment using oral reading fluency. School Psychology Review, 30, 407-419. Stanovich, K. E. (1980). Toward an interactive-compensatory model of individual differences in the development of reading fluency. Reading Research Quarterly, 16, 32-71. Stanovich, K. E., & West, R. F. (1981). The effect of sentence context on ongoing word recognition: Tests of a two-process theory. Journal of Experimental Psychology: Human Perception and Performance, 7, 658-672. Tapiero, I. (2007). Situation models and levels of coherence: Toward a definition of comprehension. Mahwah, NJ: Lawrence Erlbaum Associates. 126 van den Broek, P., & Gustafson, M. (1999). Comprehension and memory for texts: Three generations of reading research. In Goldman, S. R., Graesser, A. C., & van den Broek, P. (Eds.), Narrative comprehension, causality, and coherence: Essays in honor of Tom Trabasso (pp. 15-34). Mahwah, NJ: Lawrence Erlbaum Associates. Vidal-Abarca, E., Martinez, G., & Gilabert, R. (2000). Two procedures to improve instructional text: Effects on memory and learning. Journal of Educational Psychology, 92, 107-116. Wood, D. E. (2006). Modeling the relationship between oral reading fluency and performance on a statewide reading test. Educational Assessment, 11, 85-104. Zwaan, R. A. (1996). Processing narrative time shifts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 386-397. Zwaan, R. A., & Radvansky, G. A. (1998). Situation models in language comprehension and memory. Psychological Bulletin, 123, 162-185. Zwaan, R. A., Radvansky, G. A., Hilliard, A. E., & Curiel, J. M. (1998). Constructing multidimensional situation models during reading. Scientific Studies of Reading, 2, 199-220.