EXAMINING THE USE OF VIDEO ANALYSIS ON TEACHER 
INSTRUCTION AND TEACHER OUTCOMES:  
A META-ANALYSIS 
 
 
 
 
 
 
 
 
 
 
 
 
 
by 
 
MCKENZIE MELINE 
 
 
 
 
 
 
 
 
 
 
 
 
 
A DISSERTATION 
 
Presented to the Special Education and Clinical Sciences 
and the Graduate School of the University of Oregon 
in partial fulfillment of the requirements 
for the degree of 
Doctor of Philosophy 
 
June 2020
i 
 
 
 
DISSERTATION APPROVAL PAGE 
 
Student: McKenzie Meline 
 
Title: Examining the Use of Video Analysis on Teacher Instruction and Teacher 
Outcomes: A Meta-Analysis 
 
This dissertation has been accepted and approved in partial fulfillment of the 
requirements for the Doctor of Philosophy degree in the Special Education and Clinical 
Sciences by: 
 
Dr. Beth Harn      Chairperson and Advisor 
Dr. Elisa Jamgochian      Core Member 
Dr. Sylvia Linan-Thompson      Core Member 
Dr. Kathleen Strickland-Cohen   Core Member 
Dr. Audrey Lucero      Institutional Representative 
 
and 
 
Kate Mondloch Interim Vice Provost and Dean of the Graduate School  
 
Original approval signatures are on file with the University of Oregon Graduate School. 
 
Degree awarded June 2020 
  
ii 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
© 2020 McKenzie Meline  
iii 
  
DISSERTATION ABSTRACT 
 
McKenzie Meline 
 
Doctor of Philosophy 
 
Special Education and Clinical Sciences 
 
June 2020 
 
Title: Examining the Use of Video Analysis on Teacher Instruction and Teacher 
Outcomes: A Meta-Analysis 
 
The purpose of this replicated systematic review (SR) and meta-analysis was to 
examine the literature base of single-case research design studies using video analysis to 
determine the intervention’s effectiveness on teacher outcomes. Using a primary search and 
an ancestral, citation, and first author searches, this study evaluated participant, student, and 
setting characteristics in dissertations and peer-reviewed articles published from 2010-
2020. A total of 24 included articles were coded for descriptive analysis and design quality. 
For the meta-analysis, a total of 16 articles were reviewed for statistical analysis, in 
which a between-case standardized mean difference was used to calculate effect sizes. 
Results indicate praise (n = 6) and fidelity of  implementation (n = 6) had the largest effect 
size that continue to define video analysis as a promising practice. Recommendations for 
future practice include continued studies using video analysis with diverse educators, 
students, and settings that meet design quality standards as well as increasing sample size 
to prove the generalizability of video analysis. Addressing these recommendations will 
support video analysis becoming an evidence-based practice (EBP) for educator 
development. 
iv 
  
CURRICULUM VITAE 
 
NAME OF AUTHOR: McKenzie Meline 
 
 
GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: 
 
 University of Oregon, Eugene 
 California Polytechnic, San Luis Obispo 
 California Polytechnic, San Luis Obispo 
 
 
 
DEGREES AWARDED: 
 
 Doctor of Philosophy, Special Education, 2020, University of Oregon 
 Master of Arts, Education with specialization in Special Education, 2011, 
  California Polytechnic, San Luis Obispo  
 Bachelor of Science, Liberal Studies, 2010, California Polytechnic, San Luis 
Obispo  
 
 
 
AREAS OF SPECIAL INTEREST: 
 
 Teacher Preparation 
 English Learners 
 Education Policy 
 
 
PROFESSIONAL EXPERIENCE: 
 
 Graduate Teaching Assistant, University of Oregon, 2019-2020 
 
 Practicum Supervisor, University of Oregon, 2016-2019 
 
 Learning Support Coordinator, GEMS World Academy (Singapore), 2014-2016 
 
 Special Education and Secondary Teacher, GEMS The World Academy 
  (Saudi Arabia), 2012-2014 
 
 English Language Development Teacher, Beolgyo School District (South Korea), 
  2011-2012 
 
GRANTS, AWARDS, AND HONORS: 
 
v 
  
 Culbertson Scholarship Fund, General Scholarship, University of Oregon, 2019 
 
 
PUBLICATIONS: 
 
Harn, B. & Meline, M. (2018) Developing critical thinking and reflection in 
teachers within teacher preparation in G., Mariano (eds.) Handbook of 
Research on Critical Thinking Strategies in Pre-Service Learning 
Environments. (pp. 126-145) IGI Global 
 
McCroskey, C., Brafford, T., Reardon, K., Meline, M., & Harn, B. (in press). 
IDEA: History and legal issues. In Fisher, D. & Jung, L. A. (Eds.). 
Encyclopedia of Education. New York, NY: Routledge. 
 
Thier, M., Martinez, C. R., Jr., Al-Resheed, F., Storie, S., Sasaki. A., Meline, M., 
Rochelle, J., Witherspoon, L. & Yim-Dockery, H. Cultural adaptation of 
promising, evidence-based, and best practices: A scoping literature review. 
Prevention Science. 21(1), 53-64. doi:10.1007/s11121-019-01042-0. 
  
vi 
  
ACKNOWLEDGMENTS 
 
I wish to express my gratitude to my doctoral dissertation committee members, 
Drs. Elisa Jamgochian, Sylvia Linan-Thompson, Audrey Lucero, and Kathleen 
Strickland-Cohen with a special thank you to my advisor, Dr. Beth Harn. This journey 
would not have been possible without your continuous effort and genuine support.  
It’s been a true privilege embarking on this journey with the love and support of 
my friends and family. To my family, thank you, mom, dad, Brandon, Dustin, and 
Lindsay for showing unwavering support on this crazy journey and only providing slight 
judgment whenever I pulled out my computer to do work during family gatherings. Your 
cheering was heard in Oregon. 
To my friends, Sophia, Laura, Jesse, Fran, Zachary, who took me on adventures 
when I needed it the most, thank you for reminding me that life is too much fun and too 
precious to spend behind a computer screen. You have reminded me that I am not the 
sum of this program. 
To my academic family, thank you for your wisdom and advice that has helped 
guide me through the program--specifically, Dr. Angela Ingram, Dr. Kyle Reardon, Tasia 
Brafford, Stephanie St. Joseph, Aaron Mowery, and Stacy Arbuckle, I appreciate your 
tireless hours editing this dissertation and coding of articles. You all have been reliable in 
more than one way.  
And finally, to my dog, Bulka, who has tolerated my absence and kept me 
company on countless late-night work sessions. You’re the best dog! I could continue, 
but then this section would be longer than my dissertation and I figured this paper was 
long enough. 
vii 
  
This dissertation is dedicated to my family and friends who believed in me even 
when I didn’t see it myself. Thanks for being my guiding light through this journey. 
  
viii 
  
TABLE OF CONTENTS 
Chapter Page 
 
I. INTRODUCTION .................................................................................................... 1 
Research Questions .......................................................................................... 6 
 Literature Review ................................................................................................... 7 
 Self-Reflective Practices .................................................................................. 8 
 Video Analysis ................................................................................................. 14 
 Application to the Current Study ..................................................................... 20 
 Conclusion ....................................................................................................... 21 
II. METHODOLOGY .................................................................................................. 22 
 Data Collection Process ................................................................................... 24 
 Eligibility Criteria ............................................................................................ 25 
 Coding Variables ............................................................................................. 26 
 Title and Abstract Review ............................................................................... 35 
 Full-Text Review   ...................................................................................... 35  
 WWC Pilot Single-case Design Standards Review ......................................... 37 
 Data Analysis .................................................................................................. 40 
III. FINDINGS ............................................................................................................. 49 
 Descriptive Analysis  ............................................................................................. 49 
 Research Question 1 ........................................................................................ 49 
 Research Question 2 ........................................................................................ 52 
 Statistical Analysis  ................................................................................................ 65 
 Research Question 3 ........................................................................................ 65 
 Relation to the Parent Study  ................................................................................. 69 
ix 
  
Chapter Page 
 
Research Question 4 ........................................................................................ 70 
 Comparison of Study Characteristics  .................................................................... 71 
IV. DISCUSSION ........................................................................................................ 74 
 Research Question 1  ............................................................................................. 74 
 Study and Participant Characteristics .............................................................. 75 
 Student and Setting Characteristics ................................................................. 76 
 Research Question 2  ............................................................................................. 76 
 Research Question 3  ............................................................................................. 78 
 ES by Participant Characteristics ..................................................................... 78 
 ES by Type of Dependent Variable ................................................................. 79 
 Research Question 4  ............................................................................................. 82 
 Limitations  ............................................................................................................ 84 
 Implications for Future Practice  ............................................................................ 88 
 Conclusion  ............................................................................................................ 88 
APPENDICES ............................................................................................................. 90 
 A. Qualtrics Form for Coding Study Characteristics ............................................. 90 
 B. Qualtrics Form for Coding WWC Design Quality Standards ........................... 112 
REFERENCES CITED ................................................................................................ 123 
x 
  
LIST OF FIGURES 
 
Figure Page 
 
 
1. PRISMA Flowchart ............................................................................................... 34 
 
2. Forest Plot Displaying BC-SMD ES ..................................................................... 67 
 
3. Forest Plot Display BC-SMD ES Based upon DV ................................................ 68 
4. Forest Plot Display BC-SMD ES Based for Participant Characteristics ............... 87 
xi 
  
LIST OF TABLES 
 
Table Page 
 
1. Operational Definition of the Coding Variables .................................................... 30 
 
2. WWC Pilot Single-case Design Standards Coding Variables ............................... 42 
 
3. IRR Across Phases ................................................................................................. 45 
4. Study Characteristics of the Included Articles ....................................................... 53 
5. Participant Characteristics of the Included Articles ............................................... 55 
6. Student and Setting Characteristics of the Included Articles ................................. 58 
7. Educator, Student, and Study Characteristics ........................................................ 61 
8. WWC Design Quality Standards Results .............................................................. 63 
 
xii 
  
CHAPTER I 
INTRODUCTION 
Evidence-based practices (EBPs) are scientific and empirically-based approaches 
shown to be effective, efficient ways to produce desired outcomes (Odom, 2009; Odom 
et al., 2016). Across the field of education, EBPs demonstrate effective strategies 
targeting a variety of student skills (Harn, 2017). EBPs are essential for maximizing 
instructional time and improving student outcomes for the most at-risk students. The use 
of EBPs in school settings has been adopted into federal policy and is endorsed by The 
Every Student Succeeds Act (ESSA; 2015), which served as a continuation of the 
Individuals with Disabilities Education Act’s (IDEA; 2004) requirement of “utilizing 
research-based interventions, curriculum, and practices” (Section §1465(b)(2)(D)), by 
mandating that academic and behavioral intervention programs targeting at-risk learners 
be evidence-based. This requirement aims to ensure that intervention programs for the 
most at-risk learners yield the same results as those established in the research. 
Effective, empirically-based programs are also needed when working to improve 
teacher outcomes, particularly instructional quality (Darling-Hammond, 2017; Yoon et al, 
2007). The professional development and preparation educators receive need to be 
effective and efficient as well as minimize use of school resources (e.g., time, money, 
materials, etc.). These trainings should target and aim to improve educator behaviors 
(e.g., instructional quality, data monitoring, classroom management skills, etc.) and 
ultimately, student outcomes. Through the identification of best practices for both 
teachers and students, educational practices become more intentional, deliberate, and 
1 
  
effective, thus increasing the efficacy of instructional practices for our most at-risk 
learners. 
To establish the most effective practices, multiple studies implementing a specific 
intervention or practice need to be examined for effectiveness by determining the 
consistent significant, positive outcomes. Clearinghouses, such as What Works 
Clearinghouse (WWC), National Center on Intensive Intervention (NCII), and Evidence 
for ESSA, are the mechanisms that typically examine numerous studies using the 
intervention, and then label the practice as evidence-based or in need of further research 
or evidence. Clearinghouses identify EBPs by first evaluating the quality of the research 
design and then identifying whether or not the intervention has positive, significant 
outcomes. Replication studies and systematic reviews (SRs) are important pieces of the 
process for classifying a practice as evidence-based or in need of additional research. 
Therefore, each of these fundamental investigations and their contributions’ 
determination of EBPs are discussed below. 
Importance of Replicating Studies 
One way to identify an EBP is through the replication of studies (Therrien et al., 
2016). A replication study is a “study [that] purposefully replicates, extends, further 
investigates, or uses as its basis one or more previously conducted studies” (Cook et al., 
2016, p. 226). Replication studies validate or refute the positive findings of a previous 
study (Cook, 2014), and determine if findings are generalizable to other participants 
and/or settings (Schmidt, 2009).  
Studies need to be replicated to determine if the practice is effective and can be 
classified as an EBP. For single-case studies using a small sample size of a minimum of 
2 
  
three participants, positive results cannot be extended to make larger claims that a 
practice is effective and generalizable to other participants or settings. Therefore, the 
studies must be replicated multiple times to affirm the findings of a parent study (i.e., the 
study being replicated). For an intervention used in a single-case research designs 
(SCRDs), which is a commonly used methodology for special education research, to be 
considered an EBP the standards include: (a) a minimum of five studies using SCRD 
published in peer-reviewed journals, (b) a demonstration of a functional relation to show 
the efficacy of the treatment for each of the five studies, (c) variation in a minimum of 
three research groups or settings, and (d) documentation of an effect for a total of 20 
participants across all studies (Horner et al., 2005; Horner & Kratochwill, 2012). 
Replication studies establish EBPs by using similar procedures that include a different 
and larger number of participants (Cook et al., 2016). 
Unfortunately, replication studies are conducted infrequently because they are not 
valued as highly as other forms of research (Cook et al., 2016). Very few studies replicate 
previous studies (Makel & Plucker, 2014; Makel et al., 2012), limiting the ability to 
confirm if a given practice is truly effective or generalizable to other settings or 
participants. For example, Cook et al. (2016) investigated the prevalence of replication 
studies in special education from 2013-2014 by conducting a literature search across six 
journals. The investigation resulted in a total of 83 reviewed articles, of which only 9% (n 
= 26) were identified as replication studies. Of the 26 replication studies, 15 were single-
case designs and 11 were group designs. This finding indicates that there is a dearth of 
replication studies within the special education literature. Replication studies are a 
3 
  
significant contribution to the field; however, replication studies need to be viewed as a 
valuable contribution. 
When studies are being replicated, there are issues often with the methodological 
procedures that should be noted when interpreting findings. These issues tend to include 
(a) author overlap (Therrien et al., 2016) or (b) inconsistencies within the replicated study 
(Ioannidis, 2005; Ioannidis, 2012; Pashler & Wagenmakers, 2012). Author overlap 
occurs when the author of the parent study is a research team member of the replication 
study, which may result in author bias and the conflation of results. When different 
authors conduct the replication study, the lack of interference from the parent author 
minimizes bias and objectively confirms or refutes the previous findings of the parent 
study (Makel & Plucker, 2014; Makel et al., 2012). Therefore, when conducting 
replication studies, it is recommended that other authors conduct the replication to avoid 
author overlap and reduce the possibility of bias (Cook et al., 2016). Additionally, many 
replication studies are not regarded as high quality studies, according to the What Works 
Clearinghouse standards (Therrien et al., 2016). The purpose of replication studies is to 
verify or refute the findings from the parent study or generalize the results to different 
settings or populations. If a replicated study does not adhere to design quality standards 
showing that the study is methodologically weak, the study makes erroneous assumptions 
regarding the treatment’s effects and generalizability of the practice. This undermines the 
primary purpose of replication studies as being able to determine if a practice is evidence-
based (Therrien et al., 2016). In summary, quality replication studies are essential for the 
validation of parent studies and should be conducted more often and by different authors 
to support the identification of EBPs.  
4 
  
Importance of Systematic Reviews (SRs) 
Systematic reviews are a fundamental part of the of EBP identification process 
because they provide a consolidated and synthesized review of the literature (Moher et 
al., 2007). These reviews provide up-to-date information across studies and research 
groups to give insight on the commonalities of findings regarding a specific topic within 
the field. SRs typically serve as a starting point for granting agencies, researchers, and 
even practitioners who want to examine the most recent innovations and the empirical 
data supporting these practices (Moher et al., 2007). 
When reviewing SRs, Moher et al. (2007) discovered that few of the studies 
(17.7%) reported being an updated version of a previously conducted SR. This indicates 
that SRs tend not to build upon the current literature, and further demonstrates the lack of 
replication present in the field. Researchers, therefore, typically conduct SRs without 
extending previous research to include additional, current publications, which leaves out 
recent trends within the field and presents discontinuous information. By replicating SRs, 
researchers augment current understandings within the field, validate findings, and/or 
refute false positives (Zwaan, 2018), which in turn helps push the field of special 
education forward.  
To improve the field of teacher development in special education, the present 
study replicated Dr. Morin’s (2017) dissertation entitled The Use of Video Analysis to 
Change Special Educators’ Instructional Practices: A Single-Case Study and Meta-
Analysis. The parent study was a meta-analysis examining the overall and moderator 
effect of video analysis (VA). The parent study examined the efficacy of VA based upon 
educator, student, and setting characteristics. This current study extends Dr. Morin’s 
5 
  
research by replicating her SR research methods while incorporating the most recent 
advancements in statistical analysis for meta-analyses using SCRD methodology. 
Therefore, the current study consists of the following research questions: 
1) What is the status of the literature base on VA regarding study characteristics 
(i.e., publication type), participant characteristics (i.e., role, education level, experience 
level, age), student characteristics (i.e., disability type, student outcomes), and setting 
characteristics (i.e., grade level, group size, type of instruction, setting)? 
2. What is the status of the literature base on the research design quality for the 
included articles as measured by the What Works Clearinghouse (WWC) design quality 
standards (i.e., meets, meets with reservations, does not meet)? 
3) What is the magnitude of effect of VA interventions on the instructional 
practices of educators? 
4) How has the literature base on VA changed since 2016 as reported by Morin’s 
(2017) systematic review? 
To maintain the continuity of the progress of VA within the education field, this 
meta-analysis extends the search of the parent study to include the most recent literature 
from 2016 to 2020 as well as utilizes the most recent statistical procedures for calculating 
the treatment effect in studies using SCRD. The proposed SR aims to corroborate and 
validate the parent meta-analysis findings by searching overlapping years from 2010-
2016. To assist in the fidelity to the methodological process of the parent SR, the 
researcher contacted Dr. Morin for greater specificity and details regarding the SR 
procedures. The parent author is not an active participant in the proposed study as to 
avoid author overlap and prevent bias in the current study’s findings. 
6 
  
 
Literature Review 
Not only are EBPs important for providing quality instruction to at-risk students, 
but the training tools used to improve the instructional quality of the educators working 
with the most at-risk students need to be empirically based, as well (Darling-Hammond, 
2017; Yoon et al, 2007). The process for training educators must be effective and 
efficient, while using minimal school resources, adhere to time constraints, and 
demonstrate instructional growth over time. There are a range of methods to improve 
instructional practices (e.g., professional development, coaching, consultation, etc.), each 
of which entails providing performance feedback (PF) to educators. 
PF is an EBP shown to improve teaching behaviors and instructional quality 
(Fallon et al., 2015; Scheeler et al., 2004; Sinclair et al., 2019). PF involves the meeting 
between a consultant (specialist) and a consultee (teacher) to discuss how to improve 
instructional practices, such as reviewing student data, examining the fidelity of 
implementation (FOI), and discussing strategies for improvement (Fallon et al., 2015). 
The traditional way to implement PF requires the consultant and consultee to meet 
regularly and conduct follow-up observations, which is not time efficient. This makes it 
difficult to effectively implement in an authentic classroom setting (e.g., general 
education, special education, intervention, etc.) due to significant time constraints for 
teachers and interventionists (Reinke et al., 2007). Not implementing PF properly affects 
the efficacy of the practice and reduces the significance of instructional outcomes. 
Therefore, complementary and alternative PF practices that are well-suited for an 
authentic school setting should be considered. 
7 
  
Self-reflection and VA are two practical practices used to provide PF that 
addresses the time constraints of the observer or person providing the feedback while still 
providing necessary feedback to improve instructional quality. Self-reflection and VA are 
considered promising practices, rather than EBPs, due to the limited number of quality 
studies demonstrating their effectiveness (Morin, Ganz et al., 2019; Morin, Nagro et al., 
2019; Nagro & Cornelius, 2013) and the lack of a consistent definition and clear 
components necessary for implementation (Beauchamp, 2015; Collin et al., 2013). These 
two promising practices should be considered when trying to improve the instructional 
practices of educators in an authentic educational setting because both can be 
implemented despite the complexities encountered in a classroom. A brief review of the 
research on each strategy is discussed next. 
Self-Reflective Practices  
Self-reflection is the careful consideration of one’s actions to make decisions that 
informs future practice (Dewey, 1933). Self-reflection is a process that leads to deeper 
thinking, analysis, and understanding of one’s actions and the impact they have on others. 
To further understand how to apply self-reflection to a school setting, this section 
includes a description of self-reflection, its utility and implementation, and the limitations 
of this practice. 
Description of Self-Reflection  
Within education, self-reflection is comprised of four different hierarchal levels of 
reflective practices: (a) describing the teaching; (b) analyzing the choices and behaviors; 
(c) judging the outcomes and the instructional decisions made; and (d) applying the 
analysis for future practice (Nagro et al., 2017; deBettencourt & Nagro, 2018). Teachers 
8 
  
advance through each of these steps, but to achieve the higher levels of reflective 
thinking, support needs to be scaffolded. This form of additional support is usually 
provided by personnel (e.g., a specialist, consultant, coach, supervisor, etc.), who help 
guide teachers through the reflective process. Teachers are eventually able to effectively 
and efficiently reflect on their own once they are provided with multiple practice 
opportunities and feedback. The tools needed to support teachers through this process are 
discussed further in the following sections. 
Utility and Implementation of Self-Reflection  
Reflection is a common practice used in teacher preparation programs for 
preservice teachers or as a professional development strategy to enhance the skills of 
inservice teachers and paraprofessionals (Benedict et al., 2016; Harn & Meline, 2019). 
Within these settings, journaling, lesson studies, case-based instruction, and discussions 
are common tools for reflection. Journals are written reflections used to capture the 
teacher’s thoughts and feelings about his or her instruction (Etscheidt et al., 2012). 
Lesson studies, which were originally intended as a professional development tool for 
inservice teachers, are now being used in teacher preparation programs. The lesson study 
process consists of five steps: “(a) preparation, (b) collaborative planning, (c) teaching 
the lesson, (d) observation and data collection, and (e) debriefing and data analysis” 
(Roberts et al., 2018, p. 238). The teachers work together to determine the lesson 
objective, identify student goals, and create a lesson. This lesson is then taught to the 
students while the rest of the team observes, takes notes of the interactions, and monitors 
the achievement of the lesson goals. Afterwards, the team debriefs the lesson and, in 
some cases, the team may choose to revise and reteach the lesson (Fernandez, 2002). 
9 
  
Case-based instruction uses example narratives (e.g., vignettes, protocols, etc.) to create a 
scenario that focuses on a specific classroom problem that the teacher analyzes and helps 
to build a connection between theory and practice (Kagan, 1993). Case studies are 
versatile, reflective tools that can be used in various content areas and with both inservice 
and preservice teachers to highlight a diverse range of problematic situations that may be 
encountered in a classroom setting. Discussions, a common component of many 
reflective practices, invite professionals to come together to talk about a common topic or 
challenge. Discussions afford an opportunity for teachers with various perspectives to 
explore and share ideas that can be put into practice (Borko et al., 2008). These various 
self-reflective activities demonstrate the versatility, utility, and feasibility of self-
reflection that benefits all educators.  
Limitations of Self-Reflection  
Although reflection minimizes the time required of an observer or supervisor, 
some of these self-reflective practices are time consuming for the educators. For example, 
lesson studies take three to four weeks to complete the entire process, which typically 
consist of 10-15 hours spent on team meetings. This is a huge time commitment for 
teachers. They do not have the excess time necessary for the proper implementation of 
lesson studies, thereby limiting the utility of this practice (Fernandez, 2002). Sims and 
Walsh (2009) implemented lesson studies across two years as part of a teacher 
preparation program for early intervention preservice teachers. The preservice teachers 
were required to analyze lessons, participate in classroom discussions, and think critically 
about their research lesson. For the program, the primary goal was for preservice teachers 
to focus on the jointly designed lesson, which is commonly referred to as a research 
10 
  
lesson, and not critique the teacher’s instructional behaviors. The preservice teachers met 
as a group to collaboratively design a research lesson in which one preservice teacher 
delivered to his/her class. The group then reconvened to review how the lesson went and 
designed a revised lesson that addressed the challenges encountered in the first lesson 
delivery. The other preservice teachers in the group taught the lesson to their class. The 
design of this lesson study proved to be challenging because of the lack of classroom 
time needed to develop a completed research lesson. As a result, preservice teachers were 
required to finish the research lesson outside of university class time and it was no longer 
a collaborative process. This caused the group conversations to revolve around the 
teacher’s delivery of the lesson and the presentation of the adapted research lesson rather 
than the examination of the lesson objectives and student interactions. Furthermore, the 
probing questions asked in the discussion groups were broad and the preservice teachers’ 
responses were regarded as superficial, demonstrating that additional guidance is needed 
to properly self-reflect (Sims & Walsh, 2009).  When considering the implementation of 
reflection, it is important to select the reflective practice that aligns with the needs of the 
educators and fits the school context, taking into account resource and time constraints. 
Even if reflective practices are practical for the school context, a teacher’s reflective 
capabilities need to be considered because this can impact and consume school resources. 
Reflection is not an inherent trait of educators and studies show educators 
experience difficulties when self-reflecting on their own (Sims & Walsh, 2009; Tracz et 
al., 2005, van Es & Sherin, 2002). Without providing direct instruction on how to reflect, 
reflective activities (e.g., journaling, lesson studies, case-based instruction, discussions, 
etc.) are ineffective and requires frequent opportunities to practice ( Nagro & 
11 
  
deBettencourt, 2018). Therefore, teachers need additional support and multiple 
opportunities to practice and learn how to reflect better. This takes time and additional 
resources upfront until the teacher is able to reflect independently. For example, Spalding 
and Wilson (2002), as part of the teacher preparation program described previously, had 
preservice teachers submit journal entries reflecting on their experiences in their 
practicum sites. The preservice teachers’ initial journal entries focused on a descriptive 
analysis of what happened in the classroom, which is the rudimentary level of reflection 
(Nagro et al., 2017; deBettencourt & Nagro, 2018). The preservice teachers had to learn 
how to progress from descriptive reflective practices to higher level critical thinking that 
extended beyond stating what one is doing and instead involved a reflection of one’s 
actions. To achieve this, the researchers explicitly taught the preservice teachers how to 
self-reflect by having them review the narratives of other teachers and identify reflective 
components. The preservice teachers were then able to transfer these reflective 
components into their own journal entries. Additionally, the preservice teachers reported 
that instructor feedback helped develop their reflective practices. When following-up 
with these teachers in their second year of teaching, they continued to use self-reflection 
to enhance their instructional practices (Spalding & Wilson, 2002), indicating that self-
reflection is practical for teachers and transforms them into lifelong learners (Harn & 
Meline, 2019; Tripp & Rich, 2012). In order to sustain the efficacy of self-reflection as it 
is applied to the field to improve instructional quality, reflection needs to be modeled and 
taught, and requires additional feedback from others in order to achieve higher levels of 
self-reflection. 
12 
  
Finally, self-reflection is considered a promising practice, not an EBP 
(Beauchamp, 2015; Collin et al., 2013). The effectiveness of self-reflection needs to be 
further examined for its impact on student and teacher outcomes. In one instance, 
Richards et al., (2012) had physical education (PE) preservice teachers write a case study 
based upon their experiences in the classroom. The case study topics included classroom 
management, adapted PE, collaboration, ethical decisions, and other common issues 
experienced by beginning PE teachers. As part of the coursework, the preservice teachers 
had peers and instructors provide feedback which encouraged the preservice teachers to 
deepen their reflective practices. In the end, the preservice teachers gained a deeper 
understanding of how to resolve issues they might encounter in the school setting because 
they were provided with different perspectives on how to overcome challenges. The 
process encouraged the preservice teachers to think critically and reflect on the 
complexities of teaching (Richards et al, 2012). Regretfully, the study neglected to 
measure the change in instructional skills and its impact on student outcomes. This is a 
common issue for studies using self-reflection which indicates a greater need for a SR to 
examine the effectiveness of reflective practices within the literature base. To be 
considered as an EBP, self-reflection needs a consistent definition (Beauchamp, 2015; 
Collin et al., 2013) and demonstrates its efficacy by analyzing educator and student 
outcomes. 
In summary, there are various types of reflection (i.e., journaling, lesson studies, 
case-based instruction, and discussions) that can be implemented in diverse classroom 
settings, and self-reflection is a promising practice that has demonstrated positive results 
(Borko et al. 2008; Richards et al., 2012; Spalding & Wilson, 2002). Although teachers 
13 
  
report that they like the practice of self-reflection, it is evident that the ability to 
effectively self-reflect is not an inherent skill (Tracz et al., 2005; van Es & Sherin, 2002). 
Studies using self-reflection found preservice teachers provided basic descriptions of 
their teaching behaviors without using other sophisticated skills to analyze, judge, and 
apply to their teaching (deBettencourt & Nagro, 2018; Richards, et al., 2012; Spalding & 
Wilson, 2002; Sims & Walsh, 2009). Educators need to be explicitly taught how to be 
self-reflective through scaffolded procedures to develop higher-order reflective skills. 
This reflects the need for specialized personnel (e.g., consultants, coaches, etc.) and 
additional time for teachers to work continuously with the specialist to promote effective 
self-reflective practices. However, with the use of a rubric or framework to guide 
reflection, self-reflection could develop beyond the descriptive level and progress 
towards higher reflective practices such as judgement of teaching practices and 
application of changes to future practice (deBettencourt & Nagro, 2018). Finally, to 
better understand the effectiveness of self-reflection, more research needs to be 
conducted to demonstrate that teachers who engage in these practices show 
improvements associated to teacher and student outcomes (Harn & Meline, 2019).   
Video Analysis (VA) 
Video analysis is a promising practice commonly used for developing teacher 
skills and reflective practice (Morin, Ganz et al., 2019; Morin, Nagro et al., 2019; Nagro 
& Cornelius, 2013) by incorporating video technology that gives teachers the ability to 
review their own instruction or the instruction of their peers. This provides teachers the 
opportunity to think critically about the interaction between their teaching behaviors and 
students (Tripp & Rich, 2012). To further understand how to implement VA, this section 
14 
  
provides a description of VA, its utility and implementation, a comparison to self-
reflection, and limitations of VA. 
Description of VA 
VA consists of three main components: videorecording, video review, and 
analysis of teaching strategies (Mosley, Wetzel et al., 2017). The teachers record a lesson, 
watch the video footage, and analyze teaching behaviors through reflection or discussion 
about what they observed. Because of the use of video, teachers view actions they may 
have forgotten, are made aware of their behaviors, and/or notice students’ responses 
(Knight et al., 2012; Sherin & van Es, 2005). This gives teachers a more complete and 
accurate perception of the classroom instruction and student interactions (Nagro & 
Cornelius, 2013; Rich & Hannafin, 2009). Additionally, teachers can replay the footage 
and review the event multiple times (Tripp & Rich, 2012) to promote a more objective 
view of teaching behaviors. These specific teaching behaviors can be referenced and used 
as examples to validate the feedback being provided. Finally, since VA requires the 
analysis of teaching behaviors through reflective practices and is used in tandem with 
reflective thinking, educators need to be supported through the VA process to reach the 
higher levels of reflection. 
Utility and Implementation of VA 
VA is a versatile reflective tool that can be implemented in a variety of ways in 
terms of (a) the reflection process, (b) content being viewed, and (c) who provides 
feedback. VA is used simultaneously with self-reflection. Self-reflection can be 
incorporated in VA through video editing, video annotations, or other self-reflection 
practices (Osipova et al., 2011). Video editing involves video recording a teacher 
15 
  
providing instruction and editing it to highlight key incidents (Calandra et al., 2008). 
Video annotation gives the teachers the ability to self-reflect on their teaching by adding 
a caption to a video segment (Rich & Hannafin, 2009). This provides video evidence by 
linking the commentary with the teaching behavior. The third type of VA is video self-
reflection, which involves teachers analyzing and making connections about their 
teaching behaviors while reviewing classroom footage (Nagro & Cornelius, 2013).  
Additionally, the educator can view a variety of video footage content. Educators 
can view: published videos, peer videos, and personal videos (Zhang et al., 2011). 
Published videos or video cases are targeted videos that show the teaching environment, 
student behaviors, instructional content that happen in an authentic classroom setting. 
This allows the observer of the publish videos to pinpoint potential areas of concern or 
demonstrate a variety of teaching behaviors allowing the observer the opportunity to 
make decisions prior to entering the classroom (Olson et al., 2016; Zhang et al., 2011). 
Peer videos are videos of colleagues providing instruction and give the observer the 
opportunity to learn new teaching techniques (Zhang et al., 2011). Personal videos act as 
a mirror as the observer views him/herself providing instruction, which gives them the 
advantage of seeing things that may not have been noticed while teaching the class 
(Zhang et al., 2011).  
Finally, VA can also differ in who observes and provides feedback on the 
instruction. Feedback can be provided in a group setting (Tripp & Rich, 2012; Hong & 
Van Riper, 2016; McDuffie et al., 2014), by an expert reviewer (Weber, et al., 2018; Lee 
& Wu, 2006), or by the individual themself (Nagro et al., 2017). These variations and 
16 
  
flexibility in the implementation of VA  allows for a range of possibilities for how to use 
this practice and makes it feasible for complex classroom settings.  
VA has also shown to be an effective, promising practice for both preservice and 
inservice teachers in various grade levels and content areas (Morin, Ganz et al., 2019). As 
part of a yearlong professional development seminar, Sherin and van Es (2005) had 
mathematics teachers attend monthly meetings where they watched video clips of each 
other’s classroom teaching. The facilitator drove the group dialogue by prompting 
teachers with open-ended discussion questions. The researchers recorded, transcribed, 
and analyzed the group sessions. The results indicated the teachers’ focus throughout the 
course shifted from teacher behaviors and pedagogy to student thinking. This growth in 
reflective skills demonstrates the teachers’ ability to improve their reflective practices 
over time by using video and guided discussion questions.  
Comparison to Reflective Practices 
Since VA includes a reflective component, VA is ineffective unless teachers are 
guided through the process to meet the higher-level self-reflective practices. For instance, 
van Es and Sherin (2002) had six preservice teachers write essays before and after 
watching their instruction. The preservice teachers participated in three one-hour long 
Video Analysis Support Tool (VAST) sessions where the teacher reviewed video from 
their classroom as well as their peer’s classroom. After each video, the preservice 
teachers were provided with open-ended prompts focusing on student thinking, teacher’s 
roles, and classroom discourse. The preintervention essays were descriptive in content 
and only discussed what was occurring in the classroom. After the intervention, the 
preservice teacher’s essays included a deeper analysis of the classroom events. Through 
17 
  
scaffolded support and guidance, teachers began using higher level critical thinking when 
analyzing their own videos. Without prompts, they focused on pedagogy and descriptive 
comments showing that VA on its own is not an effective practice for teachers. But, with 
guided VA, teachers can improve their reflective practices and increase the use of 
instructional strategies (Nagro, et al. 2017).  
 To support self-reflective practices, VA extends beyond the traditional methods of 
reflection that rely on memory (Nagro, 2020) by providing a more accurate and objective 
descriptions of the teacher and student behaviors (Nagro & Cornelius, 2013; Rich & 
Hannafin, 2009). Robinson and Kelley (2007) compared standalone reflective practices 
and value-added reflective practices in combination with VA. Preservice teachers 
practiced role-playing teaching interactions and reflected on their performance. The 
preservice teachers that engaged in VA acquired higher levels of reflection in comparison 
to the control group which only reflected on their performance. Tripp and Rich (2012) 
further confirm VA’s contribution to reflection and change in instructional practices. 
When interviewing teachers who participated in VA and peer feedback, the teachers 
reported VA (a) provided them an opportunity to view their teaching from an alternative 
perspective, (b) gave them greater confidence in the feedback provided, (c) inspired 
teachers to take action and be held accountable for making a change in their instructional 
behavior, (d) made them more inclined to implement the recommended changes, and (e) 
teachers perceived that they had improved their instructional skills. Overall, VA 
complements reflective practice by providing an accurate portrayal of the instruction that 
leads to improved instructional growth and teacher development.  
Limitations of VA 
18 
  
Some of the major limitations of VA are the amount of time required for teachers 
participating in VA, the scaffolding of activities for novice users of VA, concerns about 
the generalizability of VA, and its relation to student outcomes. As previously discussed, 
VA is a versatile practice that can vary in terms of self-reflective components, video 
content, and who provides feedback which contributes to its use in complex classroom 
settings. However, educators need to learn how to properly conduct VA. To effectively 
observe their instruction, novice users of VA need guidance on what to observe to help 
them properly reflect on their instruction. Hong and Van Riper (2016) found that using 
instructional videos where the teacher viewed new instructional practices modeled in an 
authentic classroom setting was a useful professional development tool for inservice 
teachers and paraprofessionals. This process helped teachers in reviewing the videos 
critically, and the teachers discovered new instructional strategies. Regrettably, there was 
no indication that these skills were applied to their own classroom setting or impacted 
student outcomes. As a result, studies examining VA as a behavior change strategy 
should also examine its generalizability and its relation to student outcomes (Morin, 
Nagro et al., 2019).  
Summarizing VA 
VA is an effective behavior change strategy (Sherin & van Es, 2005) and, with 
recent technological advancements, has become a feasible practice within the classroom 
(Morin, Ganz et al., 2019; Morin, Nagro et al., 2019, Nagro et al., 2017) and can be 
applied to a diverse range of instructional behaviors for teachers. Through the use of 
video technology, the teacher or supervisor reviews the classroom instruction at any time 
resulting in less time and fewer resources spent on the actual observation. Instead, more 
19 
  
time and resources are allocated to essential components of an observation such as 
reviewing the video and providing feedback (Weber, et al., 2018).  
Given that self-reflection is used simultaneously with VA, VA is ineffective 
unless interventionists or teachers are guided through the process. Teachers need to be 
told which instructional items to observe to make the practice more efficacious. 
Therefore, studies with guided VA, which use a rubric or observation tool to support 
reflection, are more effective in improving reflective practices and use of instructional 
strategies for inservice and preservice teachers (Nagro, et al. 2017).  
Although VA is a supported and liked practice by teachers (Tracz et al., 2005), 
there is a limited amount of studies conducted that empirically support explore the 
efficacy of the practice (Nagro & Cornelius, 2013) and its impact on student outcomes 
(Morin, Nagro et al., 2019). There is also a dearth of research measuring VA’s impact on 
teacher effectiveness which prohibits it from being classified as an EBP (Morin, Ganz et 
al., 2019; Morin, Nagro et al., 2019; Nagro & Cornelius, 2013). Therefore, the current 
study intends to extend the field of research by examining the impact of VA on teacher 
instructional quality as measured by teacher outcomes (e.g., behavior specific praise, 
FOI, opportunities to respond, etc.). 
Application of VA to the Current Study 
The current study examines studies that have used  VA as a form of self-reflection 
and performance feedback in the hopes of providing evidence of  VA as an EBP. This 
study builds upon Morin’s (2017) SR and meta-analysis that examined the effect sizes of 
SCRD using VA as a treatment as well as examining study characteristics (i.e., 
participants, students, and setting). The current study extends Morin’s (2017) study by 
20 
  
employing the most recent research-validated meta-analysis practices to calculate study 
effect sizes using between-case standardized mean difference (BC-SMD) in addition to 
incorporating the most recent literature base from 2016-2020.  
 Another component of the current study is to examine the relation between VA 
and teacher outcomes. Most recent VA studies are qualitative in nature and examine the 
educators’ perspective about the feasibility and utility of VA (Hong & Van Riper, 2016; 
Mosley Wetzel et al., 2017; Trip & Rich, 2012), so the current study examines the impact 
of VA on various teacher outcomes (e.g., praise, fidelity of implementation, opportunities 
to respond, etc.). Furthermore, to be considered an EBP, VA needs to be included in more 
studies that meet quality design standards, so the methodological rigor is also analyzed in 
the current study. 
Conclusion 
In conclusion, schools need effective and efficient ways to monitor and promote 
instructional quality, especially for teachers providing interventions to our most at-risk 
students. Given that VA is a practice that incorporates self-reflection, the focus of this 
current study is the use of VA as  teacher-preferred practices to improve instructional 
skills. As an teacher development tool, VA has strengths (e.g., feasibility, less observer 
time, etc.) and weaknesses (e.g., time allocation, scaffolding of reflective practice, etc.), 
but VA requires further examination to demonstrate its effectiveness in improving 
student learning as well as identifying specific methods to make it more readily used in 
schools. This study examines the literature base involving VA and its impact on teacher 
outcomes.  
21 
  
CHAPTER II 
METHODOLOGY 
Meta-analyses enhance systematic literature reviews by synthesizing the findings 
across multiple studies using the same intervention treatment (Borenstein et al., 2009). 
This provides greater statistical power of the intervention package than the analysis of an 
individual study. Additionally, it provides more robust information about the 
generalizability of the intervention across settings, participants, students, and other 
variables (Tanner-Smith et al., 2016). The purpose of this SR replication and meta-
analysis is to gather and systematically review SCRD using VA as a treatment to improve 
educator teaching behaviors. The studies meeting the inclusionary criteria were 
descriptively analyzed for study characteristics (i.e., design, participant, student, and 
setting characteristics) and statistically analyzed for treatment effectiveness.  
The present study replicated the SR procedures used in a recent meta-analysis: 
The Use of Video Analysis to Change Special Educators’ Instructional Practices: A 
Single-Case Study and Meta-Analysis (Morin, 2017). Replication studies consist of two 
different types of replications: direct replications and conceptual replications. A direct 
replication is the recreation of core components of a parent study (Schmidt, 2009; Zwaan 
et al., 2018).  For the purpose of this study, the SR of the literature base was a direct 
replication of Morin’s (2017) dissertation. The current study adheres to the exact SR 
methods of the parent study, discussed later. 
A conceptual replication deviates from the parent study by intentionally altering a 
component (Makel & Plucker, 2014; Makel et al., 2012; Schmidt, 2009) and possible 
effect size (Zwaan et al., 2018). Due to statistical analysis advancements for meta-
22 
  
analyses that analyze the effect sizes for SCRD, the current study departs from the parent 
study by calculating the BC-SMD effect sizes rather than the parent study’s Tau-U 
average effect size. A Tau-U is a non-parametric effect size that combines the non-
overlapping data points and trend between the baseline and intervention phases to 
determine the percentage of non-overlapping data between two phases (Parker et al., 
2011). Although preferred over other non-parametric measures because it (a) includes the 
use of all data points, (b) controls for baseline trend, (c) uses simplified calculation 
procedures that include trend and non-overlap between phases, (d) is sensitive, and (e) 
has greater statistical power than other single-case analysis methods (Parker et al., 2011), 
Tau-U is only intended for an individual participant, and it is inappropriate to use a Tau-
U estimated effect size when calculating an overall study omnibus effect size due to 
multiple dependent variables (DVs) being measured within one study. Recent 
advancements have enabled an overall estimated effect size to be calculated across 
participants within a study that measures the same DV using a BC-SMD (Hedges et al., 
2012; 2013). Therefore, the statistical analysis portion of the dissertation is a conceptual 
replication of Morin’s (2017) study. Instead of a Tau-U average effect size, the present 
study reports multiple BC-SMD estimated effect sizes per study and is a conceptual 
replication of the parent study’s statistical methods.  
This chapter introduces the SR and meta-analysis methodology used in this study 
to investigate the following research questions:  
1) What is the status of the literature base on VA regarding study 
characteristics (i.e., publication type), participant characteristics (i.e., role, 
education level, experience level, age), student characteristics (i.e., disability type, 
23 
  
student outcomes), and setting characteristics (i.e., grade level, group size, type of 
instruction, setting)? 
2) What is the status of the literature base on the research design quality 
for the included articles as measured by the What Works Clearinghouse (WWC) 
design quality standards (i.e., meets, meets with reservations, does not meet)? 
3) What is the magnitude of effect of VA interventions on the instructional 
practices of educators? 
4) How has the literature base on VA changed since 2016 as reported 
by Morin’s (2017) systematic review? 
This section discusses the (a) data collection process, (b) eligibility criteria, (c) 
coding variables, and (d) data analysis for the meta-analysis component of the SR. To 
further authenticate the methods of the replicated SR, the researcher reached out to the 
primary investigator of the parent meta-analysis for additional support and clarification. 
Data Collection Process 
With SRs, there is variation in the search and data collection procedures that 
result in inconsistencies across SR studies which can cause a decrease quality and 
confidence in the results (Moher et al., 2007). To maintain the study’s integrity, the 
proposed meta-analysis follows the standards and procedures outline by the Preferred 
Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Moher et al., 
2009). The PRISMA guidelines require a flowchart to report the number of   included and 
excluded at each phase of the study (See Figure 1.1).  
The study also replicates Morin’s (2017) SR methods with minor adjustments 
made to the article collection part of the study. These adjustments were made due to the 
24 
  
limited access to resources that the parent study author used. The direct replication of the 
SR portion of the meta-analysis required that multiple types of searches be conducted to 
collect relevant articles using VA as a treatment. The researcher conducted two types of 
searches: (a) a primary search of research databases using predetermined search terms 
and (b) a forward and backward search that included ancestral, citation, and first author 
searches to gather any additional documents that were not identified by the research 
databases in the primary search.  
Eligibility Criteria 
 Once the documents were gathered, they needed to be reviewed to determine if 
they met the eligibility criteria to be included in the study. In this section, eligibility 
criteria used to identify articles included in the SR are discussed.  
Inclusionary/Exclusionary Criteria 
When conducting an SR, the Cochrane Collaboration, an organization that 
establishes protocols for conducting SRs and publishes SRs, recommends using the 
Participants, Intervention, Comparison, Outcomes, and Study design (PICOS) acronym 
when determining the inclusionary criteria (Methley et al., 2014). The inclusionary 
criteria used in this study were predefined by the parent study, but also met the PICOS 
guidelines for selection criteria. To be included in the study, the document needed to (a) 
use single-case research methodology, (b) have a minimum of one educator (a teacher or 
preservice teacher) as a participant, (c) take place in an early intervention to grade 12 
setting, (d) require the analysis of the preservice teacher’s or teacher’s video, (e) use an 
evaluation or feedback component, (f) have comparative data (e.g., pre/posttest, 
baseline/intervention phases, graphs with data points, etc.), (g) measure a dependent 
25 
  
variable related to teacher outcomes, (h) have behavioral or observable teacher outcomes, 
(i) be conducted in the United States and in English, and (j) be   as a peer-reviewed 
journal article or dissertation. If some of the criteria were not present in the title and/or 
abstract, the researcher and coders did not code the criteria to help adhere to objective 
coding process. For example, if the title and abstract did not specifically state that the 
study included SCRD methodology, but met the other criteria (e.g., used video analysis, 
included an educator, had observable teacher outcomes, etc.), the coder was instructed 
not to code the methodology and proceed to move the article to full-text review to 
determine the type of methodology used.  
Documents were excluded if the study (a) used qualitative or quantitative 
methods, (b) did not have a minimum of one teacher or preservice teacher as a 
participant, (c) had professionals working in non-school based facilities (e.g., home, 
clinical setting, direct care facilities), (d) included videos of other professionals (e.g., 
exemplar videos of other people), (e) lacked an evaluation or feedback component, (f) 
had no comparative data, (g) had no dependent variable related to teacher outcomes, (h) 
included unobservable or non-behavioral outcomes (e.g., surveys, reflections, ability to 
reflect, content knowledge tests, etc.), (i) were conducted in another language or not in 
the United States, and (j) were review and/or discussion articles.  
Coding Variables 
 In alignment with the parent study, the selected documents were coded for the 
following study characteristics: (a) type of study design (e.g., multiple probe, multiple 
baseline, reversal, AB designs, etc.), (b) publication type (i.e., dissertation or peer-
reviewed article), and (c) design quality (i.e., meets WWC design quality standards, 
26 
  
meets WWC design quality standards with reservations, or does not meet WWC design 
quality standards).  
The documents were examined for the following participant variables (a) role 
(i.e., inservice; preservice; paraprofessional; other; not reported), (b) education level (i.e., 
high school/general education development diploma; some college, associate’s degree, or 
specialized training; complete Bachelor’s degree; Master’s degree; not reported), (c) 
experience level (i.e., 0 years; 1-2 years; 3+ years; not reported), and (d) age (i.e., 18-29 
years; 30-39 years; 40-49 years; 50 years and over; not reported). The researcher coded 
the documents for the following student and setting characteristics: (a) group size (i.e., 
one-to-one; small group; large group; other; not reported), (b) type of instruction (i.e., 
academic; communication or language; life skills; other; not reported), (c) grade level 
(i.e., preschool, elementary; middle school; high school; not reported), (d) setting (i.e., 
self-contained; inclusion; resource classroom; general education; other; not reported), and 
(e) disability (i.e., developmental disability; physical disability; mental disability; 
emotional or behavioral disorders; learning disabilities; cognitive disabilities; other; not 
reported). Table 1.1 operationally defines each of these variables. All variables were 
coded across the included studies with the option of “not reported” for each variable. 
Primary Search 
The primary search was conducted on May 4-5th, 2020 to identify peer-reviewed 
articles and dissertations completed between 2010 and 2020. The following research 
databases were systematically searched: (a) ERIC (n = 4,106), (b) APA PsycNET (n = 
1,968), (c) Teacher Reference Center (n = 817), and (d) Academic Search Premier (n = 
2,954). The parent meta-analysis also included Education Source and Education Full 
27 
  
Text; however, due to the researcher not having access to these databases, they were not 
included in this study. Additionally, the parent study used PsycInfo, PsycArticles, 
Academic Search Complete, which were substituted for similar databases (i.e., APA 
PsycNET and Academic Search Premier) that the researcher had access to.  
Search terms were inputted into these databases to find articles that would be 
relevant to the study. For the primary search, the parent study used a combination of 
terms from three word sets: (a) educator terms (i.e., teacher*, "teach* assistant*”, 
paraprofessional*, and "instructional assistant*"), (b) video*, and (c) components of VA 
terms (i.e., analy*, evaluat*, reflect*, and feedback*). In the parent study, a combination 
of the terms from each set were searched systematically within the database using one 
search bar. For example, in ERIC, the first search included the terms teacher* AND 
video* AND analy* in one search bar. The next search in ERIC included teacher* AND 
video* AND evaluat* in one search bar, and so forth.   
Under the guidance of the University of Oregon education librarian, the present 
study modified the search terms and procedures of the parent study to decrease the 
amount of irrelevant hits and minimize duplicate articles. For the present study, a 
Boolean search containing all of the terms from the three sets in individual search bars. 
The terms for different types of educators (i.e., teacher OR paraeducator OR “teacher 
assistant” OR “instructional assistant”) were searched in the first search bar. The term 
“video” was searched in the second search bar. Then, the type of analysis (i.e., analy* OR 
evaluat* OR reflect* OR feedback) was searched in the third search bar. After the 
documents were collected from the educational databases, the identified documents were 
transferred to Zotero, a reference management software, where duplicate files were 
28 
  
removed (n = 866). Then, these documents were uploaded to Covidence (covidence.org) 
an online systematic review management system, where articles were compared across 
databases and additional duplicate articles were excluded (n = 2,339). Within Covidence, 
the researcher organized and managed the coding procedures between the coding team. A 
final total of 6,640 articles were identified for the title and abstract review (See Figure 
1.1). 
Ancestral, Citation, and First Author Search  
An SR is a collection of relevant research that is synthesized and analyzed 
(Cooper et al., 2019; Levy & Ellis, 2006), and therefore, must include a population of 
research studies that meet the inclusionary criteria. To do this, the search needs to extend 
beyond the parameters of a reference database search to guarantee that the SR includes all 
of the relevant articles possible.  Therefore, a backward and a forward search can be 
utilized for retrieving articles outside of the reference databases. A backward search 
involves reviewing the published articles that precede the original article. This type of 
search includes a backward author search, a backward reference search, and previously 
used keywords search (Levy & Ellis, 2006). A forward search looks for publications 
proceeding from the original article. This includes a forward reference search in which 
articles citing the original article are identified or forward author search in which there is 
a search for articles published by the same author of the original article. Conducting these 
types of searches expands the search process by identifying articles outside of the 
reference databases and other electronic sources (Levy & Ellis, 2006). 
 
 
 
29 
  
Table 1.1. Operational Definitions of the Coding Variables 
 
Variable Operational Definition 
Role 
Inservice An inservice teacher is a lead classroom teacher or the primary 
teacher individual responsible for delivering instruction. An inservice 
teacher may also be referred to as the lead teacher, special 
education teacher, general education teacher, credentialed 
teacher, teacher in-charge, etc. 
Paraprofessional A paraprofessional provides support to students and is 
supervised by a credentialed or lead teacher. A paraprofessional 
may also be referred to an aide, educational assistant, 
instructional aide, 1:1 aide, etc. 
Preservice A preservice teacher is an individual currently enrolled in a 
teacher teacher preparation program. A preservice teacher may also be 
referred to as a teacher candidate. 
Group Size 
One-to-one One-to-one group size is the ratio of one student to one educator 
(i.e., inservice teacher, paraprofessional, or preservice teacher).  
Small group A small group is a subset of students from a larger group who 
receive instruction. A small group could include centers, reading 
groups, etc. 
Large group A large group is all students in a classroom who receive 
instruction at the same time. A large group could include whole 
group reading instruction, morning circle time, etc. 
Type of Instruction  
Academics Academic skills are tools students need to complete intellectual 
tasks. Academic skills focus on math, reading, language arts, 
science, writing, etc. Within each of these categories, there is a 
subset of skills. For example, reading could include phonics, 
fluency, reading comprehension, etc. 
Communication Communication skills are tools students need to be able to relay 
information. Communication skills may include asking 
questions, making requests, using AAC, responding to questions, 
etc. 
Life skills Life skills are tools students need to accomplish tasks in their 
daily lives. Life skills include toileting, cooking, grocery 
shopping, dressing, eating, hygiene, etc.  
 
 
 
30 
  
Table 1.1. (continued). 
 
Variable Operational Definition 
Grade Level 
Preschool Preschool includes students who are younger than 6 years of age 
OR are in a grades K and below. 
Elementary Elementary school includes students who are less than 12 years 
school of age OR are in grades 1-5.  
Middle school Middle school includes students who are less than 14 years of 
age OR are in grades 6-8.  
High school High school includes students who are 14 years of age OR in grades 9-12. 
Setting 
General General education is the typical classroom. General education is 
education determined if none of the students in the class had a disability or 
if there is no mention of students with a disability. 
Self-contained A self-contained classroom is where students with a disability 
spend all or a majority of their school time. A self-contained 
classroom includes a special education classroom, separate 
school, or specialized school for students with disabilities.  
Resource A resource classroom is where students with disabilities spend 
some of their time in a separate classroom receiving instruction. 
Students in this setting also spend time in a general education 
classroom setting. 
Inclusion An inclusion setting is classroom with students with and without 
disabilities receiving instruction. 
Student Disability  
Developmental A development disability is a disability that is present before 
disability  adulthood. Developmental disabilities include autism spectrum 
disorder, intellectual disability, Down syndrome, or other 
developmental disorder. 
Physical A physical disability is a condition that impairs mobility. A 
disability physical disability may include cerebral palsy. 
Mental A mental disability is a condition that affects emotions, thinking, 
disability and/or behavior. A mental disability may include anxiety 
disorder, conduct disorder, bipolar disorder, depression, 
schizophrenia, and attention deficit hyperactivity disorder 
 
  
31 
  
Table 1.1. (continued). 
 
Variable Operational Definition 
Student Disability 
Emotional or An emotional or behavioral disability interferes with a person’s 
behavioral ability to sustain relationships and results in frequent use of 
disability inappropriate behavior. An emotional or behavioral disability 
may include oppositional defiant disorder. 
Learning A learning disability is a condition that impairs a student from 
disability acquiring a skill or knowledge as similar same-aged peers. A 
learning disability may include dyslexia or a specific learning 
disability. 
Cognitive A cognitive disability impairs mental functioning. A cognitive 
disability disability may include a brain injury or cognitive impairment. 
Other Other disabilities may include multiple disabilities or other 
disabilities health impairments.  
Disability not Disabilities not reported may include a developmental delay 
reported (e.g., fine motor, literacy, language, cognitive, etc.), general 
challenging behavior, or no disability identified. 
 
Note. Education level, experience level, and age are concrete descriptions and, therefore, 
are not include in the table. 
 
Following the primary search of the databases, the researcher conducted a 
backward search which included an ancestral, citation, and first author search to identify 
any additional documents that may have been omitted. An ancestral search examines the 
reference lists of the included articles to locate potential articles that may meet the 
eligibility criteria (Levy & Ellis, 2006). For the present study, the researcher examined 
the reference list of the included articles (n = 1,494). Articles that did not meet the 2010-
2020 year and publication type inclusionary criteria were immediately excluded (n = 
1,313). After duplicates were removed (n = 45), 136 articles were included for review 
(See Figure 1.1).  
A citation search identifies sources that referenced the original article (Cooper et 
al., 2019; Levy & Ellis, 2006). Google Scholar was used to find articles that cited the 
32 
  
original included article by using the “cited by” feature (n = 544). Articles that did not 
meet the 2010-2020 year and publication type inclusionary criteria were immediately 
excluded (n = 44). After duplicates were removed (n = 120), 380 articles were included 
for review (See Figure 1.1). 
Finally, the researcher completed a first author search (Levy & Ellis, 2006). In the 
parent study, Morin (2017) used Scopus, an online abstract and citation database, to 
identify additional articles written by the first author, but due to the researcher’s inability 
to access this program, these searches were conducted within a similar program called 
Web of Science, a subscription-based citation database. The researcher conducted a first 
author search by inputting the author’s first and last name into the Web of Science search 
bar. When multiple authors with the same first and last name were identified in the 
search, the researcher used the university affiliation to ensure the correct author was 
chosen. Then, Web of Science identified articles associated with the author. Through this 
process, 164 articles additional articles were collected and identified. Articles that did not 
meet the 2010-2020 year and publication type inclusionary criteria were immediately 
excluded (n = 40). After duplicates were removed (n = 18), 106 articles were included for 
review (See Figure 1.1). 
Across the forward and backward search, any article not within the inclusionary 
years of 2010-2020 was immediately excluded, as well as any non-peer reviewed or 
discussion articles (e.g., books, book chapters, review, etc.). After the documents were 
collected from the research databases, the same procedures used in the primary search 
were followed. The identified documents were stored in Zotero and then uploaded to 
Covidence where duplicates were removed (n = 183). The researcher and coders then 
33 
  
coded the articles (n = 622) during the title and abstract phase and the full-text review 
phase (See Figure 1.1). 
 
 
Figure 1.1. The PRISMA Flowchart for search results from the SR of studies using VA.  
 
Adapted from Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group (2009). 
Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA 
Statement. PLoS Med 6(7): e1000097. doi:10.1371/journal.  
34 
  
Title and Abstract Review 
In the title and abstract review phase, articles were excluded if they met the 
exclusionary criteria of: (a) using qualitative or quantitative methods, (b) not having a 
minimum of one teacher or preservice teacher as a participant, (c) having professionals 
working in non-school based facilities (e.g., home, clinical setting, direct care facilities), 
(d) including videos of other professionals (e.g., exemplar videos of other people), (e) 
lacking an evaluation or feedback component, (f) having no comparative data, (g) having 
no dependent variable related to teacher outcomes, (h) including unobservable or non-
behavioral outcomes (e.g., surveys, reflections, ability to reflect, content knowledge tests, 
etc.), (i) being conducted in another language or not in the United States, and (j) being a 
review and/or discussion article. 
The researcher reviewed and coded the titles and abstracts of all documents 
gathered from the primary search (n = 6,642) and the ancestral, citation, and first author 
search (n = 622) to determine if it met the inclusionary criteria. Four additional coders 
coded 20% of the documents. (Coder training and reliability are discussed later). 
Full-Text Review 
 After the documents were identified as not meeting any of the exclusionary 
criteria in the title and abstract phase, the documents advanced to the full-text review 
where the articles were evaluated to ensure that theythey met all of the inclusionary 
criteria. Each article was examined for the following inclusionary criteria: (a) use of 
single-case research methodology, (b) a minimum of one participant (i.e., a teacher, 
paraprofessional, or preservice teacher), (c) a teacher in an early intervention to grade 12 
setting, (d) the analysis of the preservice teacher’s or teacher’s video, (e) an evaluation or 
35 
  
feedback component, (f) comparative data (e.g., pre/posttest, treatment/control groups, 
graphs with data points, etc.), (g) measurement of a dependent variable related to teacher 
outcomes, (h) behavioral or observable teacher outcomes, (i) conducted in the United 
States and in English, and (j) published as a peer-reviewed journal article or dissertation. 
Studies that met these criteria were then coded for study characteristics. The 
researcher and coders used a Qualtrics Survey form to determine participant variables: (a) 
role (i.e., inservice; preservice; paraprofessional; other; not reported), (b) education level 
(i.e., high school/general education development diploma; some college, associate’s 
degree, or specialized training; complete Bachelor’s degree; Master’s degree; not 
reported), (c) experience level (i.e., 0 years; 1-2 years; 3+ years; not reported), and (d) 
age (i.e., 18-29 years; 30-39 years; 40-49 years; 50 years and over; not reported). The 
researcher and coding team also coded the documents for student and setting 
characteristics: (a) group size (i.e., one-to-one; small group; large group; other; not 
reported), (b) type of instruction (i.e., academic; communication or language; life skills; 
other; not reported), (c) grade level (i.e., preschool; elementary; middle school; high 
school; not reported), (d) setting (i.e., self-contained, inclusion, resource classroom, 
general education, other, not reported), and (e) disability (i.e., developmental disability; 
physical disability; mental disability; emotional or behavioral disability; learning 
disability; cognitive disability; other; not reported). 
From the primary search’s title and abstract review, 161 articles were included in 
the full-text review. Through the review process, 144 articles were excluded resulting in 
17 articles included in the SR. From the ancestral, citation, and first author search title 
and abstract review, 52 articles were included in the full-text review. Through the review 
36 
  
process, 45 articles were excluded resulting in seven articles included in the SR. After 
completing the review process for both the primary search and the ancestral, citation, and 
first author search, a total of 24 articles were included in the descriptive analysis of the 
SR. Due to statistical limitations, not all studies were included in the meta-analytic 
portion of the study (n = 8) and the reasons for this are discussed in greater detail later in 
this chapter. 
What Works Clearinghouse (WWC) Pilot Single-Case Design Standards Review 
After narrowing the documents to only those that met the inclusionary criteria, 
each individual study's methods were examined for their adherence to the What Works 
Clearinghouse Standards Handbook Version 4.0 (What Works Clearinghouse, 2020, 
WWC). This information was used to answer research question (RQ) two, which 
investigated the design of the study. Studies that did not meet the design quality standards 
were included in the descriptive analysis, but were excluded from the statistical analysis. 
Additionally, studies were examined using the WWC pilot single-case design 
standards, which include the following: (a) manipulation of the variable (Standard 1), (b) 
inter-assessor agreement (IAA; Standard 2), (c) demonstration of effect (Standard 3), (d) 
number of data points per phase (Standard 4), and (e) multiple-probe design only 
standards (Standard 5), along with an Overall Design Rating (See Table 1.2). Design 
Standard 1: Manipulation of the variable is coded as either reporting the manipulation of 
the independent variable (1) or not reporting the manipulation of the independent variable 
(0). Design Standard 2: Inter-assessor agreement (IAA) consists of three sub standards: 
IAA reporting (Standard 2A); IAA Frequency (Standard 2B); and IAA Quality (Standard 
2C). The IAA is either reported (1) or not reported (2). The IAA frequency is coded as 
37 
  
reporting IAA for a minimum of 20% the sessions within each condition (2), reporting 
IAA for a minimum of 20% the sessions without disaggregating by treatment or phase 
(1), or not reporting IAA for a minimum of 20% of the sessions (0). The quality of IAA is 
coded as the study meeting the minimum of 80% for percent agreement or 60% for 
Kappa (1) or the study not meeting the minimum of 80% for percent agreement or 60% 
for Kappa (0). Design Standard 3: Demonstration of effect is determined by 
demonstration of the intervention effect by having three attempts over three points of 
time (1) or not demonstrating intervention effect by having three attempts over three 
points of time (0). For alternating treatment designs, the study needs to demonstrate the 
intervention effect by having three attempts over three points of time with a minimum of 
two conditions (1) or not demonstrating an intervention effect by having three attempts 
over three points of time and not including a minimum of two treatment conditions (0). 
Design Standard 4: Number of data points per phase is determined as consisting of a 
minimum five data points in the baseline and treatment phases (2), a minimum three data 
points in the baseline and treatment phases (1), and fewer than three data points in the 
baseline and treatment phases (0). For alternating treatment designs the number of data 
points per phase is determined by a minimum five data points in the baseline and 
treatment phases (2), a minimum four data points in the baseline and treatment phases 
(1), and fewer than four data points in the baseline and treatment phases (0). Design 
Standard 5: Multiple-probe designs consists of three sub standards: Initial baseline 
(Standard 5A); probe points before the intervention (Standard 5B); and considerations for 
additional probe points (Standard 5C). The initial baseline is coded as a minimum of 
three consecutive data points within the first three sessions of baseline for each level (2), 
38 
  
minimum of one data point within the first session of baseline for each level (1), and does 
not include a minimum of one data point within the first session of baseline for each level 
(0). Probe points before intervention is coded as a minimum of three consecutive data 
points within the first three sessions before introducing the intervention for each level (2), 
minimum of one data point within the first session before introducing the intervention for 
each level (1), and does not include a minimum of one data point within the first session 
before introducing the intervention for each level (0). Consideration for other probe 
points is coded as each unit of analysis (e.g., participant, behavior, etc.) that was still in 
baseline when intervention is introduced for the previous unit of analysis (e.g., 
participant, behavior, etc.), had a data point when the previous unit(s) first received the 
intervention or when the previous unit(s) reached the prespecified intervention criterion 
(i.e., 3 out of 5 correct before entering intervention), AND this data point is consistent in 
level and trend with the previous baseline data points in that unit (1); or, each unit of 
analysis (e.g., participant, behavior, etc.) that was still in baseline when intervention is 
introduced for the previous unit of analysis (e.g., participant, behavior, etc.), did not have 
a data point when the previous unit(s) first received the intervention or when the previous 
unit(s) reached the prespecified intervention criterion (i.e., 3 out of 5 correct before 
entering intervention), AND this data point is not consistent in level and trend with the 
previous baseline data points in that unit (0). Finally, the Overall Design Rating is 
reported as obtaining the highest score possible across all standards (2), receiving a score 
of 1 on Standards 2 or 4 and not receiving a 0 on any of the Design Standards (1), or 
receiving a score of 0 on one or more of the Design Standards. These standards are 
defined in Table 1.2. 
39 
  
Data Analysis 
 Data analysis consists of inter-rater reliability (IRR), synthesis of the descriptive 
data, and analysis of the quantitative data. Each of these components are discussed in the 
following section. 
Inter-rater Reliability (IRR) 
IRR was calculated for both the identification of documents as well as coding of 
included studies. To obtain reliability in the identification phase, the researcher was the 
primary coder across all phases and four additional coders (three doctoral students and 
one undergraduate) were used to double code 20% of the documents. Each coder attended 
a training session where they learned about the eligibility criteria and the coding variables 
for included documents. During the training session, coders coded a practice article as a 
group by identifying the inclusionary criteria within the article. Next, the coders coded a 
second article independently and then the group discussed the discrepancies and resolved 
any issues. Once a .81 percent agreement (Landis & Koch, 1977) was obtained on two 
consecutive articles, over 20% (n = 1,337) of the primary search articles were double 
coded. If the two coders disagreed, a third coder reviewed the article to determine if the 
article would be included in the study. A reliability of .94 percent agreement was 
achieved across all coders (Table 1.3). 
For the title and abstract review of the ancestral, citation, and first author search, a 
doctoral student who participated in the coding of the primary search articles double 
coded 20% (n = 124) of the articles. Similar to the primary search, if the two coders 
disagreed, the disagreement was discussed until a consensus was made. A reliability of 
.96 percent agreement was achieved across coders (See Table 1.3). 
40 
  
Similar IRR training procedures were applied for the full-text review and scoring 
of WWC single-case quality design standards. For the full-text review of the primary 
search, the researcher and a doctoral student served as the primary coders (n = 81 and 80) 
and another doctoral student coded 20% of the articles (n = 32). A reliability of .96 
percent agreement was achieved across coders (See Table 1.3). 
For the full-text review of the ancestral, citation, and first author search, the 
researcher and a doctoral student served as the primary coders (n = 26) while another 
doctoral student coded 20% of the articles (n = 11). All documents were identified and 
coded using an online Qualtrics form (Appendix A). A reliability of .99 percent 
agreement was achieved across coders (See Table 1.3). 
For the WWC single-case quality design standards coding, the researcher served 
as the primary coder (n = 24) and a doctoral student coded 20% of the articles (n = 5). A 
Qualtrics form was used when determining the studies’ design quality (Appendix B). A 
reliability of .94 percent agreement was achieved across coders (See Table 1.3). 
For all phases, percent agreement was used to determine IRR. Percent agreement 
is calculated by taking total agreements and dividing by agreements plus disagreements 
multiplied by 100 (Cooper et al., 2007; Watkins & Pacheco, 2000). Each phase had a 
minimum of 20% of the articles double-coded. Inter-rater reliability was calculated across 
all phases with the average percent agreement for the coders ranging from 94% to 99%. 
The average percent agreement across all phases was 96%. Table 1.3 displays the phase, 
number of coders, total articles coded, number of articles coded, and average percent 
agreement for each phase of the study.  
  
41 
  
Table 1.2. What Works Clearinghouse Pilot Single-Case Design Standards Coding 
Variables. Shown are the scores and criteria definitions for each WWC design standard. 
 
Score Criteria 
Design Standard 1: Manipulation of the Independent Variable 
1 Reports the manipulation of the independent variable 
0 Does not report the manipulation of the independent variable 
Design Standard 2: Reporting Inter-Assessor Agreement (IAA) 
Reporting IAA (Standard 2A) 
1 Reports IAA 
0 Does not report IAA 
IAA Frequency (Standard 2B) 
2 A minimum of 20% the sessions within each condition 
1 A minimum of 20% the sessions without disaggregating by 
treatment or phase 
0 No reporting of IAA for a minimum of 20% the sessions 
IAA Quality (Standard 2C) 
1 Meets the minimum agreement of 80% for percent agreement or 
60% for Kappa 
0 Does not meet the minimum agreement of 80% for percent 
agreement or 60% for Kappa 
Design Standard 3: Demonstration of Treatment Effects 
1 Intervention effect shown by having three attempts over three 
points of time.  
OR 
Intervention effect shown by having three attempts over three 
points of time with a minimum of two conditions (alternating 
treatment design) 
0 Intervention effect not shown in three attempts over three points 
of time 
OR 
Intervention effect not shown in three attempts over three points 
of time with a minimum of two conditions (alternating treatment 
design) 
 
  
42 
  
Table 1.2. (continued) 
 
Score Criteria 
Design Standard 4: Number of Data Points Per Phase 
2 A minimum of five data points in the baseline and treatment 
phases 
OR  
A minimum five data points in the baseline and treatment phases 
(alternating treatment design) 
1 A minimum of three data points in the baseline and treatment 
phases 
OR 
A minimum four data points in the baseline and treatment 
phases (alternating treatment design) 
0 Less than three data points in the baseline and treatment phases 
OR 
Less than four data points in the baseline and treatment phases 
(alternating treatment design) 
Design Standard 5: Multiple Probe Designs 
Initial baseline (Standard 5A) 
2 A minimum of three consecutive data points within the first 
three sessions of baseline for each level 
1 A minimum of one data point within the first session of baseline 
for each level 
0 Does not include a minimum of one data point within the first 
session of baseline for each level 
Probe Points Before the Intervention (Standard 5B) 
2 A minimum of three consecutive data points within the first 
three sessions before introducing the intervention for each level. 
1 A minimum of one data point within the first sessions before 
introducing the intervention for each level. 
0 Does not include a minimum of one data point within the first 
sessions before introducing the intervention for each level 
 
  
43 
  
Table 1.2. (continued). 
 
Score Criteria 
Consideration of Additional Probe Points (Standard 5C) 
1 Each unit of analysis (e.g., participant, behavior, etc.) that was 
still in baseline when intervention is introduced for the previous 
unit of analysis (e.g., participant, behavior, etc.) had a data point 
when the previous unit(s) first received the intervention or when 
the previous unit(s) reached the prespecified intervention 
criterion (i.e., 3 out of 5 correct before entering intervention) 
AND this data point is consistent in level and trend with the 
previous baseline data points in that unit. 
0 Each unit of analysis (e.g., participant, behavior, etc.) that was 
still in baseline when intervention is introduced for the previous 
unit of analysis (e.g., participant, behavior, etc.) did not have a 
data point when the previous unit(s) first received the 
intervention or when the previous unit(s) reached the 
prespecified intervention criterion (i.e., 3 out of 5 correct before 
entering intervention) AND this data point is not consistent in 
level and trend with the previous baseline data points in that 
unit. 
Overall Design Quality 
2 The highest score possible across all standards 
1 A score of 1 on Standards 2 or 4 and not receiving a 0 on any of 
the Design Standards 
0 Receiving a score of 0 on one or more of the Design Standards 
 
  
44 
  
Table 1.3. Inter-rater reliability across phases 
 
Number of 
Phase Number of Number of articles 
Average 
coders articles double- percent 
coded agreement 
Primary search     
4 double-
Title/abstract coders n = 6,640 n = 1,337 94% 
3; 2 primary 
coder and 1 
Full-text review double n = 161 n = 32 96% 
coder 
Ancestral, citation, and 
first-author search     
1 double 
Title/abstract coder n = 622 n = 124 96% 
3; 2 primary 
coder and 1 
Full-text review double n = 52 n = 11 99% 
coder 
WWC quality design 1 double-
standards coder n = 24 n = 5 94% 
 
Synthesis of the Data  
Data was extracted using the GetData Graph Digitizer (http://getdata-graph-
digitizer.com), a free, online software that helps retrieve the coordinate points from 
digital graphs to obtain an estimate of the data. The software requires the researcher to 
input a JPEG photo of the graphs from the included single-case studies. This obtains an 
estimate of the data points for the baseline and intervention phases. Generalization and 
maintenance phases were excluded from the data as these phases do not demonstrate an 
immediate effect of the intervention, which is the focus of the current study. 
Then, for the purpose of this meta-analysis, further analysis of the data was 
conducted to calculate an effect size for each study’s dependent variable(s). The BC-
45 
  
SMD was calculated to determine the average effect across multiple participants. A 
standardized mean difference is the “effect size obtained by subtracting the mean 
outcome of the comparison group from the mean outcome of the treatment group and 
dividing that difference by an estimate of its standard deviation” (Shadish et al., 2015 p. 
101). In the case of SCRD, the comparison is made between the mean intervention and 
the mean baseline (Shadish et al., 2008). Participant data for each of the included study’s 
DV was input into to Pustejovsky’s (2020) single-case design hierarchical linear model 
(scdhlm) calculator, which is a free, online R-program web application. This program 
allows the synthesis of single-case studies by providing a parametric average effect size 
of data from different cases by calculating the BC-SMD (Shadish et al., 2015). 
BC-SMD was chosen to calculate the average effect size because of its ability to 
account for trend and dependency within an SCRD (Shadish et al., 2015). In the case of 
meta-analyses, the BC-SMD allows for the statistical analysis of the average effect size 
of multiple participants within a study. Unlike non-parametric measures such as Tau-U 
that calculate overlapping data at the individual participant level, BC-SMD allows for 
average effect size calculation at the study level while still accounting for variability 
between the cases (Pustejovsky, 2018; Shadish et al., 2015). This allows for an individual 
study’s results to be compared with a larger body of literature. Having comparable results 
makes BC-SMD ideal for meta-analyses because multiple studies with different variables 
can be analyzed and compared (Pustejovsky, 2018; Shadish et al., 2015).  
Additionally, BC-SMD calculates a d statistic not a p-value. A d statistic accounts 
for the variability (e.g., sample size, study design, length of phases, outcome measure 
scales, etc.) that may impact the magnitude of the effect size (Pustejovsky, 2018). For 
46 
  
instance, within single-case studies, the type of SCRD can vary along with the length of 
the baseline phase and how the DV is being measured. This impacts the effect size of 
studies using the same treatment. One fault of the d statistic is that it overestimates the 
effect size when there is a small sample size, which is typically the case for SCRD. 
Pustejovsky’s (2020) scdhlm calculator automatically corrects for this (Shadish et al., 
2015) by reporting a Hedges g (Hedges, 1981), which does a small sampling bias 
adjustment and allows for a valid comparison between SCRD studies. Therefore, for the 
purpose of this research study, a BC-SMD was used to calculate the Hedges’ g average 
effect size of the studies included in the meta-analysis.  
According to Valentine et al. (2016), the BC-SMD does have limitations. First, a 
functional relation needs to be confirmed by conducting a visual analysis. Then, only 
multiple-baseline, multiple-probe, and reversal designs can be calculated. Finally, all 
studies must have a minimum of three participants (Valentine et al., 2016). Therefore, 
once the SR identifies all included studies, only the studies that meet the BC-SMD 
requirements are analyzed during the meta-analytic component of the study. 
Finally, a randomized effect size accounts for the variability within a study (e.g., 
sampling error, intervention characteristics, etc.; Borenstein et al., 2009). The participants 
in the studies are not representative of the population and there is variability in the 
interventions using VA. To correct for this variability, a randomized effect size was 
calculated.  
Additionally, across the single-case studies, there are various measures for 
dependent variables using different scales. For example, teacher quality for studies using 
VA as a treatment used many outcome measures that encompass opportunities to respond 
47 
  
(OTRs; Smith, 2015; Westover, 2010), FOI (Capizzi et al., 2010; Fedders, 2011; Murphy 
et al., 2015), instructional quality (Coogle, 2019; Knight et al., 2018) and/or praise 
(Capizzi et al., 2010; Pinter et al., 2015; Smith, 2015; Starling, 2015; Thompson et al., 
2012; Westover, 2010) with some studies measuring multiple outcomes while others just 
measuring one.  
Also, the same teacher outcome (e.g., praise, FOI, opportunities to respond, 
negative response, etc.) were often measured differently across studies. For example, 
praise was measured as rate of behavior specific praise per minute (Capizzi et al., 2010), 
frequency of praise per 15 minutes (Pinter et al., 2015), and percent interval of specific 
praise (Smith, 2015). To account for this, the BC-SMD randomized estimated effect size 
is interpreted as a small effect (0.2-0.49), medium effect (0.5-0.79), and large effect 
(equal to or greater than 0.8; Cohen, 1988) using the absolute value of the effect sizes. 
Negative effect sizes demonstrate that the target behavior decreased after the introduction 
of the intervention. For example, Hawkins (2011) conducted a study measuring both 
behavior-specific praise statements (BSPS) and non-behavior-specific praise statements 
(NBSPS). For the NBSPS, the implementation of VA as an intervention decreased the 
behavior from the baseline to intervention phase demonstrating a negative effect. The 
calculation of these effect sizes identifies if there is a functional relation between the 
intervention and dependent variable. For studies measuring more than one DV, there are 
multiple effect sizes per study. Therefore, the meta-analysis in the current study includes 
the statistical analysis of BC-SMD for effect size of individual DVs within the included 
studies.  
48 
  
CHAPTER III  
FINDINGS 
This study utilized a SR and a meta-analysis to identify the effectiveness of VA 
within the literature base. The purpose of this review was to gain insights to the different 
characteristics of studies and to uncover the effectiveness of VA as treatment for 
educators. This chapter reports the results of the SR and meta-analysis on VA. The 
specific procedures for searching, identifying, and coding articles, including IRR, are 
reported in Chapter II. After the articles were identified, the graphical data were extracted 
from the studies and analyzed using meta-analytic methods, described in the previous 
chapter. The results of the descriptive analysis (i.e., Research Question 1 and Research 
Question 2), statistical analysis (i.e., Research Question 3), and relation to the parent 
study (i.e., Research Question 4) are discussed below.  
Descriptive Analysis of Studies Using VA 
 The SR of the literature resulted in 24 articles that met the inclusionary criteria as 
described in Chapter III and the descriptive characteristics of those studies are reported 
related to Research Questions 1 and 2 of this study. 
Research Question 1 (RQ 1): What is the status of the literature base on VA regarding 
study characteristics (i.e., publication type), participant characteristics (i.e., role, 
education level, experience level, age), student characteristics (i.e., disability type, 
student outcomes), and setting characteristics (i.e., grade level, group size, type of 
instruction, setting)?  
RQ 1 examines the characteristics, described in Table 1.1, most apparent within 
the literature of SCRD studies using VA as a treatment. The study, educator, student, and 
49 
  
setting characteristics were coded at the study level (See Tables 1.4, 1.5, and 1.6). For 
example, Smith (2015) reported that the study took place in resource and self-contained 
classrooms within elementary (n = 3), middle (n = 2), and high schools (n = 2). 
Therefore, Smith (2015) was coded as taking place in a resource and self-contained 
classroom in elementary, middle, and high schools. These coding procedures were 
consistent across similar articles that reported aggregated descriptive information. The 
findings for (a) study characteristics, (b) educator characteristics, (c) student 
characteristics, and (d) setting characteristics are reported below. 
Analysis of Study Characteristics. The SR included both peer-reviewed articles 
and dissertations using VA as a treatment. Westover (2010) is a dissertation and 
Westover and Martin (2014) is a peer-reviewed article gathered in the collection process, 
but are identical studies using the same data and reporting outcomes; and therefore, is 
coded as both a dissertation and a peer-reviewed article. Of the included articles, 15 
articles (63%) were peer-reviewed and 10 were dissertations (42%). Table 1.4 shows the 
publication type of the included articles. The design quality of the articles is discussed in 
the following research question. 
Analysis of Educator Characteristics. Across all of the articles, the studies 
included various participant characteristics including role (i.e., inservice, 
paraprofessional, or preservice), age (i.e., 18-29 years, 30-39 years, 40-49 years, 50+ 
years, or not reported), educational level (i.e., high school/GED, some college, bachelor’s 
degree, master’s degree, or not reported), and teaching experience (i.e., 0 years, 1-2 
years, or 3+ years). The findings indicate that a majority of the studies 63% (n = 15) 
reported that the participants were inservice teachers, 50% (n = 12) studies reported that 
50 
  
the participants ranged from 18-29 years of age, and 71% (n = 17) studies included 
participants having three or more years of experience. Of the included articles, most of 
the studies included participants 54% (n = 13) that held a bachelor’s degree. Table 1.5 
displays all educator characteristics across studies and Table 1.7 displays a synthesized 
version of the study characteristics. 
Analysis of Student Characteristics. Across all of the articles, the studies 
included students with various disabilities (i.e., developmental disability, physical 
disability, mental disability, emotional or behavioral disability, learning disability, 
cognitive disability, other disability, or disability not reported) in various grade levels 
(i.e., preschool, elementary school, middle school, high school, post-secondary, or not 
reported). The findings indicate 50% (n = 12) of the studies included participants that 
worked with students classified as having a developmental disability which includes 
autism spectrum disorder, intellectual disability, Down syndrome, or other developmental 
disorders. Additionally, the studies had students in different grade levels. It was reported 
that forty-six percent (n = 11) of the studies took place at the elementary school level and 
eight (33%) were at the preschool level. Table 1.6 shows the complete list of the 
disabilities and grade level of the students in the included studies, and Table 1.7 displays 
a synthesized version of the study characteristics. 
Analysis of Setting Characteristics. Finally, each article was examined for 
setting characteristics categorizing group size (i.e., one-to-one, small group, large group, 
or not reported), type of instruction (i.e.,  academic, communication, life skills, or not 
reported), and instructional setting (i.e., general education, self-contained, resource, 
inclusion, not reported). The studies primarily took place in small groups (46%, n = 11) 
51 
  
and inclusion classrooms (50%, n = 12). In these settings, 54% (n = 13) studies focused 
on academic skill development, as defined in Chapter II. Table 1.6 shows the different 
types of setting characteristics and the total number of studies taking place in each 
setting. Table 1.7 displays a synthesized version of the study characteristics. 
Research Question 2 (RQ 2): What is the status of the literature base on the research 
design quality for the included articles as measured by the What Works Clearinghouse 
(WWC) design quality standards (i.e., meets, meets with reservations, does not meet)? 
RQ 2 examined the research design quality of the literature base of single-case 
studies using VA as a treatment. The design quality is measured by evaluating the studies 
using the criteria included in the WWC design quality standards (See Table 1.2). Studies 
that met all of the criteria and received an overall study score of two were identified as 
Meets Standards (see description in Chapter II). Studies that met a portion of the criteria 
and received an overall study score of one were identified as Meets with Reservations. 
Studies that did not meet the standards and received an overall study score of zero were 
identified as Does Not Meet. Of the included articles, 13% (n = 3) met the WWC 
standards, 58% (n = 14) met the WWC standards with reservations, and 29% (n = 7) did 
not meet the WWC standards. Table 1.8 shows each study’s adherence to the individual 
WWC single-case design quality standards along with an overall study rating.  
52 
  
Table 1.4. Study characteristics of the included articles. 
Article Publication Type Design Design Quality Number of Participants 
Alexander et al. (2012) PR AB Doesn’t meet standards (0) 2 
Bishop et al. (2015) PR MPD Meets standards with reservations (1) 3 
Capizzi et al. (2010) PR MBD Meets standards with reservations (1) 3 
Coogle et al. (2019) PR MPD Meets standards with reservations (1) 3 
D’Agostino et al. (2020) PR MPD Meets standards with reservations (1) 6 
Englund (2010) Diss. MBD Meets standards with reservations (1) 6 
Fedders (2011) Diss. MBD Meets standards with reservations (1) 3 
Hager (2012) PR MBD Doesn’t meet standards (0) 1 
Hawkins, & Heflin (2011) PR MBD Meets standards with reservations (1) 3 
Knight et al. (2018) PR MBD Meets standards (2) 8 
Leins Dvorchak (2015) Diss. MBD Meets standards with reservations (1) 5 
Lynes (2012) Diss. ABCD Doesn’t meet standards (0) 6 
MacVittie (2018) Diss. ABC Meets standards (2) 3 
McLeod et al. (2019) PR MBD Doesn’t meet standards (0) 2 
 
53 
  
Table 1.4. (continued). 
Article Publication Type Design Design Quality Number of Participants 
Morin (2017) Diss. MBD Meets standards (2) 5 
Murphy et al. (2015) PR AB Doesn’t meet standards (0) 2 
Pelletier et al. (2010) PR MBD Meets standards with reservations (1) 3 
Pinter et al. (2015) PR MBD Meets standards with reservations (1) 4 
Robinson (2011) PR MBD Doesn’t meet standards (0) 4 
Smith (2015) Diss. MBD Meets standards with reservations (1) 6 
Snyder (2013) Diss. MBD Doesn’t meet standards (0) 4 
Starling (2015) Diss. MBD Meets standards with reservations (1) 4 
Thompson et al. (2012) PR MPD Meets standards with reservations (1) 3 
Westover (2010) Diss. and PR MBD Meets standards with reservations (1) 3 
Note. Diss. = dissertation; PR = peer-reviewed; MBD = multiple-based line design; MPD = multiple probe design.  
  
54 
  
 
Table 1.5. Participant characteristics of the included articles.  
Article Participants Age Role Education Experience 
Alexander et al. 
(2012) Susan, Rachel NR Preservice NR 0 years, 3+ years 
Some college or 
Bishop et al. (2015) Natalie, 18-29 years old, Rhonda, Brenda 30-39 years old Inservice specialized training, 
1-2 years, 3+ 
Bachelor’s, Master’s years 
Capizzi et al. (2010) Amy, Sarah, 18-29 years old, Scott 30-39 years old Preservice NR 1-2 years, NR 
Coogle et al. (2019) Andreia, Hadi, 30-39 years old, 1-2 years, 3+ Abigail 50+ years old Inservice Bachelor’s degree years 
D’Agostino et al. Amy, Betty, 
18-29 years old, Some college or 
Carey, Danielle, 30-39 years old, Inservice specialized training, 1-2 years, 3+ (2020) Emily, Fae 40-49 years old, Bachelor’s degree, years 50+ years old Master’s degree 
Center 1 (PA, 18-29 years High school or GED, 
PB, PC) old,30-39 years Some college or 
Englund (2010) old, 40-49 years Inservice specialized training, 1-2 years, 3+ 
Center 2 (PD, old, 50+ years Bachelor's degree, years 
PE, PF) old Master's degree 
Fedders (2011) Teacher 1-3 18-29 years old Inservice NR 0 years, 1-2 years, 3+ years 
Hager (2012) Jennifer 18-29 years old Preservice Some college or specialized training 0 years 
 
 
55 
  
Table 1.5. (continued). 
Article Participants Age Role Education Experience 
Hawkins, & Heflin Cantelli, Thomas, 18-29 years old, 1-2 years, 3+ 
(2011) Williams 30-39 years old Inservice Master’s degree years 
Knight et al. (2018) Teachers 1-8, NR Inservice NR 1-2 years, 3+ years 
Leins Dvorchak Davis, Kate, NR Inservice Bachelor's degree, 3+ years 
(2015) Rover, Rita, Moss Master's degree 
Some college or 
Lynes (2012) Teachers 1-6 NR Inservice specialized training, 1-2 years, 3+ 
Bachelor's degree years 
3+ years, NR 
MacVittie (2018) Katie, Cassie, 30-39 years old Inservice Bachelor's degree, Mary Master's degree  
McLeod et al. 
(2019) Kelly, Mimi NR Preservice Bachelor's degree 0 years 
Stephanie, Crystal, 
Morin (2017) Mary Anne, 18-29 years old, Inservice, 30-39 years old paraprofessional Bachelor's degree 0 years, 3+ years Pamela, Angela 
High school or GED, 
Murphy et al. (2015) Hannah, Lydia 18-29 years old paraprofessional Some college or 1-2 years 
specialized training 
Pelletier et al. (2010) Layla, Bob, Sam NR Inservice NR NR 
  
56 
  
Table 1.5. (continued). 
Article Participants Age Role Education Experience 
Pinter et al. (2015) Linda, Ava, Leeza, Mick NR Inservice Master's degree 3+ years 
Anna, Deborah, 
Robinson (2011) 18-29 years old, 50+ years old Paraprofessional 
High school or GED,  1-2 years, 3+ 
Sandra, Mary Bachelor's degree years 
Beth, Julia, Kat, 
Smith (2015) Chelsey, Mary, 18-29 years old Preservice Some college or 
Katie specialized training 
0 years 
High school or GED, 
Amanda, Leah, 18-29 years 
Some college or 
Snyder (2013) specialized training, 1-2 years, 3+ Kristin, Tricia old,40-49 years Paraprofessional old Bachelor's degree years 
 
Starling (2015) Participants 1-4 NR Inservice NR NR 
Thompson et al. Anna, Jane, 40-49 years old, 
(2012) Gail 50+ years old Inservice 
Bachelor's degree, 
NR 3+ years, NR 
Westover (2010) Dyads A, B, C 40-49 years old, High school or GED, 50+ years old Paraprofessional Bachelor's degree 0 years, 3+ years 
Note. NR = not reported. 
  
57 
  
Table 1.6. Student and setting characteristics of the included articles.  
Article Disability Type Grade Level Group Size Instruction Setting 
Alexander et al. (2012) NR Elementary Small group Academic skills Resource classroom 
Bishop et al. (2015) NR Preschool One-to-one NR Inclusion 
Developmental disability, 
Capizzi et al. (2010) emotional or behavioral disability, learning Elementary  NR Academic skills Resource classroom 
disability 
Coogle et al. (2019) Developmental disability Preschool One-to-one Communication skills Inclusion 
D’Agostino et al. 
(2020) Developmental disability Preschool One-to-one 
Communication 
skills Inclusion 
Englund (2010) NR Preschool NR Communication skills Inclusion 
Fedders (2011) Developmental disability Elementary  One-to-one Academic skills Self-contained classroom 
Hager (2012) Cognitive disability Elementary Small group Academic skills NR 
Hawkins, & Heflin Mental disability, Emotional or behavioral High Small group, (2011) large group Academic skills 
Self-contained 
disorders classroom 
Knight et al. (2018) NR Middle NR NR NR 
Leins Dvorchak (2015) NR Middle Large group Academic skills Inclusion 
Lynes (2012) NR Preschool Small group Communication skills Inclusion 
 
 
58 
  
Table 1.6. (continued). 
 
Article Disability Type Grade Level Group Size Instruction Setting 
Developmental disability, 
MacVittie (2018) emotional or behavioral disability, learning Elementary Small group Academic skills Inclusion 
disability 
Developmental disability, 
McLeod et al. (2019) physical disability, emotional or behavioral Preschool Small group NR Inclusion 
disability 
Developmental disability, Elementary, One-to-one, 
Morin (2017) physical disability, mental disability, post- small group, Academic skills Inclusion 
learning disability, NR secondary large group 
Murphy et al. (2015) Developmental disability, Communication physical disability Elementary One-to-one skills Inclusion 
Pelletier et al. (2010) Emotional or behavioral disability NR One-to-one 
Communication 
skills NR 
Developmental disability, 
emotional or behavioral 
Pinter et al. (2015) disability, learning Middle,  high Small group Academic Self-contained 
disability, cognitive skills, life skills classroom 
disability, other disability 
Robinson (2011) Developmental disability Preschool One-to-one Communication skills Inclusion 
 
 
 
59 
  
 
Table 1.6. (continued). 
 
Article Disability Type Grade Level Group Size Instruction Setting 
Developmental disability, 
emotional or behavioral 
Smith (2015) disability, learning Elementary Small group, Self-contained 
disability, other large group 
Academic skills classroom 
disability, NR 
Snyder (2013) NR Preschool Small group Academic skills Inclusion 
Starling (2015) NR Elementary Small group Academic skills Self-contained classroom 
Thompson et al.  
(2012) NR Elementary Large group NR 
General education 
classroom 
Westover (2010) Developmental disability Elementary One-to-one Academic skills Self-contained classroom 
 
Note. NR = not reported. 
60 
  
Table 1.7. Educator, student, and setting characteristics across the included articles.   
Study Characteristics Total (n)  
Educator characteristics  
Role  
Inservice 15 
Paraprofessional 5 
Preservice 5 
Age  
18-29 years old 12 
30-39 years old 8 
40-49 years old 5 
50+ years old 6 
Not reported 8 
Education level  
High school/GED 7 
Some college 6 
Bachelor’s degree 13 
Master’s degree 7 
Not reported 7 
Teaching experience  
0 years 7 
1-2 years 11 
3+ years 17 
Not reported 4 
Student characteristics  
Student disability  
Developmental disability 12 
Physical disability 3 
Mental disability 3 
Emotional or behavioral disability 7 
Learning disability 5 
Cognitive disability 2 
Other disability 2 
Disability not reported 11 
Grade Level  
Preschool 8 
Elementary school 11 
Middle school 3 
61 
  
Table 1.7. (continued). 
 
Study Characteristics Total (n)  
Grade Level  
High school 2 
Post-secondary 1 
Not reported 1 
Setting characteristics  
Group Size  
One-to-one 9 
Small group 11 
Large group 5 
Not reported 3 
Type of instruction  
Academic 13 
Communication 7 
Life skills 1 
Not reported 4 
Instructional setting  
General education 1 
Self-contained 6 
Resource 2 
Inclusion 12 
Not reported 3 
Note. Participant, student, and setting characteristics are reported at the study level. 
62 
  
Table 1.8 What Works Clearinghouse design quality standards results for included articles. 
Article Standard 1 Standard 2 Standard 3 Standard 4 Standard 5 Overall 
(Probe) Design 
Quality 
Alexander et al.  
(2012) 1 1 2 1 0 1 N/A N/A N/A 0 
Bishop et al. (2015) 1 1 1 1 1 2 N/A N/A N/A 1 
Capizzi et al. (2010)  1 1 2 1 1 1 N/A N/A N/A 1 
Coogle et al. (2019)  1 1 1 1 N/A N/A 2 1 1 1 
D’Agostino et al. 
(2020) 1 1 2 1 N/A N/A 2 1 1 1 
Englund (2010) 1 1 1 1 1 1 N/A N/A N/A 1 
Fedders (2011)  1 1 1 1 1 1 N/A N/A N/A 1 
Hager (2012) 1 0 0 0 0 1 N/A N/A N/A 0 
Hawkins & Heflin 
(2011)  1 1 1 1 1 2 N/A N/A N/A 1 
Knight et al. (2018) 1 1 2 1 1 2 N/A N/A N/A 2 
Leins Dvorchak 1 1 1 1 1 2 N/A N/A N/A 1 
(2015) 
Lynes (2012) 1 1 1 1 1 0 N/A N/A N/A 0 
MacVittie (2018) 1 1 2 1 1 2 N/A N/A N/A 2 
McLeod et al.  (2019) 1 1 2 0 1 2 N/A N/A N/A 0 
Morin (2017) 1 1 2 1 1 2 N/A N/A N/A 2 
 
 
63 
  
Table 1.8 (continued). 
 
Overall 
Article Standard 1 Standard 2 Standard 3 Standard 4 Standard 5 Design (Probe) Quality 
Murphy et al. (2015) 0 0 0 0 0 1 N/A N/A N/A 0 
Pelletier et al. (2010) 1 1 1 1 1 1 N/A N/A N/A 1 
Pinter et al. (2015) 1 1 1 1 1 2 N/A N/A N/A 1 
Robinson (2011)  1 1 0 1 1 N/A 1 1 1 0 
Smith (2015) 1 1 1 1 1 1 N/A N/A N/A 1 
Snyder (2013) 1 1 2 0 1 1 N/A N/A N/A 0 
Starling (2015) 1 1 2 1 1 1 N/A N/A N/A 1 
Thompson et al.  
(2012) 1 1 1 1 1 2 N/A N/A N/A 1 
Westover (2010) 1 1 1 1 1 2 N/A N/A N/A 1 
Note. Standard 1 includes manipulation of the independent variable. Standard 2 includes reporting on inter assessor agreement 
(IAA), and frequency and quality of inter-assessor agreement. Standard 3 includes treatment effects. Standard 4 includes points 
per phase. Standard 5 (probe design only) includes initial baseline points, points before intervention, and additional probe 
points. N/A = not applicable.  
 
64 
 
 
Statistical Analysis of Studies Using VA 
For the meta-analysis portion of the study, articles were analyzed for treatment 
effectiveness by calculating the BC-SMD for participants within a study. To run the 
statistical analyses, individual participant data was extracted from the graphs within each 
study using the Getgraph data’s software. During this process, one study (Leins 
Dvorchak, 2015) was excluded from the statistical analysis portion of the meta-analysis 
because the data used a celeration graph in which the data were unable to be extracted 
using the Getgraph data’s software nor could the data be visually extracted. Additionally, 
due to the limitations of the BC-SMD calculator, only studies (a) demonstrating a 
functional relation, (b) using multiple-baseline, multiple-probe, or reversal designs, and 
(c) have a minimum of three participants were included in the analysis (Pustejovsky et 
al., 2014; Shadish et al., 2015; Valentine et al., 2016). Each included article’s methods 
were read to determine if there was a functional relation and to identify which SCRD 
design type (i.e., multiple-baseline, multiple-probe, or reversal designs) was used along 
with the number of participants. This criterion eliminated the following four studies: 
Alexander et al. (2012); Hager (2012); McLeod et al. (2019); and Murphy et al. (2015). 
Additional studies were excluded from analysis because they did not meet the WWC 
design quality standards (Lynes, 2012; Robinson, 2011; Snyder, 2013). Lynes (2012) did 
not have the minimum number of data points per phase; and Robinson (2011) and Snyder 
(2013) did not report IAA resulting in an overall study score of zero. As a result, of the 
total 24 articles identified, 16 were included in the meta-analysis.  
Research Question 3 (RQ 3): What is the magnitude of effect of VA interventions on 
the instructional practices of educators?  
65 
  
RQ 3 analyzes the magnitude of effect of the use of VA as an intervention for 
educator instructional practices. Figure 1.2 displays a forest plot for each included study. 
The forest plot shows the effect size (ES) and confidence interval for the individual DVs 
for each included study (Shadish et al., 2015). The BC-SMD ES across the studies range 
from -4.70 to 4.02.   
Effect Size by DV. The included studies measured praise (n = 9), implementation 
(n = 6), student outcomes (n = 6), negative response (n = 5), opportunities to respond 
(OTR; n = 3), instructional quality (n = 2), error correction (n = 1); redirect (n = 1); and 
instructional time (n = 1). Across the DVs, the largest ES were measuring praise (n = 6; 
Capizzi et al., 2010; Hawkins & Heflin, 2011; Morin, 2017; Smith, 2015; Starling, 2015; 
Westover, 2010),  FOI (n = 5; Bishop et al., 2015; Capizzi et al., 2010; Coogle et al., 
2019; Fedders, 2011; Pelletier et al., 2010), student outcomes (n = 3; Coogle et al., 2019; 
D’Agostino et al., 2020; Westover, 2010), instructional quality (n = 2, Englund, 2010; 
Knight et al., 2018), OTR (n = 2; D’Agostino et al., 2020, Westover, 2010), and errors (n 
= 1; Westover, 2010). Praise (n = 1; Pinter et al., 2015), student outcomes (n = 1; 
Fedders, 2011) and OTR (n = 1; Smith,) had a medium effect size (see Figure 1.3). 
Confidence intervals are reported because they are important when analyzing the ES as it 
demonstrates precision and the stability of the effect (Borenstein, 1994; Borenstein et al., 
2009). Although these ES show a wide range, with a number of them being very large, 
other studies using BC-SMD also report similar findings (Barton et al., 2017; Maggin et 
al., 2017). Figure 1.3 displays a forest plot showing the BC-SMD ES for individual 
studies based upon the DV. 
  
66 
  
 
 
Figure 1.2. A forest plot displaying the BC-SMD ES for individual studies and the DVs. 
  
67 
  
 
 
Figure 1.3. A forest plot displaying the BC-SMD ES for individual studies based upon 
the DV. 
68 
  
Relation to the Parent Study  
 The current study is a direct replication of Morin’s (2017) SR and a conceptual 
replication of her meta-analysis used to examine treatment effects of VA for studies using 
SCRD methods. The parent study used a Tau-U to calculate the omnibus ES of each 
study and moderator effects, which Shadish et al. (2008; 2015) states is not recommended 
practice given multiple DVs. With new statistical developments for analyzing data for 
meta-analyses using SCRD, the current study calculated BC-SMD ES. Since the ES are 
not comparable across meta-analyses, the comparison only examined the SR process 
involving the descriptive characteristics across both studies. 
 Additionally, with differing accessibility to the research databases and reference 
software, there were varying results in the included articles. Morin’s (2017) SR gathered 
articles from 1976-2016 with a total of 28 included articles. The current study overlaps 
and extends Morin’s (2017) study conducting a search between 2010-2020. The current 
study’s SR included a total of 24 articles; 13 articles were originally included in Morin’s 
(2017) SR (i.e., Alexander, 2012; Bishop et al., 2015; Capizzi et al., 2010; Englund, 
2010; Fedders, 2011; Hager, 2012; Hawkins & Heflin, 2011; Lynes, 2013; Pelletier et al., 
2010; Pinter et al., 2015; Robinson, 2011; Snyder, 2013; Westover, 2010) and 11 were 
newly identified in the current study. Of these 11 articles, six articles were published in 
the proceeding years of Morin’s (2017) SR (i.e., Coogle et al., 2019; D’Agostino et al., 
2020; Knight et al., 2018; MacVittie, 2018; McLeod et al., 2019; Morin, 2017) and five 
were identified within the same search years as Morin’s (2017) SR (i.e., Leins Dvorchak, 
2015; Murphy et al., 2015; Smith, 2015; Starling, 2015; Thompson et al., 2012). These 
69 
  
were identified due to differences in the access of library research databases and 
reference databases. 
Finally, the current study coded participant, student, and setting characteristics at 
the study level while Morin (2017) disaggregated the data and examined each 
characteristic at the participant level. For example, Capizzi et al. (2010) had two 
participants. Participant 1 was an undergraduate with five years of teaching experience 
completing her practicum in an elementary resource classroom teaching academics to 
students with moderate disabilities. Participant 1’s group size and specific student 
disabilities were not reported. Participant 2 was an undergraduate with no teaching 
experience completing her practicum in a middle school teaching academics in a small 
group setting. Participant 2’s classroom setting and specific student disabilities were not 
reported. The current study reports characteristics at the study level. The unreported 
information was coded as not reported (NR). The current study reports overall study data 
because the BC-SMD ES are reported at the study level and not at the participant level as 
Morin’s (2017) Tau-U ES. As a result, there is not a direct comparison between Morin’s 
(2017) findings and the current study. The following section looks at the comparison of 
the overlapping years with only the identical articles as well as the extension of the 
literature database, which included the six articles published between 2016-2020. 
Research Question 4 (RQ 4): How has the literature base on VA changed since 2016 as 
reported by Morin’s (2017) systematic review?  
RQ 4 looks at how the literature base on VA has changed since 2016 as reported 
by Morin’s (2017) SR. The following section compares the (a) study characteristics; (b) 
design quality standards, and (c) DVs measured. When reviewing these descriptive 
70 
  
results, it should be known that the current study’s findings are reported at the study-level 
and Morin’s (2017) findings are reported at the participant level. 
Comparison of Study Characteristics  
As previously mentioned in RQ 1, a majority (63%; n = 15) of the included 
studies reported that of the participants were inservice teachers; 50% (n = 12) of studies 
reported that the participants ranged from 18-29 years of age; and 67% (n = 16) of studies 
included participants having three or more years of experience. Of these studies, 13 
(54%) had participants with a bachelor’s degree. In comparison, Morin’s (2017) findings 
were based on the 105 participants within the 28 included articles. Of these participants, 
52% (n = 55) were inservice teachers; 21% (n = 22) of participants were between the age 
of 18-29 years; and 41% (n = 43) of participants with four or more years of teaching. Of 
the reported educational background, 24% (n = 25) of the participants had a bachelor’s 
degree. From 2016-2020, the majority of studies reported that the participants were 
inservice teachers (n = 5), were between 30-39 years of age (n = 4), held a bachelor’s 
degree (n = 5), and had three or more years of experience (n = 5). This indicates that 
studies continue to include participants who are inservice teachers, hold a bachelor’s 
degree, and have three or more years of experience. The only difference is the increased 
age of the participants. 
In the current study, the majority of the included studies took place in small group 
settings (46%, n = 11) and in inclusion classrooms (50%, n = 12). In these settings, 54% 
(n = 13) of the studies had participants providing academic skills development. In 
comparison, Morin’s (2017) findings indicate that 39% (n = 41) of the participants 
provided one-to-one instruction; 32% (n = 33) of the participants taught in a small group 
71 
  
setting; 39% (n = 41) of the participants focused on academic skills development. A 
majority of the instruction took place in self-contained (39%, n = 41) and inclusion (31%; 
n = 33) classrooms. An extension of the study years indicates that the most recent studies 
took place in inclusion classrooms (n = 5) in a small group (n = 3) or one-to-one setting 
(n = 3). In these settings, teachers provided academic instruction (n = 2), communication 
instruction (n = 2), or the instruction was not reported (n = 2). 
Across the current study, students had various disabilities. Twelve (50%) studies 
included educators who worked with students with a developmental disability (i.e., 
autism spectrum disorder, intellectual disability, Down syndrome, or other developmental 
disorders). Eleven (46%) studies reported that the student disability as “Not Reported”, 
which means that the study did not state the student’s disability or the student had a 
developmental delay (e.g., fine motor, literacy, language, cognitive, etc.), general 
challenging behavior, or no disability identified. Similarly, Morin (2017) found that the 
majority of students (38%, n = 15) had developmental disabilities. Of the recently 
published articles, the most commonly reported disability were developmental disabilities 
(n = 5), which indicates a continued trend of VA being implemented with participants 
who provide instruction to students with developmental disabilities.  
Additionally, the studies included students in different grade levels. Forty-six 
percent (n = 11) of the studies included students in elementary schools and 33% (n = 8) 
studies included students in preschools. In contrast, Morin (2017) found that 34% (n = 
36) of the participants provided instruction in a preschool setting and 33% (n = 35) of 
participants provided instruction in an elementary school setting. These findings are 
similar to the current study. In the years since Morin’s (2017) review, the newly 
72 
  
identified studies took place in preschools (n = 3) and elementary schools (n = 2), 
indicating the past and most recent studies continue to focus in these settings.  
Comparison of Publication Type and Design Quality Standards. The current 
study identified 15 peer reviewed articles and 10 dissertations, with Westover (2010) 
being coded as both a dissertation and a peer-reviewed article. Three studies (13%) met 
the WWC standards, 14 (58%) studies met with reservations, and seven (29%) studies did 
not meet WWC standards. Morin (2017) identified 61% (n = 17) peer-reviewed articles 
and 39% (n = 11) dissertations. Of these, 50% (n = 14) of studies met the standards with 
reservations and 39% (n = 11) did not meet. In the most recent studies (i.e., Coogle et al., 
2019; D’Agostino et al., 2020; Knight et al., 2018; McVittie, 2018; McLeod et al., 2019; 
Morin, 2017), four articles were peer reviewed and three met the WWC quality standards 
indicating the design of the studies methodologically adhere to the standards. 
Comparison of DVs Measured. The current study reported teacher outcomes 
while Morin’s (2017) meta-analysis reported student outcomes. Given this limitation, no 
comparison of DVs can be made between studies. The analysis of the teacher outcomes 
relies on the findings of this meta-analysis and the trend in study DVs preceding Morin’s 
publication date. From 2016-2020, the current study identified teacher outcomes in the 
following categories: praise (n = 2); student outcomes (n = 2); OTR (n = 2); 
implementation (n = 2); instructional quality (n = 1). This indicates that the most recent 
literature base focused on measuring praise, student outcomes, OTR, and implementation 
73 
  
CHAPTER IV  
DISCUSSION 
Through recent improvements in digital technology, VA has become a more 
commonly used tool for improving educator instructional quality (Knight et al., 2012). 
Currently, the VA literature base provides evidence on how to implement VA as either a 
teacher preparation tool for preservice teachers or as a professional development tool for 
inservice teachers and paraprofessionals. The purpose of this SR and meta-analysis was 
to understand the contribution that VA has made to the field of educator development. 
After completing a thorough SR of articles between 2010-2020, a total of 24 articles were 
identified that matched the inclusion criteria discussed in Chapter III. This chapter (a) 
summarizes the findings of each research question, (b) addresses the limitations of the SR 
and meta-analysis, (c) provides implications for future practice, and (d) draws a 
conclusion about the current use of VA. 
RQ 1: What is the status of the literature base on VA regarding study 
characteristics (i.e., publication type), participant characteristics (i.e., role, 
education level, experience level, age), student characteristics (i.e., disability type, 
student outcomes), and setting characteristics (i.e., grade level, group size, type of 
instruction, setting)? 
 To better understand the current literature base of SCRD using VA, RQ 1 
descriptively analyzes the (a) study and participant characteristics and (b) student and 
setting characteristics. Each of these characteristics and subcategories are addressed in 
greater detail below.  
74 
  
Study and Participant Characteristics 
 The current SR identified 15 (63%) peer-reviewed articles and 10 (42%) 
dissertations meeting inclusionary criteria. Additionally, the majority of the studies 
included inservice teachers (n = 15; 63%), participants with three or more years of 
experience (n = 17; 71%), participants who were 18-29 years of age (n = 12; 50%), and 
participants who had bachelor’s degrees (n = 13; 54%). These findings suggest that 
participants in studies using VA are inservice teachers and have a minimum of three 
years of teaching experience. This result is consistent with previous research studies. For 
example, Webster et al. (2012) found similar results using video self-reflection as part of 
a treatment package with 51 Head Start teachers who had an average of 10 years of 
teaching experience. The Head Start teachers were assigned randomly to an experiment 
group (i.e., immediate video self-reflection or delayed video self-reflection) or the control 
group. The participants in the immediate and delayed video self-reflection groups 
increased the number of praise statements given, demonstrating that inservice teachers 
with experience improved their instructional skills by participating in VA as a 
professional practice. With more evidence demonstrating its effectiveness for inservice 
teachers, VA could be used as a professional development tool to help support educators. 
Importantly, inservice teachers are not the only educators that interact with 
students and provide targeted supports. Preservice teachers and paraprofessionals both 
serve instructional roles and could potentially benefit the most from VA; however, they 
were less frequently studied. In comparison to inservice teachers, only five studies (21%) 
included preservice teachers and five studies (21%) included paraprofessionals. Given 
75 
  
these low study numbers, more studies using VA need to include both preservice teachers 
and paraprofessionals to determine VA treatment effects with all types of educators.  
Student and Setting Characteristics 
The findings also indicate the majority of studies (n = 12; 50%) included students 
that were identified as having a developmental disability and about half of the studies (n 
= 11; 46%) took place in an elementary school. In terms of classroom setting, half of the 
studies (n = 12; 50%) took place in inclusion classrooms; most studies provided 
instruction in a small group setting (n = 11; 46%); and over half of the studies (n = 13; 
54%) focused on teaching academic skills. Because students in these studies are receiving 
intervention or special education services and are the most at-risk students, instruction 
needs to be provided by a highly qualified and trained interventionist who has strong 
content and instructional knowledge (Johnson et al., 2013). To address this need, VA 
could be used as a professional development or teacher preparation tool to help support 
educators with little or no training in education. As a result, studies using VA need to be 
more inclusive of preservice teachers and paraprofessionals. Furthermore, studies need to 
include student outcomes to determine the efficacy of VA for at-risk students requiring 
individualized support.  
RQ 2: What is the status of the literature base on the research design quality for the 
included articles as measured by the What Works Clearinghouse (WWC) design 
quality standards (i.e., meets, meets with reservations, does not meet)? 
Of the 24 included studies, three (13%) studies met the WWC design quality 
standards, 14 (58%) studies met the WWC design quality standards with reservations, 
and seven (29%) studies did not meet the WWC design quality standards. These findings 
76 
  
demonstrate that approximately one-third of the studies did not meet the WWC standards. 
These rates are similar to other meta-analyses using SCRD (Barton et al., 2017; Barton, 
et al., 2020; Maggin et al., 2017). For example, Barton, et al., (2020) conducted a review 
of SCRD focused on student play interventions. As part of the study, the authors 
reviewed the rigor of the methods of the 27 included articles using the WWC standards 
and found that seven (26%) met the design quality standards, eleven (41%) met them 
with reservations, and nine (33%) did not meet WWC standards. The WWC standards 
were designed to address concerns about the reliability and interpretation of visual 
analysis of SCRD (Horner & Kratochwill, 2012). These WWC findings indicate that the 
standards are not being implemented regularly and point to the adolescence of the 
methodology. Due to the lack of studies adhering to high quality design standards in 
current study’s SR, it can be concluded that more rigorous methods are required in this 
area of research to support its use as a potential EBP (Horner & Kratochwill, 2012; 
Odom, 2009).  
For an approach to be recognized as an EBP, the practice used as a treatment in 
SCRD must have (a) a minimum of five studies using single case research methodology  
published in peer-reviewed journals, (b) a demonstration of a functional relation for each 
study, (c) variation in a minimum of three different research groups or settings, and (d) 
documentation of an effect for a total of 20 participants across all studies (Horner et al., 
2005; Horner & Kratochwill, 2012). This meta-analysis includes SCRDs, which 
inherently have small sample sizes; thus, multiple studies are necessary to meet the 
requirement of an adequate sample size (Horner et al., 2005). When using these criteria 
within the current review, studies using behavior specific praise (n = 9) as the dependent 
77 
  
variable met the requirements for an EBP. Praise showed promise by having large effect 
size (ES; g= 0.88-2.66), but given the wide confidence intervals, the results should be 
interpreted with caution. The need to further replicate studies measuring praise and its ES 
is discussed in greater detail below. 
RQ 3: What is the magnitude of effect of VA interventions on the instructional 
practices of educators? 
 Although practices using SCRD have a standard of demonstrating a functional 
relation to be identified as an EBP using visual analysis standards, Horner and 
Kratochwill (2012) also urge the field to calculate a standardized ES. Having a 
standardized ES would allow results from SCRDs to be compared across research design 
methodologies (Pustejovsky, 2018; Shadish et al., 2015) and further validates the efficacy 
of the practice. To determine the ES of the included studies, a BC-SMD was used, which 
allows for a comparison across study designs (Pustejovsky, 2018). This research question 
examines the magnitude of effect by (a) participant characteristics and (b) type of 
dependent variable. 
ES by Participant Characteristics 
When analyzing the participant characteristics (i.e., educator role, age, education 
level and teaching experience), findings show that studies with a large ES (g >  0.80) 
included the following: Bishop et al., 2015; Capizzi et al., 2010; Coogle et al., 2019; 
D’Agostino et al., 2020; Englund, 2010; Fedders, 2011; Hawkins & Heflin, 2011; Knight 
et al., 2018; Morin, 2017; Pelletier et al., 2010; Smith, 2015; Starling, 2015; Westover, 
2010). Across the studies with large effect sizes, ten studies included inservice teachers. 
Of the studies demonstrating a large ES that were not with inservice teachers, two studies 
78 
  
(Capizzi et al., 2010; Smith, 2015) included preservice teachers and two studies (Morin, 
2017; Westover, 2010) included paraprofessionals as participants. Of these studies, nine 
had participants with three or more years of teaching experience, six had participants 
holding a bachelor’s degree or higher, and eight had participants between the ages of 18-
29.  
The studies with a medium effect (g =  0.50-0.79) included the following three 
studies: Fedders, 2011; Pinter et al., 2015; Smith, 2015. Two studies (Pinter et al., 2015;, 
g = 0.66, Fedders, 2011 ; g = 0.57) with a medium effect size included inservice teachers, 
one study (Smith, 2015; g = 0.72) including preservice teachers had a medium effect size, 
and one study (Westover, 2010; g = -0.68) including paraprofessionals had a negative 
medium effect size. Three studies included participants with no teaching experience and 
three studies included participants with three or more years of teaching experience. One 
study had participants with a high school or GED and bachelor’s degree, one study had 
participants with some college experience, and one study had participants who held a 
master’s degree. Figure 1.4 shows the ES based on participant characteristics. 
Although no conclusions can be made as to why more studies that included 
teachers with more experience had a large ES, one reason could be that more experienced 
teachers, who may have more confidence, were more likely to participate in such a study 
than teachers with less experience. The large ES could also be simply a function of 
having more actual studies (i.e. numerically) that included this population as well.  
ES by Type of Dependent Variable 
When examining the ES by dependent variable (DV), results indicated that the 
effect of VA as an intervention varied by the outcome measure being used. A total of 
79 
  
nine DVs (i.e., praise, FOI, student outcomes, instructional quality, negative response, 
opportunities to respond, errors corrected, redirect, and instructional learning time) were 
measured across the studies. When examining the large effect sizes (g >  0.80), behavior 
specific praise and FOI had six studies, student outcomes had three studies, instructional 
quality had two studies, opportunities to respond had two studies, and errors corrected 
had one study. Fedders (2011) measured negative response that had a large negative 
effect (g = -4.70). 
When examining the medium effect sizes (g =  0.50-0.79), behavior specific 
praise, student outcomes, and opportunities to respond had one study each. Additionally, 
there was one study with a medium negative effect (Westover, 2010; g = -0.68), which 
measured student outcomes (i.e., no response). No response was defined as the student 
not responding to the paraprofessional within 10-seconds (Westover, 2010). This student 
behavior had a negative effect, meaning that this behavior decreased or students 
responded more quickly, which is the expected trend for a no response behavior.  
Finally, of the remaining studies, results demonstrated that one study had no 
effect (g < .2; Starling, 2015) and five had a small effect (g = 0.20-0.49; Hawkins & 
Heflin, 2011; MacVittie, 2018; Smith, 2015; Thompson et al., 2012; Westover, 2010). It 
is important to note that there was a study that measured two DVs (Starling, 2015) and 
had small negative effects. These DVs were focused on negative specific praise 
statements (g = -0.02) and reprimands (g = -0.13), so it would be expected that these 
behaviors would decrease once the intervention was introduced resulting in a negative 
trend in ES.  
80 
  
The DVs with the largest ES were praise and FOI. The six studies that measured 
praise had a moderate to large ES (g =-0.88-2.66). FOI was the DV in six studies with 
moderate to large ES (g =1.07-3.64). Interestingly, the confidence intervals for both were 
quite wide, which is common in SCRD meta-analyses and is present in other studies 
(Barton et al., 2017; Maggin et al., 2017). However, it does demonstrate considerable 
variability of the effect.  
The reason why VA may impact these DVs so strongly is because praise and FOI 
are discrete teaching behaviors that are easily identifiable and measurable making them 
ideal DVs for studies. Additionally, these procedural behaviors make the actions easier to 
implement in comparison to less discrete behaviors such as redirection and instructional 
time, which were also DVs of some of the included studies but had smaller ES.  
Interestingly, there was variability of the effectiveness of VA within studies that 
measured multiple DVs, indicating that VA may impact some teacher behaviors 
differently than others. For example, Smith (2015) examined the use of VA and measured 
OTR, negative response, instructional learning time, and praise and found ES of g = 0.72, 
g = 0.34, g = 0.20, and g = 1.73, respectively. The effect sizes ranged from small 
(instructional learning) to large (praise). Similarly, Westover (2010) also measured 
multiple DVs and obtained ES’s that ranged from small effect (redirect; g = 0.26), 
negative moderate effect (i.e., student outcomes; g = -0.68), to large (i.e., student 
outcomes, g = 1.27; praise, g = 1.83; error correction, g = 1.95; OTR, g = 2.15). These 
two studies further demonstrate the different impact of VA on various DVs. Figure 1.3 
shows the ES based on all the DVs across studies. 
81 
  
Given that the study characteristics (i.e., participants) and interventions using VA 
were consistent in these studies, yet yielded different impact by DV makes an important 
claim that the field should carefully consider the role that the type of DV may impact the 
efficacy of using VA. Nagro et al. (2020) advocate for the field of VA to advance its 
understanding of the practice to include more challenging teaching behaviors, such as 
classroom management skills, and discuss the challenges of how to feasibly implement 
studies with more complex teaching behaviors. To implement VA with classroom 
management, Nagro et al. (2020) suggest the following procedures: (a) recording the 
lesson; (b) reviewing the video while using an observation tool to help focus attention to 
the targeted instructional components; (c) reflecting using a structured graphic organizer, 
(d) revising instruction for the betterment of students, and (e) then repeating the process. 
It is noted that discrete and less complex teaching behaviors, such as praise and FOI, are 
easily observable and measurable which may increase the reliability as well as the utility 
in those studies compared to using more complex instructional behaviors. One way to 
mitigate this measurement challenge is through the use of a standardized observation tool 
in which participants and coders are trained to reliability. Observation tools such as the 
Quality Intervention Delivery and Receipt (QIDR; Harn et al., 2011), Classroom 
Assessment Scoring System (CLASS; Pianta et al., 2008), and the Framework for 
Teaching (FFT; Danielson, 2011) are observation tools that may assist in evaluating more 
complex teaching behaviors by using them as a graphic organizer to guide reflection as 
suggested by Nagro et al. (2020). 
RQ 4: How has the literature base on VA changed since 2016 as reported 
by Morin’s (2017) systematic review? 
82 
  
Comparison of the current meta-analysis to the parent study (Morin, 2017) SR 
indicate that there has been little change regarding the type of articles and study 
characteristics (e.g., samples, research design, setting characteristics) examining VA. 
From 2016-2020, six VA-related articles (i.e., Coogle et al., 2019; D’Agostino et al., 
2020; Knight et al., 2018; MacVittie, 2018; McLeod et al., 2019; Morin, 2017) were 
published. Similar to the parent study, the more recent studies reported that the 
participants were primarily inservice teachers (n = 5), between 30-39 years of age (n = 4), 
held a bachelor’s degree (n = 5), and had three or more years of experience (n = 5). The 
most recent studies also primarily took place in inclusion classrooms (n = 5), delivered in 
small groups (n = 3), or in one-to-one environments (n = 3). In these settings, teachers 
provided academic instruction (n = 2), communication instruction (n = 2), or the 
instruction was not reported (n = 2). The student characteristics also focused on students 
with developmental disabilities (n = 5). Finally, the most recent studies measured praise 
(n = 2), student outcomes (n = 2), OTR (n = 1), implementation (n = 1), and instructional 
quality (n = 1). 
Although technological advancements have been made making VA a more 
feasible tool for teacher development (Knight et al., 2012), these findings show a slight 
stagnation in the development of the field and indicate the need to increase and extend 
VA research to address the current identified gaps. One consideration would be to 
examine the reasons why the use of VA has not increased over the years. A plausible 
explanation is that teachers feel uncomfortable viewing their instruction (Mosley Wetzel 
et al., 2017). However, as educators watch videos of themselves they become more 
comfortable and accustomed to watching themselves teach (Hong & Van Riper, 2016). 
83 
  
This exposure to VA and self-reflective practices transforms teachers into lifelong 
learners (Benedict, et al., 2016; Harn & Meline, 2019; Tripp & Rich, 2012) who analyze 
and adapt their teaching to better support their students. Teachers can become more 
responsive to their students through the use of VA.  
Limitations 
There were multiple limitations within this study. At the SR level, limitations 
occurred regarding the access to resources needed to replicate Morin’s (2017) 
dissertation. Therefore, the current study used similar, but not the exact research and 
reference databases used in the parent study. This altered the articles collected in the 
primary search and the ancestral, citation, and first author searches. For example, Snyder 
(2013) was included in the parent study but was not identified in the current SR’s 
collection process. Therefore, this study included the article in the full-text review 
because it met all the inclusionary criteria. Snyder (2013) was included in the descriptive 
data but was ultimately excluded in the statistical analysis because the study had fewer 
than three participants. Additionally, Morin (2017) included Lindsey (2013), which could 
not be obtained for the current study due to database and website restrictions. Finally, two 
articles were excluded because the video component used was an exemplar teacher, not 
the participant (i.e., Digennaro-Reed et al., 2010) or took place in a setting outside of the 
US (i.e., Stephenson et al., 2011). These restrictions made it challenging to conduct a 
direct replication of Morin’s (2017) meta-analysis and highlights an important issue in 
attempting to replicate SR: the process and procedure of replication studies needs to be 
reproducible (Zwaan et al., 2018). 
84 
  
Additionally, this study used different research and reference databases as well as 
search engines than the parent study, which resulted in a difference between the collected 
articles. Even if the articles were identified in the searches and coded in the title and 
abstract review, it was challenging to obtain access to the full-text of some of the articles. 
Articles that were identified from the search but could not be obtained through inter-
library loan (n = 7) were not included; all of these were dissertations. 
The inclusionary criteria also limited the types of studies that could be examined. 
Self-reflection is an essential piece of VA and the growth of teachers and reflective 
practices should be examined, yet the SR inclusionary criteria required that the teacher 
outcomes be observable and measurable. This restricted the ability to determine if the 
teacher’s reflective ability as a component of VA had resulted in higher-levels of self-
reflection. 
Due to the limitations of the statistical analysis, I was unable to (a) isolate VA 
from other treatment packages, (b) examine moderator effects, and (c) calculate the 
robust variance estimation (RVE). When looking at the studies using VA as an 
intervention, it may have been included as part of a treatment package. For example, 
Coogle et al.,’s (2019) study used a treatment package that combined both bug-in-ear and 
VA reflection. Educators received real-time coaching through bug-in-ear and also 
received an email with their instructional video, which they were to review and reflect 
upon. The use of two interventions used simultaneously made it challenging to determine 
if VA or a treatment package (i.e., VA and bug-in-ear coaching) was effective.  
Relatedly, due to the lack of sufficient data and the small number of studies, 
Borenstein et al., (2009) recommends not statistically summarizing the moderators and 
85 
  
questions the reliability of the calculations if it were to be conducted. Given the current 
statistical procedures for meta-analyses, the current study did not calculate the moderator 
effects for SCRD studies using VA. Moderator analysis can be calculated using a t-test, 
analysis of variance, or regression model to determine the moderator of variables such as 
participants, setting, and student characteristics (Shadish et al., 2014). Regrettably, due to 
the small sample size and variability in the DVs across the included studies, a moderator 
analysis could not be conducted. 
Finally, newly recommended meta-analytic methods propose calculating the 
omnibus ES using robust variance estimation (RVE), which accounts for unknown 
covariance and sampling distributions (Hedges et al., 2010). RVE is used for dependent 
ES (Fisher & Tipton, 2015; Hedges et al., 2010; Tanner et al., 2016), which occurs in 
SCRD, to determine the effect of a treatment on different outcomes (Hedges et al., 2010). 
Typical procedures for a meta-analysis includes first calculating the BC-SMD ES of 
individual DVs within a study and then calculating the RVE for the effects size of a DV 
across studies. Unfortunately, due to the multiple DVs (i.e., praise, implementation, 
student outcomes, instructional quality, error correction, instructional learning time, 
negative response, OTR, and redirect) across the included articles and low number of 
studies examining similar DVs, the RVE could not be calculated.  
  
86 
  
 
 
 
Figure 1.4. A forest plot displaying the BC-SMD ES for individual studies based upon 
the participant characteristics. 
87 
  
Implications for Future Practice 
The results of this SR and meta-analysis identify the issue of a small sample size 
and lack of methodological rigor for SCRD. These two concerns limit the generalizability 
of VA and impact it from being classified as an EBP. One way to overcome the issue of a 
small sample size and lack of rigorous methods is to use alternative research designs as 
recommended by Odom (2009) and Odom et al. (2005). The most common approach 
when studying VA is the use of SCRD; but by using group design or mixed method 
approaches with VA, it would help the field better understand its actual utility both by 
participant type (i.e., inservice, preservice, paraprofessional) and dependent variable. By 
increasing the number of participants in studies using VA, results can be more broadly 
generalized. These larger and more rigorous studies need to have diverse study 
characteristics and measure more complex classroom teaching behaviors to determine the 
effectiveness of VA. Group designs are more appropriate for better understanding the 
impact of an intervention (Odom, 2009; Odom et al., 2005). This could be applied to VA 
to better understand the role that study characteristics (i.e., participants) have on different 
dependent variables (e.g., OTR, praise, instructional quality, etc.) 
Conclusion 
While the field continues to frequently use VA, the nature of the studies, 
primarily using single-case methodological approaches, minimizes our ability to call it an 
EBP. The current study highlights some of the challenges encountered in order for VA to 
be considered an EBP. First to become an EBP, studies using VA as a treatment must 
diversify the participant, student, and setting characteristics to increase the 
generalizability of the practice. Future studies should adhere to WWC standards for 
88 
  
SCRD to improve our understanding of the utility of VA. Another option is to consider 
alternative methodologies (e.g., quasi-experimental, group design, etc.) with larger 
sample sizes and varied study settings to enable different types of analyses that can be 
used to determine potential moderating variables related to the effectiveness of VA (e.g., 
type of DV, participant, etc.).  
 Additionally, this study highlights a challenge in truly “replicating” a study 
because of differential access/use of search engines yielding variable access to studies 
(e.g. dissertations). Finally, the advancements in the statistical analysis for completing a 
meta-analysis using SCRD is still evolving, so comparing these results to the outdated 
practice in the parent study is ill advised.   
In relation to the use of VA as an intervention, the measured outcomes (DV) also 
need to be linked to student outcomes (Morin, Ganz, et al., 2019). This is particularly 
critical to professional development tools such as VA that aim to improve instructional 
skills that result in increased student outcomes. Across the included studies, five different 
studies (28%) measured student outcomes. More research needs to be conducted to 
understand under what conditions and in what manner VA can be used to more 
effectively improve instructional practices and impact student outcomes. In conclusion, 
VA continues to be a promising practice. Once the previously mentioned challenges are 
addressed and advancements in statistical analysis are made, VA has the potential to be 
identified an effective EBP.  
89 
  
APPENDIX A 
QUALTRICS FORM FOR CODING STUDY CHARACTERISTICS 
 
 
90 
  
91 
  
 
92 
  
93 
  
94 
  
95 
  
 
96 
  
 
97 
  
 
98 
  
99 
  
 
100 
  
101 
  
102 
  
103 
  
 
104 
  
105 
  
 
106 
  
107 
  
108 
  
109 
  
 
110 
  
 
111 
  
APPENDIX B 
QUALTRICS FORM FOR CODING WHAT WORKS CLEARINGHOUSE 
DESIGN QUALITY STANDARDS 
 
112 
  
 
113 
  
 
114 
  
 
115 
  
 
116 
  
 
117 
  
 
118 
  
 
119 
  
 
120 
  
 
121 
  
 
 
122 
  
REFERENCES CITED 
 
*Alexander, M., Williams, N. A., & Nelson, K. L. (2012). When you can't get there: 
Using video self-monitoring as a tool for changing the behaviors of pre-service 
teachers. Rural Special Education Quarterly, 31, 18-24. 
https://doi.org/10.1177/875687051203100404 
 
Barton, E. E., Murray, R., O'Flaherty, C., Sweeney, E. M., & Gossett, S. (2020). 
Teaching object play to young children with disabilities: A systematic review of 
methods and rigor. American Journal on Intellectual and Developmental 
Disabilities, 125, 14-36. https://doi.org/10.1352/1944-7558-125.1.14  
 
Barton, E. E., Pustejovsky, J. E., Maggin, D. M., & Reichow, B. (2017). Technology-
aided instruction and intervention for students with ASD: A meta-analysis using 
novel methods of estimating effect sizes for single-case research. Remedial and 
Special Education, 38(6), 371-386. https://doi.org/10.1177/0741932517729508 
 
Beauchamp, C. (2015). Reflection in teacher education: issues emerging from a review of 
current literature. Reflective Practice, 16, 123-141. 
https://doi.org/10.1080/14623943.2014.982525  
 
Benedict, A., Holdheide, L., Brownell, M., & Foley, A. M. (2016). Learning to teach: 
Practice-based preparation in teacher education [Special Issues Brief]. American 
Institutes for Research. https://ceedar.education.ufl.edu/wp-
content/uploads/2016/07/Learning_To_Teach.pdf  
 
*Bishop, C. D., Snyder, P. A., & Crow, R. E. (2015). Impact of video self-monitoring 
with graduated training on implementation of embedded instructional learning 
trials. Topics in Early Childhood Special Education, 35, 170-182. 
https://doi.org/10.1177/0271121415594797  
 
Borenstein, M. (1994). The case for confidence intervals in controlled clinical trials. 
Controlled Clinical Trials, 15, 411-428. https://doi.org/10.1016/0197-
2456(94)90036-1  
 
Borenstein, M. (2009). Effect sizes for continuous data. In Cooper, H., Hedges, L. V., & 
Valentine, J. C. (Eds.), The handbook of research synthesis and meta-analysis. 
(2nd ed., pp. 221-235). Russel Sage Foundation. 
 
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction 
to meta-analysis: Statistics in practice. Wiley. 
 
Borko, H., Jacobs, J., Eiteljorg, E., & Pittman, M. E. (2008). Video as a tool for fostering 
productive discussions in mathematics professional development. Teaching and 
teacher education, 24, 417-436. https://doi.org/10.1016/j.tate.2006.11.012  
 
123 
  
*Capizzi, A. M., Wehby, J. H., & Sandmel, K. N. (2010). Enhancing mentoring of 
teacher candidates through consultative feedback and self-evaluation of 
instructional delivery. Teacher Education and Special Education, 33, 191-212. 
https://doi.org/10.1177/0888406409360012  
 
Calandra, B., Gurvitch, R., & Lund, J. (2008). An exploratory study of digital video 
editing as a tool for teacher preparation. Journal of Technology and Teacher 
Education, 16(2), 137-153. 
 
Cohen J, (1988). Statistical power analysis for the behavioral sciences (2nd ed.). 
Erlbaum. 
 
Collin, S., Karsenti, T., & Komis, V. (2013). Reflective practice in initial teacher 
training: Critiques and perspectives. Reflective practice, 14, 104-117. 
https://doi.org/10.1080/14623943.2012.732935  
 
*Coogle, C. G., Nagro, S., Regan, K., O’Brien, K. M., & Ottley, J. R. (2019). The impact 
of real-time feedback and video analysis on early childhood teachers’ practice. 
Topics in Early Childhood Special Education. Advance online publication. 
https://doi.org/10.1177/0271121419857142 
 
Cook, B. G. (2014). A call for examining replication and bias in special education 
research. Remedial and Special Education, 35, 233–246. 
https://doi.org/10.1177/0741932514528995  
 
Cook, B. G., Collins, L. W., Cook, S. C., & Cook, L. (2016). A replication by any other 
name: A systematic review of replicative intervention studies. Remedial and 
Special Education, 37, 223-234. https://doi.org/10.1177/0741932516637198  
 
Cooper, H., Hedges, L., & Valentine, J. (Eds.). (2019). The handbook of research 
synthesis and meta-analysis. Russell Sage Foundation. 
http://www.jstor.org/stable/10.7758/9781610448864 
 
Covidence systematic review software, Veritas Health Innovation, Melbourne, Australia. 
Available at www.covidence.org 
 
*D’Agostino, S., Douglas, S. N., & Horton, E. (2020). Inclusive preschool practitioners’ 
implementation of naturalistic developmental behavioral intervention using 
telehealth training. Journal of Autism and Developmental Disorders, 50, 864-880. 
https://doi.org/10.1007/s10803-019-04319-z 
 
Danielson, C. (2011). The framework for teaching evaluation instrument. The Danielson 
Group. 
 
 
124 
  
Darling-Hammond, L., Hyler, M. E., & Gardner, M. (2017, June). Effective teacher 
professional development. Learning Policy Institute. 
https://learningpolicyinstitute.org/product/effective-teacher-professional-
development-report 
 
DeBettencourt, L. U., & Nagro, S. A. (2018). Tracking Special Education Teacher 
Candidates’ Reflective Practices Over Time. Remedial and Special Education, 40, 
277-288. https://doi.org/10.1177/0741932518762573 
 
Dewey, J. (1933). How we think. Prometheus Books. 
 
Digennaro-Reed, F. D., Codding, R., Cantania, C. N., & Maguire, H. (2010). Effects of 
video modeling on treatment integrity of behavioral interventions. Journal of 
Applied Behavior Analysis, 43, 291-295. https://doi.org/10.1901/jaba.2010.43-291  
 
*Englund, L. W. (2010). Evaluating and improving the quality of teacher’s language 
modeling in early childhood classrooms [Doctoral dissertation, University of 
Nevada, Las Vegas]. Digital Scholarship@UNLV. 
https://digitalscholarship.unlv.edu/thesesdissertations/722/ 
 
Etscheidt, S., Curran, C. M., & Sawyer, C. M. (2012). Promoting reflection in teacher 
preparation programs: A multilevel model. Teacher education and Special 
education, 35, 7-26. https://doi.org/10.1177/0888406411420887  
 
Every Student Succeeds Act of 2015, Pub. L. No. 114-95 § 114 Stat. 1177 (2015). 
https://www.congress.gov/114/plaws/publ95/PLAW-114publ95.pdf  
 
Fallon, L. M., Collier-Meek, M. A., Maggin, D. M., Sanetti, L. M., & Johnson, A. H. 
(2015). Is performance feedback for educators an evidence-based practice? A 
systematic review and evaluation based on single-case research. Exceptional 
Children, 81, 227-246. https://doi.org/10.1177/0014402914551738  
 
*Fedders, A. M. (2011). The effect of video self-monitoring on novice special educators' 
implementation of advanced direction instruction reading techniques [Doctoral 
dissertation, University of California, Santa Barbara]. ProQuest. 
https://www.proquest.com/docview/923804213  
 
Fernandez, C. (2002). Learning from Japanese approaches to professional development: 
The case of lesson study. Journal of teacher education, 53, 393-405. 
https://doi.org/10.1177/002248702237394  
 
Fisher, Z., & Tipton, E. (2015). Robumeta: An R-package for robust variance estimation 
in meta-analysis [Statistical package]. https://cran.r-
project.org/web/packages/robumeta/vignettes/robumetaVignette.pdf  
 
125 
  
Gaudin, C., & Chaliès, S. (2015). Video viewing in teacher education and professional 
development: A literature review. Educational Research Review, 16, 41-67. 
https://doi.org/10.1016/j.edurev.2015.06.001  
 
Gearing, R. E., El-Bassel, N., Ghesquiere, A., Baldwin, S. Gillies, J., & Ngeow, E.. 
(2011). Major ingredients of fidelity: A review and scientific guide to improving 
quality of intervention research implementation. Clinical Psychology 
Review, 31(1), 79–88. https://doi.org/10.1016/j.cpr.2010.09.007 
 
*Hager, K. D. (2012). Self-monitoring as a strategy to increase student teachers' use of 
effective teaching practices. Rural Special Education Quarterly, 31, 9-17. 
https://doi.org/10.1177/875687051203100403  
 
Harn, B. (2017) Making RTI effective by coordinating the system of instructional 
supports. Perspectives on Language and Literacy , 43, 15-18. 
 
Harn, B. A., Forbes-Spear, C., Fritz, R., & Berg, T. (2011). Quality of Intervention 
Delivery and Receipt (QIDR) observation tool. Eugene, OR. 
 
Harn, B., & Meline, M. (2019). Developing Critical Thinking and Reflection in Teachers 
Within Teacher Preparation. In Mariano, G. J. & Figiliano, F. J. (Eds.), Handbook 
of Research on Critical Thinking Strategies in Pre-Service Learning 
Environments (pp. 126-145). IGI Global. 
 
*Hawkins, S. M., & Heflin, L. J. (2011). Increasing secondary teachers’ behavior-
specific praise using a video self-modeling and visual performance feedback 
intervention. Journal of Positive Behavior Interventions, 13, 97-108. 
https://doi.org/10.1177/1098300709358110  
 
Hedges, L. V. (1981). Distribution theory for Glass’s estimator of effect size and related 
estimators. Journal of Educational Statistics, 6, 107–128. 
https://doi.org/10.3102/10769986006002107  
 
Hedges, L. V., Pustejovsky, J. E., & Shadish, W. R. (2012). A standardized mean 
difference effect size for single case designs. Research Synthesis Methods, 3, 224-
239. https://doi.org/10.1002/jrsm.1052  
 
Hedges, L. V., Pustejovsky, J. E., & Shadish, W. R. (2013). A standardized mean 
difference effect size for multiple baseline designs across individuals. Research 
Synthesis Methods, 4, 324-341. https://doi.org/10.1002/jrsm.1086  
 
Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta-
regression with dependent effect size estimates. Research Synthesis Methods, 1, 
39-65. https://doi.org/10.1002/jrsm.5 
 
126 
  
Hong, C. E., & Van Riper, I. (2016). Enhancing teacher learning from guided video 
analysis of literacy instruction: An interdisciplinary and collaborative approach. 
Journal of Inquiry and Action in Education, 7(2), 94-110. 
https://eric.ed.gov/?id=EJ1133602  
 
Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S., & Wolery, M. (2005). The 
use of single-subject research to identify evidence-based practice in special 
education. Exceptional Children, 71, 165–179. 
https://doi.org/10.1177/001440290507100203  
 
Individuals with Disabilities Education Act, 20 U.S.C. § 1400 (2004). 
https://sites.ed.gov/idea/  
 
Ioannidis, J. (2005). Why most published research findings are false. PLoS Medicine, 
2(8), e124. https://doi.org/10.1371/journal.pmed.0020124  
 
Ioannidis, J. (2012). Why science is not necessarily self-correcting. Perspectives on 
Psychological Science, 7, 645– 654. https://doi.org/10.1177/1745691612464056. 
 
Johnson, E. S., Carter, D. R., & Pool, J. L. (2013). Introduction to the special issue: The 
critical role of a strong tier 2 system. Intervention in School and Clinic, 48, 195-
197. https://doi.org/10.1177/1053451212462877  
 
Kagan, D. M. (1993). Contexts for the use of classroom cases. American Educational 
Research Journal, 30, 703–723. https://doi.org/10.3102/00028312030004703  
 
Knight, J., Bradley, B. A., Hock, M., Skrtic, T. M., Knight, D., Brasseur-Hock, I., Clark, 
J, Ruggles, M. & Hatton, C. (2012). Record, replay, reflect. Journal of Staff 
Development, 33(2), 18-23. https://www.proquest.com/docview/1015816235   
 
*Knight, D., Hock, M., Skrtic, T. M., Bradley, B. A., & Knight, J. (2018). Evaluation of 
video-based instructional coaching for middle school teachers: Evidence from a 
multiple baseline study. The Educational Forum, 82, 425-442. 
https://doi.org/10.1080/00131725.2018.1474985  
 
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for 
categorical data. biometrics, 33, 159-174. https://doi.org/10.2307/2529310  
 
Lee, G. C., & Wu, C. C. (2006). Enhancing the teaching experience of pre-service 
teachers through the use of videos in web-based computer-mediated 
communication (CMC). Innovations in Education and Teaching International, 43, 
369-380. http://doi.org/10.1080/14703290600973836  
 
Leins Dvorchak, J. (2015). Increasing Secondary Teachers’use Of Praise With Video 
Performance Feedback [Doctoral dissertation, University of Pittsburgh]. http://d-
scholarship.pitt.edu/25179/  
127 
  
 
Levy, Y., & Ellis, T. J. (2006). A systems approach to conduct an effective literature 
review in support of information systems research. Informing Science: The 
International Journal of Emerging Transdiscipline, 9, 181-212. 
https://doi.org/10.28945/479  
 
Lindsey, R. (2014). Increasing the use of prompting strategies: A multiple baseline study 
across pairs of paraeducators of students with disabilities [Doctoral dissertation, 
John Hopkins University]. ProQuest. www.proquest.com/docview/1507591541  
 
*Lynes, M. J. (2012). The effects of self-evaluation with video on the use of oral 
language development strategies by preschool teachers. [Doctoral dissertation, 
The University of Utah]. https://www.proquest.com/docview/1012121473  
 
*MacVittie, N. S. (2018). Guided self-reflection with video and changes in teacher 
instructional behaviors [Doctoral dissertation, George Mason University]. 
ProQuest. www.proquest.com/docview/2070496459  
 
Maggin, D. M., Pustejovsky, J. E., & Johnson, A. H. (2017). A meta-analysis of school-
based group contingency interventions for students with challenging behavior: An 
update. Remedial and Special Education, 38, 353-370. 
https://doi.org/10.1177/0741932517716900  
 
Makel, M. C., Plucker, J. A., & Hegarty, B. (2012). Replications in psychology research: 
How often do they really occur? Perspectives on Psychological Science, 7, 537-
542. https://doi.org/10.1177/1745691612460688 
 
Makel, M. C., & Plucker, J. A. (2014). Facts are more important than novelty: 
Replication in the education sciences. Educational Researcher, 43, 304–316. 
https://doi.org/10.3102/0013189X14545513  
 
McLeod, R. H., Kim, S., & Resua, K. A. (2019). The effects of coaching with video and 
email feedback on preservice teachers’ use of recommended practices. Topics in 
Early Childhood Special Education, 38, 192-203. 
https://doi.org/10.1177/0271121418763531  
 
Methley, A. M., Campbell, S., Chew-Graham, C., McNally, R., & Cheraghi-Sohi, S. 
(2014). PICO, PICOS and SPIDER: A comparison study of specificity and 
sensitivity in three search tools for qualitative systematic reviews. BMC Health 
Services Research, 14, 579. https://doi.org/10.1186/s12913-014-0579-0  
 
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & The PRISMA Group (2009). 
Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The 
PRISMA Statement. PLoS Medicine 6(7), e1000097. 
https://doi.org/10.1371/journal.pmed1000097  
 
128 
  
Moher, D., Tetzlaff, J., Tricco, A. C., Sampson, M., & Altman, D. G. (2007). 
Epidemiology and reporting characteristics of systematic reviews. PLoS 
Medicine, 4(3). https://doi.org/10.1371/journal.pmed.0040078  
 
*Morin, K. (2017). The use of video analysis to change special educators' instructional 
practices: A single-case study and meta-analysis [Doctoral dissertation, Texas 
A&M University]. Texas A&M University Libraries: OAKTrust. 
http://hdl.handle.net/1969.1/165113 
 
Morin, K. L., Ganz, J. B., Vannest, K. J., Haas, A. N., Nagro, S. A., Peltier, C. J., Fuller, 
M. C., & Ura, S. K. (2019). A systematic review of single-case research on video 
analysis as professional development for special educators. The Journal of Special 
Education, 53, 3-14. https://doi.org/10.1177/0022466918798361  
 
Morin, K. L., Nagro, S., Artis, J., Haas, A., Ganz, J. B., & Vannest, K. J. (2019). 
Differential effects of video analysis for special educators related to intervention 
characteristics, dependent variables, and student outcomes: A meta-analysis of 
single-case research. Journal of Special Education Technology. Advance online 
publication. https://doi.org/10.1177/0162643419890250  
 
Mosley Wetzel, M., Maloch, B., & Hoffman, J. V. (2017). Retrospective video analysis: 
a reflective tool for teachers and teacher educators. The Reading Teacher, 70, 
533-542. https://doi.org/10.1002/trtr.1550  
 
*Murphy, A., Robinson, S. E., Cote, D. L., Karge, B. D., & Lee, T. (2015). A teacher's 
use of video to train paraprofessionals in pivotal response techniques. The Journal 
of Special Education Apprenticeship, 4(2), 1-19. 
https://eric.ed.gov/?id=EJ1127774  
 
Nagro, S. A. (2020). Reflecting on others before reflecting on self: Using video evidence 
to guide teacher candidates’ reflective practices. Journal of Teacher 
Education, 71, 420-433. https://doi.org/10.1177/0022487119872700  
 
Nagro, S. A., & Cornelius, K. E. (2013). Evaluating the evidence base of video analysis: 
A special education teacher development tool. Teacher Education and Special 
Education, 36, 312-329. https://doi.org/10.1177/0888406413501090  
 
Nagro, S. A., & deBettencourt, L. U. (2019). Reflection activities within clinical 
experiences: An important component of field-based teacher education. 
In Handbook of Research on Field-Based Teacher Education (pp. 565-586). IGI 
Global. 
 
Nagro, S. A., deBettencourt, L. U., Rosenberg, M. S., Carran, D. T., & Weiss, M. P. 
(2017). The effects of guided video analysis on teacher candidates’ reflective 
ability and instructional skills. Teacher Education and Special Education, 40, 7-
25. https://doi.org/10.1177/0888406416680469 
129 
  
 
Nagro, S. A., Hirsch, S. E., & Kennedy, M. J. (2020). A self-led approach to improving 
classroom management practices using video analysis. TEACHING Exceptional 
Children, 53, 24-32. https://doi.org/10.1177/0040059920914329  
 
Odom, S. L. (2009). The tie that binds: Evidence-based practice, implementation science, 
and outcomes for children. Topics in Early Childhood Special Education, 29, 53-
61. https://doi.org/10.1177/0271121408329171  
 
Odom, S. L, Brantlinger, E., Gersten, R., Horner, R. H., Thompson, B., & Harris, K. R. 
(2005). Research in special education: Scientific methods and evidence-based 
practices. Exceptional Children, 71, 137-148. 
https://doi.org/10.1177/001440290507100201  
 
O’Donnell, C. L. (2008). Defining, conceptualizing, and measuring fidelity of 
implementation and its relationship to outcomes in K–12 curriculum intervention 
research. Review of Educational Research, 78, 33–84. 
https://doi.org/10.3102/0034654307313793  
 
Olson, J. K., Bruxvoort, C. N., & Vande Haar, A. J. (2016). The impact of video case 
content on preservice elementary teachers’ decision-making and conceptions of 
effective science teaching. Journal of Research in Science Teaching, 53(10), 
1500-1523. 
 
Osipova, A., Prichard, B., Boardman, A. G., Kiely, M. T., & Carroll, P. E. (2011). 
Refocusing the lens: Enhancing elementary special education reading instruction 
through video self-reflection. Learning Disabilities Research & Practice, 26, 158-
171. https://doi.org/10.1111/j.1540-5826.2011.00335.x  
 
Pashler, H., & Wagenmakers, E.-J. (2012). Editors’ introduction to the special section on 
replicability in psychological science: A crisis of confidence? Perspectives on 
Psychological Science, 7, 528–530. https://doi.org/10.1177/1745691612465253  
 
Parker, R. I., Vannest, K. J., Davis, J. L., & Sauber, S. B. (2011). Combining nonoverlap 
and trend for single-case research: Tau-U. Behavior Therapy, 42, 284-299. 
https://doi.org/10.1016/j.beth.2010.08.006  
 
Partin, T. C. M., Robertson, R. E, Maggin, D, M, Oliver, R. M, & Wehby, J. H. (2009). 
Using teacher praise and opportunities to respond to promote appropriate student 
behavior. Preventing School Failure, 54, 172–178. 
https://doi.org/10.1080/10459880903493179  
 
*Pelletier, K., McNamara, B., Braga-Kenyon, P., & Ahearn, W. H. (2010). Effect of 
video self-monitoring on procedural integrity. Behavioral Interventions, 25, 261-
274. https://doi.org/10.1002/bin.316  
 
130 
  
Pianta, R. C., La Paro, K. M., & Hamre, B. K. (2008). Classroom Assessment Scoring 
System™: Manual K-3. Paul H Brookes Publishing. 
 
*Pinter, E. B., East, A., & Thrush, N. (2015). Effects of a video-feedback intervention on 
teachers' use of praise. Education and Treatment of Children, 38, 451-472. 
https://doi.org/10.1353/etc.2015.0028  
 
Pustejovsky, J. E., Hedges, L. V., & Shadish, W. R. (2014). Design-comparable effect 
sizes in multiple baseline designs: A general modeling framework. Journal of 
Educational and Behavioral Statistics, 39, 368-393. 
https://doi.org/10.3102/1076998614547577  
 
Pustejovsky, J. E. (2020). scdhlm: Estimating hierarchical linear models for single-case 
designs [Statistical package]. https://cran.r-
project.org/web/packages/scdhlm/scdhlm.pdf  
 
Pustejovsky, J. E.  (2018, February 1). Effect size measures for single-case research: 
General considerations. Advanced Training Institute on Single-Case Research 
Methods. https://singlecaseinstitute.uoregon.edu/2018/02/01/effect-sizes-and-
single-case-research/ 
 
Reinke, W. M., Lewis-Palmer, T., & Martin, E. (2007). The effect of visual performance 
feedback on teacher use of behavior-specific praise. Behavior Modification, 31, 
247-263. https://doi.org/10.1177/0145445506288967  
 
Rich, P. J., & Hannafin, M. (2009). Video annotation tools: Technologies to scaffold, 
structure, and transform teacher reflection. Journal of Teacher Education, 60, 52-
67. https://doi.org/10.1177/0022487108328486  
Richards, K. A. R., Templin, T. J., & Gaudreault, K. L. (2013). Understanding the 
realities of school life: Recommendations for the preparation of physical 
education teachers. Quest, 65, 442-457. 
https://doi.org/10.1080/00336297.2013.804850  
 
Roberts, C. A., Benedict, A. E., Kim, S. Y., & Tandy, J. (2018). Using lesson study to 
prepare preservice special educators. Intervention in School and Clinic, 53(4), 
237-244. https://doi.org/10.1177/1053451217712974  
 
*Robinson, S. E. (2011). Teaching paraprofessionals of students with autism to 
implement pivotal response treatment in inclusive school settings using a brief 
video feedback training package. Focus on Autism and Other Developmental 
Disabilities, 26, 105-118. https://doi.org/10.1177/1088357611407063  
 
Robinson, L., & Kelley, B. (2007). Developing reflective thought in preservice educators: 
Utilizing role-plays and digital video. Journal of Special Education 
Technology, 22(2), 31-43. https://doi.org/10.1177/016264340702200203  
 
131 
  
Scheeler, M. C., Ruhl, K. L., & McAfee, J. K. (2004). Providing performance feedback to 
teachers: A review. Teacher Education and Special Education, 27, 396-407. 
https://doi.org/10.1177/088840640402700407  
 
Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is 
neglected in the social sciences. Review of General Psychology, 13, 90–100. 
https://doi.org/10.1037/a0015108  
 
Shadish, W. R., Hedges, L. V., Horner, R. H., & Odom, S. L. (2015). The role of 
between-case effect size in conducting, interpreting, and summarizing single-case 
research. National Center for Education Research. 
https://files.eric.ed.gov/fulltext/ED562991.pdf  
 
Shadish, W. R., Rindskopf, D. M., & Hedges, L. V. (2008). The state of the science in the 
meta-analysis of single-case experimental designs. Evidence-Based 
Communication Assessment and Intervention, 2, 188-196. 
https://doi.org/10.1080/17489530802581603  
 
Sherin, M., & van Es, E. (2005). Using video to support teachers’ ability to notice 
classroom interactions. Journal of technology and teacher education, 13(3), 475-
491. 
 
Sims, L., & Walsh, D. (2009). Lesson study with preservice teachers: Lessons from 
lessons. Teaching and Teacher Education, 25(5), 724-733. 
 
Sinclair, A. C., Gesel, S. A., LeJeune, L. M., & Lemons, C. J. (2019). A review of the 
evidence for real-time performance feedback to improve instructional practice. 
The Journal of Special Education, 54, 90-100. 
https://doi.org/10.1177/0022466919878470  
 
*Smith, C. L. (2015). Effects of video feedback and self-assessment on the performance 
of evidence-based teaching strategies [Doctoral dissertation, University of 
Georgia]. Athenaeum@UGA. https://athenaeum.libs.uga.edu/handle/10724/34655  
 
*Snyder, C. K. (2013). Effects of Training on Early Childhood Special Education 
Paraeducators' Use of Early Literacy Strategies During Book Reading [Doctoral 
dissertation, University of Kansas]. KU ScholarWorks. 
https://kuscholarworks.ku.edu/handle/1808/15109  
 
Spalding, E., & Wilson, A., (2002). Demystifying reflection: A study of pedagogical 
strategies that encourage reflective journal writing. Teachers College 
Record, 104(7), 1393-1421. 
 
 
 
132 
  
Stains, Marilyne, & Vickrey, Trisha. (2017). Fidelity of Implementation: An Overlooked 
Yet Critical Construct to Establish Effectiveness of Evidence-Based Instructional 
Practices. CBE Life Sciences Education, 16(1), rm1. 
https://doi.org/10.1187/cbe.16-03-0113 
 
*Starling, N. R. (2015). The Effectiveness of Video Self-Modeling on Increasing and 
Sustaining Teacher Use of Behavior-Specific Praise in the Alternative Classroom 
[Doctoral dissertation, University of Connecticut]. OpenCommons@UConn. 
https://opencommons.uconn.edu/cgi/viewcontent.cgi?article=6991&context=disse
rtations  
 
Stephenson, J., Carter, M., & Arthur-Kelly, M. (2011). Professional learning for teachers 
without special education qualifications working with students with severe 
disabilities. Teacher Education and Special Education, 34, 7-20. 
https://doi.org/10.1177/0888406410384407  
 
Tanner-Smith, E. E., Tipton, E., & Polanin, J. R. (2016). Handling complex meta-analytic 
data structures using robust variance estimates: A tutorial in R. Journal of 
Developmental and Life-Course Criminology, 2, 85-112. 
https://doi.org/10.1007/s40865-016-0026-5  
 
*Thompson, M. T., Marchant, M., Anderson, D., Prater, M. A., & Gibb, G. (2012). 
Effects of tiered training on general educators' use of specific praise. Education 
and Treatment of Children, 35, 521-546. https://doi.org/10.1353/etc.2012.0032  
 
Therrien, W. J., Mathews, H. M., Hirsch, S. E., & Solis, M. (2016). Progeny review: An 
alternative approach for examining the replication of intervention studies in 
special education. Remedial and Special Education, 37, 235-243. 
https://doi.org/10.1177/0741932516646081  
 
Tracz, S. M., Daughtry, J., Henderson-Sparks, J., Newman, C., & Sienty, S. (2005). The 
impact of NBPTS participation on teacher practice: learning from teacher 
perspectives. Educational Research Quarterly, 28(3), 35–50. 
https://files.eric.ed.gov/fulltext/EJ718123.pdf  
 
Tripp, T. R., & Rich, P. J. (2012). The influence of video analysis on the process of 
teacher change. Teaching and teacher education, 28, 728-739. 
https://doi.org/10.1016/j.tate.2012.01.011  
 
Van Es, E. A., & Sherin, M. G. (2002). Learning to notice: Scaffolding new teachers’ 
interpretations of classroom interactions. Journal of Technology and Teacher 
Education, 10(4), 571-596. 
 
 
 
133 
  
Valentine, J. C., Tanner-Smith, E. E., Pustejovsky, J. E., & Lau, T. S. (2016). Between-
case standardized mean difference effect sizes for single-case designs: a primer 
and tutorial using the scdhlm web application. Campbell Systematic Reviews, 12, 
1-31. https://doi.org/10.4073/cmdp.2016.1  
 
Watkins, M. W., & Pacheco, M. (2000). Interobserver agreement in behavioral research: 
Importance and calculation. Journal of Behavioral Education, 10(4), 205-212. 
ttps://doi.org/10.1023/A:1012295615144 
 
Weber, K. E., Gold, B., Prilop, C. N., & Kleinknecht, M. (2018). Promoting pre-service 
teachers' professional vision of classroom management during practical school 
training: Effects of a structured online-and video-based self-reflection and 
feedback intervention. Teaching and Teacher Education, 76, 39-49. 
https://doi.org/10.1016/j.tate.2018.08.008  
 
*Westover, J. M. (2010). Increasing the Literacy Skills of Students Who Require AAC 
through Modified Direct Instruction and Specific Instructional Feedback 
[Doctoral dissertation, University of Oregon]. ProQuest. 
https://www.proquest.com/docview/749881596  
 
Westover, J. M., & Martin, E. J. (2014). Performance feedback, paraeducators, and 
literacy instruction for students with significant disabilities. Journal of Intellectual 
Disabilities, 18, 364-381. https://doi.org/10.1177/1744629514552305  
 
What Works Clearinghouse. (2020). What Works Clearinghouse Standards Handbook 
(Version 4.1). Department of Education, Institute of Education Sciences, National 
Center for Education Evaluation and Regional Assistance. 
https://ies.ed.gov/ncee/wwc/Docs/referenceresources/WWC-Standards-
Handbook-v4-1-508.pdf  
 
Yoon, K. S., Duncan, T., Lee, S. W.-Y., Scarloss, B., & Shapley, K. L. (2007). Reviewing 
the evidence on how teacher professional development affects student 
achievement. (REL 2007-No. 033). Department of Education, Institute of 
Education Sciences, National Center for Education Evaluation and Regional 
Assistance. https://files.eric.ed.gov/fulltext/ED498548.pdf  
 
Zhang, M., Lundeberg, M., Koehler, M. J., & Eberhardt, J. (2011). Understanding 
affordances and challenges of three types of video for teacher professional 
development. Teaching and teacher education, 27, 454-462. 
https://doi.org/10.1016/j.tate.2010.09.015  
 
Zwaan, R. A., Etz, A., Lucas, R. E., & Donnellan, M. B. (2018). Making replication 
mainstream. Behavioral and Brain Sciences, 41. 
https://doi.org/10.1017/S0140525X17001972  
 
 
134