MISSION ACCEPTED: A CASE STUDY EXAMINING THE RELATIONSHIP OF 
KHAN ACADEMY WITH STUDENT LEARNING 
 
 
 
 
 
 
 
 
 
 
 
 
 
by 
 
GEOFFREY BARRETT 
 
 
 
 
 
 
 
 
 
 
 
 
 
A DISSERTATION 
 
Presented to the Department of Educational Methodology, Policy, and Leadership,  
and the Graduate School of the University of Oregon 
in partial fulfillment of the requirements for the degree of 
Doctor of Education  
 
March 2018 
 
 ii 
 
DISSERTATION APPROVAL PAGE 
 
Student: Geoffrey Barrett 
 
Title: Mission Accepted: A Case Study Examining the Relationship of Khan Academy 
with Student Learning 
 
 
This dissertation has been accepted and approved in partial fulfillment of the 
requirements for the Doctor of Education degree in the Department of Educational 
Methodology, Policy, and Leadership by: 
  
Kathleen Scalise Chairperson 
Michael D. Bullis Core Member 
Keith Hollenbeck Core Member 
Joanna Goode Institutional Representative 
 
and 
 
Sara D. Hodges Interim Vice Provost and Dean of the Graduate School  
 
Original approval signatures are on file with the University of Oregon Graduate School. 
 
Degree awarded March 2018 
  
 iii 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
© 2018  Geoffrey Barrett  
  
 iv 
 
DISSERTATION ABSTRACT 
 
Geoffrey Barrett 
 
Doctor of Education 
 
Department of Educational Methodology, Policy, and Leadership 
 
March 2018 
 
Title: Mission Accepted: A Case Study Examining the Relationship of Khan Academy 
with Student Learning  
 
This study examined implementing the online website Khan Academy as a primary 
resource for mathematics instruction. Participants were high school students aged 15-18 
years enrolled in the traditional mathematics courses of Algebra 1, Geometry, and Algebra 
2. A pre-test/post-test research design was implemented over the course of a six-week 
period of instruction.  I wanted to examine whether Khan Academy was associated with 
positive learning outcomes over the six-week period as compared to measures of 
normalized growth.  
Additionally, I asked whether a beta program to personalize instruction on Khan 
Academy was associated with statistically significantly better outcomes compared to the 
regular Khan Academy course sequences alone. To address my questions, I randomly 
assigned students into treatment and comparison groups. As a measure of learning growth, 
I used the Northwest Education Assessment’s Measures of Academic Progress (MAP) to 
establish a pre-treatment baseline and again at the end of the program to measure learning 
growth. I compared before and after means. Overall, I found that students in both groups 
showed overall positive growth, statistically significantly different from normal expected 
growth. However, I did not find a statistically significant difference between the two 
groups.   
 v 
 
In terms of practical implementation, the results of this study suggest that use of 
Khan Academy as a primary instructional resource is associated with positive learning 
outcomes in this data set.  Further study with larger sample sizes to confirm these 
preliminary results is recommended. 
 
 vi 
 
CURRICULUM VITAE 
 
NAME OF AUTHOR:  Geoffrey Barrett 
 
 
GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: 
 
 University of Oregon, Eugene 
 University of New Mexico, Albuquerque 
 University of Iowa, Iowa City 
  
 
 
DEGREES AWARDED: 
 
 Doctor of Education, 2018, University of Oregon 
 M.A., Special Education, 2002, University of New Mexico  
 B.A., History (Biology minor), 1989, University of Iowa  
  
 
 
AREAS OF SPECIAL INTEREST: 
 
 Educational Technology 
  
 Education of Homeless Students 
 
 
PROFESSIONAL EXPERIENCE: 
 
 Teacher, West Lane Technology Learning Center, 10 years   
 
 Teacher, Robert F. Kenney Charter High School, 5 years 
  
 
 
 
 vii 
 
ACKNOWLEDGMENTS 
 
 
 
I wish to express sincere appreciation to my committee chair, Dr. Kathleen 
Scalise, whose guidance in the preparation of this manuscript was essential. Also, special 
thanks are due to Dr. Michael D. Bullis who served for a time as my committee chair and 
offered vital support and mentorship. In addition, I wish to thank Dr. Keith Hollenbeck 
and Dr. Joanna Goode for their insights into this project. West Lane Technical Learning 
Center provided the space and internet resources required to complete this study with 
support from director, Ron Osibov. Finally, I would not have been able to finish this 
project without the support of my wife, Lois, and daughters, Faolan and Siobhan. 
  
 viii 
 
 
 
 
 
 
 
Dedicated to my wife, Lois Pribble, who inspires me to be better. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ix 
 
TABLE OF CONTENTS 
Chapter Page 
 
 
I. INTRODUCTION AND LITERATURE REVIEW ................................................ 1 
 Background ............................................................................................................ 1 
      Statement of Problem ............................................................................................. 2 
 Definitions.............................................................................................................. 2 
          
            Blended Learning ............................................................................................. 2 
 
            Online Learning ............................................................................................... 2 
 
       Mastery-based Learning................................................................................... 3 
 
            Student-centered Learning ............................................................................... 3  
 
            Student-directed Learning ................................................................................ 3   
 
      Growth of Online Learning ............................................................................... 3 
 
      Literature Review................................................................................................... 3 
 Effectiveness of Online Learning .................................................................... 4 
           The Pedagogy of Khan Academy ..................................................................... 5 
            Khan Academy Implementation Strategies ..................................................... 8 
            Effectiveness of Khan Academy ...................................................................... 10 
 Research Questions ................................................................................................ 15 
II. METHODS.............................................................................................................. 16 
 Population Sample ................................................................................................. 18 
 Random Assignment ........................................................................................ 19 
      Setting……. ...........................................................................................................    21 
      Khan Academy Implementation Strategy .............................................................. 22 
 x 
 
   Chapter Page 
 
      Measuring Growth  …………. ..............................................................................    23 
      Data Collection…………. .....................................................................................    24 
      Data Analysis …………........................................................................................    25 
III. RESULTS AND DISCUSSION ............................................................................ 26 
 Effect of MAP Recommended Practice Pilot ........................................................ 27 
 Examining Aspects of Attrition ............................................................................. 28 
 Overall Proficiency Growth ................................................................................... 29 
 Comparison to Expected Growth Norms ............................................................... 29 
 Comparison to Expected Growth Norms ............................................................... 30 
Discussion .............................................................................................................. 34 
 Limitations ............................................................................................................. 36 
IV. DISCUSSION ........................................................................................................ 39 
 Limitations .............................................................................................................    41 
 Implications for Practice ........................................................................................    43 
      Conclusions ............................................................................................................    43 
APPENDIX  .................................................................................................................    49 
 Implementation Guide ...........................................................................................    49 
REFERENCES CITED ................................................................................................ 52 
 xi 
 
LIST OF FIGURES 
 
Figure Page 
 
 
1. Example of MAP Recommended Pathway............................................................ 18 
 
2. Initial Frequency of Participant Percentile Ranks ................................................. 19 
 
3.   Khan Academy Learning Pathway ........................................................................    23 
 
4.   Percent of participants exceeding normal expected growth ..................................    31 
 
5.   Percentile Rank Range ...........................................................................................    32 
 
6.   Reported satisfaction with learning progress .........................................................    36 
 
7.   Likeliness of watching videos ................................................................................    37 
 
8.   Frequency of seeking instructor assistance ............................................................    37 
 
9.   Frequency of note-taking .......................................................................................    38 
 
10.   Khan Academy Learning Pathway ......................................................................    51 
 
 xii 
 
LIST OF TABLES 
 
Table Page 
 
 
1. Pre-test/post-test research design .. ........................................................................ 17 
 
2. Comparison vs. treatment, original pairs.. ............................................................. 26 
 
3. Comparison vs. treatment, rematched pairs ........................................................... 29 
4. Overall pre- and post-test means, t-test results ...................................................... 29 
5.  Comparison of RIT growth statistical significance (p-value) by quartile .............    30 
6.   Observed RIT growth vs. expected growth ...........................................................    33 
7.   Comparison of RIT growth statistical significance (p-value) by quartile ..............    34 
8.   Relationship between RIT score change and initial score .....................................    35 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
 1 
 
CHAPTER I 
INTRODUCTION AND LITERATURE REVIEW  
The purpose of this study is to examine the relationship of Khan Academy with 
learning outcomes for high school math students.  Although Khan Academy is used 
world-wide and its influence in educational settings is growing, there have been few 
examinations of its effectiveness in terms of student learning outcomes. The goal of this 
study is to make a small contribution to that gap in the literature. To examine the question 
of Khan Academy’s effects on student learning outcomes, a population of high school 
aged students engaged in a six-week treatment consisting of using the platform as a 
primary resource for acquiring math skills.  
Khan Academy is a non-profit educational organization that offers online tutoring 
in a variety of subject areas.  It is most known for its catalog of mathematics instructional 
videos. Since its beginnings in around 2011, it has evolved from a YouTube video 
catalog of discrete mathematical skills into a full-fledged, mastery-based, student-
centered tutorial program. Pertaining to secondary education, Khan Academy offers full 
content courses, referred to as missions, in traditional learning pathways such as Algebra 
1, Geometry, Algebra 2, and Pre-calculus, as well as in an integrated math approach, 
Mathematics 1, Mathematics 2, and Mathematics 3.  Recently, Khan Academy 
introduced the MAP Recommended Practice feature which allows teachers, referred to as 
coaches, to personalize individual learning pathways based on student MAP scores.  
Presently, Khan Academy is used by millions of learners globally and is often 
incorporated into primary and secondary school math programs. Despite its widespread 
use and influence on math instruction around the world, few scholars have focused 
 2 
 
exclusively on whether the use of the platform has a positive effect on students’ learning 
outcomes.   
Statement of Problem 
Educators and researchers have raised concerns about both Khan Academy and 
distance learning, despite the tremendous growth of both. Some educators criticized the 
learning platform for lacking a pedagogical foundation and instructional sophistication 
(Kai, 2012; Strauss, 2012) . Another critique of Khan Academy, is that teachers often do 
not implement the program in a way that was originally envisioned by the founder 
Salman Khan (Cargile & Harkness, 2014). Over time, Khan Academy has revamped 
much of its exercise content and modes of delivery, although many of the videos remain 
unchanged. This study examined the outcomes of Khan Academy as an instructional tool.  
The purpose of this study is not to address all the criticisms of Khan Academy, but 
focuses on student learning outcomes. The bottom line here is: do students gain 
proficiency through use of the platform as an instructional tool. 
Definitions 
 To avoid confusion, some commonly used terms will be used according to the 
following definitions. 
Blended Learning. Blended learning refers to a classroom situation in which 
some part of the instruction occurs in a face-to-face setting while some part is computer 
and internet-based conducted remotely from a brick and mortar location. 
Online Learning.  Online learning occurs entirely through the internet with no 
physical face-to-face interaction between teacher and student.  It should be noted that 
with technology available at this date, the line between blended and online learning can 
 3 
 
be blurred.  Presently available applications support remote face-to-face interactions 
through video conferencing, voice over the internet protocol (VOIP), and instant 
messaging.  For the purposes of this study, online learning will refer to instruction that is 
conducted entirely over the internet and not in a physical face-to-face setting.  
Mastery-based Learning. Mastery-based learning focuses on each individual 
student’s learning growth. Rather than proceed along a course of study at a uniform pace 
with all other students, as in a traditional classroom, regardless of whether learning 
growth has occurred, mastery-based learning requires students to achieve a threshold of 
proficiency before moving on to the next task or concept (Kulik, Kulik, Bangert, & 
Bangert-Drowns, 1990). This study will consider mastery-based programs to be 
instruction that requires a student to demonstrate a specific level of proficiency in order 
to progress to the next level.    
Student-centered Learning.  Student-centered learning refers to instruction that 
is specifically and deliberately designed to focus on the needs of the student, including 
consideration of past history of success or lack of success in learning, socio-emotional 
issues impacting performance, and present levels of academic proficiency.   
Student-directed Learning. A student-directed approach allows students 
multiple options in order to reach their learning goals. These options can include how 
much time to spend on a particular learning task, the sequence of task focus, and types of 
learning tasks to access.  
Literature Review and Conceptual Framework 
 
I conducted a web search for peer-reviewed articles with the search terms Khan 
Academy and mathematics utilizing the University of Oregon Library search 
 4 
 
function.  The search returned 215 potentially relevant articles.  After reviewing the 
abstracts, commentaries and articles dealing with topics unrelated to this study were 
eliminated. I retained articles reporting original research. There were 21 articles that met 
my criteria. I then classified each article into the categories Effectiveness of Online 
Learning, Khan Academy Implementation Strategies, Khan Academy Student 
Engagement, and Effectiveness of Khan Academy (2). I review each category below.  
Effectiveness of Online Learning. Studies support the hypothesis that online 
learning is at least as effective as face-to-face instruction. Some moderating factors, such 
as blended learning, may increase the positive effects of online learning, but even in the 
absence of those factors, online instruction as a means of delivering instruction has 
positive support in the literature. A 2013 meta-analysis of 45 studies found that “purely 
online learning has been equivalent to face-to-face instruction in effectiveness (Means, 
Toyama, Murphy, & Baki, 2013).” That study also found that the moderating influence of 
blended learning resulted in higher effects. The authors cautioned against interpreting 
results as suggesting “that online learning is superior as a medium (Means et al., 2013, p. 
36).”  The authors suggest that varying different kinds of learning activities proved most 
effective across strategies. One example of a specific quasi-experimental study 
examining the effects of an online math tutoring program, ASSISTments, found that 
students using the program statistically significantly outperformed students who were in 
the comparison school. The effect was greater for students identified as requiring special 
assistance (Koedinger, Mclaughlin, & Heffernan, 2010). 
A key factor to be considered here is the context of this investigation, situated in 
the growth of online or distance learning for education. The growth of Khan Academy is 
 5 
 
embedded in a broader increase in popularity of distance learning. Before the advent of 
the internet, distance learning was most commonly practiced by correspondence. Today it 
is more often the case that instruction is delivered via the internet. Many educational 
companies offer completely online courses. Programs like Odysseyware, Connections 
Academy, and K12 offer full online curricula for kindergarten through high school.  
Other course management systems, many of them free, such as Google Classroom, 
Schoology, and Edmodo, provide platforms for individual teachers to build their own 
online course content.   
Due to the availability of online learning opportunities, high school enrollment in 
online courses has steadily increased in recent years. According to the U.S. Department 
of Education (DOE), 1.3 million high school students were enrolled in distance learning 
classes in 2009-10. Distance learning has also grown in higher education, with an 
estimated 5.8 million students enrolled in at least one online course and 2.8 million 
enrolled in exclusively distance education courses  (Allen, Seaman, Poulin, & Straut, 
2016). Given this growth, it is important to examine the effectiveness of online resources 
available to teachers and students on the internet. 
Effectiveness of Summer Math Programs. One measure of the effectiveness of using 
Khan Academy in a summer math program is to compare the results to other summer 
math programs. An obstacle to making a definitive comparison is the lack of research 
comparing student achievement against normal growth expectations in summer programs. 
Examples of research that measure learning growth with a pre-/post-test design do exist. 
One such program conducted by researchers at Indiana University examined the effects 
of a two-week summer program found that students did experience positive learning 
 6 
 
gains (Timme et al., 2013). In that study, students received instruction in physics, AP 
Physics, pre-calculus, and AP Calculus. The program did not address entire course 
content but focused on prerequisites necessary for success in each regular high school 
courses. Researchers found positive results in this study. 
 It is important to note some differences in the design of that program compared to 
the design of the program in this study. The Indiana University program did not attempt 
to deliver a semester long course in a summer program. Also, the courses addressed by 
the study were higher level courses for high school students, while this study will 
examine effects of a summer program on growth in required high school math content 
rather than more advanced content, including Algebra 1, Geometry, and Algebra 2. 
Finally, and perhaps most important for comparison purposes, the pre-/post-test design of 
the Indiana study tested students using identical tests containing content directly 
addressed in the program. The current study compared student growth in the program to 
normal growth expectations with a widely used, standardized, measurement instrument, 
the NWEA Measure of Academic Progress. Course content was not specifically tailored 
to address the standardized test, but consisted of the entire course content for a typical 
high school mathematics course.  
Conceptual Framework: The Pedagogy of Khan Academy. The conceptual 
framework of Khan Academy instruction is mastery learning, which is defined above as 
requiring students to achieve a threshold of proficiency before moving on to the next task 
or concept (Kulik et al., 1990). 
The efficacy of mastery-based programs is well-established in the literature. A 
1990 meta-analysis of 108 controlled studies found that mastery-based programs had a 
 7 
 
positive effect on student assessment performance for upper-elementary through college-
level age groups (Kulik et al., 1990). The study did find, however, that students in 
mastery-based programs often take more time to complete the course of study.  However, 
the meta-analysis found an average growth of 0.5 standard deviations on final 
examination scores. Further, low-aptitude students benefitted more from mastery-based 
programs than high-aptitude, although both groups benefitted. Finally, students in 
mastery-based programs tended to be more satisfied with the learning experience than 
students in more traditional settings.  
Khan Academy is a mastery-based, student-directed learning resource.  
Instructors have the ability to set pacing recommendations and assign playlists, but within 
the learning platform students are able to decide for themselves what resources to use and 
when to use them to achieve mastery. For example, students are able to access videos 
which can be watched multiple times or opt to not use videos at all.  They are also able to 
consult example solutions, called hints, to determine for themselves what mistakes they 
are making. Immediate feedback informs the student if they were successful or not.  If 
unsuccessful, the student can access the entire solution and thus learn from his or her 
mistake. Students demonstrate mastery of a skill by providing the correct answer five 
times in a row.  After that, the specific skill is added to the student’s mastery challenges.  
From that point, each correct response on a mastery challenge raises the student’s 
mastery status by one level.  After the student demonstrates consistent competency, the 
skill is upgraded mastered, indicating the student has acquired this skill and knowledge. 
Incorrect responses on the mastery challenges result in that skill being returned to a 
“Needs Practice” classification and the process begins again.   
 8 
 
Khan Academy Implementation Strategies. Khan Academy is a web-based 
program that is available free of charge to anyone.  It has two main components that are 
regularly accessed by learners. First, it supports a catalogue of videos that contain 
explanations of math concepts as well procedural algorithms for hundreds of skills.  
Second, it provides problem sets that cover skills and concepts from very beginning 
math, single digit addition, through calculus.  Recently, Khan Academy upgraded 
secondary course content to include the traditional pathways Algebra 1 and 2, Geometry, 
Pre-Calculus and Trigonometry, as well as the more internationally integrated math 
pathways, Mathematics 1, Mathematics 2, and Mathematics 3.  The pedagogy of Khan 
Academy is student-directed, competency based mastery of identified skills in each 
course content area.  
 The content and framework of Khan Academy supports several different 
implementation strategies.  Those strategies primarily include a) personalized learning 
tool, b) supplemental resource, c) flipped classroom method, and d) primary course 
resource.  Flexible teachers are able to combine and evolve strategies to fit the needs of 
their students (Murphy, Gallagher, Krumm, Mislevy, & Hafter, 2014). I will describe 
each of these below. 
 Informal use of Khan Academy as a personalized learning tool is likely the most 
popular strategy. This type of use involves either an instructor making recommendation 
or individuals independently seeking assistance through the platform. In either case, 
students access specific videos or problem sets to assist their learning as needed.  
Recently, Khan Academy added a feature, still in beta form, that allows instructors to 
match scores obtained on the MAP to playlists generated by the program’s engine.  That 
 9 
 
feature has not been widely used and this study is likely the first academic examination of 
it.  
 Use as a supplemental resource is likely the most common method of 
implementation (Murphy et al., 2014).  In this strategy, a teacher encourages or schedules 
regular use of Khan Academy videos and problem sets as a skill building utility.  
Teacher-led instruction is the primary instructional strategy with Khan Academy as a 
secondary resource. This implementation strategy is conducive to traditional classrooms 
with Khan Academy use occurring outside of class time. Students are typically awarded 
extra credit points for participation. 
 In the flipped classroom method, instructors assign activities on Khan Academy 
for students to complete prior to the scheduled class meeting. The goal is to focus 
classroom time on guided and independent practice instead of lecture which is replaced 
by the outside class use of Khan Academy. The founder of Khan Academy, Salman 
Khan, initially embraced the flipped classroom as the most effective implementation 
strategy for the platform. 
 Implemented as a primary course resource, Khan Academy serves as both the 
primary method of concept introduction as well as providing the problem sets for 
practice.  The Khan Academy website allows for tracking of mastery of skills through 
practice and intermediate assessment and re-assessment. Utilizing this method, the 
student learns independently and at their own pace.  The center of focus is on the student 
and the process of learning, rather than on the teacher or style of teaching. 
  
 10 
 
Effectiveness of Khan Academy. Khan Academy is recognized as an important 
tool available to teachers.  For example, one study described the platform as enabling 
“powerful on-line classes (Ruipérez-Valiente, Muñoz-Merino, Leony, & Delgado Kloos, 
2015).”  The influence of Khan Academy extends beyond borders.  A study conducted in 
Chile asserted that it is “beneficial for students’ math skills (Light & Pierson, 2014).”  
Income-strapped India turned to Khan Academy in some communities as a substitute for 
teachers and books (Learning & Subbarayan, 2012).  The perception of Khan Academy 
as an effective educational tool for the teaching of mathematics is widespread. 
Emerging evidence provides some qualified support for that view. For example, a 
2014 statewide pilot study involving almost 6,000 students found a positive relationship 
between use of Khan Academy and proficiency growth.  Despite the recognition of Khan 
Academy and almost universal support for the platform, the literature on its effectiveness 
remains scant.  I will review in detail the largest studies to date, a state-wide pilot 
conducted in Idaho mentioned above and a similar study conducted in California.  
A statewide pilot study conducted in Idaho, Learning Gets Personal (hereafter, 
the Idaho study) found a positive relationship between student use of Khan Academy and 
proficiency growth as measured by the MAP  (cite: Learning Gets Personal).  Students 
who completed a higher portion of their mission, defined as the assigned course of study, 
showed more score gains than students who completed less.  Percent of mission 
completion also positively correlated with percentile rank improvement, demonstrating 
gains against the normal distribution of MAP test takers.  
The Idaho Study was a large-scale pilot project that included more than 5,000 
participants in grades 3-8 from 43 different schools throughout the state. The duration of 
 11 
 
the study was one school year.  Researchers administered pre- and post-assessments 
using the MAP as the measurement instrument.  Instructor participants were early 
adopters of Khan Academy who agreed to a set of classroom conditions including use of 
Khan Academy at least one hour per week and use of the MAP to measure growth. In 
addition, instructors were required to attend a professional development session at the 
start of the 2013-14 school years and complete weekly surveys on 
implementation.  Beyond those requirements, instructors had wide latitude in terms of 
implementation strategies adopted, which I will discuss below. 
The Idaho Study reported generally positive results for the effectiveness of Khan 
Academy classroom use. Most notably percent of mission completion showed a positive 
relationship with learning. Students who completed 0-10% of their assigned mission 
achieved expected annual growth, those who completed more than 40%, achieved more 
than 1.5 times their expected annual growth, and those completing more than 60% 
achieved 1.8 times their expected growth.  All groups averaged at least expected growth 
achievement. These results, though, positive, should be adopted with some caution.  
The generalizability of the Idaho Study is limited by a number of confounding 
factors.  First, as mentioned previously, there was little control over implementation 
strategies.  Khan Academy supports many types of classroom use, from watching videos 
occasionally for extra help to primary classroom resource.  The only requirement of the 
study was that teachers agreed to incorporate use of Khan Academy for one hour per 
week.  Similar to the findings of Koedinger et al., (2010) the results of this study suggest 
that more exposure to and engagement with the platform results in a larger positive 
effect.   
 12 
 
Secondly, the Idaho Study experienced a high attrition rate.  While more than 
10,000 students initially participated in the study, 5,304 completed it.  The researchers 
noted several reasons for the high attrition rate: a) some students did not take both the 
pre- and post-assessments, b) some data was lost due to inability to link Khan Academy 
data to MAP scores, c) some student MAP scores were disqualified due to invalid results 
caused by completing the assessment too quickly, and d) some students worked in 
missions that were outside grades 3-8, which was the focus of the study.  These attrition 
rates and rationale do raise some concern about integrity of the results.  I will address this 
issue by running a smaller scale experiment with tighter controls over attrition.   
As a third point, only grades 3-8 were included in the Idaho Study.  At the time of 
that study, Khan Academy did not offer full courses in high school level classes.  Since 
then, a full high school curriculum aligned to the Common Core State Standards has been 
rolled out.  Examining the effectiveness of Khan Academy’s course material on high 
school learning growth would broaden our knowledge of available resources for that age 
group.  
The Idaho study is an important step in furthering our knowledge of the 
effectiveness of Khan Academy.  As such, it provides a baseline for further 
research.  There are still large gaps to fill in our knowledge of Khan Academy’s effects 
on student learning in terms of proficiency growth. One area, in particular, is to focus on 
the effectiveness of particular implementation strategies.  For example, it could be 
hypothesized that, based on the Idaho Study, the deepest implementation strategy that 
results in the most exposure and engagement would yield the largest effect.   
 13 
 
The results of a study conducted by SRI Education generally support the findings 
of the Idaho Study.  Through funding from the Bill & Melinda Gates Foundation, 
Research on the Use of Khan Academy in Schools examined the implementation and the 
effectiveness of Khan Academy in mathematics classrooms (Murphy et al., 2014). Like 
the Idaho Study, the SRI Study found a positive relationship between Khan Academy use 
and test scores. The SRI Study also collected data on teacher perceptions of the 
effectiveness of Khan Academy and found that 80% of teachers reported that Khan 
Academy had a positive impact on students’ conceptual understanding of mathematics.  
The SRI Study was conducted during school years 2011-12 and 2012-2013. 
Researchers included seven sites for 2011-2012 and six sites for 2012-2013, with four 
sites repeating both years.   In the first year, 1,694 students participated, increasing to 
2,246 in the second year.  The study reported that most sites served students from low-
income communities and that several specifically used Khan Academy to address the 
needs of struggling students. The sites included students from three types of schools: 
regular public, independent, and charter. A majority (1,260 of 1,694) of the sample 
population attended regular public schools in 2011-12. In the second year, 47% of the 
participants attended regular public schools.  Student participants were in grades 6 
through 8. 
Similar to the Idaho Study, the SRI Study reported a variety of implementation 
strategies, identified as a) personalized learning tool, b) supplemental resource, c) flipped 
classroom model, and d) primary instructional resource. All sites were reported to use a 
blended learning model. It should be noted that only one school during the two year 
period of study adopted Khan Academy as a primary instructional resource.  Reasons 
 14 
 
given for that were lack of adequate computer access, content gaps at specific grade 
levels, or both. The study reported that variations in implementation occurred within 
schools, not just across schools, as well as even within single classrooms over time. 
Many teachers adjusted their implementation strategies as they gained proficiency in 
using Khan Academy as an instructional tool.  
One site, identified as Site 2, provided an example of an implementation 
strategy.  Site 2 adopted a competency-based instructional model that focused on self-
pacing and self-directed learning. One of Site 2’s goals was to develop students’ self-
advocacy and independence in preparation for post-secondary educational opportunities. 
The sample size of Site 2 was 200 student participants, 45% of whom qualified for the 
federal free lunch program. Students at Site 2 participated in a daily two-hour math block 
divided evenly between teacher-led instruction and student-directed independent 
learning.  During the independent learning period, students followed Khan Academy 
playlists to access videos and problem sets for practice. Students were allowed to 
progress at their own pace. During “core time,” teacher-led instruction focused on 
deepening conceptual understanding and one-on-one support.  
The SRI Study, like the Idaho Study found a statistically significant positive 
relationship between both independent variables of minutes spent and problem sets 
completed and improved test scores.  In the SRI Study, the California Standards Test 
(CST).  This independent finding, using a different measurement instrument, provides 
support for the Idaho Study’s similar findings.   
 15 
 
Research Questions 
 The discussion above has revealed some areas of further study.  First, there is 
little available data on the effects of Khan Academy on student outcomes.  Two studies 
cited in this review did provide some support for the claims that use of Khan Academy 
enhances outcomes, but neither study attempted to isolate a particular implementation 
strategy.  By isolating Khan Academy as a primary classroom resource, a fuller extent of 
effectiveness can be captured.  Secondly, participant assignment to control and treatment 
groups can allow for a comparison of two approaches, using Khan Academy’s available 
recommended course playlists versus the new beta feature that allows for greater 
personalization by using student MAP scores. Third, collection of learning analytics 
available through the teacher dashboard can shed more light on effective student use of 
the learning resources accessible on the Khan Academy platform.  This study, thus, 
employed the following research questions: 
RQ1: Is there a difference in effects between the control and treatment groups? (Control 
and treatment groups are defined in the Methods chapter.) 
RQ2: Is there a difference in the overall achievement growth between either or both study 
groups and normalized growth expectations? 
RQ3: Is participation in the program associated with positive growth for student learning, 
for this data set?  
RQ4:  Is there a relationship between beginning proficiency level and growth in 
proficiency at the end of the study period? 
 16 
 
CHAPTER II 
METHODS 
This study examined the outcomes of Khan Academy as a primary instructional 
tool for high school mathematics students participating in a hybrid summer program. 
Using the NWEA Measure of Academic Progress (MAP), I examined the differences in 
outcomes between the treatment and a control group assigned to the normal condition, 
and compared the outcomes to normed growth expectations.  Using statistical analyses, I 
examined the growth of students using Khan Academy, including differences between the 
comparison and treatment groups. I also conducted a correlational analysis to determine 
the effect of starting proficiency on the amount of growth achieved during the six-week 
study period. 
Study Design 
 This study employed a pre-test/post-test design to assess the effects of using Khan 
Academy as a learning resource. Participants were administered a pre-test using the 
NWEA MAP assessment before engaging with the treatment, the use of Khan Academy 
as a learning resource. After six weeks, participants were administered the MAP as a 
post-test in order to examine the effects of the treatment. Additionally, students were 
randomly assigned to one of two groups, a comparison group and a treatment group in 
order to assess effects of a pilot program designed to personalize student course material 
based on MAP results (see Table 1).  
 
 
 
 17 
 
Table 1     
Pre-Test/Post-test Study Design 
 Pre-Test Khan Academy 
Khan Academy w/MAP 
recommendations Post-Test 
Comparison X X  X 
Treatment X  X X 
 
Participants assigned to the control group were assigned to a standard course of 
study available on Khan Academy. Khan Academy offers the traditional courses in 
Algebra 1, Algebra 2, Geometry, Pre-Calculus, and Trigonometry. In addition to 
traditional courses, the program also offers an integrated math approach with courses 
Mathematics I, Mathematics II, and Mathematics III. For the purposes of this study, each 
student was assigned a course based on the recommendation of their high school 
counselor. As it turned out, all participants in the control group were assigned to 
traditional sequences of Algebra 1, Geometry, and Algebra 2.  
The treatment group was assigned a course of study recommended by their 
counselors but adjusted according to Khan Academy’s MAP recommended practice tool.  
That tool is a beta feature on Khan Academy and is based on each individual MAP 
score. In some cases, the course was adjusted by the researcher to account for sequencing 
and to assure that the student received instruction meeting the required course content per 
state standards. In those cases, elements of the recommended course were combined with 
the required content from them traditional course of study to best scaffold the learning 
process of the student.   
An example of the difference between the comparison and treatment groups is 
The MAP Recommended Practice tool indicated that a participant whose course of study 
included sequences would benefit from the exercise: Math Patterns 2. In this specific 
 18 
 
case, that participant would have Math Patterns 2 to their course of study which would 
deviate from the standard course of study (see Figure 1). 
 
Figure 1. Example of MAP Recommended Practice vs. Standard Course of Study 
(Sequences) 
 
A stratified randomized matched pair design was used to assign students to study 
groups. For the stratification, participants were first matched into pairs based on their pre-
test MAP RIT scores and their courses of study. After they were paired one member of 
each pair was randomly assigned to either the comparison or the treatment group. Note 
that after the treatment was applied, differential attrition was addressed by examining the 
characteristics of the completion groups. More details on the randomization approach and 
the attrition plan are described below in the Population Sample section. 
To continue with the study, treatments were applied, data collected from students 
during the intervention process (which will be discussed below), and then the post-test 
administered and results analyzed. 
Population Sample 
Participants in this study were high school aged students who were referred to the 
summer math program by their school counselors.  Students came from a combination of 
rural schools in Lane County, Oregon and more urban school located in the city of 
 19 
 
Eugene, and could be expected to be higher risk students for credit denial in mathematics 
due to referral to summer intervention by their counselors. All home schools were public 
with the exception of one private school. Sixty students participated in the summer 
program. Each student’s current level of mathematics achievement was assessed using 
the MAP which generated a  percentile rank range from 5 to 99, indicating a broad range 
of initial ability with bias toward lower achievement (see Figure 2). 
 
Figure 2. Frequency of Participant Percentile Ranks. Results of MAP pre-test by percentile rank.  
Random Assignment. I randomly assigned each participant to the treatment 
group or the control group. In order to control for initial differences, I matched 
participants into pairs of similar pre-test MAP scores.  All scores were sorted by 
percentile rank, then paired with a similar score in the order. Gender and course 
enrollment were considered in matching scores with the similarity in score given the most 
weight.   
In order to allocate students to groups, I conducted a matched pair examination of 
the results obtained in this study.  Participants were first matched into pairs based on their 
 20 
 
pre-test MAP RIT scores and course assignments. Note that these pairs were used only 
for stratification to allocate the sample, and not as a matched-pair design in the analysis. 
(For analysis, the groups were considered equivalent groups at pretest, see results below.)  
After students were paired up to stratify the sample, I randomly assigned one 
member of each pair to the treatment group. The treatment group was assigned a course 
of study based on the MAP Recommended Practice. The comparison group was assigned 
the normal course of study available on Khan Academy.  
After assignments were completed, I compared the means of each group to 
ascertain whether there was an initial statistically significant difference of means between 
the two groups. The mean pre-test RIT scores for the treatment group (µ = 228.95, SD = 
13.98) and the comparison group (µ = 230.00, SD = 14.21) were not found to be 
statistically significantly different (p = 0.82, α < 0.05).. Therefore the groups were 
considered equivalent groups on the variables of interest following random assignment  
Ensuring Participant Privacy. All data was de-identified and no individual 
scores are reported in this study. Also, no disaggregated data with a sample size less than 
six is reported.  I conducted this study in a school environment in which all the 
requirements of FERPA are rigorously observed. 
Potential Effects on Participants. Because all students were exposed to the 
state-required content standards, there was little potential harm to the students, either in 
the comparison or the treatment group. At the outset, it was unknown whether one 
instructional strategy is statistically significantly superior to the other, so there is no 
known advantage to being assigned to one group over the other. As the instructor, it was 
my responsibility to ensure that all students received high quality instruction and that 
 21 
 
consideration took precedent at all stages of this study. In keeping to that principle, 
however, there were no instances in which the needs of the study conflicted with my 
professional responsibilities in the carrying out of this study. 
It is possible that knowledge of participating in a study could have a positive 
effect on student motivation. While that might be a factor that impacts the 
generalizability of the results, it is a potential risk that I feel is worth taking. In observing 
students who participated, they did not seem to be concerned about how their work 
affected the study and, indeed, most seemed to forget that they were participants in a 
study and were not mindful of that fact on a day-to-day basis.  
Setting 
The Summer Math Academy (SMA), which is the context of the study here, 
operates out of a small public charter high school in rural Oregon. The SMA is a hybrid 
program combining elements of online distance learning with face-to-face instruction.  
During the course of the study, group instruction did not occur, but one-on-one 
instructional sessions were normal, particularly with students who required extra 
assistance. All students were required to spend at least one 3-hour session at the school in 
person each week; otherwise, they worked from a distant location. Students were able to 
access one-on-one instruction through internet applications such as Google products and 
online whiteboards, as well. One-on-one instruction was delivered as needed, either by 
request of the student or intervention by the instructor. The duration of the program was 
six weeks with an open computer lab Monday through Thursday from 8:30 to 11:30. The 
program used Khan Academy as its primary instructional tool throughout.  
Khan Academy Implementation Strategy 
 22 
 
Khan Academy’s pedagogy is mastery-based, student-centered instruction.  
Students learn through active engagement with the program, accessing videos and hints, 
to assist in the learning process. Teachers, termed coaches act as guides or mentors, 
intervening as needed. KA provides coaches with a variety of tools to monitor student 
progress. They can view whether students are achieving success or struggling, the time 
students spend overall as well as on each problem within an exercise, and which learning 
tools the student accesses. The coach can focus instructional time on students who are 
struggling while students doing well are able to proceed at their own pace. While this 
program has apparent success, it is important to use an outside tool to monitor student 
proficiency growth to corroborate anecdotal observations. 
For the purposes of this study, I employed Khan Academy as the primary 
classroom resource in a blended hybrid summer math program. Students were assigned 
their coursework through Khan Academy. All students received a tutorial on the best 
practices for success on Khan Academy. I instructed students to follow a best practices 
learning strategy following these steps: a) examine given problem, b) determine whether 
they have the background knowledge to attempt a solution, c) attempt a solution.  If the 
solution is correct, the student moves on to the next problem. Otherwise, they are 
encouraged to watch a video first, then attempt a solution. In either case, if the solution is 
incorrect, the student may consult the hints to determine what their mistake was. If after 
following the recommended learning pathway, the student still requires assistance, he 
will be encouraged to solicit assistance from the teacher (see Figure 3).  See Appendix A 
for a full description of the program implementation. 
 23 
 
 
Figure 3. Khan Academy Learning Pathway. 
.   
Measuring Growth 
This study employed the Northwest Evaluation Association Measures of 
Academic Progress as the instrument to measure growth during the six-week period of 
this study. The Northwest Evaluation Association (NWEA) is a “global not-for-profit 
educational services organization.” NWEA developed the Measure of Academic Progress 
as an interim assessment to measure student academic growth over time. MAP 
assessments are computerized adaptive tests (CATs) that report scores based on a linear 
Rasch Unit (RIT) scale. The MAP RIT scale provides a valid and consistent measure of 
 24 
 
academic growth (Wang, Mccall, Jiao, & Harris, 2013). I conducted pre- and post-MAP 
assessments with participants in this study to measure the degree of growth during the 
period of the study. 
To assess the appropriateness of utilizing MAP for this study,  a search for the use 
of NWEA Measure of Academic Progress (MAP) as a measurement of student academic 
growth was conducted. That search found 31 articles that matched search parameters on 
the UO Library Search engine related to MAP as a measure of growth. Of the 31 
documents, eight were dissertations that used the MAP as a measure of growth. Only the 
abstracts of these eight studies were available for review. One study involved the use of 
MAP to measure the academic growth of students receiving instruction through 
American Sign Language/English bilingual model (Lange, Lane-Outlaw, Lange, & 
Sherwood, 2013). Due to its use in several peer reviewed studies, I concluded from my 
review of the MAP that it is a suitable instrument for measurement of growth in this 
study. 
Data Collection 
Each participant completed a MAP mathematics pre-test at the start of the six-
week program and a post-test again at the end of the program. Testing conditions 
followed protocols established for the administering of state tests.  Those protocols 
include no electronic devices in testing area, no discussion or helping on the assessment, 
use of only materials provided within the computerized testing environment itself, for 
example, calculators. The treatment period lasted be six weeks from the end of June 
through the first week of August. I administered the post-test to all participants at the end 
of the six-week period.  
 25 
 
Data Analysis 
In order to determine the effects of Khan Academy use as an instructional tool, 
participant pre-test and post-test scores were compared. The MAP was administered for 
both the pre-test and the post-test.  I examined the difference of means between the 
treatment and the comparison groups and conducted a t-test comparison as a measure of 
the statistical significance. Similarly, I compared the growth of all students on the MAP 
during the six-week period and conducted a t-test comparison of means as a test of 
statistical significance. Finally, I compared the actual growth found to the expected 
growth as defined in the NWEA 2015 MAP Norms for Student and School Achievement 
Status and Growth. (Thum & Hauser, 2015).   
The analysis applied in most cases was a one-tailed t-test to examine the 
difference of means between the pre-test and the post-test results. A one-tailed test 
because only the amount of positive change was of interest. While negative change is 
possible in some instances, an assumption herein is that students will make either no gain 
or some gain but are not likely to make negative gains after an application of an 
instructional treatment. A Pearson’s correlational test to investigate the possibility that 
there is a negative relationship between initial RIT score and change after the application 
of the treatment was also applied to the data. 
 
 
 
 
 
 26 
 
CHAPTER III 
RESULTS  
 Quantitative analyses were conducted to address the research questions 
introduced above. Primarily, results were compared using one-tailed t-test analyses to 
determine whether the observed differences of means between pre-test and post-test 
results were statistically significant. A correlational analysis was performed  to determine 
the relationship between the observed change in pre- and post-test scores was related to 
the initial proficiency of the participants.    
Effect of MAP Recommended Practice Pilot.  
The first research question addressed was whether there was a difference 
outcomes between participants in the comparison versus the treatment groups. A one-
tailed t-test was conducted to compare the mean growth achieved by each group. The 
comparison group achieved a higher mean RIT growth (µ=7.95) than the treatment group 
(µ=6.84). The statistic I obtained from the t-test indicated that there was no statistically 
significant difference between the means (p = 0.401, α < 0.05).  
Table 2 
Comparison vs. Treatment Pre-test/Post-test Change in Scores, Original Pairs 
 n Mean SD  Lower Upper df t Sig. 
Comparison 19 7.95 8.59  3.80 12.08 18 0.401 0.345 
Treatment 26 6.84 9.08  3.09 10.59 25   
*Statistically significant at α < .05   
 
As can be seen from Table 2 and noted in the Methods chapter, the comparison 
and treatment groups became unbalanced due to attrition.  
 27 
 
In order to complete the analysis on the post-test results as described in the 
Methods section, attrition in the sample was next addressed. Since this was a summer 
program for students at high risk of credit denial in mathematics, it was to be expected 
that the sample group would show substantial attrition rates and that non-completion of 
the program of study would be the case for a substantial number of students (non-
finishers), with exact numbers of course not possible to be known prior to treatment. This 
was found to be the case.  
During the course of treatment, 44 students, or 73% of the original sample, 
completed the entire study, including taking both the pre-test and post-test and 
participating in the intervention.  Sixteen students (non-finishers) opted out of 
participation, dropped out of the summer math program, or did not take the post-test. A 
limitation of this study, addressed below and discussed in the Limitations sections, then is 
attrition (study design threat of mortality within the program of study) and the degree to 
which attrition might have taken place differentially in the groups in some systematic 
way that was not a random.  
Differential favored the treatment group, with 11 participants attritioning from the 
comparison group and 5 from the treatment. Possibly of interest for future study is that 
more students from the comparison group did not finish (11) as compared to the 
treatment (5); however this study was too small to investigate and interpret meaningfully 
this difference, which was not part of the study design to examine. In order to examine 
the potential effects of attrition on the characteristics of the groups following treatment, 
finisher students who were paired with non-finishers for the comparison assignments 
were examined for similar characteristics between groups on the elements of 
 28 
 
stratification. Appropriate comparison students were identified for all students except for 
the differential attrition, which could not be addressed due to differences in final sample 
sizes between the two groups. While samples do not need to be equal for the t-tests 
applied in the analysis, this remains a limitation because the two groups could have been 
less equivalent at post-test than originally, introducing some bias in the results. Note that 
missingness at random or not at random was investigated to the extent possible with some 
external indicators described below. 
Examining Aspects of Attrition. The scores of students who opted out of 
participation were not included in any calculations.  However, in order to help estimate 
the scope of missingness not at random from attrition of the non-finishers on the final 
results, I compared initial and final results on the pre- and post-test for finishers and non-
finishers. The 13 non-finishers who had post-test scores available achieved an average 
RIT of 227.26 on the initial MAP assessment as compared to 229.47 for finishers. A t-test 
was conducted to determine the statistical significance of the difference between the two 
groups and a result of not statistically significant (p-value=.33, α<.05) was obtained. This 
helps to support a claim of missing at random between the two groups, but remains some 
but weak evidence that should be interpreted cautiously because other factors between the 
two groups may have been different, as well as data from non-finishers who returned for 
post-test could have been different from non-finishers who did not return for post-test.  
Analysis of Rematched Pairs. After accounting for attrition, a second t-test was 
conducted on the rematched pairs of data. Results of that analysis were similar to the 
initial test (see Table 3). Again, no statistically significant difference between the 
comparison group and the treatment group was found (p = 0.370, α < 0.05). 
 29 
 
 Table 3 
 Comparison vs. Treatment Pre-test/Post-test Change in Scores, Rematched Pairs 
 n Mean SD  Lower Upper df t Effect size Sig. 
Comparison 19 7.68 7.42  4.10 11.26 18 0.333 0.12 0.370 
Treatment 19 6.79 8.65  2.62 10.96 18    
 *Statistically significant at α < .05   
 
Overall Proficiency Growth.   
The second question addressed was the overall effect of the use of Khan Academy 
on the participants as a whole. To examine that question, the group means of the pre-test 
scores and post-test results of the entire group were. A difference in the mean of the 
scores obtained on the pre-test was observed to be higher than that of the post-test. A 
one-tailed t-test was conducted to determine whether the difference in means was 
statistically significant.  
The results of the t-test indicated that the difference of means was statistically 
significant. The post-test mean for the entire population of participants (µ = 237.53,SD = 
11.68) was observationally higher than the pre-test (µ = 230.42, SD = 15.82). The p-
value obtained was 0.008, indicating significance below an α < 0.05 (see Table 4). Due to 
the small population size, I also calculated a Cohen’s d statistic to determine the effect of 
this result. I obtained a Cohen’s d of 0.51, which is typically interpreted as a moderate 
effect.  
 Table 4 
 Overall Pre- and Post-test Means, t-test Results 
 n Mean SD  Lower Upper t df Effect size Sig. 
Pre-test 45 230.42 15.02  225.85 235.35 -2.48 44 .51 .008* 
Post-Test 45 237.53 11.68  234.09 241.55     
*Statistically significant at α < .05    
 
 30 
 
The possibility that the difference in scores could be statistically significant for 
either the comparison or treatment group was also tested. It was observed that the 
comparison group obtained a higher mean growth than the treatment group. To test the 
possibility that the either group, particularly the treatment group, could have obtained 
non-statistically significant results, a separate t-test was conducted. As with the overall 
population, for both the comparison (p = .042, α < .05) and the treatment (p = .041, α < 
.05) groups there was a statistically significant difference in mean scores (see Table 5) 
between the pre-test and the post-test. 
Table 5 
Comparison and Treatment t-test Results, Pre- and Post-test Means 
 n Mean SD  t df Effect size Sig. 
Comparison         
     Pre-test 19 234.42 15.49  -1.78 18 .59 .042* 
     Post-Test 19 242.37 10.98      
Treatment         
     Pre-test 26 227.50 13.97  -1.78 25 .52 .041* 
     Post-Test 26 234.00 10.88      
*Significant at α < .05  
 
. Comparison to Expected Growth Norms. To examine the question of whether 
participants achieved overall growth when compared to normal expectations, results from 
this study were compared to the normal expected growth as calculated by NWEA. 
NWEA publishes a norms study periodically that can be used to predict expected growth 
over varying time periods.  Statistics are published for 10
th
 grade expected growth over 
three time periods, a) fall to winter, b) winter to spring, and c) fall to spring (Thum & 
Hauser, 2015).  Because the participants were high school aged students who entered the 
summer program immediately at the conclusion of the regular school year the best 
 31 
 
comparison period is winter to spring. The rationale was to avoid comparison to a 
statistic that included recapturing losses from a long summer of no instruction. However, 
the comparisons to expected norms for all periods were calculated.   
Determining Normal Growth.  As mentioned above, the most appropriate 
statistic to use as a comparison for this study is the Winter to Spring expected growth, 
which NWEA publishes to be 0.85 RIT points. Initially, it was observed that most student 
(78%), achieved RIT growth higher than the 0.85 expected growth. Additionally, most 
students (69%) scored higher than expected growth for an entire school year from fall to 
spring semester (See Figure 4). 
 
Figure 4. Percent of participants exceeding normal expected growth for winter to spring 
semester (0.85) and for one school year (2.31). 
 
Observing the change in percentile rank frequency further suggests that use of 
Khan Academy has a positive effect on student learning outcomes (see Figure 5). The 
frequencies of percentile ranks have shifted to the right when compared to initial 
frequencies. 
 32 
 
 
To determine whether this observation indicated statistically significant growth 
for the six-week period, a single sample t-test was conducted comparing the mean growth 
of the overall population and the two assigned groups to the expected growth. The results 
are summarized in Table 6. Both the comparison group and the treatment group achieved 
statistically significantly more growth than expected (p < 0.0001, α < 0.05). This was the 
same result for all comparisons, including Fall to Winter (1.46) and Fall to Spring (2.31).  
In each case, there was a statistically significant difference in the mean growth obtained 
and the expected value (p <0.0001, α < 0.05). 
 
 
 
 
 
 
 33 
 
Table 6 
Observed RIT Growth vs. Expected Growth     
 Mean Growth      
 Expected Observed n SD t Effect size Sig.* 
Winter to Spring 0.85       
     Overall  7.48 45 8.71 4.97 .91 <0.001* 
     Comparison  7.95 19 8.59 3.50 .99 0.002* 
     Treatment  6.84 26 9.04 3.23 .80 0.004* 
Fall to Winter 1.46       
     Overall  7.48 45 8.71 4.51 .80 <0.001* 
     Comparison  7.95 19 8.59 3.20 .87 0.004* 
     Treatment  6.84 26 9.04 2.90 .70 0.008* 
Fall to Spring 2.31       
     Overall  7.48 45 8.71 3.86 .59 <0.001* 
     Comparison  7.95 19 8.59 2.78 .64 0.012* 
     Treatment  6.84 26 9.04 2.44 .50 0.022* 
 *Statistically significant at α < .05 
 
Effect of Initial Proficiency on Learning Outcomes 
Another important question to consider was whether participants with different 
levels of math achievement benefited differently from the Khan Academy program. The 
MAP Recommended Practice should impact students with lower levels of achievement 
more than students who are at or above proficiency for their level of math instruction. To 
examine this question, a comparison of participants in the treatment group with a RIT 
score less than the 10
th
 grade mean of 234, with the similar control group was conducted. 
Again, the RIT change within the comparison group (µ = 13.78) was higher than the 
mean for the treatment ((µ =8.8), but the difference was not statistically significant (p = 
0.105, α < 0.05). 
Related to the previous question, another consideration is whether participants 
with lower initial proficiency would benefit more or less than those with higher initial 
 34 
 
proficiency. To examine that question, the data were divided into groups based on pre-
test RIT scores. Two comparison analyses were conducted. The first compared growth of 
students who initially scored less than the average 10
th
 graders (RIT score of 234) to 
those who scored higher. Second, the data were divided into quartiles and again a 
comparison of  the mean growth of each group was conducted. The results of these 
comparisons are shown in Table 7.  
 The relationship between initial proficiency and amount of change in RIT score 
was examined by conducting independent one-tailed t-tests (see Table 7). The difference 
in means was statistically significant only for students who scored less than 234 on the 
pre-test. Although positive change in RIT scores was observed for all groups, only 
students with lower initial proficiency achieved statistically significant results.  
Table 7         
Comparison of RIT Growth Statistical significance (p-value) by Quartile 
 
n 
Pre-
test 
Post-
test 
Mean 
Change t df 
Effect 
size Sig. 
<234 26 220.62 231.15 10.54 -3.82 25 1.08 <0.0001* 
234 and 
more 
19 244.79 247.47 2.68 -0.94 18 0.31 0.177 
Quartile 1 11 209.45 225.64 16.18 -5.10 10 2.28 <0.0001* 
Quartile 2 11 227.55 234.45 6.91 -2.76 10 1.24 0.006* 
Quartile 3 11 234.90 238.60 3.55 -1.64 10 0.89 0.573 
Quartile 4 12 250.27 253.55 3.27 -1.20 11 0.54 0.122 
 *Statistically significant at α < .05 
 
The results of the comparison discussed above suggested a relationship may exist 
between initial proficiency and the amount of change experienced by participants. A 
Pearson’s correlation was conducted to compare the relationship between initial RIT 
scores and RIT score growth (see Table 8). The analysis obtained a statistically 
 35 
 
significant negative statistic (-0.622), indicating a negative relationship between initial 
proficiency level and the amount of positive RIT score change achieved. In other words, 
students with a lower proficiency are more likely to make statistically significant gains by 
using the Khan Academy program. 
Table 8   
Relationship between RIT Score Change and Initial 
Score 
  Score Change 
Pre-test RIT 
Scores 
Pearson 
Correlation 
 
-0.622** 
 Sig. (1-tailed) 0.000 
 N 44 
**Relationship is statistically significant at the 0.01 level 
(1-tailed) 
 
 In this section, I have analyzed the relationship of initial RIT score levels on 
learning outcomes. The results have suggested that learning outcomes are related to 
initial proficiency levels for this date set, in that participants with lower proficiency 
achieved larger gains than those with higher proficiency.  
Participant Perceptions of Learning Progress 
 Each week students were provided an opportunity to complete a voluntary survey 
to provide feedback on their perceptions of their learning progress and what tools they 
utilized to help them acquire new concepts and skills. Generally, students reported 
satisfaction with their progress (see Figure 6). When asked: How would you rate your 
learning  progress since last reflection, nearly 90% of responses reported being satisfied 
(42%) or very satisfied (46%).  
 36 
 
 
Figure 6. Participant reported satisfaction with learning progress. 
 Another aspect of student learning examined was how participants utilized Khan 
Academy to learn the concepts and skills. As noted in Chapter 2, participants were 
provided an orientation on a specific learning pathway that included using Khan 
Academy tools before asking the instructor for assistance. In order to examine whether 
students utilized the proscribed pathway, the weekly survey asked participants to provide 
feedback on their own learning pathway. Participants reported that they were likely 
(27%) or very likely (57%) to watch a video when stuck on a problem (see Figure 7). 
A majority of students reported that they seldom (39%) or never (20%) asked the 
instructor for help (see Figure 8). 
 37 
 
 
Figure 7. Participant likeliness of watching video as a learning strategy. 
 
Figure 8. Participant responses, frequency of asking for help 
 An indicator of engagement with the learning tools on Khan Academy is whether 
participants were actively interacting with the materials. In order to gain some insight 
into that question, participants were asked if they regularly took notes from the videos or 
 38 
 
the textual explanations. Most respondents indicated that note-taking was a regular 
activity either often (41%) or always (38%) taking notes (see Figure 9). 
 
Figure 9. Participant response, frequency of note-taking 
 In open-ended reflections on their learning experiences, participants commented 
on obstacles to progress and strategies they could use or did use to overcome them. One 
participant commented that “taking more notes” was one way to improve their 
understanding of the math concepts. Several participants echoed that sentiment as well as 
“slowing down and taking my time.” Other students noted the issue of taking more time 
to learn the concept and being able to “stop relax and just work on my math” and “take 
just a little more time and ask for help when needed.” Several participants reflected on 
their perseverance as an important aspect of learning. For example, one respondent 
commented that one strategy used was to “stick to a problem until I have succeeded” and 
another added, “”not give up and take a deep breath and ask for help.”  Overall, 
respondents indicated that taking notes, watching videos, using hints on Khan Academy, 
and asking the instructor for help were all important strategies. 
 39 
 
CHAPTER IV 
DISCUSSION 
Overall, the results of this study suggest that use of Khan Academy is associated 
with a positive gain in learning outcomes. Used exclusively as the primary classroom 
resource, most participants in this study showed positive gains that exceeded predicted 
growth. Participants achieved an average growth of 7.5 RIT points between the pre-test 
application and the post-test.  Compared to a normal growth expectation, as calculated by 
NWEA, of .85 for a semester of work, this gain is an impressive 8.9 times the expected 
growth. These results suggest further that use of the Khan Academy platform may be 
especially beneficial to students who are behind grade level in proficiency.  
On the question related to the effectiveness of the alignment with the MAP 
Recommended Practice Pilot, the results did not support the conclusion that use of the 
pilot benefited student above using the regular Khan Academy program itself, when the 
program was employed under best practices with skilled teacher guidance and sufficient 
teacher time available to do the differentiated instruction manually. There was no 
statistically significant difference between outcomes of best practice teacher use of 
differentiated instruction and the automated program. However, since many students may 
not have access to best practice teacher use of differentiated instruction, or teachers may 
not have sufficient time to prep differentiation for all students, such as was done for the 
control group here, the association of the automation with the same level of gains was 
impressive. It points to use of the new automatically differentiated platform potentially as 
a support for teachers engaged in mathematics instruction, especially in remediation with 
high-needs students as in this study that employed a sample of students directed to the 
 40 
 
program for additional summer support to improve their limited school-year gains. 
However, due to the small sample size, no general conclusions should be drawn on the 
effectiveness of the MAP recommended practice pilot, and, more examination of this 
question is recommended.  
In order to control for the possibility that some students might exert more effort 
on the post-test than the pre-test, a comparison of results of students who completed 
sufficient instructional work to earn credit with those who did not was conducted. 
Earning credit was a potential incentive for all participants.  Of the 44 finishers, 36 
earned credit and 8 did not. It should be noted that the pool of non-credit earners is very 
small making conclusions difficult to draw. Nonetheless, a t-test on the difference in 
mean growth between the two groups, 9.0 for credit earners vs. 1.0 for non-credit earners, 
found a statistically significant difference.  It should be noted that even at an average of 
1.0 RIT growth, non-credit earners achieved the expected growth for a semester of 
instruction. The result should only be tentatively adopted, but it does provide an 
interesting point for further examination.  
Participants’ perceptions of their own progress was generally positive and mirrors 
the actual progress measured. For example, approximately 88% of participants reported 
being satisfied or very satisfied with their progress while 78% were found to have 
achieved at least one semester of growth. Perceptions were slightly more favorable than 
actual observed results, but a finding that high school students who have a history of 
struggling with mathematics reported satisfaction in their learning results is important. In 
terms of gaining some insight into the learning pathways of students, many respondents 
on the weekly feedback surveys expressed that having more time, taking more notes, 
 41 
 
watching videos and asking for help were helpful in making progress. Common themes 
expressed were that taking time, not giving up, and not being afraid to make mistakes 
were important to learning. 
Limitations 
There are a number of limitations that should be considered when interpreting the 
results of this study.  Most obviously, this study involved a small number of participants. 
With only 44 participants, the results are very tentative. In addition, 16 of the initial 60 
participants did not complete the study introducing a possible “hardy survivor” effect.  
Another limitation is that while this study suggests that Khan Academy is a viable 
and effective resource for math instruction, there was no comparison to other resources.  
A possible area for further examination would be to compare Khan Academy use to other 
online and face-to-face resources and methods. 
Participants were in a focused program of primarily math instruction. Some 
participants were dual enrolled in a second class, but even so, two classes a time is 
statistically significantly less than what students are normally exposed to in a regular 
school year environment. This study did not attempt to compare results of students with 
two classes versus one class. The freedom to focus on just one class could impact these 
results and further investigation is needed to make any firm conclusions on this point. 
Differential attrition between the two study groups, as described in the Results 
section, was analyzed here to gauge the degree of missingness at random. While evidence 
of systematic attrition was not found in the approach use, the approach was limited and 
therefore caution should be exercised in interpreting results. This remains a limitation of 
 42 
 
the study that was not possible to address, given the sample and fidelity of outcomes, and 
would need to be studied in a larger intervention. 
Additionally, there was no formal control of the teacher impact on growth. The 
researcher was the instructor for all participants in the study. The teacher effect in this 
case was mostly controlled due to the online nature of the program. Instruction was 
provided primarily through Khan Academy itself and only secondarily and in a support 
role by the instructor. It should be noted, though, that the courses students engaged in as 
well as the design of the course including sequencing of activities was determined by the 
instructor.  Further study in this area is difficult due to the responsibility to do what is 
best for learners and not to subject students to less than optimal practices for the purposes 
of scientific inquiry. 
Another limitation is that no subgroup or other demographic data were collected 
or analyzed for this study.  A suggestion for future studies is to include such data. In 
particular, it is important to know if English language proficiency is a factor that could 
lead to statistically significant differences in student learning. Another factor to consider 
is gender differences and whether males or females respond differently to the treatment.  
Finally, it is important to mention that the researcher in this study is also the 
instructor. While the implementation strategy was purposefully designed to encourage 
student-directed learning and minimize the instructor role, it is important to keep in mind 
that the instructor is proficient in the use of Khan Academy as well as integration of web-
based learning platforms into classroom instruction. Further study should make attempts 
to control for instructor effects on learning outcomes.  
 
 43 
 
Implications for Practice 
 The results of this study suggest several implementation recommendations for 
schools and districts to consider: 
 Remedial programs to boost students currently below grade level. 
 Primary instructional resource in alternative education settings. 
 Allow sharing students’ Khan Academy progress between schools. 
An important consideration when implementing a program like this is the role of 
the instructor. Khan Academy is a resource and an instructional tool. It is not 
suggested here that this or any other computer or web-based program can replace the 
role of the instructor in a classroom. While the role for an instructor using Khan 
Academy as a primary resource may shift from lecturer to guide, it is essential that 
the instructor continually monitor progress, provide encouragement, and intervene as 
students navigate the program. As one participant stated when asked what helped 
them succeed, “by just being there when I need you.”  
Conclusions  
Khan Academy is a web-based computer application that allows users to learn and 
practice mathematics skills and concepts. It is a widely known program and is 
increasingly utilized in classrooms around the world. Despite this popularity, the 
evidence base for the effectiveness of the platform is lacking. This paper attempted to 
make a small contribution toward filling that gap in the research.  
 In addition to the lack of research addressing the effectiveness of Khan Academy, 
there is concern that teachers have not implemented it in the most effective manner. Sal 
Khan, the founder of Khan Academy, recommended implementation as part of a flipped 
 44 
 
classroom methodology in which students watched videos prior to class meetings then 
used in class time to practice the skill and receive assistance from the instructor. Many 
classrooms are unable to implement such a strategy fully because some students do not 
have access to computers or the internet outside of school hours. This study attempted to 
look at a more comprehensive implementation strategy: the use of Khan Academy as a 
primary resource in a student-centered, self-directed classroom. 
The previous large-scale studies found promising effects of using Khan Academy 
but in each case there was no control over implementation strategies. The platform can be 
used in many different ways and disentangling its effects from those strategies was not a 
focus of those studies. Also, previous studies found that even within individual 
classrooms, the implementation strategy was not consistent. In many cases, teachers 
began employing Khan Academy resources more extensively as they became more 
familiar with them.  
 In addition to the above issues, some of the strongest elements of Khan Academy 
implementation are not accounted for in those studies. The pedagogy of Khan Academy 
encourages student-centered and student-directed learning. It makes sense that the best 
use of the platform would be divorced from a regular classroom format and schedule, 
allowing students to proceed at their own pace. In most cases, the studies cited in this 
report studied the use of Khan Academy as an extra resource in a classroom, not as a 
primary instructional tool. 
 This study sought to address the issues described above. Khan Academy was 
introduced to participants as a stand-alone, primary resource. Participants were tutored on 
best practices for using the platform and then engaged in independent learning regularly 
 45 
 
over a six-week period with limited guidance, assistance, and direction from the 
instructor. The results were measured and are herein reported.  
This study also examined a beta tool available in Khan Academy called the MAP 
Recommended Practice. In order to study that question, participants were randomly 
assigned to a comparison group and a treatment group. The results of those groups were 
separately compared and analyzed. 
This study found that there was an overall positive association of using Khan 
Academy as a primary instructional resource on learning outcomes for both groups. On 
average, participants in this study demonstrated statistically significant growth. 
Generally, students outperformed expected growth norms, even when comparing this six-
week program to expected annual growth. That observation is especially true for students 
who initially assessed at less than a 10
th
 grade achievement level in mathematics. 
Although students at all levels achieved a measured growth that averaged more than 
expected growth, the growth rates for students with higher initial levels were not found to 
be statistically significant. 
As to the question of the MAP recommended practice beta tool, there was no 
statistically significant difference between the achievement results vis-à-vis the 
comparison group. In fact, overall, the treatment group achieved slightly lower average 
growth, though the difference was not statistically  significant. Small participant size 
could have played a role in this lack of finding, but without further data no determination 
can be made.  
In addition to the above findings, this study lends support to previous large-scale 
studies that found students experienced positive achievement growth after using Khan 
 46 
 
Academy. Previous studies had found that a majority of students achieve positive results 
using Khan Academy in a variety of ways. The results of this study support those 
findings and expand them to a particular implementation strategy: the use of Khan 
Academy as a primary instructional resource. Additionally, this study suggests that lower 
achieving students may benefit the most from use of Khan Academy.  
 These results suggest potential implementation strategies for the educational 
setting. One use would be as a remedial program to raise students up to grade level. 
Participants with initial proficiencies in the lowest quartile benefited most from the 
summer program, regaining on average over 16 RIT points toward grade level. 
Participants in the second lowest quartile also made statistically significant gains. Of the 
11 students in that quartile, six of them went from scoring below grade level to achieving 
above grade level scores.  
 While participants in the upper two quartiles did not demonstrate a statistically 
significant result, it should be noted that change in pre- to post-test scores was generally 
positive. The mean growth, like that of the lower two quartiles, was higher for both 
groups than the expected annual growth. In this case, further study is recommended with 
larger sample sizes to strengthen any conclusions regarding the effects of Khan Academy 
on this population. I would not interpret these results as suggesting that use of Khan 
Academy does not benefit higher level students. In fact, the results optimistically suggest 
the opposite could be true, but further study is required.  
It should be noted that this study was carried out in a blended environment 
combining online learning with face to face, usually one on one, instruction. The role of 
the teacher could best be described as the guide on the side style as opposed to sage on 
 47 
 
the stage. The primary motivation for participants in this program was a desire to earn 
credit, which was directly tied to proficiency achieved in Khan Academy.  
 This study generated encouraging results for the use of Khan Academy as a 
mathematical instructional tool. Further study should focus on the effects of use by higher 
level students, use in different controlled settings, and controlled study focusing on 
various implementation strategies.  
 Khan Academy is currently a free web-based resource. If the effects found in this 
study are replicable, incorporating the use of the platform into mathematics instruction 
could yield positive results, in particular for students who are behind grade level. A 
limitation to implementation is the technological infrastructure required but it is possible 
that the use of expensive textbooks could be reduced or eliminated. The strategy 
implemented in this study was not a traditional classroom and replicating it might be 
difficult in traditional school structures.  
 Another consideration is using Khan Academy with non-traditional students who 
have difficulty attending school regularly. Khan Academy allows students to access 
instructional materials without missing lessons due to absences. Also, the instruction is 
individualized for each student, allowing students to advance at their own pace rather 
than the regular pacing of a traditional classroom. Access to computers and the internet is 
a limiting factor, but becoming less so. One possible beneficial use to address the specific 
needs of students who experience multiple school changes is to allow students to 
transition their Khan Academy progress from school to school. This option would 
prevent such students from losing progress or experiencing content discontinuities during 
transitions. 
 48 
 
The conclusion of this study is that Khan Academy is associated with learning 
gains for this sample that indicate it was an effective tool for learning mathematics, either 
in the automated differentiation approach, or in best practices teacher differentiation for 
learning programs. Based on the results of this study, use of Khan Academy was found to 
be particularly useful for low proficiency students, and students at proficiency levels less 
than 10
th
 grade level, although benefits are not limited to these categories but 
encompassed the span of students.  
 49 
 
APPENDIX A 
Implementation Guide The purpose of this guide is to provide a framework for 
implementing future replicative studies or using Khan Academy in a school setting. 
Critical to student success is that the instructor’s presence is felt by students daily in 
terms of feedback on progress and offers of assistance when needed. Monitoring 
Progress. It is advised that instructors use a learning management system as a side-by-
side instructional support for the purposes of providing timely feedback, encouragement, 
as well as monitoring progress. Students can self-monitor progress through Khan 
Academy. Suggested LMS applications include free web-based applications such as 
Edmodo, Schoology, or Google Classroom. In this study, Google Classroom and school-
based Gmail were used. The selection of an LMS is a matter of instructor preference.  
Orientation. The instructor orients all students either as a group or individually in the 
use of Khan Academy. Orientation includes technical matters and best practices for using 
Khan Academy as a learning resource. Technical matters include instruction in accessing 
lessons, turning in lessons, and tracking progress. Using Khan Academy as a learning 
resource includes explicitly outlining a procedure for lesson completion.  
Best Practices for Learning. Students are instructed to follow an explicit learning 
pathway (See Figure 10) that includes: 
 Examine the task. Student assess whether they have background knowledge to 
attempt a problem. If they feel they do, then they make an attempt to solve the 
problem.  
 Watch a video or use hints. If a student decides they need more instruction in 
order to be successful, they are instructed to either watch a recommended video or 
 50 
 
use the hints which provide the student with a step-by-step solution to the 
problem. After using the learning tools, the student is instructed to make an 
attempt to solve the problem. 
 Attempt a solution. If a student assesses that they have the skills or they have 
watched the videos and used the hints, then they attempt to solve a problem. If 
they achieve a successful result, they go on to the next problem and start the 
process again. 
 Study the solution. If the student makes an unsuccessful attempt, they are 
instructed to study the solution (using hints) and, if necessary, rewatch a video, 
and make another attempt. This process repeats through a problem set.  
 Seek assistance. If a student experiences continual failure, it is imperative that 
they receive support from the instructor. Students are encouraged to self-advocate, 
but it is essential that the instructor monitor each student and intervene when 
necessary even if a student has not requested assistance. Students should be 
allowed to complete one full problem set (typically between four and seven 
problems) before instructor intervention occurs.   
 51 
 
Figure 10. 
Student learning pathway. 
 52 
 
REFERENCES CITED 
Allen, I. E., Seaman, J., Poulin, R., & Straut, T. T. (2016). Online report card: Tracking 
online education in the United States. Sloam Consortium, 1–4. Retrieved from 
http://onlinelearningsurvey.com/reports/onlinereportcard.pdf 
 
Cargile, L. A., & Harkness, S. S. (2014). Flip or Flop: Are Math Teachers Using Khan 
Academy as Envisionedby Sal Khan? TechTrends, 59(6), 21–28. 
https://doi.org/10.1007/s11528-015-0900-8 
 
Kai, K. (2012). Khan Academy: The hype and the reaility. American Educator, (Fall), 
23–25. 
 
Koedinger, K. R., Mclaughlin, E. A., & Heffernan, N. T. (2010). A quasi-experimental 
evaluation of an on-line formative assessment and tutoring system. Journal of 
Educational Computing Research, 43(4), 489–510. 
https://doi.org/10.2190/EC.43.4.d 
 
Kulik, C.-L. C., Kulik, J. A., Bangert, R. L., & Bangert-Drowns, R. L. (1990). 
Effectiveness of Mastery Learning Programs: A Meta-Analysis. Review of 
Educational Research Review of Educational Research Summer, 60(2), 265–299. 
Retrieved from http://www.jstor.org/stable/1170612 
 
Lange, C. M., Lane-Outlaw, S., Lange, W. E., & Sherwood, D. L. (2013). American sign 
language/english bilingual model: A longitudinal study of academic growth. Journal 
of Deaf Studies and Deaf Education, 18(4), 532–544. 
https://doi.org/10.1093/deafed/ent027 
 
Learning, D., & Subbarayan, S. (2012). Lacking Teachers and Textbooks , India â€TM s 
Schools Turn to Khan Academy to Survive. New York Times, 10–12. Retrieved from 
https://india.blogs.nytimes.com/2012/10/15/lacking-teachers-and-textbooks-indias-
schools-turn-to-khan-academy-to-survive/?_r=0 
 
Light, D., & Pierson, E. (2014). Increasing student engagement in math: The study of an 
intel funded pilot program in chile. 
 
Means, B., Toyama, Y., Murphy, R., & Baki, M. (2013). The Effectiveness of Online and 
Blended Learning: A Meta-Analysis of the Empirical Literature. Teachers College 
Record, 115(30303). 
 
Murphy, R., Gallagher, L., Krumm, A., Mislevy, J., & Hafter, A. (2014). Khan Academy 
in Schools. SRI Education. Retrieved from www.sri.com/education 
 
 
 
 
 53 
 
Ruipérez-Valiente, J. A., Muñoz-Merino, P. J., Leony, D., & Delgado Kloos, C. (2015). 
ALAS-KA: A learning analytics extension for better understanding the learning 
process in the Khan Academy platform. Computers in Human Behavior, 47, 139–
148. https://doi.org/10.1016/j.chb.2014.07.002 
 
Strauss, V. (2012). Does the Khan Academy know how to teach? Retrieved January 1, 
2001, from https://www.washingtonpost.com/blogs/answer-sheet/post/how-well-
does-khan-academy-teach/2012/07/27/gJQA9bWEAX_blog.html 
 
Thum, Y. M., & Hauser, C. H. (2015). NWEA 2015 MAP Norms for Student and School 
Achievement Status and Growth. Port. 
 
Timme, N., Baird, M., Bennett, J., Fry, J., Garrison, L., & Maltese, A. (2013). A summer 
math and physics program for high school students: Student performance and 
lessons learned in the second year. The Physics Teacher, 51(5), 280–285. 
https://doi.org/10.1119/1.4801354 
 
Wang, S., Mccall, M., Jiao, H., & Harris, G. (2013). Construct Validity and Measurement 
Invariance of Computerized Adaptive Testing: Application to Measures of 
Academic Progress (MAP) Using Confirmatory Factor Analysis. Journal of 
Educational and Developmental Psychology, 3(1). 
https://doi.org/10.5539/jedp.v3n1p88