With a reliable internet connection comes access to the enormous World Wide Web. Being so large, we rely on tools like Google to search and filter all this information. Additional filters can be found in sites like Wikipedia, offering a library style access to curated knowledge, but it too is enormous. In more recent years, open online courses has rapidly become a highly popular method of gaining easy access to curated, high quality, as well as pre-packaged knowledge. A particularly popular variety is the Massive Open Online Course, or MOOC, which are found on platforms like Coursera and edX. The promise – global and free access to high quality education – has often been applauded. Some have heralded the age of the MOOC as the death of campus based teaching. Others are more critical, often citing the high drop-out rates as a sign of failure, or argue that MOOCs do not or cannot foster ‘real’ learning (e.g., Zemsky, 2014; Pope, 2014).

For those who are not aware of the MOOC phenomenon I will first briefly introduce them. In the remainder of this post I will discuss how we can learn about open online courses, what the key challenges are, and how the field can move forward.

What’s all this buzz about?

John Daniel (2012) called MOOCs the official educational buzzword of 2012, and the New York Times called it the Year of the MOOC. However, the movement started before that, somewhere around 2001 when the Massachusetts Institute of Technology (MIT) launched its OpenCourseWare (OCW) to share all its courses online. Individual teachers have been sharing digital content before (e.g., ‘Open Educational Resources’ or OER; Lane & McAndrew, 2010), but the scale and quality of OCW was pioneering. Today, MOOCs can be found on various platforms, such as the ones described in Table 1 below.

Table 1. Overview of several major platforms offering MOOCs

Platform	Free content	Paid certifications	For profit
Coursera	Partial	Yes	Yes
edX	Everything	Yes	No
Udacity	Everything	Yes	Yes
Udemy	Partial	Yes	Yes
P2PU	Yes	No	No

MOOCs, and open online courses in general, have the goal of making high quality education available to everyone, everywhere. MOOC participants indeed come from all over the world, although participants from Western countries are still overrepresented (Nesterko et al., 2013). Nevertheless, there are numerous inspiring stories from students all over the world, for whom taking one or more MOOCs has had dramatic effects on their lives. For example, Battushig Myanganbayar, a 15 year old boy from Mongolia, took the Circuits and Electronics MOOC, a sophomore-level course from MIT. He was one of the 340 students out of 150.000 who obtained a perfect score, which led to his admittance to MIT (New York Times, 2013).

Stories like these make it much clearer that MOOCs are not to replace contemporary forms of education, but are an amazing addition to it. Why? Because books, radios, and the computer also did not replace education, but enhanced it. In some cases, such as in the story of Battushig, MOOCs provide a variety and quality of education which would otherwise not be accessible at all, due to lacking higher educational institutes. Open online courses provide a new source of high quality education, which is not just accessible to a few students in a lecture hall but has the potential to reach almost everyone who is interested. Will MOOCs replace higher education institutes? Maybe, or maybe not; I think this question mis ses the point of MOOCs.

In the remainder of this article I will focus on MOOCs from my perspective as a researcher. From this perspective, open online education is in some ways a new approach to education and should thus be investigated on its own. On the other hand, key learning mechanisms (e.g., information processing, knowledge integration, long-term memory consolidation) of human learners are independent of societal changes such as the use of new technologies (e.g., Merrill, Drake, Lacy, & Pratt, 1996). The science of educational instruction has a firm knowledge base and could be used to further our understanding of these generic learning mechanisms, which are inherent to humans.

What are MOOCs anyway?

The typical MOOC is a series of educational videos, often interconnected by other study materials such as texts, and regularly followed-up by quizzes. Usually these MOOCs are divided into approximately 5 to 8 weeks of content. In Figure 1 you see an example of Week 1 from the course ‘Improving your statistical inferences’ by Daniel Lakens.

Figure 1. Example content of a single week in a MOOC

What do students do in a MOOC? To be honest, most do next to nothing. That is, most students who register for a course do not even access it or do so very briefly. However, the thousands of students per course who are active describe a wide variety of learning paths and behaviors. See Figure 2 for an example of a single study in a single course. It shows how this particular student engages very regularly with the course, but the duration and intensity of each session differs substantially. Lectures (shown in green) are often watched in long sessions, while (s)he makes much more often, but shorter, visits to the forum. In the bottom you see a surprising spurt of quiz activity, which might reflect the student’s desire to see what type of questions will be asked later in the course.

Figure 2. Activities of a single user in a single course. Source: Jasper Ginn

Of all the activities which are common for most MOOCs, educational videos are most central to the student learning experience (Guo, Kim, & Rubin, 2014; Liu et al., 2013). The central position of educational videos is reflected by students’ behavior and their intentions: most students plan to watch all videos in a MOOC, and also spend the majority of their time watching these videos (Campbell, Gibbs, Najafi, & Severinski, 2014; Seaton, Bergner, Chuang, Mitros, & Pritchard, 2014). The focus on videos does come with various consequences. Video production is typically expensive and time intensive labor. In addition, they are not as easily translated to other languages, which is contradictory to the aim of making the content accessible to students all around the world. There are many non-native English speakers in MOOCs, while these are almost exclusively presented in English. This raises the question to what extent non-native English speakers can benefit from these courses, compared to native speakers. Open online education may be available to most, the content might not be as accessible for many, for example due to language barriers. It is important to design online education in such a way that it minimizes detrimental effects of potential language barriers to increase its accessibility for a wider audience. While subtitles are often provided, it is unclear whether they promote learning (Markham, Peter, & McCarthy, 2001), hamper learning (Kalyuga, Chandler, & Sweller, 1999), or have no impact at all (van der Zee et al., 2017).

How do we learn about (online) learning?

Research on online learning, and MOOCs in particular, is a highly interdisciplinary field where many perspectives are combined. While research on higher education is typically done primarily by educational scientists, MOOCs are also studied in fields such as computer science and machine learning. This has resulted in an interesting divide in the literature, as researchers from some disciplines are used to publish only in journals (e.g., Computers & Education, Distance Education, International Journal of Computer-Supported Collaborative Learning) while other disciplines focus primarily on conference proceedings (e.g., Learning @ Scale, eMOOCs, Learning Analytics and Knowledge).

Learning at scale opens up a new frontier to learn about learning. MOOCs and similar large-scale online learning platforms give an unprecedented view of learners’ behavior, and potentially, learning. In online learning research, the setting in which the data is measured is not just an approximation of, but equals the world under examination, or at least comes very close to it. That is, measures of students’ behavior do not need to rely on self-reports, but can often be directly derived from log data (e.g., automated measurements of all activities inside an online environment). While this type of research has its advantages, it also comes with various risks and challenges, which I will attempt to outline.

Big data, meaningless data

Research on MOOCs is blessed and cursed with a wide variety of data. For example, it is possible to track every user’s mouse clicks. We also have detailed information about page views, forum data (posts, likes, reads), clickstream data, and interactions with videos. This is all very interesting, except that nobody really knows what it means if a student has clicked two times instead of three times. Nevertheless, the amount of mouse clicks is a strong predictor of ‘study success’, because students who click more, more often finish the course and do so with higher grade. As can be seen Figure 3, the correlations between various mouse clicks metrics and grade ranges from 0.50 to 0.65. However, it would be absurd to recommend students to click more and believe that this will increase their grades. Mouse clicks, in isolation, are inherently ambiguous, if not outright meaningless.

Figure 3. Pairwise Spearman rank correlations between various metrics for all clickers (upper triangle, N = 108008) and certificate earners (lower triangle, N = 7157), from DeBoer, Ho, Stump and Breslow (2014)

When there is smoke, but no fire

With mouse clicks, it will be obvious that this is a problem and will be recognized by many. However, the same problem can secretly underlie many other measured variables which are not that easily recognized. For example, how can we interpret the finding that some students watch a video longer than other students? Findings like this are readily interpreted as being meaningful, for example as signifying that these students were more ‘engaged’, while you could just as well argue that they were got distracted, were bored, etc. There is a classical reasoning fallacy which often underlies these arguments. Because it is reasonable to state that increased engagement will lead to longer video dwelling times, observing the latter is (incorrectly!) assumed to signify the former. In other words: if A leads to B, observing B does not allow you to conclude A. As there are many plausible explanations of differences in video dwelling times, observing such differences cannot be directly interpreted without additional data. This is an inherent problem with many types of big data: you have an enormous amount of granular data which often cannot be directly interpreted. For example, Guo et al. (2014) states that shorter videos and certain video production styles are “much more engaging” than their alternatives. While enormous amounts of data was used, it was in essence a correlational study, such that the claims about which video types are better is based on observational data which do not allow causal inference. More students stop watching a longer video than they do when watching shorter videos, which is interpreted as meaning that the shorter videos are more engaging. While this might certainly be true, it is difficult to make these claims when confounding variables have not been accounted for. As an example, shorter and longer videos do not differ just in time but might also differ in complexity, and the complexity of online educational videos is also strongly correlated with video dwelling time (Van der Sluis, Ginn, & Van der Zee, 2016). More importantly, they showed that the relationship between a video’s complexity (insofar that can be measured) and dwelling time appears to be non-linear, as shown in Figure 4. Non-linear relationships between variables which are typically measured observationally should make us very cautious about making confident claims. For example, in Figure 4 a relative dwelling time of 4 can be found both for an information rate of ~0.2 (below average complexity) as ~1.7 (above average complexity). In other words, if all you know is the dwelling time this does not allow you to make any conclusions about the complexity due to the non-linear relationship

Figure 4. The non-linear relationship between dwelling time and information rate (as a measure of complexity per second). Adapted from Van der Sluis, Ginn, & Van der Zee (2016).

Ghost relationships

Big data and education is a powerful, but dangerous combination. No matter the size of your data set, or variety of variables, correlation data remains incredible treacherous to interpret, especially when the data is granular and lacks 1-to-1 mapping to relevant behavior or cognitive constructs. Given that education is inherently about causality (that is, it aims to change learner’s behavior and/or knowledge), research on online learning should employ a wide array of study methodologies as to properly gather the type of evidence which is required to make claims about causality. It does not require gigabytes of event log data to establish there is some relationship between students’ video watching behavior and quiz results. It does require proper experimental designs to establish causal relationships and effectiveness of interventions and course design. For example, Kovacs (2016) found that students watch videos with in-video questions more often, and are less likely to prematurely stop watching these videos. While this provides some evidence on the benefits of in-video questions, it was a correlational study comparing videos with and without in-video questions. There might have been more relevant differences between the videos, other than the presence of in-video questions. For example, it is reasonable to assume that teachers do not randomly select which videos will have in-video questions, but will choose to add questions to more difficult videos. Should this be the case, a correlational study comparing different videos with and without in-video questions might be confounded by other factors such as the complexity of the video content, to the extent that the relationship might be opposite of what will be found in correlational studies. These type of correlational relationships which can be ‘ghost relationships’ which appear real at first sight, but have no bearing on reality.

The way forward

The granularity of the data, and the various ways how they can be interpreted challenges the validity and generalizability of this type of research. With sufficiently large sample sizes, amount of variables, and researchers’ degrees of freedom, you will be guaranteed to find ‘potentially’ interesting relationships in these datasets. A key development in this area (and science in general) is pre-registering research methodology before a study is performed, in other to decrease ‘noise mining’ and increase the overall veracity of the literature. For more on the reasoning behind pre-registration, see also the JEPS Bulletin three part series on the topic, starting with Dablander (2016). The Learning @ Scale conference, which is already at the center of research on online learning, is becoming a key player in this movement, as they explicitly recommend the use of pre-registered protocols for submitted papers for the conference in 2017.

A/B Testing

Experimental designs (often called “A/B tests” in this literature) are increasingly common in the research on online learning, but they too are not without dangers, and need to be carefully crafted (Reich, 2015). Data in open online education are not only different due to their scale, they require reconceptualization. There are new measures, such as the highly granular measurements described above, as well as existing educational variables which require different interpretations (DeBoer, Ho, Stump and Breslow, 2014). For example, in traditional higher education it would be considered dramatic if over 90% of the students do not finish a course, but this normative interpretation of drop-out rates cannot be uncritically applied to the context of open online education. While registration barriers are substantial for higher education, they are practically nonexistent in MOOCs. In effect, there is no filter which pre-selects the highly motivated students, resulting in many students who just want to take a peek and then stop participating. Secondly, in traditional education dropping out is interpreted as a loss for both the student and the institute. Again, this interpretation does not transfer to the context of MOOCs, as the students who drop out after watching only some videos might have successfully completed their personal learning goals.

Rebooting MOOC research

The next generation of MOOC research needs to adopt a wider range of research designs with greater attention to causal factors promoting student learning (Reich, 2015). To advance in understanding it becomes essential to compliment granular (big) data with other sources of information in an attempt to triangulate its meaning. Triangulation can be done in a various way, from multiple proxy measurements of the same latent construct within a single study, to repeated measurements across separate studies. A good example of triangulation in research on online learning is combining granular log data (such as video dwelling time), student output (such as essays), and subjective measures (such as self-reported behavior) in order to triangulate students’ behavior. Secondly, these models themselves require validation through repeated applications across courses and populations. Convergence between these different (types of) measurements strengthens singular interpretations of (granular) data, and is often a necessary exercise. Inherent to triangulation is increasing the variety within and between datasets, such that they become richer in meaning and usable for generalizable statements.

Replications (both direct and conceptual) are fundamental for this effort. I would like to end with this quote from Justin Reich (2015), which reads: “These challenges cannot be addressed solely by individual researchers. Improving MOOC research will require collective action from universities, funding agencies, journal editors, conference organizers, and course developers. At many universities that produce MOOCs, there are more faculty eager to teach courses than there are resources to support course production. Universities should prioritize courses that will be designed from the outset to address fundamental questions about teaching and learning in a field. Journal editors and conference organizers should prioritize publication of work conducted jointly across institutions, examining learning outcomes rather than engagement outcomes, and favoring design research and experimental designs over post hoc analyses. Funding agencies should share these priorities, while supporting initiatives—such as new technologies and policies for data sharing—that have potential to transform open science in education and beyond.”

References

Butler, A. C., & Roediger III, H. L. (2007). Testing improves long-term retention in a simulated classroom setting. European Journal of Cognitive Psychology, 19(4-5), 514-527.

Campbell, J., Gibbs, A. L., Najafi, H., & Severinski, C. (2014). A comparison of learner intent and behaviour in live and archived MOOCs. The International Review of Research in Open and Distributed Learning, 15(5).

Cepeda, N. J., Coburn, N., Rohrer, D., Wixted, J. T., Mozer, M. C., & Pashler, H. (2009). Optimizing distributed practice: Theoretical analysis and practical implications. Experimental psychology, 56(4), 236-246.

Daniel, J. (2012). Making sense of MOOCs: Musings in a maze of myth, paradox and possibility. Journal of interactive Media in education, 2012(3).

DeBoer, J., Ho, A. D., Stump, G. S., & Breslow, L. (2014). Changing “course” reconceptualizing educational variables for massive open online courses. Educational Researcher.

Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students’ learning with effective learning techniques promising directions from cognitive and educational psychology. Psychological Science in the Public Interest, 14(1), 4-58.

Guo, P. J., Kim, J., & Rubin, R. (2014, March). How video production affects student engagement: An empirical study of mooc videos. In Proceedings of the first ACM conference on Learning@ scale conference (pp. 41-50). ACM.

Johnson, C. I., & Mayer, R. E. (2009). A testing effect with multimedia learning. Journal of Educational Psychology, 101(3), 621.

Kalyuga, S., Chandler, P., & Sweller, J. (1999). Managing split-attention and redundancy in multimedia instruction. Applied cognitive psychology, 13(4), 351-371.

Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for learning. science, 319(5865), 966-968.

Konstan, J. A., Walker, J. D., Brooks, D. C., Brown, K., & Ekstrand, M. D. (2015). Teaching recommender systems at large scale: evaluation and lessons learned from a hybrid MOOC. ACM Transactions on Computer-Human Interaction (TOCHI), 22(2), 10.

Lane, A., & McAndrew, P. (2010). Are open educational resources systematic or systemic change agents for teaching practice?. British Journal of Educational Technology, 41(6), 952-962.

Liu, Y., Liu, M., Kang, J., Cao, M., Lim, M., Ko, Y., … & Lin, J. (2013, October). Educational Paradigm Shift in the 21st Century E-Learning. In E-Learn: World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education (Vol. 2013, No. 1, pp. 373-379).

Markham, P., Peter, L. A., & McCarthy, T. J. (2001). The effects of native language vs. target language captions on foreign language students’ DVD video comprehension. Foreign language annals, 34(5), 439-445.

Mayer, R. E. (2003). The promise of multimedia learning: using the same instructional design methods across different media. Learning and instruction, 13(2), 125-139.

Mayer, R. E., Mathias, A., & Wetzell, K. (2002). Fostering understanding of multimedia messages through pre-training: Evidence for a two-stage theory of mental model construction. Journal of Experimental Psychology: Applied, 8(3), 147.

Merrill, M. D., Drake, L., Lacy, M. J., Pratt, J., & ID2 Research Group. (1996). Reclaiming instructional design. Educational Technology, 36(5), 5-7.

Nesterko, S. O., Dotsenko, S., Han, Q., Seaton, D., Reich, J., Chuang, I., & Ho, A. D. (2013, December). Evaluating the geographic data in MOOCs. In Neural information processing systems.

Ozcelik, E., Arslan-Ari, I., & Cagiltay, K. (2010). Why does signaling enhance multimedia learning? Evidence from eye movements. Computers in human behavior, 26(1), 110-117.

Plant, E. A., Ericsson, K. A., Hill, L., & Asberg, K. (2005). Why study time does not predict grade point average across college students: Implications of deliberate practice for academic performance. Contemporary Educational Psychology, 30(1), 96-116.

Pope, J. (2015). What are MOOCs good for?. Technology Review, 118(1), 69-71.

Reich, J. (2015). Rebooting MOOC research. Science, 347(6217), 34-35.

Roediger, H. L., & Butler, A. C. (2011). The critical role of retrieval practice in long-term retention. Trends in cognitive sciences, 15(1), 20-27.

Seaton, D. T., Bergner, Y., Chuang, I., Mitros, P., & Pritchard, D. E. (2014). Who does what in a massive open online course?. Communications of the ACM, 57(4), 58-65.

Van der Sluis, F., Ginn, J., & Van der Zee, T. (2016, April). Explaining Student Behavior at Scale: The Influence of Video Complexity on Student Dwelling Time. In Proceedings of the Third (2016) ACM Conference on Learning@ Scale (pp. 51-60). ACM.

Van der Zee, T., Admiraal, W., Paas, F., Saab, N., & Giesbers, B. (2017). Effects of Subtitles, Complexity, and Language proficiency on Learning from Online Education Videos. Journal of Media Psychology, in print. Pre-print available at https://osf.io/n6zuf/.

Zemsky, R. (2014). With a MOOC MOOC here and a MOOC MOOC there, here a MOOC, there a MOOC, everywhere a MOOC MOOC. The Journal of General Education, 63(4), 237-243.

Tim van der Zee

Skeptical scientist. I study how people learn from educational videos in open online courses, and how we can help them learn better. PhD student at Leiden University (the Netherlands), but currently a visiting scholar at MIT and UMass Lowell. You can follow me on Twitter: @Research_Tim and read my blog at www.timvanderzee.com

Open online education: Research findings and methodological challenges