A conceptual introduction to mathematical modeling of cognition

Psychological researchers try to understand how the mind works. That is, they describe observable phenomena, try to induce explanatory theories, and use those theories to deduce predictions. The explanatory value of a theory is then assessed by comparing theoretical predictions to new observations.

A good theory ideally makes precise predictions about new observations (Roberts & Pashler, 2000). While this sounds trivial, it is important to consider what it means to make precise predictions. A theory that can, in principle, predict any outcome is precise in the sense that it offers an explanation for every observation. At the same time the theory is imprecise because it is unspecific: It cannot inform our expectations about future events because it makes no prediction about what will not happen. In this sense, the theory is useless. Nobody would hire a financial adviser that can always explain why their client’s past investments failed but can never tell them where to invest next. Thus, an ideal theory predicts only what we observe and declares every other possible outcome impossible.

The law of parsimony—Occam’s razor—famously demands that we should prefer the simplest complete explanation of a phenomenon. One rationale is that simpler explanations are easier to understand, test, and falsify. Moreover, unnecessarily complex explanations yield inaccurate predictions about future events because they tend to assume causal reasons for random events that are unlikely to repeat in the future—a concept that in statistics is referred to as overfitting.

One way to conceptualize the simplicity (or complexity) of a theory is to examine the range of observations it can, in principle, explain. Following this reasoning, theories that can explain many different observations are complex; theories that can explain only very few observations are simple. Psychological theories are often verbal descriptions of assumed social or mental processes. As I will illustrate, at this level of specificity, it is often difficult to assess what exactly a theory predicts and how simple the explanation is. Mathematical models can be used to address this problem.

The number of mathematical models of cognitive processes is growing exponentially (Palminteri, Wyart, & Koechlin, 2017). However, many students of psychology and a fair number of researchers have limited knowledge about this approach to understanding the mind. In this blog post I will try to illustrate how theoretical positions can be expressed in mathematical terms as measurement models1. I will argue that formalizing a theory mathematically helps to understand it and to derive testable predictions. Finally, I will then show conceptually how the derived predictions can be tested. But first, I will try to explain some of the basic vocabulary of mathematical modeling by analogy of familiar statistical models.

What is a mathematical model?

To some the terms “mathematical model” or “formal model” may be daunting. Quite simply, a mathematical model is an expression of assumptions about how the observed data came about (i.e., about a data generating process). For example, a simple bivariate linear regression model is a mathematical model that, among other things, assumes that the relationship between two variables follows a straight line with an intercept \(a\) and a slope \(b\),

\[
\hat y_i = a + b \times x_i,
\]

for every observation \(i\)2. The intercept \(a\) and the slope \(b\) are the parameters of the model that quantify components of the data generating process.

To find the combination of parameter values that best describe a dataset the model is fit to those data. For some models, such as this linear regression model, formulas are available to calculate the most likely parameters. When this is not the case, the parameter values have to be cleverly guessed by an optimization algorithm that minimizes the discrepancy between model predictions \(\hat y_i\) and the observed data \(y_i\) (e.g., quantified by the root-mean-square error, \(\mathrm{RMSE} = \sqrt{\frac{1}{n}\sum^{n}_{i = 1}{(\hat y_i – y_i)^2}}\). The guessed parameter values can be used to visualize the model’s best description of the data. A visual comparison between observed data and the model description may reveal gross deviations and helps to understand what aspects of the data can be explained by the model and what aspects cannot.

To illustrate the process of fitting a linear regression model to data consider the following example inspired by Kortt & Leigh (2010)—the data used here are simulated. The authors asked “Does Size Matter?”, that is, are (logarithmized) hourly wages related linearly to body height? The relationship is visualized in the top left panel of Figure 1.

When fitting a model to data the optimization algorithm starts with an arbitrary set of parameter values, which are then adjusted step-by-step until they converge on the best description of the data. This process is illustrated by the convergence of the grey line towards the blue line. The stepwise reduction of the discrepancy between model predictions and the observed data that guides the optimization algorithm is visualized in the top right panel and the corresponding parameter values in the bottom panels of Figure 1. The final model describes the linear relationship between hourly wages and body height quite well.


Figure 1: Iterative estimation of linear regression model parameters predicting a person’s (logarithmized) hourly wages from their body height. The linear function is iteratively adjusted to the data by repeatedly trying parameter values that minimize the discrepancy between model predictions and the observed data. The blue lines indicate the optimal values derived from the analytical solution. The data used here are simulated.

Just like linear regression models, the parameters of many cognitive models can be estimated by fitting these models to data. What makes cognitive models interesting is that their parameters quantify assumed unobservable (latent) cognitive processes. That is, the parameter values usually have psychologically meaningful interpretations. I will provide some examples after some further discussion of the advantages of expressing psychological theories in mathematical terms.

What are mathematical models of cognition good for?

Expressing a theory about cognitive processes mathematically has at least three advantages. First, translating a verbal theory into a set of formulas requires specification and explicates underlying assumptions. Second, mathematical models yield specific predictions that can inform experimental tests of theories and can be used to assess a model’s complexity. Third, if we accept the assumptions of a given model, we can use the model to decompose participant responses to focus on the psychological processes of interest.

In their introductory book on computational modeling, Lewandowsky & Farrell (2011) illustrate the benefit of explicating assumptions mathematically. They attempt to translate a component process of Baddeley’s theory of working memory (Baddeley, 1986), namely the phonological loop, into a mathematical model. In the process they track the decisions about technicalities that are necessary for the implementation of the models’ mechanisms, such as the decay function or the decay rate. Lewandowsky & Farrell (2011) illustrate that there are at least 144 mathematical models of the phonological loop and conclude that a “verbal theory of working memory actually constitutes an entire family of possible models.” (p. 39, Lewandowsky & Farrell, 2011) This example clearly shows that verbal descriptions of theories are ambiguous.

The uncertainties about the specifics of a model that result in 144 candidate implementations of the theory entail uncertainty about the model’s predictions. A fully specified model allows the researcher to derive specific predictions for an experimental setup before she collects the data. These specific predictions are an important benefit to mathematical modeling.

Exploration of specific model predictions can inform the design of experiments to pit competing cognitive models against one another. Cognitive models can best be compared in conditions for which the models make diverging prediction. When such diverging prediction have been identified the researcher can explore the models’ parameter settings that yield the largest disagreement between the models. Based on this exploration the researcher can design an experiment that constitutes a maximally informative comparison between the models. This approach can even be implemented in a continuous manner while the data are being collected (Cavagnaro, Myung, Pitt, & Kujala, 2009; Myung & Pitt, 2009; Myung, Cavagnaro, & Pitt, 2013). Here on every trial the stimulus for which the models make the most diverging predictions (the response to which will be most informative) is presented. Conversely, the researcher may learn that the models make very similar predictions for the planned experiment. In this case the study would not provide a strong test between the models, is unlikely to be informative, and should be revised.

Exploration of model predictions, moreover, reveals a models’ complexity—the range of observations a model can explain. As discussed above, researchers should prefer simple explanations and thus model complexity should be penalized when researchers compare model predictions. This is difficult when a theory is expressed in words. For mathematical models, a variety of approaches to penalize model complexity in a principled manner are available (Myung & Pitt, 1997; Myung, Navarro, & Pitt, 2006; Pitt, Myung, & Zhang, 2002). Such statistical model comparisons instantiate comparisons of competing psychological theories.

Exploration of model predictions can also build an intuition as to how a model works and what “makes it tick”. That is, what are the crucial assumptions that allows the model to describe a specific pattern of results or that are responsible for empirically unsupported predictions. Such detailed understanding of the model mechanics facilitates model revision and theory development.

Finally, by fitting a cognitive model to data researchers can decompose the observed responses into the assumed cognitive processes. If the model assumptions are sensible, the parameter estimates constitute a more direct measures of the cognitive process of interest than the observed variable. In this sense, the model acts as a measurement model separating processes that researchers may be interested in from nuisance processes—measurement noise. This point will become clear when I introduce the example application in the next section.

How are predictions derived?

To illustrate some of the benefits of mathematically modeling cognitive processes I draw on an example from research in episodic long-term recognition memory. Here researchers try to understand how we judge whether we have seen something before, that is whether we perceive something to be ‘old’. A fundamental issue in the theoretical debate concerns the nature of the information that we base such judgments on3.

Latent-strength theories of recognition memory postulate that retrieval from memory yields a mnemonic signal of varying strength (e.g., Eich, 1982; Hintzman, 1984; Kahana & Sekuler, 2002; Murdock, 1993; Nosofsky & Palmeri, 2014; Shiffrin & Steyvers, 1997). This unobservable signal is assumed to be what we experience as familiarity. Things that elicit a strong signal feel familiar; things that feel familiar are likely judged to be old. It is assumed that the memory system always produces a signal of continuously varying strength. Conversely, our judgments should always be informed by a memory signal; hence, there is no such thing as random guessing.

Discrete-state theories posit that memories are either retrieved or they are not—there are no intermediate states or nuanced mnemonic strength signals that factor into the decision process (e.g., Blackwell, 1963; Bröder & Schütz, 2009; Luce, 1963). If a memory is not retrieved it is assumed that we guess randomly.

It is not obvious from these verbal descriptions how to pit these theories against one another. Fortunately, both theoretical positions have been implemented in a variety of mathematical models. For this example I will consider two variants of the well-known signal detection theory (Macmillan & Creelman, 2005; Swets, Tanner, & Birdsall, 1961) to stand in for the latent-strength perspective and the high-threshold model (Blackwell, 1963) will represent the discrete-state perspective. I will introduce the latter model first.

The high-threshold model


Figure 2: Schematic depiction of the latent states in recognition memory decisions as assumed by the high-threshold model.

The high-threshold model (HTM; Figure 2; Blackwell, 1963) assumes that when participants judge whether they have seen something before they attempt to retrieve a memory of that thing. If the thing has indeed been previously encountered, the retrieval of the corresponding memory succeeds with some probability \(p\). The model does not specify how this retrieval process proceeds. When no memory is retrieved the participant is in a state of ignorance—no information is available that could sway the judgment one way or the other. Hence, the only way to make a judgment is to resort to guess ‘old’ with probability \(b\) or guess ‘new’ with probability \(1 – b\). In case where participants are asked about something they have not encountered before the probability to retrieve the corresponding memory is assumed to be \(p = 0\)—participants always guess. Because memory retrieval and guessing are assumed to be independent processes the rate of old responses can be calculated as dependent probabilities,

\[
\begin{align}
\text{Hits} & = & p(\text{‘Old’}|\text{Old}) & = p + (1-p) \times b \\
\text{False alarms} & = & p(\text{‘Old’}|\text{New}) & = b.
\end{align}
\]

If we are willing to accept the assumptions of HTM as reasonably accurate description of the cognitive processes involved in old-new recognition we can use this model to isolate memory performance from guessing. As both memory retrieval and guessing factor into the correct recognition of previously encountered things, the rate of ‘old’ responses—also known as hit rate—is a crude measure of memory performance. Observed changes in hit rates can result from changes in memory performance or changes in guessing behavior. However, by rearranging the above formula we can subtract out the ‘old’ responses that are due to guessing. This gives us an estimate of the probability of successful memory retrieval \(\hat p\)—a more direct measure of memory performance,

\[
\hat p = \frac{\text{Hits} – \text{False alarms}}{1 – \text{False alarms}}.
\]

In this sense, HTM can be interpreted as a measurement model, a theory of origin and effects of measurement error in old-new recognition. There are more assumptions models that attempt to specify how the retrieval of memories proceeds and why it may fail. As such models specify larger portions of the involved cognitive processes they are also referred to as process models. I will not cover process models in this blog post.

Signal detection theory


Figure 3: Schematic depiction of the latent mnemonic strength distributions for old and new probes in recognition memory judgments as assumed by equal- and unequal-variance signal detection theory.

The assumptions of signal detection theory (SDT; Figure 3; Swets et al., 1961) are slightly more involved. It is assumed that every memory probe elicits a mnemonic strength signal. Things that have previously been encountered elicit stronger signals than things that are new. If the mnemonic signal strength surpasses a response threshold \(c\) the participant endorses the probe as ‘old’. This threshold is an index of response bias and indicates how easily a person is convinced that they have encountered something before. However, the strength of the mnemonic signal for old and new memory probes is not fixed, it is assumed to be normally distributed. As a consequence, some new memory probes elicit a stronger signal than old probes. Assuming variability in the mnemonic signal is not only plausible but also necessary. If the model assumed fixed signal strengths for either old or new probes it would predict that either all or none of the respective probes would be judged as old, depending on the location of the response threshold. It follows from these assumptions that the rate of old responses can be calculated as the area under the curve of the respective normal distributions above the threshold \(c\),

\[
\begin{align}
\text{Hits} & = & p(\text{‘Old’}|\text{Old}) & = \Phi(\frac{\mu_{Old} – c}{\sigma_{Old}}), \\
\text{False alarms} & = & p(\text{‘Old’}|\text{New}) & = \Phi(\frac{\mu_{New} – c}{\sigma_{New}}),
\end{align}
\]
where \(\Phi\) is the cumulative distribution function of the normal distribution. \(\mu_{Old}\) and \(\mu_{New}\) are the mean mnemonic strengths for old and new probes, \(\sigma_{Old}\) and \(\sigma_{New}\) are the standard deviations of the strength distributions.

In classic equal-variance signal detection theory (EVSDT) the dispersion of the distributions \(\sigma_{Old}\) and \(\sigma_{New}\) are assumed to be equal. Unequal-variance signal detection theory (UVSDT) is more complex in that it is assumed that \(\sigma_{Old}\) can be greater than \(\sigma_{New}\).

The distance between the two distributions \(d_a\), that is, the average difference in mnemonic strength between old and new memory probes, is an index of discriminability or sensitivity and, thus, of memory performance,

\[
d_a = \frac{\mu_{Old} – \mu_{New}}{\sqrt{0.5(\sigma_{Old}^2 + \sigma_{New}^2})}.
\]

In EVSDT, sensitivity is typically denoted as \(d’\). Without loss of generality it is assumed that \(\sigma_{Old}^2 = \sigma_{New}^2 = 1\). This is an arbitrary choice and could, in principle, be fixed to other values without changing the model.

Again, if we are willing to accept the assumptions of SDT as reasonably accurate description of the cognitive processes involved in old-new recognition, we can use this model to isolate memory performance from response bias. In case of EVSDT, sensitivity \(d’\) and response threshold \(c\) can easily be calculated from the observed rates of old responses,

\[
\begin{align}
\hat{d’} & = \Phi^{-1}(\text{Hits}) – \Phi^{-1}(\text{False alarms}), \\
\hat c & = -\frac{\Phi^{-1}(\text{Hits}) + \Phi^{-1}(\text{False alarms})}{2},
\end{align}
\]
where \(\Phi^{-1}\) is the inverse cumulative distribution function of the standard normal distribution, also known as probit transformation or \(z\) scores.

Comparison of predictions

The mathematical expression of the three models can be used to drive specific predictions about the relationship between hits and false alarms. Consider the HTM. We can substitute false alarms for \(b\) and predict hits from false alarms,

\[
\text{Hits} = p + (1-p) \times \text{False alarms}.
\]
The resulting equation takes the same form as the linear regression function \(y = a + b \times x_i\) discussed above, with the intercept \(a = p\) and the slope \(b = 1 – p\). Hence, HTM predicts a linear relationship between hits and false alarms. Intercept and slope of the linear relationship are determined by the probability of retrieving a memory, Figure 4. Moreover, intercept and slope are inversely related: As the intercept increases, the slope decreases.

The predicted linear relationship between hits and false alarms can be tested experimentally. Under conditions where the probability of retrieving a memory \(p\) can be assumed to be constant, manipulations that affect the probability of guessing ‘old’ \(b\) should yield a linear relationship between hits and false alarms; a nonlinear relationship between would contradict HTM.


Figure 4: Predicted relationship between hits and false alarms according to high-threshold model (HTM), equal-variance (EVSDT), and unequal-variance signal detection theory (UVSDT). The predictions for UVSDT assume a constant sensitivity of \(d_a = 2.00\) to illustrate the models additional flexibility relative to EVSDT. When \(\sigma_{\mathrm{Old}} = 1.00\), UVSDT and EVSDT make identical predictions. The dotted lines indicate chance performance.

Predictions can similarly be derived for EVSDT, Figure 4. Inspection of the predicted relationships reveals that HTM and EVSDT make distinct predictions. EVSDT predicts a curved relationship between hits and false alarms where the curvature increases with the strength of the memory signal for old probes, that is the sensitivity \(d’\). Again, this constitutes an experimentally testable prediction. A comparison of the predictions of HTM and EVSDT further suggests that a paradigm that yields a medium probability of retrieving a memory or a discriminability of around \(d’ = 1.5\) would be most informative for the model comparison—the line and the curved function are distinguishable in the medium ranges of hits and false alarms.

Finally, the predictions of UVSDT illustrate the effect of assuming increased variability in the mnemonic strength distribution of old probes, Figure 4. The relationship between hits and false alarms becomes more linear in the medium and high range of false alarms. Moreover, the predictions illustrate the increased complexity of the model. When the variability in the mnemonic signal for old probes equals that of new probes UVSDT mimics EVSDT—both models make identical predictions. When the variability for old probes is large and the response threshold is low the model can predict false alarm rates that are higher than the hit rates. This observation would contradict both HTM and EVSDT.

How can the predictions be empirically tested?

As previously discussed, HTM and SDT can be used to decompose participants responses and isolate memory processes from guessing or response criteria. However, decomposition rests on the assumption that the measurement model provides a reasonably accurate description of the processes involved in recognition memory. If the assumption of the model are violated the results of the decomposition may be misleading—indices of memory performance may in part reflect processes unrelated to memory retrieval. This poses a problem: The cognitive processes involved in recognition memory cannot be observed. We can, however, compare the diverging model predictions to observed data. The model that provides the best description of the observed data—given its complexity—would be considered to provide the least implausible characterization of the latent processes. Such model comparisons do not prove that the favored model is the true model. Rather they indicate that the favored model is the least implausible. Given that it describes all relevant patterns in the data, it may provide a reasonably accurate description of the processes involved in recognition memory.

The predictions derived for HTM, EVSDT, and UVSDT suggest an experimental design to pit the models against one another. Consider the following hypothetical study inspired by Swets et al. (1961; cf. Kellen, Erdfelder, Malmberg, Dubé, & Criss, 2016). Four participants study a list of 150 words. They are instructed to memorize the list as they will be asked to remember them later. In the subsequent recognition test, another 150 new words are mixed with the study list. That is, the test list consist in equal parts of old and new memory probes. Participants receive compensation depending on their performance: They receive a bonus for every hit but a malus for every false alarm. The test list is randomly grouped into 10 sublists and the extend of the malus is varied across the sublists. Because the incentive manipulation is introduced in the test phase—all memory probes are studied as parts of the same list—we assume that it only affects processes unrelated to memory performance (i.e., guessing or response threshold). With constant memory performance HTM predicts a linear, EVSDT a symmetric curved, and UVSDT an asymmetric curved relationship between hits and false alarms.


Figure 5: Scatter plot of hits and false alarms for the hypothetical experiment. Lines indicate the best description of the data from high-threshold model, equal-variance, and unequal-variance signal detection theory. The dotted lines indicate chance performance. The data used here are simulated.

The results of the hypothetical study along with the best descriptions from each model are shown in Figure 5. Visual inspection of the plots suggests that the linear function predicted by HTM may be a decent characterization of Participant 1’s responses. However, one condition with few false alarms and hits deviates from the linear prediction and is captured much better by the SDT models. The responses by Participant 3 appear to be best described by UVSDT. There, again, is one condition with few false alarms and hits that deviates from the linear prediction. Moreover, in another condition there are more false alarms than hits—a result that only UVSDT can explain. But are the observed deviations extreme enough to support one model over the other?

Firm conclusions require statistical model comparisons. For this example I will use two information criteria, AIC\(_c\) and BIC, that quantify the models’ predictive accuracy and penalize them for their respective complexity (see Aho, Derryberry, & Peterson, 2014 for an overview), albeit crudely4. BIC penalizes model complexity more strongly than AIC\(_c\). In both cases lower values indicate better model fits. Both information criteria can be used to calculate model weights (\(w\)AIC\(_c\) and \(w\)BIC) that indicate the probability that a given model is the best model among the tested set (Wagenmakers & Farrell, 2004).

In the context of nonlinear cognitive models, such as the three models under consideration here, it has been shown that aggregating responses across participants can bias parameter estimates and lead to incorrect conclusions (e.g., Curran & Hintzman, 1995; Estes, 1956). Hence, it is not appropriate to analyse all responses jointly as if they orginated from a single participant. Alternatively, if enough data are available, the models can be compared individually for each participant (see Lewandowsky & Farrell, 2011) or jointly using advanced hierarchical modeling techniques (e.g., Rouder & Lu, 2005). For simplicity, I fit the models to each participants’ responses individually.

Figure 6 illustrates the results of the statistical model comparison. The AIC\(_c\) analysis indicates that UVSDT provides the best description for the responses of Participants 2, 3, and 4, whereas HTM provides the best description for Participant 1’s responses because these models have the lowest AIC\(_c\) values. The results of the BIC analysis are similar but the simpler models fare better due to the added penalty for the extra variance parameter in UVSDT. For example, in case of Participant 2 BIC indicates that EVSDT is the best model. The extend to which each model is to be preferred is best reflected in the model weights.


Figure 6: Heat maps of by-participant model comparisons based on Akaike Information Criterion differences (\(\Delta\)AIC\(_c\)), Akaike weights (\(w\)AIC\(_c\)), and Bayesian Information Criterion differences (\(\Delta\)BIC), and Schwarz weights (\(w\)BIC).

Beyond the comparison of the individual models, model weights can be combined to jointly compare the latent strength models to the discrete state model, e.g.,

\[
\frac{w\text{AIC}_c^{(\text{HTM})}}{w\text{AIC}_c^{(\text{EVSDT})} + w\text{AIC}_c^{(\text{UVSDT})}}.
\]

The joint model comparison provides a direct test of the research question while taking into account the uncertainty about the implementation of the latent strength hypothesis. According to the AIC\(_c\) the discrete-state model is favored 1.82-to-1 for Participant 1—barely informative. The latent strength models are favored 2,440.99, 11.98, and 18.48-to-1 for Participants 2, 3 and 4. According to the BIC the discrete model is favored 9.64-to-1 for Participants 1, whereas the latent strength models are favored 746.36, 2.05, and 3.16-to-1 for Participants 2, 3, and 4.

To conclude, the results are somewhat contingent on the employed information criterion but indicate that overall the latent strength models tested here may provide a better description of the observed data.

Where can I learn more?

I hope this blog post has illustrated how theoretical positions can be expressed in mathematical terms and how mathematical models of cognition can help to test and compare psychological theories. If you want to learn more, I highly recommend the book by Lewandowsky & Farrell (2011) for a general introduction and the book by Lee & Wagenmakers (2014) for a detailed introduction into Bayesian estimation techniques for cognitive models, which I haven’t covered here. Also, I would like to encourage anyone to post further suggestions for introductory materials in the comments.

References

Aho, K., Derryberry, D., & Peterson, T. (2014). Model selection for ecologists: The worldviews of aic and bic. Ecology, 95(3), 631–636. doi:10.1890/13-1452.1

Baddeley, A. (1986). Working Memory. Oxford: Oxford University Press.

Blackwell, H. R. (1963). Neural Theories of Simple Visual Discriminations. Journal of the Optical Society of America, 53(1), 129–160. doi:10.1364/JOSA.53.000129

Bröder, A., & Schütz, J. (2009). Recognition ROCs are curvilinear – Or are they? On premature arguments against the two-high-threshold model of recognition. Journal of Experimental Psychology – Learning, Memory, and Cognition, 35(3), 587–606. doi:10.1037/a0015279

Cavagnaro, D. R., Myung, J. I., Pitt, M. A., & Kujala, J. V. (2009). Adaptive Design Optimization: A Mutual Information-Based Approach to Model Discrimination in Cognitive Science. Neural Computation, 22(4), 887–905. doi:10.1162/neco.2009.02-09-959

Curran, T., & Hintzman, D. L. (1995). Violations of the independence assumption in process dissociation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(3), 531–547. doi:10.1037/0278-7393.21.3.531

Eich, J. M. (1982). A composite holographic associative recall model. Psychological Review, 89(6), 627–661. doi:10.1037/0033-295X.89.6.627

Estes, W. K. (1956). The problem of inference from curves based on group data. Psychological Bulletin, 53(2), 134–140. doi:10.1037/h0045156

Hintzman, D. L. (1984). MINERVA 2: A simulation model of human memory. Behavior Research Methods, Instruments, & Computers, 16(2), 96–101. doi:10.3758/BF03202365

Kahana, M. J., & Sekuler, R. (2002). Recognizing spatial patterns: A noisy exemplar approach. Vision Research, 42(18), 2177–2192. doi:10.1016/S0042-6989(02)00118-9

Kellen, D., Erdfelder, E., Malmberg, K. J., Dubé, C., & Criss, A. H. (2016). The ignored alternative: An application of luce’s low-threshold model to recognition memory. Journal of Mathematical Psychology, 75, 86–95. doi:10.1016/j.jmp.2016.03.001

Kortt, M., & Leigh, A. (2010). Does size matter in australia? Economic Record, 86(272), 71–83. doi:10.1111/j.1475-4932.2009.00566.x

Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian Cognitive Modeling: A Practical Course. Cambridge, NY: Cambridge University Press.

Lewandowsky, S., & Farrell, S. (2011). Computational Modeling in Cognition: Principles and Practice. Thousand Oaks, CA, US: SAGE.

Luce, R. D. (1963). A Threshold Theory for Simple Detection Experiments. Psychological Review, 70(1), 61–79. doi:10.1037/h0039723

Macmillan, N. A., & Creelman, D. C. (2005). Detection theory: A user’s guide (2nd ed., Vol. xix). Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers.

Mandler, G. (1980). Recognizing: The judgment of previous occurrence. Psychological Review, 87(3), 252–271. doi:10.1037/0033-295X.87.3.252

Murdock, B. B. (1993). TODAM2: A model for the storage and retrieval of item, associative, and serial-order information. Psychological Review, 100(2), 183–203. doi:10.1037/0033-295X.100.2.183

Myung, J. I., & Pitt, M. A. (1997). Applying Occam’s razor in modeling cognition: A Bayesian approach. Psychonomic Bulletin & Review, 4(1), 79–95. doi:10.3758/BF03210778

Myung, J. I., & Pitt, M. A. (2009). Optimal experimental design for model discrimination. Psychological Review, 116(3), 499–518. doi:10.1037/a0016104

Myung, J. I., Cavagnaro, D. R., & Pitt, M. A. (2013). A tutorial on adaptive design optimization. Journal of Mathematical Psychology, 57(3), 53–67. doi:10.1016/j.jmp.2013.05.005

Myung, J. I., Navarro, D. J., & Pitt, M. A. (2006). Model selection by normalized maximum likelihood. Journal of Mathematical Psychology, 50(2), 167–179. doi:10.1016/j.jmp.2005.06.008

Nosofsky, R. M., & Palmeri, T. J. (2014). An Exemplar-Based Random-Walk Model of Categorization and Recognition. In J. Busemeyer, J. Townsend, Z. Wang, & A. Eidels (Eds.), Mathematical and Computational Models of Cognition. Oxford University Press. Retrieved from http://catlab.psy.vanderbilt.edu/wp-content/uploads/NP-Oxford2014.pdf

Palminteri, S., Wyart, V., & Koechlin, E. (2017). The Importance of Falsification in Computational Cognitive Modeling. Trends in Cognitive Sciences, 21(6), 425–433. doi:10.1016/j.tics.2017.03.011

Pitt, M. A., Myung, J. I., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109(3), 472–491. doi:10.1037/0033-295X.109.3.472

Roberts, S., & Pashler, H. (2000). How persuasive is a good fit? A comment on theory testing. Psychological Review, 107(2), 358. doi:10.1037/0033-295X.107.2.358

Rouder, J. N., & Lu, J. (2005). An introduction to bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12(4), 573–604. doi:10.3758/BF03196750

Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory: REM—retrieving effectively from memory. Psychonomic Bulletin & Review, 4(2), 145–166. doi:10.3758/BF03209391

Swets, J. A., Tanner, W. P. J., & Birdsall, T. G. (1961). Decision Processes In Perception. Psychological Review, 68(5), 301–340. doi:10.1037/h0040547

Wagenmakers, E.-J., & Farrell, S. (2004). AIC model selection using Akaike weights. Psychonomic Bulletin & Review, 11(1), 192–196. doi:10.3758/BF03206482

Yonelinas, A. P. (2002). The Nature of Recollection and Familiarity: A Review of 30 Years of Research. Journal of Memory and Language, 46(3), 441–517. doi:10.1006/jmla.2002.2864


  1. The models presented in the blog post make rather abstract assumptions about the involved cognitive processes. Some mathematical models commit to more specific assumptions and mechanisms. These models are referred to as process models although the distinction between measurement models and process models is continuous rather than dichotomous.
  2. Fitting this model to data requires several additional assumptions, such as independent and identically distributed prediction errors, that I will pass over in the interest of brevity as they are irrelevant to the models’ predictions. Also, note that linear regression models can be extended to predicted nonlinear relationships, for example, by adding exponantiated predictor terms such as \(c \times x_i^2\).
  3. Another long-standing debate revolves around whether our episodic long-term memory is a unitary storage or whether it consists of multiple qualitatively different memory systems (e.g., Mandler, 1980; Yonelinas, 2002). For simplicity we will ignore this debate and focus on theories that assume episodic long-term memory to be a unitary storage.
  4. Both information criteria quantify model complexity by counting the number of free parameters. In nonlinear cognitive models not all parameters of each model grant the same flexibility. Modern model comparison methods allow the researcher to quantify model complexity in a more principled manner (Cavagnaro et al., 2009; Myung & Pitt, 2009; Myung et al., 2013).
Frederik Aust

Frederik Aust

Frederik Aust is pursuing a PhD in cognitive psychology at the University of Cologne. He is interested in mathematical models of memory and cognition, open science, and R programming.

More Posts - Website

Facebooktwitterrss

How to stop being busy and become productive

With the rise of social media, potential distractions have risen to unseen levels; they dominate our daily lives. Do you check Facebook, Twitter, Snapchat, Instagram, or Email on a constant basis? Do you have an embarrassing relationship with your alarm clock’s snooze button? Do you pass on social invites, telling other people that you are too busy? As a generation, we have lost the ability to focus sharply on the task at hand; instead, we work on a multitude of things simultaneously, lamenting that we do not achieve what we seek to achieve.

Picture of a busy person on a computer (Lego).

In this post, we share useful tips, tricks, and tools for you to stay on top of your day and move quickly from task to task, accomplishing the things that matter. In addition to linking to further resources, we suggest a three stage actionable program for you to go through in order to stop being busy and start being productive. As we (Fabian and Lea, the authors of this post) have experienced first hand, making the jump from being busy to being productive — from workaholism to strictly separating work and play, from social exclusion to social inclusion — has the promising potential of increasing quality time spent with friends and family, accelerating the pace of skill development, avoiding burnouts, and leading to increased subjective well-being.

Challenges in the 21st Century

Why would one want to become more productive? In additional to personal reasons — leading a more happier, more accomplished, more balanced life — there are societal reasons. The 21st century presents us with unique challenges, and the way we tackle them will define the future of our species. The three most important challenges are the exploitation of the Earth (including climate change), income inequality (including world poverty), and the “rise of robots” which includes digitalisation and its impact on work. In this post, we want to focus on the latter and make the argument that, in order to stay lean, one needs to cultivate what Cal Newport calls Deep work habits, enabling one to quickly adapt to changing work environments. Additionally, these habits also increase the effectiveness with which we can tackle the three challenges.

Take data science as an example. Few fields move as fast as data science. In its current form, it didn’t even exist fifteen years ago (for a very short history of the field, see this). Now “data scientist” has become the “sexiest job of the 21st century”.

The job market will change dramatically in the coming years. It is predicted that many jobs will fall out of existence, being taken over by machines, and that new jobs will be created (see this study and these books). Humanity is moving at an incredibly fast pace, and each individual’s challenge is to stay sharp amidst all those developments. To do so requires the ability to quickly learn new things, and to spend time productively — the two skills which make you most employable.

Being busy vs being productive

Every day, week, and month we have a number of tasks and obligations we need to address; the way we organize the time spent on getting these done differs strongly among individuals. It is here that the distinction between being busy and being productive becomes apparent.

When thinking of someone who is busy, usually we picture someone who tries to complete a task while in the same time thinking about some other task, checking social media, email, or conversing with other people. The splitting of attention on multiple things at once, while claiming to be working on a really important task, is a dead giveaway. This causes the task at hand to take forever to be completed. Oddly enough, the extensive time this task takes to be completed need not bother a busy person. On the contrary, it provides an opportunity to talk a lot about being busy, having so much to do, having so many exams, etc. This leads to cancellations of social plans and less time for leisure activities. Too many things to do, not enough time. One gets more and more frustrated.

On the other hand, a productive person is a responsible person with a focus on setting clear, few priorities and thinking of measurable steps how to achieve her goal. While working, an intense focus and undivided attention is directed on a single activity. Keeping track of progress gives a clear idea of what has been achieved during the day and what is left for tomorrow.

The distinction between being busy and being productive is at the core of this blog post. Table 1 below gives an overview of what distinguishes these two states.

Table describing the difference between being busy and being productive

Table 1. Describes the difference between being busy and being productive.

Learning how to learn

In addition to personal productivity, which will be the focus of the remaining sections, being able to monitor one’s learning progress and learning new things quickly is another very important skill. Barbara Oakley and Terrence Seijnowski have designed an online course over at Coursera called “Learning How To Learn” in which they discuss, among other things, the illusion of competence, memory techniques, and how to beat procrastination. It is the most popular, free course on Coursera and we highly recommend it.

Tips, tricks, and tools

Note that these are personal recommendations. Most of them are backed by science or common sense, but they need not work for you. This is a disclaimer: your mileage may vary.

Manage your time. Time is your most important commodity. You can’t get it back, so consider spending it wisely. To facilitate that, we highly recommend the Bullet Journal. It is an “analog tool designed for the digital world”. All you need is a notebook — we use a Leuchtturm1917, but any other would do, too — and a pen. Here is a video explaining the basics. It combines the idea of keeping track of your time and obligations while providing a space for creativity.

Schedule tasks & eat your frog first. Write down what needs to get done the next day on the evening before. Pick out your most despised task — your frog — and tackle it first thing in the morning. If you eat your frog first, there is nothing more disgusting that can happen during the day. Doing this mitigates procrastination and provides a sense of accomplishment that keeps you energy levels up.

Avoid social media. Social media and email have operantly conditioned us; we get a kick out every notification. Thousands of engineers are working on features that grab our attention and maximize the time we spent on the platforms they build (see also this fascinating interview). However, checking these platforms disrupts our workflow and thought process. They train us to despise boredom and instill in us the unfortunate need of having something occupy our attention at all times. Therefore, we recommend having fixed time points when you check email, and not spend too much time on social media before late in the afternoon or evening, when energy is low. More important tasks require attention during the day when your mind is still sharp.

We feel that quitting social media altogether is too extreme and would most likely be detrimental to our social life and productivity. However, we did remove social media apps from our phones and we limit the number of times we log onto these platforms per day. We recommend you do the same. You will very soon realize that they aren’t that important. Time is not well spent there.

Stop working. There is a time for work, and there is a time for play. We recommend setting yourself a fixed time when you stop working. This includes writing and responding to emails. Enjoy the rest of the day, read a book, learn a new skill, meet friends, rest your mind. This helps your mind wander from a focused into a diffuse mode of thinking which helps with insight problems such as “Thiss sentence contains threee errors.” If you do this, you will soon realize a boost in your overall creativity and subjective well-being. Cal Newport has structured his schedule according to this principle, calling it fixed-schedule productivity.

Build the right habits. Being productive is all about building the right habits. And building habits is hard; on average, it takes 66 days to build one, although there is great variability (see Lally et al., 2009, and here). In order to facilitate this process, we recommend Habitica, an app that gamifies destroying bad habits and building good habits; see Figure 1 below.

Figure 1. From left to right, shows the apps Habitica, Calm, and 7 Minute. The important thing is to not break the chain. This creates a psychological need for continuation. Note the selection bias here. It took me over a month to get to level 3 in Habitica. Don’t expect miracles; take small, consistent steps every day.

Workout. In order to create high quality work, you need to take care of your body; you can’t really be productive when you are not physically fit. Staying fit by finding an exercise routine that one enjoys and can manage is one of the best things we do, and we can only recommend it. Being able to climb stairs without getting out of breath is just one of the many rewards.

Meditate or go for a run. In order to increase your ability to focus and avoid distractions, we recommend meditation. For this purpose, we are using Calm, but any other meditation app, for example Headspace, yields similar results. (Of course, nothing beats meditating in a Buddhist centre.) This also helps during the day when some stressful event happens. It provides you with a few minutes to recharge, and then start into the day afresh. Going for a run, for example, does the same trick.

Someone asked a Zen Master, “How do you practice Zen?”
The master said, “When you are hungry, eat; when you are tired, sleep.”
“Isn’t that what everybody does anyway?”
The master replied, “No, no. Most people entertain a thousand desires when they eat and scheme over a thousand thoughts when they sleep.”

Powernap. This is one of the more unconventional recommendations, but it has worked wonders for our productivity. In the middle of the day, take a short power nap. It provides a boost of energy that lasts until bedtime (for more, see this).

Process versus Product. For starting to work, focusing on process rather than product is crucial. Set yourself a timer for, say, 25 minutes and then fully concentrate on the task at hand. Take a short break, and start the process again. In this way, you will focus on bursts of concentrated, deep work that bring you step by step towards your final outcome, say a finished blog post.

This approach is reminiscent of the way Beppo, the road sweeper, works in Michael Ende’s book Momo. About his work, he says

“…it’s like this. Sometimes, when you’ve a very long street ahead of you, you think how terribly long it is and feel sure you’ll never get it swept. And then you start to hurry. You work faster and faster and every time you look up there seems to be just as much left to sweep as before, and you try even harder, and you panic, and in the end you’re out of breath and have to stop — and still the street stretches away in front of you. That’s not the way to do it.

You must never think of the whole street at once, understand? You must only concentrate on the next step, the next breath, the next stroke of the broom, and the next, and the next. Nothing else.

That way you enjoy your work, which is important, because then you make a good job of it. And that’s how it ought to be.

And all at once, before you know it, you find you’ve swept the whole street clean, bit by bit. What’s more, you aren’t out of breath. That’s important, too.”

This technique is sometimes called the “Pomodoro”, and apps help achieving that abound. Although you need no app for this, apps are nice because they keep track of how many Pomodoros you have finished on a given day, providing you with a direct measure of your productivity. We can recommend the Productivity Challenge Timer.

Write down ten ideas. This recommendation comes from James Altucher, who wrote Reinvent Yourself which is an entertaining book with chapters such as “Seven things Star Wars taught me about productivity” and “The twenty things I’ve learned from Larry Page”. The habit is simple: write down ten ideas every day, on any topic. The basic rationale behind this is that creativity is a muscle, and like every other muscle, training it increases its strength. Most of the ideas will be rather useless, but that doesn’t matter. Now and then there will be a really good one. This habit probably has strong transfer effects, too, because creativity is required in many areas of life.

Read, Read, Read. There’s a saying that most people die by age 25 but aren’t put into a coffin until age 75. Reading allows your mind to continuously engage with novel ideas. We recommend Goodreads to organize and structure your reading.

Reflect on your day. Take a few minutes in the evening to reflect on your day. Keep a gratefulness journal in which you write down five things you are grateful for each day (this might also increases your overall happiness, see, e.g., here). Summarize your day in a few lines, pointing out the new things you have learned.

Does it work? Quantifying oneself

It is important to once in while take a cold, hard look into the mirror and ask: What am I doing? Am I working on things that matter, am I helping other people? Am I progressing, or am I stagnating in the comfort zone? Am I enjoying my life?

A useful habit to build is to, every evening, reflect on one’s behaviour and the things that have happened during the day. To achieve this, I (Fabian) have created a Google Form that I fill out daily. It includes, among others, questions on what I have eaten during the day, on the quality of my social interactions, on what the most important thing I have learned today; see Figure 2 below. It also asks me to summarize my day in a few lines.

Figure 2. Quantified Self questions. Every evening I reflect on the day by answering these questions. You can create your own, adapting the questions to your needs.

I have not done much with the data yet, but I know that just the process of answering the questions is very reflective and soothing. It is also valuable in the sense that, should there be too many days in which I feel bad, this will be directly reflected in the data and I can adjust my behaviour or my environment. I can wholeheartedly recommend this tiny bit of quantified self at the end of the day.

Incidentally, there is a whole community behind this idea of quantifying oneself. They go much further. As with most things, it is all about finding the right balance. It is easy to become overwhelmed when engaging with too many tools that measure your behaviour; you might end up being busy and chasing ghosts.

A 3 Stage program

In order to succeed in whatever area of life, commitment is key. Reading a blog post on productivity is the first step in a long journey towards actual behaviour change. In order to help you take this journey, we suggest three “stages”. Note that they are not necessarily sequential; you can take ideas from Stage 3 and implement them before things listed in Stage 1. The main reason behind these stages is that you should avoid being overwhelmed. Take small steps and stick to them. The first two stages will probably take one or two months, while the latter will take a bit longer.

Stage 1

Stage 1 is about getting started. It is about you becoming clear of your motivation; why do you want to be productive? What are the issues that plague or annoy you in the way you currently work? We recommend that you

  • Figure out and write down your motivation for why you want to be productive
  • Become aware of your social media use
  • Enroll in and complete Learning How to Learn
  • Start using the Pomodoro technique
  • Create an account on Habitica, adding habits you want to build or destroy
  • Uninstall social media apps from your phone
  • Set yourself a time point after which you will not check email nor social media

Stage 2

Stage 2 is about staying committed and developing a healthier and more consistent lifestyle.

  • Stay committed to your habits and review your motivation
  • Review what you have accomplished during the last months
  • Develop a consistent sleep-wake cycle
  • Develop a morning ritual
  • Eat healthy food, not too much, mostly plants
  • Start to exercise regularly (at least 3x a week)
  • Start a Bullet Journal

Stage 3

Stage 3 is about exceeding what you have accomplished so far. It is about figuring out your goals and the skills you want to develop. It is about not staying in your comfort zone, about building a habit of reading a variety of books, and becoming more engaged with others. It is from other people that we can learn the most.

  • Stay committed to your habits and review your motivation
  • Review what you have accomplished during the last months
  • Figure out what skills you want to develop
  • Read Deep Work and figure out a Deep Work routine that suits you
  • Engage with others and exchange ideas and practices
  • Find mentors for the skills you want to develop (e.g., writing, programming)
  • Create an account on Goodreads and organize your reading
  • Read at least two books per month

Conclusion

We have started this blog post discussing the future of work. But it’s not really about work. Sure, applying the ideas we have sketched will make you more productive professionally; but it’s not about running in a hamster wheel, meeting every objective at work or churning out one paper after another. Instead, it’s about finding the right balance of work and play, engaging in meaningful activities, and enjoying life.

If you take anything from this blog post, it should be the following three points.

If you work, work hard. If you’re done, be done. This means sharply separating work from play. It is important for avoiding burning out, for creating an atmosphere in which creativity and novel ideas flourish, for enhancing your life through spending time with friends and family, and, overall, for increasing the amount of play in your life. After all, play is what makes life joyful.

Never be the smartest person in the room. This is about learning from others. Identify the skills you want to develop, and seek out mentors for those skills; mentors will rapidly speed up your learning. Additionally, hang out with people with different backgrounds. This exposes you to ideas that you would not otherwise be exposed to. It is the people who we barely know that have the capacity to change our lives the most.

Be relevant. This is the culmination of the whole post. It is about helping others and having a lasting impact. This might entail donating to the world’s poorest; being there for a friend in dire times; pushing people to expand their horizons; helping them develop in the direction they want to develop in; working on projects that have a lasting positive impact. It is about doing the things that matter.

Recommended Resources

80.000 hours
Learning How To Learn
Deep Work (or How to Become a Straight-A Student)
– Cal Newport’s fixed-schedule productivity

This post was written together with Lea Jakob and is based on a workshop we have presented at the 31st EFPSA Congress in Qakh, Azerbaijan in April — twice. The feedback we got from participants was extremely positive, and so we decided to write up the main points. This post will also act as a reminder to ourselves should we ever be lead astray and fall back into old habits.

Fabian Dablander

Fabian Dablander is currently finishing his thesis in Cognitive Science at the University of Tübingen and Daimler Research & Development on validating driving simulations. He is interested in innovative ways of data collection, Bayesian statistics, open science, and effective altruism. You can find him on Twitter @fdabl.

More Posts - Website

Facebooktwitterrss

Are You Registering That? An Interview with Prof. Chris Chambers

There is no panacea for bad science, but if there were, it would certainly resemble Registered Reports. Registered Reports are a novel publishing format in which authors submit only the introduction, methods, and planned analyses without actually having collected the data. Thus, peer-review only focuses on the soundness of the research proposal and is not contingent on the “significance” of the results (Chambers, 2013). In one strike, this simple idea combats publication bias, researchers’ degrees of freedom, makes apparent the distinction between exploratory and confirmatory research, and calms the researcher’s mind. There are a number of journals offering Registered Reports, and this is arguable the most important step journals can take to push psychological science forward (see also King et al., 2016). For a detailed treatment of Registered Reports, see here, here, here, and Chambers (2015).

Picture of Chris Chambers

Chris Chambers is the initiator of the “Registration Revolution”, the man behind the movement. He has introduced Registered Reports into psychology, has written publicly about the issues we currently face in psychology, and has recently published a book called the “7 Deadly Sins of Psychology” in which he masterfully exposes the shortcomings of current academic customs and inspires change. He is somebody who cares deeply about the future of our field, and he is actively changing it for the better.

We are very excited to present you with an interview with Chris Chambers. How did he become a researcher? Where did he get the idea of Registered Reports from? What is his new book about, and what can we learn from hard sciences such as physics? Find out below!


Tell us a bit about your background. How did you get into Psychology and Cognitive Neuroscience? What is the focus of your research?

Since my teenage years I had been interested in psychology (the Star Trek Next Generation episode “Measure of a Man” left me pondering the mind and consciousness for ages!) but I never really imagined myself as a psychologist or a scientist – those seemed like remote and obscure professions, well out of reach. It wasn’t until the final year of my undergraduate degree that I developed a deep interest in the science of psychology and decided to make a run for it as a career. Applying to do a PhD felt like a very long shot. I have this distinct memory, back in 1999, scrolling down the web page of accepted PhD entrants. I searched in vain for my name among the list of those who had been awarded various prestigious scholarships, and as I neared the bottom I began pondering alternative careers. But then, as if by miracle, there was my name at the end. I was last on the list, the entrant with the lowest successful mark out of the entire cohort. For the next two and half years I tried in vain to replicate a famous US psychologist’s results, and then had to face having this famous psychologist as a negative reviewer of every paper we submitted. One day – about two years into my PhD – my supervisor told me about this grant he’d just been awarded to stimulate people’s brains with electromagnetic fields. He asked if I wanted a job and I jumped at the chance. Finally I could escape Famous Negative Reviewer Who Hated Me! Since then, a large part of my research has been in cognitive neuroscience, with specific interests in attention, consciousness and cognitive control.

You have published an intriguing piece on “physics envy” (here). What can psychology learn from physics, and what can psychologists learn from physicists?

Psychology can learn many lessons from physics and other physical sciences. The physics community hinges reputation on transparency and reproducibility – if your results can’t be repeated then they (and you) won’t be believed. They routinely publish their work in the form of pre-prints and have successfully shaped their journals to fit with their working culture. Replication studies are normal practice, and when conducted are seen as a compliment to the importance of the original work rather than (as in psychology) a threat or insult to the original researcher. Physicists I talk to are bemused by our obsession with impact factors, h-indices, and authorship order – they see these as shallow indicators for bureaucrats and the small minded. There are career pressures in physics, no doubt, but at the risk of over-simplifying, it seems to me that the incentives for individual scientists are in broad alignment with the scientific objectives of the community. In psychology, these incentives stand in opposition.

One of your areas of interest is in the public understanding of science. Can you provide a brief primer of the psychological ideas within this field of research?

The way scientists communicate with the public is crucial in so many ways and a large part of my work. In terms of outreach, one of my goals on the Guardian science blog network is to help bridge this gap. We’ve also been exploring science communication in our research. Through the Insciout project we’ve been investigating the extent to which press releases about science and health contribute to hype in news reporting, and the evidence suggests that most exaggeration we see in the news begins life in press releases issued by universities and academic journals. We’ve also been looking at how readers interpret common phrases used in science and health reporting, such as “X can cause Y” or “X increases risk of Y”, to determine whether the wording used in news headlines leads readers to conclude that results are more deterministic (i.e. causal) than the study methods allow. Our hope is that this work can lead to evidence-based guidelines for preparation of science and health PR material by universities and journals.

I’m also very interested in mechanisms for promoting evidence-based policy more generally. Here in the UK I’m working with several colleagues to establish a new Evidence Information Service for connecting research academics and policy makers, with the aim to provide parliamentarians with a rapid source of advice and consultation. We’re currently undertaking a large-scale survey of how the academic community feels about this concept – the survey can be completed here.

You have recently published a book titled “The 7 Deadly Sins of Psychology”. What are the sins and how can psychologists redeem themselves?

The sins, in order, are bias, hidden flexibility, unreliability, data hoarding, corruptibility, internment and bean counting. At the broadest level, the path to redemption will require wide adoption of open research practices such as a study preregistration, open data and open materials, and wholesale revision of the systems we use to determine career progression, such as authorship rank, journal rank, and grant capture. We also need to establish robust provisions for detecting and deterring academic fraud while at the same time instituting genuine protections for whistleblowers.

How did you arrive at the idea of Registered Reports for Psychology? What was the initial response from journals that you have approached? How has the perception of Registered Reports changed over the years?

After many years of being trained in the current system, I basically just had enough of publication bias and the “academic game” in psychology – a game where publishing neat stories in prestigious journals and attracting large amounts of grant funding is more rewarded than being accurate and honest. I reached a breaking point (which I write about in the book) and decided that I was either going to do something else with my life or try to change my environment. I opted for the latter and journal-based preregistration – what later became known as Registered Reports – seemed like the best way to do it. The general concept behind Registered Reports had been suggested, on and off, for about 50 years but nobody had yet managed to implement it. I got extremely lucky in being able to push it into the mainstream at the journal Cortex, thanks in no small part to the support of chief editor Sergio Della Sala.

The initial response from journals was quite cautious. Many were – and still are – concerned about whether Registered Reports will somehow produce lower quality science or reduce their impact factors. In reality, they produce what in my view are among the highest quality empirical papers you will see in their respective fields – they are rigorously reviewed with transparent, high-powered methods, and the evidence also suggests that they are cited well above average. Over the last four years we’ve seen more than 50 journals adopt the format (including in some prominent journals such as Nature Human Behaviour and BMC Biology) and the community has warmed up to them as published examples have begun appearing. Many journals are now seeing them as a strength and a sign that they value reproducible open science. They are realising that adding Registered Reports to their arsenal is a small and simple step for attracting high-quality research, and that having them widely available is potentially a giant leap for science as a whole.

Max Planck, the famous German Physicist, once said that science advances a funeral at a time. Let’s hope that is not true —  we simply don’t have the time for that. What skills, ideas, and practices should the next generation of psychological researchers be familiar and competent with? What further resources can you recommend?

I agree – there is no time to wait for funerals, especially in our unstable political climate. The world is changing quickly and science needs to adapt. I believe young scientists can protect themselves in two ways: first, by learning open science and robust methods now. Journals and funders are becoming increasingly cognisant of the need to ensure greater reproducibility and many of the measures that are currently optional will inevitably become mandatory. So make sure you learn how to archive your data, or preregister your protocol. Learn R and become familiar with the underlying philosophy of frequentist and Bayesian hypothesis testing. Do you understand what a p value is? What power is and isn’t? What a Bayes factor tells you? My second recommendation is to recognise these tumultuous times in science for what they are: a political revolution. It’s easy for more vulnerable members of a community to be crushed during a revolution, especially if isolated, so young scientists need to unionise behind open science to ensure that their voices are heard. Form teams to help shape the reforms that you want to see in the years ahead, whether that’s Registered Reports or open data and materials in peer review, or becoming a COS Ambassador. One day, not long from now, all this will be yours so make sure the system works for you and your community.

Fabian Dablander

Fabian Dablander is currently finishing his thesis in Cognitive Science at the University of Tübingen and Daimler Research & Development on validating driving simulations. He is interested in innovative ways of data collection, Bayesian statistics, open science, and effective altruism. You can find him on Twitter @fdabl.

More Posts - Website

Facebooktwitterrss

Introducing jamovi: Free and Open Statistical Software Combining Ease of Use with the Power of R

For too long, Psychology has had to put up with costly, bulky, and inflexible statistics software. Today, we’d like to introduce you to a breath of fresh air: jamovi, free statistics software available for all platforms that is intuitive and user-friendly, and developed with so much pace that its capabilities will potentially soon outrun SPSS.

Screenshot of jamovi

 

As can be seen above, jamovi has a beautiful user interface with some very handy features: It does real-time computation and presents and updates results immediately with beautiful figures and neat APA tables. These results can then be copy-pasted into your editing software such as Word. Basic analyses (e.g., t-tests, ANOVAs, correlations, contingency tables, proportion tests) are already available and more will be arriving shortly. What’s more, packages from the powerful R software can be easily adapted so that they can be used within jamovi’s beautiful user interface. In this way, jamovi can give you access to the power of the R language, but without having to learn the R syntax. For those wanting to learn R, jamovi can help there too: with just one mouse click jamovi delivers the R syntax underlying each analysis.

Another gadget of jamovi is live data management: You can edit your data directly in the software, and if you change something, results that depend on these changes are immediately updated in the output window. Imagine how this would work in SPSS: Change a data point, click through all the menus again or re-activate the relevant syntax, manually delete the old output, all in order to get ugly figures and tables that need additional time investment to become beautiful or in accordance with APA-format; with jamovi, these strenuous days are over!

One particular and useful type of analysis is also already available in jamovi: The TOSTER module. This analysis allows testing whether data support a null hypothesis (e.g., the absence of a meaningful effect), which is often what we want to know but not possible to test with most statistics packages.

Thus, there are many reasons to install and use jamovi right away, and if you want to help your peers, you can develop your own R-based jamovi modules and make them freely available for everyone in the jamovi store.


Interview with Jonathon Love, jamovi co-founder and developer

jamovi might remind you of another recently established free stastistics software: JASP. Indeed, Jonathon Love, Damian Dropmann, and Ravi Selker were all developers of JASP who now develop jamovi. The two software packages may at first seem similar, but they emphasize different functionality. This means both packages will continue to be developed, and users can enjoy the benefits of both. Let’s see what Jonathon, former lead developer and designer of JASP, and now one of the jamovi core developers, has to say about this and more:

During our last interview, you were lead developer of JASP. What was your motivation to start jamovi and what happened since then?

So developing JASP was really fabulous, and something we all really enjoyed doing. But we did find that our ambitions, hopes and dreams went beyond JASP’s original goals. JASP has always been heavily focused on Bayes, and we wanted the freedom to explore other statistical philosophies.

At the same time, a number of technologies had matured to the point where we could build a more advanced software architecture in a much shorter amount of time. When I began JASP, I had to choose between older, “tried and true” technologies (C++ and QWidgets), and the newer, up and coming HTML5+js technologies. At the time, I concluded the newer technologies just weren’t mature enough for a large project like JASP.

Fast-forward a few years, and everything has changed. HTML5+js have overcome leaps and bounds and have become a capable, mature framework. Similarly, other developments have made things that before were very difficult, much more straight forward. For example, the R6 R package has enabled us to create a much more elegant analysis framework, allowing rich graphical analyses to be developed in much less time, and to support data-editing. Similarly, it has made it feasible to provide one of the most requested features: R syntax for each analysis.

So the decision to begin jamovi was a combination of ambitions beyond the JASP project’s core goals, and seeing the opportunities that newer technologies provided.

You launched jamovi a couple of weeks ago and so far only few analyses are available. When will jamovi offer a scope of analyses comparable to SPSS?

So we actually think SPSS is overwhelming, making the user navigate a huge labyrinth of menus filled with analyses most people will never use. We do want to provide a lot of analyses, but we’ll do it in a different way. Our intention is to provide all the basic analyses used in undergraduate social science courses in the next few months, and we have the ambitious roadmap of being a viable (and compelling!) alternative to SPSS for the majority of social science researchers by August, providing all these analyses, and providing complete data-editing, cleaning, filtering and restructuring.

For additional, or more specialised analyses, we hope to build a community of developers providing analyses as “jamovi modules”. jamovi modules are R packages which have been augmented to run inside jamovi and provide analyses with a user-interface. Importantly, these modules still function  as R packages making the analyses usable from both platforms. People are then able to publish jamovi modules they create through the “jamovi store” (and CRAN), making them available to anyone. We recently worked with Daniel Lakens to produce a jamovi module of his TOSTER package, and that’s come together very nicely. There’s a few more modules in development that we know about, and you can expect further announcements in the coming weeks!

One of the neat things about the jamovi store is that it allows us to keep jamovi itself simple, and allows people to only install the analyses that are important to them. For those familiar with R, this is exactly how it works with CRAN, and we hope to duplicate its success, but for analyses with rich, accessible user interfaces.

jamovi is built on the idea that developers create jamovi modules for their R packages. Why should they do that?

There are two answers here: for science, and for themselves.

For science, because not everyone is, can be, or needs to be an R programmer. People have strengths in different areas. As long as new analyses are only available to people who can work with R, there are a lot of scientists who will be left behind. So I think it is imperative that we make new and advanced analyses available and accessible to everyone – that’s one of the core motivations for jamovi.

But creating jamovi modules can also be significant for the authors of analyses. One of the most significant metrics in science is how widely someone’s work is used, and a jamovi module ensures an analysis is accessible to the greatest number of people possible. So there are good career incentives for people to develop jamovi modules too.

Therefore, we encourage R developers to look into developing jamovi modules. The jamovi developer hub provides tutorials walking you through the process of writing a jamovi module: dev.jamovi.org, and if people would like help or advice, we can pair them up with a “dev mentor”. There are also forums where people can post questions. We’re keen to support the developer community in whatever way we can.

Readers of this interview will inevitably compare jamovi to JASP. What do you see as jamovi’s most distinctive features? Where can you borrow from JASP’s approach?

So our distinctive features are: data-editing, our R syntax mode, and the jmv R package.

Data-editing is one of my favourite features, because it takes something crazy complex, and makes  it seem really easy! You’ll notice that if you run an analysis, say descriptives, and then start changing some values in the data view, the descriptives analysis updates in real-time. This in  itself is cool, but you’ll also notice that only the columns in the descriptives analysis affected by the data changes are updated. Under the hood, jamovi is dynamically figuring out which values in the results need to change in response to the data changing – and only recalculating those. This is pretty neat.

R syntax mode is another favourite. jamovi can be placed in “syntax mode”, where the R code for producing each analysis in R is provided. This is super-cool, because it makes it easy for people to see and learn R code, and it also allows them to copy and paste the R code into an interactive R session. This allows people to make the jump to R, if that’s an area they are wanting to develop skills in.

Our jmv R package is the other half of “syntax mode”; an R package which provides all the  analyses included in jamovi. This is awesome, because it means that a single R package will cover entire undergraduate social sciences programs. In the past, doing something like an ANOVA with all the contrasts, assumption checks, post-hoc corrections, etc. required in the order of 7 packages. So it’s been exciting to bring all of those elements together, and make them simpler for R users as well.

With respect to JASP’s approach, Eric-Jan Wagenmakers and the JASP guys have put a lot of effort, and continue to put a lot of effort into making new Bayesian analyses accessible to a broader audience. Their analyses represent a truly fabulous contribution and we’ll definitely be keeping a keen eye on what they get up to. You should too!

What are the biggest challenges ahead in developing and disseminating jamovi?

The chicken and egg problem. Always the chicken and egg problem!

People are reluctant to adopt a new platform when not all the supporting materials, videos, textbooks, etc. have been created yet. At the same time, the content creators are reluctant to provide supporting materials, because people seem reluctant to adopt it. In this way, markets tend to resist change, and overturning the status-quo often poses a frustrating challenge.

This phenomena isn’t just limited to software; you’ll find that it applies to many areas in science. Of course, change can, does, and must take place, and so the challenge is putting all the pieces in place so that new ideas, new paradigms, and new pieces of software can be adopted. In my view, this is almost always the biggest challenge, but it must be overcome — progress depends upon it!

So it’s been pretty exciting seeing the level of support coming from the community. We’ve had a surprising number of very promising talks with authors and publishers. I think we’ll have some pretty exciting announcements in the coming months, and it looks like we’re well on the way to hatching that chicken … or egg … or whatever.

How is jamovi being funded? How can users be sure of its continuing existence?

So at the moment jamovi is still in the early stages, and our emphasis has been on demonstrating  that we have the sort of trajectory that people can get behind, and so we currently don’t have a lot of funding. I work for the university of Newcastle, and volunteer my time on jamovi, and the same applies to the other core developers. However, people can still feel confident in the future of jamovi.

We expect to provide a complete and practical alternative to SPSS by August – with full data-editing, filtering, restructuring, the works. At that time, jamovi could be considered “complete”. We don’t intend on stopping developing then, but if we did, jamovi would still be (in our view) one of the best tools available for social scientists, probably for years to come. It won’t require a lot of effort to continue to maintain jamovi into the future, and people can feel confident that jamovi will be here for years to come. (There’s a persistent myth, that the maintenance of software once written requires substantial resources to maintain. Indeed, in proprietary software it’s often a problem that old software “just keeps working”, and it’s hard to persuade customers to pay for newer versions!)

Having said all that, we are keen to develop funding and business models to support additional development of jamovi – and we have big plans going into the future. In the short-term, our efforts are concentrated on creating a viable alternative to SPSS, but longer term we want to provide a range of additional paid services that make the lives of researchers easier. jamovi itself will always (and must!) be free and open-source, but there’s a range of areas where we think we can provide services to make researchers more productive, and where it would be reasonable to charge a fee.

We’re also keen for benefactors, so if you or your institution benefit or stand to benefit from the work of the jamovi team, you could consider making a financial contribution to our work. Such a contribution would allow us to ramp up development, and provide a greater range of features. If there are particular features and analyses which are important to you or your institution, you could sponsor their development (e.g.,  reproducibility in a spreadsheet? We’d love to do that!). Do drop us a line.

How does the curious reader get started with jamovi?

jamovi is pretty straight forward to use, and it contains several example data-sets that make it easy  to get up and running. I’d recommend downloading and installing jamovi, and just playing around with it. We also have a user-guide, complete with neat little videos demonstrating the basic features. If you’ve used SPSS before, you should find the user interface concepts quite familiar; like the dragging and dropping of variables for an analysis. It’s designed to be easy and straight-forward to use, and if you find this not to be the case, do drop us a line in the forums. We’re very keen for feedback, and to make jamovi the best it can be!

 

Peter Edelsbrunner

Peter Edelsbrunner

Peter is currently doctoral student at the section for learning and instruction research of ETH Zurich in Switzerland. He graduated from Psychology at the University of Graz in Austria. Peter is interested in conceptual knowledge development and the application of flexible mixture models to developmental research. Since 2011 he has been active in the EFPSA European Summer School and related activities.

More Posts

Facebooktwitterrss

Magical 7±2 Tips for Psychologists Participating in a Hackathon

A hackathon is an event, typically lasting for 24-48 hours, in which a group of people with diverse backgrounds come together to solve a problem by building a first working prototype of a solution (usually a web app, program or a utility).

There is something inherently likable, or dare I say, smart, about hackathons. They have a specific goal, your progress and results are measurable, getting a first working prototype is both achievable and realistic, and it will all be over in 24-48 hours. I have come to appreciate hackathons a lot over the last five months where I’ve participated in five, and won two of them with my teams. I would like to invite you to participate in one as well by giving you 7±2 tips to make your hackathon experience especially enjoyable.

#1 Just go

There’s more to a hackathon than just programming. Every team needs to tackle a wide variety of tasks ranging from totally non-technical to highly technical. Someone has to make nice visuals, look for evidence to back the product, write code to make it work and combine all the work into a meaningful proposal for the jury. The best teams in a hackathon have a diverse set of skills in a team (although always at least one developer).

Worst case scenario is that you’ve grown your professional network, enjoyed some social meals and gained invaluable experience of developing something from an ideation phase to a first working prototype.

#2 Focus on your unique skillset

Your expertise in your favorite domain in psychology will be an important contribution to the team. In the Accenture Digital Hackathon our Bitein team worked on making language learning easier. As an experimental psychologist with a special interest in memory research, I could make sure that our product incorporates spaced repetition and self-testing – two most scientifically backed ways to enhance memory.

From psychological research we know that brainstorming sessions generate way more ideas when participants brainstorm on their own first, and only then share their ideas with others. We can use our questionnaire building skills to carry out – decent market research, or design experiments (A/B tests) to make confident causal statements about the solid base for our product. The more you develop your technical skills, the more you can be involved with the implementation of these ideas yourself.

#3 Focus on giving your best (not winning)

Winning is not under your control, doing your best is. Winning is a destination, doing your best is a process that optimizes your chances of getting there. Having the attitude of focusing on things under your control allows you to feel good about the progress you are making without making unfair comparisons with others. In every hackathon I’ve been to, there have been teams who silently leave the event thinking their great idea was crap just because they didn’t win a prize.

Doing your best includes working with a goal in mind and with a clear understanding of the judging criteria, also optimizing your chances of winning a sponsor prize. But, judges make their decision based on the competition of that event. Also, you end up in a team with people you didn’t know before and your team might choose to pursue an idea in a field you are not familiar with – yet.

BiteIn team choosing a project at Accenture Digital Hackathon. From the right: Taavi Kivisik, Amine Rhord, Jedda Boyle, our mentor Nima Rahbari, Zhi Li and Paulina Siwak. (Photo courtesy of Marija Jankovic.)

#4 Prepare for the pitch, and practise!

One of the biggest mistakes teams can do in a hackathon is to underestimate the importance of an amazing pitch. In most cases, those 2 minutes are the only time the judges ever hear about your product (a technical check is done separately).

Hackathon organizer and pitch coach Brian Collins recommends teams to choose the pitcher early and start practising early. Also, at least the pitcher should get a good night’s sleep. It means that the pitcher knows in advance to start gathering punchlines, finding her own phrasing that would carry the meaning seamlessly, and packaging it in a unique manner. Three hours of pitch preparation has been the absolute minimum in my teams.

#5 Have a working prototype

There is a big difference between teams selling an idea, and teams that sell an idea with a working prototype. If your idea relies on translating parts of webpages, then demonstrate that you can do that and forget building the login screen. Get something ready, and then, don’t break it (or just use Git).

At IBC Hackfest, our Skipaclass team’s lead developer Itay Kinnrot categorically refused to make any changes to the code during the last hour before the deadline. Our prototype was working flawlessly and our pitcher could sell our future development plans on top of that solid basis. We won.

#6 Have fun

Hackathons are engaging, thrilling and intense. Most people even spend the night at the venue. It can quickly induce a state where the only thing you think of is your idea and your prototype. But, hackathons bring together an amazing bunch of people. Take time to learn more about your teammates, who they are as fellow human beings. Take time to talk to that mentor working in a company you would love to work for. One day, one of these mentors might offer you a job just like Stefan Hogendoorn (mentor at IBC Hackfest) offered me a job at Qlouder.

 

Participating in hackathons has been lots of fun and a great place for professional development. Just type in ‘hackathon’ and your current city to get hackathon experiences of your own.
PS! If you are still wondering why ‘magical’ and ‘7±2’, then click here.

Taavi Kivisik

Data scientist and developer at Qlouder. While at the University of Tartu and University of Toronto, I was inspired to learn more about efficient learning and mnemonics. Midway through the studies I discovered my passion for research methodology and technical side of research, statistics and programming, also machine learning. I’m volunteering as a Lead Archivist for the Nordic Psychology Students’ Conference (NPSC). I'm former President of the Estonian Psychology Students’ Association and former Junior Editor at the Journal of European Psychology Students’ (JEPS). I sometimes tweet @tkivisik .

More Posts

Follow Me:
TwitterLinkedIn

Facebooktwitterrss

Open online education: Research findings and methodological challenges

With a reliable internet connection comes access to the enormous World Wide Web. Being so large, we rely on tools like Google to search and filter all this information. Additional filters can be found in sites like Wikipedia, offering a library style access to curated knowledge, but it too is enormous. In more recent years, open online courses has rapidly become a highly popular method of gaining easy access to curated, high quality, as well as pre-packaged knowledge. A particularly popular variety is the Massive Open Online Course, or MOOC, which are found on platforms like Coursera and edX. The promise – global and free access to high quality education – has often been applauded. Some have heralded the age of the MOOC as the death of campus based teaching. Others are more critical, often citing the high drop-out rates as a sign of failure, or argue that MOOCs do not or cannot foster ‘real’ learning (e.g., Zemsky, 2014; Pope, 2014).

For those who are not aware of the MOOC phenomenon I will first briefly introduce them. In the remainder of this post I will discuss how we can learn about open online courses, what the key challenges are, and how the field can move forward.

What’s all this buzz about?

John Daniel (2012) called MOOCs the official educational buzzword of 2012, and the New York Times called it the Year of the MOOC. However, the movement started before that, somewhere around 2001 when the Massachusetts Institute of Technology (MIT) launched its OpenCourseWare (OCW) to share all its courses online. Individual teachers have been sharing digital content before (e.g., ‘Open Educational Resources’ or OER; Lane & McAndrew, 2010), but the scale and quality of OCW was pioneering. Today, MOOCs can be found on various platforms, such as the ones described in Table 1 below.

Table 1. Overview of several major platforms offering MOOCs

Platform Free content Paid certifications  For profit
Coursera Partial Yes Yes
edX Everything Yes No
Udacity Everything Yes Yes
Udemy Partial Yes Yes
P2PU Yes No No

MOOCs, and open online courses in general, have the goal of making high quality education available to everyone, everywhere. MOOC participants indeed come from all over the world, although participants from Western countries are still overrepresented (Nesterko et al., 2013). Nevertheless, there are numerous inspiring stories from students all over the world, for whom taking one or more MOOCs has had dramatic effects on their lives. For example, Battushig Myanganbayar, a 15 year old boy from Mongolia, took the Circuits and Electronics MOOC, a sophomore-level course from MIT. He was one of the 340 students out of 150.000 who obtained a perfect score, which led to his admittance to MIT (New York Times, 2013).

Stories like these make it much clearer that MOOCs are not to replace contemporary forms of education, but are an amazing addition to it. Why? Because books, radios, and the computer also did not replace education, but enhanced it. In some cases, such as in the story of Battushig, MOOCs provide a variety and quality of education which would otherwise not be accessible at all, due to lacking higher educational institutes. Open online courses provide a new source of high quality education, which is not just accessible to a few students in a lecture hall but has the potential to reach almost everyone who is interested. Will MOOCs replace higher education institutes? Maybe, or maybe not; I think this question mis  ses the point of MOOCs.

In the remainder of this article I will focus on MOOCs from my perspective as a researcher. From this perspective, open online education is in some ways a new approach to education and should thus be investigated on its own. On the other hand, key learning mechanisms (e.g., information processing, knowledge integration, long-term memory consolidation) of human learners are independent of societal changes such as the use of new technologies (e.g., Merrill, Drake, Lacy, & Pratt, 1996). The science of educational instruction has a firm knowledge base and could be used to further our understanding of these generic learning mechanisms, which are inherent to humans.

What are MOOCs anyway?

The typical MOOC is a series of educational videos, often interconnected by other study materials such as texts, and regularly followed-up by quizzes. Usually these MOOCs are divided into approximately 5 to 8 weeks of content. In Figure 1 you see an example of Week 1 from the course ‘Improving your statistical inferences’ by Daniel Lakens.

Figure 1. Example content of a single week in a MOOC

What do students do in a MOOC? To be honest, most do next to nothing. That is, most students who register for a course do not even access it or do so very briefly. However, the thousands of students per course who are active describe a wide variety of learning paths and behaviors. See Figure 2 for an example of a single study in a single course. It shows how this particular student engages very regularly with the course, but the duration and intensity of each session differs substantially. Lectures (shown in green) are often watched in long sessions, while (s)he makes much more often, but shorter, visits to the forum. In the bottom you see a surprising spurt of quiz activity, which might reflect the student’s desire to see what type of questions will be asked later in the course.

Figure 2. Activities of a single user in a single course. Source: Jasper Ginn

Of all the activities which are common for most MOOCs, educational videos are most central to the student learning experience (Guo, Kim, & Rubin, 2014; Liu et al., 2013). The central position of educational videos is reflected by students’ behavior and their intentions: most students plan to watch all videos in a MOOC, and also spend the majority of their time watching these videos (Campbell, Gibbs, Najafi, & Severinski, 2014; Seaton, Bergner, Chuang, Mitros, & Pritchard, 2014). The focus on videos does come with various consequences. Video production is typically expensive and time intensive labor. In addition, they are not as easily translated to other languages, which is contradictory to the aim of making the content accessible to students all around the world. There are many non-native English speakers in MOOCs, while these are almost exclusively presented in English. This raises the question to what extent non-native English speakers can benefit from these courses, compared to native speakers. Open online education may be available to most, the content might not be as accessible for many, for example due to language barriers. It is important to design online education in such a way that it minimizes detrimental effects of potential language barriers to increase its accessibility for a wider audience. While subtitles are often provided, it is unclear whether they promote learning (Markham, Peter, & McCarthy, 2001), hamper learning (Kalyuga, Chandler, & Sweller, 1999), or have no impact at all (van der Zee et al., 2017).

How do we learn about (online) learning?

Research on online learning, and MOOCs in particular, is a highly interdisciplinary field where many perspectives are combined. While research on higher education is typically done primarily by educational scientists, MOOCs are also studied in fields such as computer science and machine learning. This has resulted in an interesting divide in the literature, as researchers from some disciplines are used to publish only in journals (e.g., Computers & Education, Distance Education, International Journal of Computer-Supported Collaborative Learning) while other disciplines focus primarily on conference proceedings (e.g., Learning @ Scale, eMOOCs, Learning Analytics and Knowledge).

Learning at scale opens up a new frontier to learn about learning. MOOCs and similar large-scale online learning platforms give an unprecedented view of learners’ behavior, and potentially, learning. In online learning research, the setting in which the data is measured is not just an approximation of, but equals the world under examination, or at least comes very close to it. That is, measures of students’ behavior do not need to rely on self-reports, but can often be directly derived from log data (e.g., automated measurements of all activities inside an online environment). While this type of research has its advantages, it also comes with various risks and challenges, which I will attempt to outline.

Big data, meaningless data

Research on MOOCs is blessed and cursed with a wide variety of data. For example, it is possible to track every user’s mouse clicks. We also have detailed information about page views, forum data (posts, likes, reads), clickstream data, and interactions with videos. This is all very interesting, except that nobody really knows what it means if a student has clicked two times instead of three times. Nevertheless, the amount of mouse clicks is a strong predictor of ‘study success’, because students who click more, more often finish the course and do so with higher grade. As can be seen Figure 3, the correlations between various mouse clicks metrics and grade ranges from 0.50 to 0.65. However, it would be absurd to recommend students to click more and believe that this will increase their grades. Mouse clicks, in isolation, are inherently ambiguous, if not outright meaningless.

Figure 3. Pairwise Spearman rank correlations between various metrics for all clickers (upper triangle, N = 108008) and certificate earners (lower triangle, N = 7157), from DeBoer, Ho, Stump and Breslow (2014)

When there is smoke, but no fire

With mouse clicks, it will be obvious that this is a problem and will be recognized by many. However, the same problem can secretly underlie many other measured variables which are not that easily recognized. For example, how can we interpret the finding that some students watch a video longer than other students? Findings like this are readily interpreted as being meaningful, for example as signifying that these students were more ‘engaged’, while you could just as well argue that they were got distracted, were bored, etc. There is a classical reasoning fallacy which often underlies these arguments. Because it is reasonable to state that increased engagement will lead to longer video dwelling times, observing the latter is (incorrectly!) assumed to signify the former. In other words: if A leads to B, observing B does not allow you to conclude A. As there are many plausible explanations of differences in video dwelling times, observing such differences cannot be directly interpreted without additional data. This is an inherent problem with many types of big data: you have an enormous amount of granular data which often cannot be directly interpreted. For example, Guo et al. (2014) states that shorter videos and certain video production styles are “much more engaging” than their alternatives. While enormous amounts of data was used, it was in essence a correlational study, such that the claims about which video types are better is based on observational data which do not allow causal inference. More students stop watching a longer video than they do when watching shorter videos, which is interpreted as meaning that the shorter videos are more engaging. While this might certainly be true, it is difficult to make these claims when confounding variables have not been accounted for. As an example, shorter and longer videos do not differ just in time but might also differ in complexity, and the complexity of online educational videos is also strongly correlated with video dwelling time (Van der Sluis, Ginn, & Van der Zee, 2016). More importantly, they showed that the relationship between a video’s complexity (insofar that can be measured) and dwelling time appears to be non-linear, as shown in Figure 4. Non-linear relationships between variables which are typically measured observationally should make us very cautious about making confident claims. For example, in Figure 4 a relative dwelling time of 4 can be found both for an information rate of ~0.2 (below average complexity) as ~1.7 (above average complexity). In other words, if all you know is the dwelling time this does not allow you to make any conclusions about the complexity due to the non-linear relationship

Figure 4. The non-linear relationship between dwelling time and information rate (as a measure of complexity per second). Adapted from Van der Sluis, Ginn, & Van der Zee (2016).

Ghost relationships

Big data and education is a powerful, but dangerous combination. No matter the size of your data set, or variety of variables, correlation data remains incredible treacherous to interpret, especially when the data is granular and lacks 1-to-1 mapping to relevant behavior or cognitive constructs. Given that education is inherently about causality (that is, it aims to change learner’s behavior and/or knowledge), research on online learning should employ a wide array of study methodologies as to properly gather the type of evidence which is required to make claims about causality. It does not require gigabytes of event log data to establish there is some relationship between students’ video watching behavior and quiz results. It does require proper experimental designs to establish causal relationships and effectiveness of interventions and course design. For example, Kovacs (2016) found that students watch videos with in-video questions more often, and are less likely to prematurely stop watching these videos. While this provides some evidence on the benefits of in-video questions, it was a correlational study comparing videos with and without in-video questions. There might have been more relevant differences between the videos, other than the presence of in-video questions. For example, it is reasonable to assume that teachers do not randomly select which videos will have in-video questions, but will choose to add questions to more difficult videos. Should this be the case, a correlational study comparing different videos with and without in-video questions might be confounded by other factors such as the complexity of the video content, to the extent that the relationship might be opposite of what will be found in correlational studies. These type of correlational relationships which can be ‘ghost relationships’ which appear real at first sight, but have no bearing on reality.

The way forward

The granularity of the data, and the various ways how they can be interpreted challenges the validity and generalizability of this type of research. With sufficiently large sample sizes, amount of variables, and researchers’ degrees of freedom, you will be guaranteed to find ‘potentially’ interesting relationships in these datasets. A key development in this area (and science in general) is pre-registering research methodology before a study is performed, in other to decrease ‘noise mining’ and increase the overall veracity of the literature. For more on the reasoning behind pre-registration, see also the JEPS Bulletin three part series on the topic, starting with Dablander (2016). The Learning @ Scale conference, which is already at the center of research on online learning, is becoming a key player in this movement, as they explicitly recommend the use of pre-registered protocols for submitted papers for the conference in 2017.

A/B Testing

Experimental designs (often called “A/B tests” in this literature) are increasingly common in the research on online learning, but they too are not without dangers, and need to be carefully crafted (Reich, 2015). Data in open online education are not only different due to their scale, they require reconceptualization. There are new measures, such as the highly granular measurements described above, as well as existing educational variables which require different interpretations (DeBoer, Ho, Stump and Breslow, 2014). For example, in traditional higher education it would be considered dramatic if over 90% of the students do not finish a course, but this normative interpretation of drop-out rates cannot be uncritically applied to the context of open online education. While registration barriers are substantial for higher education, they are practically nonexistent in MOOCs. In effect, there is no filter which pre-selects the highly motivated students, resulting in many students who just want to take a peek and then stop participating. Secondly, in traditional education dropping out is interpreted as a loss for both the student and the institute. Again, this interpretation does not transfer to the context of MOOCs, as the students who drop out after watching only some videos might have successfully completed their personal learning goals.

Rebooting MOOC research

The next generation of MOOC research needs to adopt a wider range of research designs with greater attention to causal factors promoting student learning (Reich, 2015). To advance in understanding it becomes essential to compliment granular (big) data with other sources of information in an attempt to triangulate its meaning. Triangulation can be done in a various way, from multiple proxy measurements of the same latent construct within a single study, to repeated measurements across separate studies. A good example of triangulation in research on online learning is combining granular log data (such as video dwelling time), student output (such as essays), and subjective measures (such as self-reported behavior) in order to triangulate students’ behavior. Secondly, these models themselves require validation through repeated applications across courses and populations. Convergence between these different (types of) measurements strengthens singular interpretations of (granular) data, and is often a necessary exercise. Inherent to triangulation is increasing the variety within and between datasets, such that they become richer in meaning and usable for generalizable statements.

Replications (both direct and conceptual) are fundamental for this effort. I would like to end with this quote from Justin Reich (2015), which reads: “These challenges cannot be addressed solely by individual researchers. Improving MOOC research will require collective action from universities, funding agencies, journal editors, conference organizers, and course developers. At many universities that produce MOOCs, there are more faculty eager to teach courses than there are resources to support course production. Universities should prioritize courses that will be designed from the outset to address fundamental questions about teaching and learning in a field. Journal editors and conference organizers should prioritize publication of work conducted jointly across institutions, examining learning outcomes rather than engagement outcomes, and favoring design research and experimental designs over post hoc analyses. Funding agencies should share these priorities, while supporting initiatives—such as new technologies and policies for data sharing—that have potential to transform open science in education and beyond.”

Further reading

Here are three recommended readings related to this topic:

  1. Reich, J. (2015). Rebooting MOOC research. Science, 347(6217), 34-35.
  2. DeBoer, J., Ho, A. D., Stump, G. S., & Breslow, L. (2014). Changing “course” reconceptualizing educational variables for massive open online courses. Educational Researcher.
  3. Daniel, J., (2012). Making Sense of MOOCs: Musings in a Maze of Myth, Paradox and Possibility. Journal of Interactive Media in Education. 2012(3), p.Art. 18.

References

Butler, A. C., & Roediger III, H. L. (2007). Testing improves long-term retention in a simulated classroom setting. European Journal of Cognitive Psychology19(4-5), 514-527.

Campbell, J., Gibbs, A. L., Najafi, H., & Severinski, C. (2014). A comparison of learner intent and behaviour in live and archived MOOCs. The International Review of Research in Open and Distributed Learning15(5).

Cepeda, N. J., Coburn, N., Rohrer, D., Wixted, J. T., Mozer, M. C., & Pashler, H. (2009). Optimizing distributed practice: Theoretical analysis and practical implications. Experimental psychology56(4), 236-246.

Daniel, J. (2012). Making sense of MOOCs: Musings in a maze of myth, paradox and possibility. Journal of interactive Media in education2012(3).

DeBoer, J., Ho, A. D., Stump, G. S., & Breslow, L. (2014). Changing “course” reconceptualizing educational variables for massive open online courses. Educational Researcher.

Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students’ learning with effective learning techniques promising directions from cognitive and educational psychology. Psychological Science in the Public Interest14(1), 4-58.

Guo, P. J., Kim, J., & Rubin, R. (2014, March). How video production affects student engagement: An empirical study of mooc videos. In Proceedings of the first ACM conference on Learning@ scale conference (pp. 41-50). ACM.

Guo, P. J., Kim, J., & Rubin, R. (2014, March). How video production affects student engagement: An empirical study of mooc videos. In Proceedings of the first ACM conference on Learning@ scale conference (pp. 41-50). ACM.

Johnson, C. I., & Mayer, R. E. (2009). A testing effect with multimedia learning. Journal of Educational Psychology101(3), 621.

Kalyuga, S., Chandler, P., & Sweller, J. (1999). Managing split-attention and redundancy in multimedia instruction. Applied cognitive psychology, 13(4), 351-371.

Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for learning. science319(5865), 966-968.

Konstan, J. A., Walker, J. D., Brooks, D. C., Brown, K., & Ekstrand, M. D. (2015). Teaching recommender systems at large scale: evaluation and lessons learned from a hybrid MOOC. ACM Transactions on Computer-Human Interaction (TOCHI)22(2), 10.

Lane, A., & McAndrew, P. (2010). Are open educational resources systematic or systemic change agents for teaching practice?. British Journal of Educational Technology41(6), 952-962.

Liu, Y., Liu, M., Kang, J., Cao, M., Lim, M., Ko, Y., … & Lin, J. (2013, October). Educational Paradigm Shift in the 21st Century E-Learning. In E-Learn: World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education (Vol. 2013, No. 1, pp. 373-379).

Markham, P., Peter, L. A., & McCarthy, T. J. (2001). The effects of native language vs. target language captions on foreign language students’ DVD video comprehension. Foreign language annals, 34(5), 439-445.

Mayer, R. E. (2003). The promise of multimedia learning: using the same instructional design methods across different media. Learning and instruction13(2), 125-139.

Mayer, R. E., Mathias, A., & Wetzell, K. (2002). Fostering understanding of multimedia messages through pre-training: Evidence for a two-stage theory of mental model construction. Journal of Experimental Psychology: Applied8(3), 147.

Merrill, M. D., Drake, L., Lacy, M. J., Pratt, J., & ID2 Research Group. (1996). Reclaiming instructional design. Educational Technology36(5), 5-7.

Nesterko, S. O., Dotsenko, S., Han, Q., Seaton, D., Reich, J., Chuang, I., & Ho, A. D. (2013, December). Evaluating the geographic data in MOOCs. In Neural information processing systems.

Ozcelik, E., Arslan-Ari, I., & Cagiltay, K. (2010). Why does signaling enhance multimedia learning? Evidence from eye movements. Computers in human behavior26(1), 110-117.

Plant, E. A., Ericsson, K. A., Hill, L., & Asberg, K. (2005). Why study time does not predict grade point average across college students: Implications of deliberate practice for academic performance. Contemporary Educational Psychology30(1), 96-116.

Pope, J. (2015). What are MOOCs good for?. Technology Review118(1), 69-71.

Reich, J. (2015). Rebooting MOOC research. Science, 347(6217), 34-35.

Roediger, H. L., & Butler, A. C. (2011). The critical role of retrieval practice in long-term retention. Trends in cognitive sciences15(1), 20-27.

Seaton, D. T., Bergner, Y., Chuang, I., Mitros, P., & Pritchard, D. E. (2014). Who does what in a massive open online course?. Communications of the ACM57(4), 58-65.

Van der Sluis, F., Ginn, J., & Van der Zee, T. (2016, April). Explaining Student Behavior at Scale: The Influence of Video Complexity on Student Dwelling Time. In Proceedings of the Third (2016) ACM Conference on Learning@ Scale (pp. 51-60). ACM.

Van der Zee, T., Admiraal, W., Paas, F., Saab, N., & Giesbers, B. (2017). Effects of Subtitles, Complexity, and Language proficiency on Learning from Online Education Videos. Journal of Media Psychology, in print. Pre-print available at https://osf.io/n6zuf/.

Zemsky, R. (2014). With a MOOC MOOC here and a MOOC MOOC there, here a MOOC, there a MOOC, everywhere a MOOC MOOC. The Journal of General Education63(4), 237-243.

Tim van der Zee

Skeptical scientist. I study how people learn from educational videos in open online courses, and how we can help them learn better. PhD student at Leiden University (the Netherlands), but currently a visiting scholar at MIT and UMass Lowell. You can follow me on Twitter: @Research_Tim and read my blog at www.timvanderzee.com

More Posts - Website

Follow Me:
Twitter

Facebooktwitterrss

Introduction to Data Analysis using R

R Logo

R is a statistical programming language whose popularity is quickly overtaking SPSS and other “traditional” point-and-click software packages (Muenchen, 2015). But why would anyone use a programming language, instead of point-and-click applications, for data analysis? An important reason is that data analysis rarely consists of simply running a statistical test. Instead, many small steps, such as cleaning and visualizing data, are usually repeated many times, and computers are much faster at doing repetitive tasks than humans are. Using a point-and-click interface for these “data cleaning” operations is laborious and unnecessarily slow:

“[T]he process of tidying my data took me around 10 minutes per participant as I would do it all manually through Excel. Even for a moderate sample size, this starts to take up a large chunk of time that could be spent doing other things like writing or having a beer” (Bartlett, 2016).

A programmed analysis would seamlessly apply the tidying steps to every participant in the blink of an eye, and would itself constitute an exact script of what operations were applied to the data, making it easier to repeat the steps later.

Learning to use a programming language for data analysis reduces human labor and saves time that could be better spent doing more important (or fun) things. In this post, I introduce the R programming language, and motivate its use in Psychological science. The introduction is aimed toward students and researchers with no programming experience, but is suitable for anyone with an interest in learning the basics of R.

The R project for statistical computing

“R is a free software environment for statistical computing and graphics.” (R Core Team, 2016)

Great, but what does that mean? R is a programming language that is designed and used mainly in the statistics, data science, and scientific communities. R has “become the de-facto standard for writing statistical software among statisticians and has made substantial inroads in the social and behavioural sciences” (Fox, 2010). This means that if we use R, we’ll be in good company (and that company will likely be even better and numerous in the future, see (Muenchen, 2015)).

To understand what R is, and is not, it may be helpful to begin by contrasting R to its most common alternative, SPSS. Many psychologists are familiar with SPSS, which has a graphical user interface (GUI), allowing the user to look at the two-dimensional data table on screen, and click through various drop-down menus to conduct analyses on the data. In contrast, R is an object oriented programming language. Data is loaded into R as a “variable”, meaning that in order to view it, the user has to print it on the screen. The power of this approach is that the data is an object in a programming environment, and only your imagination limits what functions you can apply to the data. R also has no GUI to navigate with the mouse; instead, users interact with the data by typing commands.

SPSS is expensive to use: Universities have to pay real money to make it available to students and researchers. R and its supporting applications, on the other hand, are completely free—meaning that both users and developers have easier access to it. R is an open source software package, which means that many cutting edge statistical methods are more quickly implemented in R than SPSS. This is apparent, for example, in the recent uprising of Bayesian methods for data analysis (e.g. Buerkner, 2016).

Further, SPSS’s facilities for cleaning, organizing, formatting, and transforming data are limited—and not very user friendly, although this is a subjective judgment—so users often resort to a spreadsheet program (Microsoft Excel, say) for data manipulation. R has excellent capacities for all steps in the analysis pipeline, including data manipulation, and therefore the analysis never has to spread across multiple applications. You can imagine how the possibility for mistakes, and time needed, is reduced when the data file(s) doesn’t need to be juggled between applications. Switching between applications, and repeatedly clicking through drop-down menus means that, for any small change, the human using the computer must re-do every step of the analysis. With R, you can simply re-use your analysis script and just import different data to it.

workflow

Figure 1. Two workflows for statistical discovery in the empirical sciences. “Analysis” consists of multiple operations, and is spread over multiple applications in Workflow 2, but not in Workflow 1. Therefore, “analysis” is more easily documented and repeated in Workflow 1. This fact alone may work to reduce mistakes in data analysis. The dashed line from R to Word Processor indicates an optional step: You can even write manuscripts with RStudio, going directly from R to Communicating Results.

These considerations lead to contrasting the two different workflows in Figure 1. Workflow 1 uses a programming language, such as R. It is difficult to learn, but beginners generally get started with real analysis in an hour or so. The payoff for the initial difficulty is great: The workflow is reproducible (users can save scripts and show their friends exactly what they did to create those beautiful violin plots); the workflow is flexible (want to do everything just the way you did it, but instead do the plots for males instead of females? Easy!); and most importantly, repetitive, boring, but important work is delegated to a computer.

The final point requires some reflecting; after all, computer programs all work on computers, so it sounds like a tautology. But what I mean is that repetitive tasks can be wrapped in a simple function (these are usually already available—you don’t have to create your own functions) which then performs the tasks as many times as you would like to. Many tasks in the data cleaning stage, for example, are fairly boring and repetitive (calculating summary statistics, aggregating data, combining spreadsheets or columns across spreadsheets), but less so when one uses a programming language.

Workflow 2, on the other hand, is easy to learn because there are few well-defined and systematic parts to it—everything is improvised on a task-by-task basis and done manually by copy-pasting, pointing-and-clicking and dragging and dropping. “Clean and organize” the data in Excel. “Analyze” in SPSS. In the optimal case where the data is perfectly aligned to the format that SPSS expects, you can get a p-value in less than a minute (excluding SPSS start-up time, which is quickly approaching infinity) by clicking through the drop-down menus. That is truly great, if that is all you want. But that’s rarely all that we want, and data is rarely in SPSS’s required format.

Workflow 2 is not reproducible (that is, it may be very difficult if not impossible to exactly retrace your steps through an analysis), so although you may know roughly that you “did an ANOVA”, you may not remember which cases were included, what data was used, how it was transformed, etc. Workflow 2 is not flexible: You’ve just done a statistical test on data from Experiment 1? Great! Can you now do it for Experiment 2, but log-transform the RTs? Sure, but then you would have to restart from the Excel step, and redo all that pointing and clicking. This leads to Workflow 2 requiring the human to do too much work, and spend time on the analysis that could be better spent “doing other things like writing or having a beer” (Bartlett, 2016).

So, what is R? It is a programming language especially suited for data analysis. It allows you to program (more on this below!) your analyses instead of pointing and clicking through menus. The point here is not that you can’t do analysis with a point-and-click SPSS style software package. You can, and you can do a pretty damn good job with it. The point is that you can work less and be more productive if you’re willing to spend some initial time and effort learning Workflow 1 instead of the common Workflow 2. And that requires getting started with R.

Getting started with R: From 0 to R in 100 seconds

If you haven’t already, go ahead and download R, and start it up on your computer. Like most programming languages, R is best understood through its console—the interface that lets you interact with the language.

Figure 2. The R console.

After opening R on your computer, you should see a similar window on your computer. The console allows us to type input, have R evaluate it, and return output. Just like a fancy calculator. Here, our first input was assigning (R uses the left arrow, <-, for assignment) all the integers from 0 to 100 to a variable called numbers. Computer code can often be read from right to left; the first one here would say “integers 0 through to 100, assign to numbers”. We then calculated the mean of those numbers by using R’s built in function, mean(). Everything interesting in R is done by using functions: There are functions for drawing figures, transforming data, running statistical tests, and much, much more.

Here’s another example, this time we’ll create some heights data for kids and adults (in centimeters) and conduct a two-sample t-test (every line that begins with a “#>” is R’s output):

That’s it, a t-test in R in a hundred seconds! Note, c() stands for “combine”, so kids is now a numeric vector (collection of numbers) with 5 elements. The t-test results are printed in R’s console, and are straightforward to interpret.

Save your analysis scripts

At its most basic, data analysis in R consists of importing data to R, and then running functions to visualize and model the data. R has powerful functions for covering the entire process going from Raw Data to Communicating Results (or Word Processor) in Figure 1. That is, users don’t need to switch between applications at various steps of the analysis workflow. Users simply type in code, let R evaluate it, and receive output. As you can imagine, a full analysis from raw data to a report (or table of summary statistics, or whatever your goal is) may involve lots of small steps—transforming variables in the data, plotting, calculating summaries, modeling and testing—which are often done iteratively. Recognizing that there may be many steps involved, we realize that we better save our work so that we can investigate and redo it later, if needed. Therefore for each analysis, we should create a text file containing all those steps, which could then be run repeatedly with minor tweaks, if required.

To create these text files, or “R scripts”, we need a text editor. All computers have a text editor pre-installed, but programming is often easier if you use an integrated development environment (IDE), which has a text editor and console all in one place (often with additional capacities.) The best IDE for R, by far, is RStudio. Go ahead and download RStudio, and then start it. At this point you can close the other R console on your computer, because RStudio has the console available for you.

Getting started with RStudio

rstudio

Figure 3. The RStudio IDE to R.

Figure 3 shows the main view of RStudio. There are four rectangular panels, each with a different purpose. The bottom left panel is the R console. We can type input in the console (on the empty line that begins with a “>”) and hit return to execute the code and obtain output. But a more efficient approach is to type the code into a script file, using the text editor panel, known as the source panel, in the top left corner. Here, we have a t-test-kids-grownups.R script open, which consists of three lines of code. You can write this script on your own computer by going to File -> New File -> R Script in RStudio, and then typing in the code you see in Figure 3. You can execute each line by hitting Control + Return, on Windows computers, or Command + Return on OS X computers. Scripts like this constitute the exact documentation of what you did in your analysis, and as you can imagine, are pretty important.

The two other panels are for viewing things, not so much for interacting with the data. Top right is the Environment panel, showing the variables that you have saved in R. That is, when you assign something into a variable (kids <- c(100, 98, 89, 111, 101)), that variable (kids) is visible in the Environment panel, along with its type (num for numeric), size (1:5, for 5), and contents (100, 98, 89, 111, 101). Finally, bottom right is the Viewer panel, where we can view plots, browse files on the computer, and do various other things.

With this knowledge in mind, let’s begin with a couple easy things. Don’t worry, we’ll get to actual data soon enough, once we have the absolute basics covered. I’ll show some code and evaluate it in R to show its output too. You can, and should, type in the commands yourself to help you understand what they do (type each line in an R script and execute the line by pressing Cmd + Enter. Save your work every now and then.)

Here’s how to create variables in R (try to figure out what’s saved in each variable):

And here’s how to print those variable’s contents on the screen. (I’ll provide a comment for each line, comments begin with a # and are not evaluated by R. That is, comments are read by humans only.)

Transforming data is easy: R automatically applies operations to vectors of (variables containing multiple) numbers, if needed. Let’s create z-scores of kids heights.

I hope you followed along. You should now have a bunch of variables in your R Environment. If you typed all those lines into an R script, you can now execute them again, or modify them and then re-run the script, line-by-line. You can also execute the whole script at once by clicking “Run”, at the top of the screen. Congratulations, you’ve just programmed your first computer program!

User contributed packages

One of the best things about R is that it has a large user base, and lots of user contributed packages, which make using R easier. Packages are simply bundles of functions, and will enhance your R experience quite a bit. Whatever you want to do, there’s probably an R package for that. Here, we will install and load (make available in the current session) the tidyverse package (Wickham, 2016), which is designed for making tidying data easier.

It’s important that you use the tidyverse package if you want to follow along with this tutorial. All of the tasks covered here are possible without it, but the functions from tidyverse make the tasks easier, and certainly easier to learn.

Using R with data

Let’s import some data to R. We’ll use example data from Chapter 4 of the Intensive Longitudinal Methods book (Bolger & Laurenceau, 2013). The data set is freely available on the book’s website. If you would like to follow along, please donwload the data set, and place it in a folder (unpack the .zip file). Then, use RStudio’s Viewer panel, and its Files tab, to navigate to the directory on your computer that has the data set, and set it as the working directory by clicking “More”, then “Set As Working Directory”.

setwd

Figure 4. Setting the Working Directory

Setting the working directory properly is extremely important, because it’s the only way R knows where to look for files on your computer. If you try to load files that are not in the working directory, you need to use the full path to the file. But if your working directory is properly set, you can just use the filename. The file is called “time.csv”, and we load it into a variable called d using the read_csv() function. (csv stands for comma separated values, a common plain text format for storing data.) You’ll want to type all these functions to an R script, so create a new R script and make sure you are typing the commands in the Source panel, not the Console panel. If you set your working directory correctly, once you save the R script file, it will be saved in the directory right next to the “time.csv” file.

d is now a data frame (sometimes called a “tibble”, because why not), whose rows are observations, and columns the variables associated with those observations.

This data contains simulated daily intimacy reports of 50 individuals, who reported their intimacy every evening, for 16 days. Half of these simulated participants were in a treatment group, and the other half in a control group. To print the first few rows of the data frame to screen, simply type its name:

The first column, id is a variable that specifies the id number who that observation belongs to. int means that the data in this column are integers. time indicates the day of the observation, and the authors coded the first day at 0 (this will make intercepts in regression models easier to interpret.) time01 is just time but recoded so that 1 is at the end of the study. dbl means that the values are floating point numbers. intimacy is the reported intimacy, and treatment indicates whether the person was in the control (0) or treatment (1) group. The first row of this output also tells us that there are 800 rows in total in this data set, and 5 variables (columns). Each row is also numbered in the output (leftmost “column”), but those values are not in the data.

Data types

It’s important to verify that your variables (columns) are imported into R in the appropriate format. For example, you would not like to import time recorded in days as a character vector, nor would you like to import a character vector (country names, for example) as a numeric variable. Almost always, R (more specifically, read_csv()) automatically uses correct formats, which you can verify by looking at the row between the column names and the values.

There are five basic data types: int for integers, num (or dbl) for floating point numbers (1.12345…), chr for characters (also known as “strings”), factor (sometimes abbreviated as fctr) for categorical variables that have character labels (factors can be ordered if required), and logical (abbreviated as logi) for logical variables: TRUE or FALSE. Here’s a little data frame that illustrates the basic variable types in action:

Here we are also introduced a very special value, NA. NA means that there is no value, and we should always pay special attention to data that has NAs, because it may indicate that some important data is missing. This sample data explicitly tells us that we don’t know whether this person likes matlab or not, because the variable is NA. OK, let’s get back to the daily intimacy reports data.

Quick overview of data

We can now use the variables in the data frame d and compute summaries just as we did above with the kids’ and adults’ heights. A useful operation might be to ask for a quick summary of each variable (column) in the data set:

To get a single variable (column) from the data frame, we call it with the $ operator (“gimme”, for asking R to give you variables from within a data frame). To get all the intimacy values, we could just call d$intimacy. But we better not, because that would print out all 800 intimacy values into the console. We can pass those values to functions instead:

If you would like to see the first six values of a variable, you can use the head() function:

head() works on data frames as well, and you can use an optional number argument to specify how many first values you’d like to see returned:

A look at R’s functions

Generally, this is how R functions work, you name the function, and specify arguments to the function inside the parentheses. Some of these arguments may be data or other input (d, above), and some of them change what the argument does and how (2, above). To know what arguments you can give to a function, you can just type the function’s name in the console with a question mark prepended to it:

Importantly, calling the help page reveals that functions’ arguments are named. That is, arguments are of the form X = Y, where X is the name of the argument, and Y is the value you would like to set it to. If you look at the help page of head() (?head), you’ll see that it takes two arguments, x which should be an object (like our data frame d, (if you don’t know what “object” means in this context, don’t worry—nobody does)), and n, which is the number of elements you’d like to see returned. You don’t always have to type in the X = Y part for every argument, because R can match the arguments based on their position (whether they are the first, second, etc. argument in the parentheses). We can confirm this by typing out the full form of the previous call head(d, 2), but this time, naming the arguments:

Now that you know how R’s functions work, you can find out how to do almost anything by typing into a search engine: “How to do almost anything in R”. The internet (and books, of course) is full of helpful tutorials (see Resources section, below) but you will need to know these basics about functions in order to follow those tutorials.

Creating new variables

Creating new variables is also easy. Let’s create a new variable that is the square root of the reported intimacy (because why not), by using the sqrt() function and assigning the values to a new variable (column) within our data frame:

Recall that sqrt(d$intimacy) will take the square root of every 800 values of the vector of intimacy values, and return a vector of 800 squared values. There’s no need to do this individually for each value.

We can also create variables using conditional logic, which is useful for creating verbal labels for numeric variables, for example. Let’s create a verbal label for each of the treatment groups:

We created a new variable, Group in d, that is “Control” if the treatment variable on that row is 0, and “Treatment” otherwise.

Remember our discussion of data types above? d now contains integer, double, and character variables. Make sure you can identify these in the output, above.

Aggregating

Let’s focus on aggregating the data across individuals, and plotting the average time trends of intimacy, for the treatment and control groups.

In R, aggregating is easiest if you think of it as calculating summaries for “groups” in the data (and collapsing the data across other variables). “Groups” doesn’t refer to experimental groups (although it can), but instead any arbitrary groupings of your data based on variables in it, so the groups can be based on multiple things, like time points and individuals, or time points and experimental groups.

Here, our groups are the two treatment groups and 16 time points, and we would like to obtain the mean for each group at each time point by collapsing across individuals

The above code summarized our data frame d by calculating the mean intimacy for the groups specified by group_by(). We did this by first creating a data frame that is d, but is grouped on Group and time, and then summarizing those groups by taking the mean intimacy for each of them. This is what we got:

A mean intimacy value for both groups, at each time point.

Plotting

We can now easily plot these data, for each individual, and each group. Let’s begin by plotting just the treatment and control groups’ mean intimacy ratings:

unnamed-chunk-21-1

Figure 5. Example R plot, created with ggplot(), of two groups’ mean intimacy ratings across time.

For this plot, we used the ggplot() function, which takes as input a data frame (we used d_groups from above), and a set of aesthetic specifications (aes(), we mapped time to the x axis, intimacy to the y axis, and color to the different treatment Groups in the data). We then added a geometric object to display these data (geom_line() for a line.)

To illustrate how to add other geometric objects to display the data, let’s add some points to the graph:

unnamed-chunk-22-1

Figure 6. Two groups’ mean intimacy ratings across time, with points.

We can easily do the same plot for every individual (a panel plot, but let’s drop the points for now):

unnamed-chunk-23-1

Figure 7. Two groups’ mean intimacy ratings across time, plotted separately for each person.

The code is exactly the same, but now we used the non-aggregated raw data d, and added an extra function that wraps each id’s data into their own little subplot (facet_wrap(); remember, if you don’t know what a function does, look at the help page, i.e. ?facet_wrap). ggplot() is an extremely powerful function that allows you to do very complex and informative graphs with systematic, short and neat code. For example, we may add a linear trend (linear regression line) to each person’s panel. This time, let’s only look at the individuals in the experimental group, by using the filter() command (see below):

unnamed-chunk-24-1

Figure 8. Treatment group’s mean intimacy ratings across time, plotted separately for each person, with linear trend lines.

Data manipulation

We already encountered an example of manipulating data, when we aggregated intimacy over some groups (experimental groups and time points). Other common operations are, for example, trimming the data based on some criteria. All operations that drop observations are conceptualized as subsetting, and can be done using the filter() command. Above, we filtered the data such that we plotted the data for the treatment group only. As another example, we can get the first week’s data (time is less than 7, that is, days 0-6), for the control group only, by specifying these logical operations in the filter() function

Try re-running the above line with small changes to the logical operations. Note that the two logical operations are combined with the AND command (&), you can also use OR (|). Try to imagine what replacing AND with OR would do in the above line of code. Then try and see what it does.

A quick detour to details

At this point it is useful to remind that computers do exactly what you ask them to do, nothing less, nothing more. So for instance, pay attention to capital letters, symbols, and parentheses. The following three lines are faulty, try to figure out why:

Why does this data frame have zero rows?

Error? What’s the problem?

Error? What’s the problem?

(Answers: 1. Group is either “Control” or “Treatment”, not “control” or “treatment”. 2. Extra parenthesis at the end. 3. == is not the same as =, the double == is a logical comparison operator, asking if two things are the same, the single = is an assignment operator.)

Advanced data manipulation

Let’s move on. What if we’d like to detect extreme values? For example, let’s ask if there are people in the data who show extreme overall levels of intimacy (what if somebody feels too much intimacy!). How can we do that? Let’s start thinking like programmers and break every problem into the exact steps required to answer the problem:

  1. Calculate the mean intimacy for everybody
  2. Plot the mean intimacy values (because always, always visualize your data)
  3. Remove everybody whose mean intimacy is over 2 standard deviations above the overall mean intimacy (over-intimate people?) (note that this is a terrible exclusion criteria here, and done for illustration purposes only)

As before, we’ll group the data by person, and calculate the mean (which we’ll call int).

We now have everybody’s mean intimacy in a neat and tidy data frame. We could, for example, arrange the data such that we see the extreme values:

Nothing makes as much sense as a histogram:

unnamed-chunk-31-1

Figure 9. Histogram of everybody’s mean intimacy ratings.

It doesn’t look like anyone’s mean intimacy value is “off the charts”. Finally, let’s apply our artificial exclusion criteria: Drop everybody whose mean intimacy is 2 standard deviations above the overall mean:

Then we could proceed to exclude these participants (don’t do this with real data!), by first joining the d_grouped data frame, which has the exclusion information, with the full data frame d

and then removing all rows where exclude is TRUE. We use the filter() command, and take only the rows where exclude is FALSE. So we want our logical operator for filtering rows to be “not-exclude”. “not”, in R language, is !:

I saved the included people in a new data set called d2, because I don’t actually want to remove those people, but just illustrated how to do this. We could also in some situations imagine applying the exclusion criteria to individual observations, instead of individual participants. This would be as easy as (think why):

Selecting variables in data

After these artificial examples of removing extreme values (or people) from data, we have a couple of extra variables in our data frame d that we would like to remove, because it’s good to work with clean data. Removing, and more generally selecting variables (columns) in data frames is most easily done with the select() function. Let’s select() all variables in d except the squared intimacy (sqrt_int), average intimacy (int) and exclusion (exclude) variables (that is, let’s drop those three columns from the data frame):

Using select(), we can keep variables by naming them, or drop them by using -. If no variables are named for keeping, but some are dropped, all unnamed variables are kept, as in this example.

Regression

Let’s do an example linear regression by focusing on one participant’s data. The first step then is to create a subset containing only one person’s data. For instance, we may ask a subset of d that consists of all rows where id is 30, by typing

Linear regression is available using the lm() function, and R’s own formula syntax:

Generally, for regression in R, you’d specify the formula as outcome ~ predictors. If you have multiple predictors, you combine them with addition (“+”): outcome ~ IV1 + IV2. Interactions are specified with multiplication (“*“): outcome ~ IV1 * IV2 (which automatically includes the main effects of IV1 and IV2; to get an interaction only, use”:" outcome ~ IV1:IV2). We also specified that for the regression, we’d like to use data in the d_sub data frame, which contains only person 30’s data.

Summary of a fitted model is easily obtained:

Visualizing the model fit is also easy. We’ll use the same code as for the figures above, but also add points (geom_point()), and a linear regression line with a 95% “Confidence” Ribbon (geom_smooth(method="lm")).

unnamed-chunk-40-1

Figure 10. Person 30’s intimacy ratings over time (points and black line), with a linear regression model (blue line and gray Confidence Ribbon).

Pretty cool, right? And there you have it. We’ve used R to do a sample of common data cleaning and visualization operations, and fitted a couple of regression models. Of course, we’ve only scratched the surface, and below I provide a short list of resources for learning more about R.

Conclusion

Programming your statistical analyses leads to a flexible, reproducible and time-saving workflow, in comparison to more traditional point-and-click focused applications. R is probably the best programming language around for applied statistics, because it has a large user base and many user-contributed packages that make your life easier. While it may take an hour or so to get acquainted with R, after initial difficulty it is easy to use, and provides a fast and reliable platform for data wrangling, visualization, modeling, and statistical testing.

Finally, learning to code is not about having a superhuman memory for function names, but instead it is about developing a programmer’s mindset: Think your problem through and decompose it to small chunks, then ask a computer to do those chunks for you. Do that a couple of times and you will magically have memorized, as a byproduct, the names of a few common functions. You learn to code not by reading and memorizing a tutorial, but by writing it out, examining the output, changing the input and figuring out what changed in the output. Even better, you’ll learn the most once you use code to examine your own data, data that you know and care about. Hopefully, you’ll be now able to begin doing just that.

Resources

The web is full of fantastic R resources, so here’s a sample of some materials I think would useful to beginning R users.

Introduction to R

  • Data Camp’s Introduction to R is a free online course on R.

  • Code School’s R Course is an interactive web tutorial for R beginners.

  • YaRrr! The Pirate’s Guide to R is a free e-book, with accompanying YouTube lectures and witty writing (“it turns out that pirates were programming in R well before the earliest known advent of computers.”) YaRrr! is also an R package that helps you get started with some pretty cool R stuff (Phillips, 2016). Recommended!

  • The Personality Project’s Guide to R (Revelle, 2016b) is a great collection of introductory (and more advanced) R materials especially for Psychologists. The site’s author also maintains a popular and very useful R package called psych (Revelle, 2016a). Check it out!

  • Google Developers’ YouTube Crash Course to R is a collection of short videos. The first 11 videos are an excellent introduction to working with RStudio and R’s data types, and programming in general.

  • Quick-R is a helpful collection of R materials.

Data wrangling

These websites explain how to “wrangle” data with R.

  • R for Data Science (Wickham & Grolemund, 2016) is the definitive source on using R with real data for efficient data analysis. It starts off easy (and is suitable for beginners) but covers nearly everything in a data-analysis workflow apart from modeling.

  • Introduction to dplyr explains how to use the dplyr package (Wickham & Francois, 2016) to wrangle data.

  • Data Processing Workflow is a good resource on how to use common packages for data manipulation (Wickham, 2016), but the example data may not be especially helpful.

Visualizing data

Statistical modeling and testing

R provides many excellent packages for modeling data, my absolute favorite is the brms package (Buerkner, 2016) for bayesian regression modeling.

References

Bartlett, J. (2016, November 22). Tidying and analysing response time data using r. Statistics and substance use. Retrieved November 23, 2016, from https://statsandsubstances.wordpress.com/2016/11/22/tidying-and-analysing-response-time-data-using-r/

Bolger, N., & Laurenceau, J.-P. (2013). Intensive longitudinal methods: An introduction to diary and experience sampling research. Guilford Press. Retrieved from http://www.intensivelongitudinal.com/

Buerkner, P.-C. (2016). Brms: Bayesian regression models using stan. Retrieved from http://CRAN.R-project.org/package=brms

Fox, J. (2010). Introduction to statistical computing in r. Retrieved November 23, 2016, from http://socserv.socsci.mcmaster.ca/jfox/Courses/R-course/index.html

Muenchen, R., A. (2015). The popularity of data analysis software. R4stats.com. Retrieved November 22, 2016, from http://r4stats.com/articles/popularity/

Phillips, N. (2016). Yarrr: A companion to the e-book “YaRrr!: The pirate’s guide to r”. Retrieved from https://CRAN.R-project.org/package=yarrr

R Core Team. (2016). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/

Revelle, W. (2016a). Psych: Procedures for psychological, psychometric, and personality research. Evanston, Illinois: Northwestern University. Retrieved from https://CRAN.R-project.org/package=psych

Revelle, W. (2016b). The personality project’s guide to r. Retrieved November 22, 2016, from http://personality-project.org/r/

Wickham, H. (2016). Tidyverse: Easily install and load ’tidyverse’ packages. Retrieved from https://CRAN.R-project.org/package=tidyverse

Wickham, H., & Francois, R. (2016). Dplyr: A grammar of data manipulation. Retrieved from https://CRAN.R-project.org/package=dplyr

Wickham, H., & Grolemund, G. (2016). R for data science. Retrieved from http://r4ds.had.co.nz/

Matti Vuorre

Matti Vuorre

Matti Vuorre is a PhD Student at Columbia University in New York City. He studies cognitive psychology and neuroscience, and focuses on understanding the mechanisms underlying humans' metacognitive capacities.

More Posts - Website

Facebooktwitterrss

Not solely about that Bayes: Interview with Prof. Eric-Jan Wagenmakers

Last summer saw the publication of the most important work in psychology in decades: the Reproducibility Project (Open Science Collaboration, 2015; see here and here for context). It stirred up the community, resulting in many constructive discussions but also in verbally violent disagreement. What unites all parties, however, is the call for more transparency and openness in research.

Eric-Jan “EJ” Wagenmakers has argued for pre-registration of research (Wagenmakers et al., 2012; see also here) and direct replications (e.g., Boekel et al., 2015; Wagenmakers et al., 2015), for a clearer demarcation of exploratory and confirmatory research (de Groot, 1954/2013), and for a change in the way we analyze our data (Wagenmakers et al., 2011; Wagenmakers et al., in press).

Concerning the latter point, EJ is a staunch advocate of Bayesian statistics. With his many collaborators, he writes the clearest and wittiest exposures to the topic (e.g., Wagenmakers et al., 2016; Wagenmakers et al., 2010). Crucially, he is also a key player in opening Bayesian inference up to social and behavioral scientists more generally; in fact, the software JASP is EJ’s brainchild (see also our previous interview).

EJ

In sum, psychology is changing rapidly, both in how researchers communicate and do science, but increasingly also in how they analyze their data. This makes it nearly impossible for university curricula to keep up; courses in psychology are often years, if not decades, behind. Statistics classes in particular are usually boringly cookbook oriented and often fraught with misconceptions (Wagenmakers, 2014). At the University of Amsterdam, Wagenmakers succeeds in doing differently. He has previously taught a class called “Good Science, Bad Science”, discussing novel developments in methodology as well as supervising students in preparing and conducting direct replications of recent research findings (cf. Frank & Saxe, 2012).

Now, at the end of the day, testing undirected hypotheses using p values or Bayes factors only gets you so far – even if you preregister the heck out of it. To move the field forward, we need formal models that instantiate theories and make precise quantitative predictions. Together with Michael Lee, Eric-Jan Wagenmakers has written an amazing practical cognitive modeling book, harnessing the power of computational Bayesian methods to estimate arbitrarily complex models (for an overview, see Lee, submitted). More recently, he has co-edited a book on model-based cognitive neuroscience on how formal models can help bridge the gap between brain measurements and cognitive processes (Forstmann & Wagenmakers, 2015).

Long-term readers of the JEPS bulletin will note that topics ranging from openness of research, pre-registration and replication, and research methodology and Bayesian statistics are recurring themes. It has thus been only a matter of time for us to interview Eric-Jan Wagenmakers and ask him questions concerning all areas above. In addition, we ask: how does he stay so immensely productive? What tips does he have for students interested in an academic career; and what can instructors learn from “Good Science, Bad Science”? Enjoy the ride!


Bobby Fischer, the famous chess player, once said that he does not believe in psychology. You actually switched from playing chess to pursuing a career in psychology; tell us how this came about. Was it a good move?

It was an excellent move, but I have to be painfully honest: I simply did not have the talent and the predisposition to make a living out of playing chess. Several of my close friends did have that talent and went on to become international grandmasters; they play chess professionally. But I was actually lucky. For players outside of the world top-50, professional chess is a career trap. The pay is poor, the work insanely competitive, and the life is lonely. And society has little appreciation for professional chess players. In terms of creativity, hard work, and intellectual effort, an international chess grandmaster easily outdoes the average tenured professor. People who do not play chess themselves do not realize this.

Your list of publications gets updated so frequently, it should have its own RSS feed! How do you grow and cultivate such an impressive network of collaborators? Do you have specific tips for early career researchers?

At the start of my career I did not publish much. For instance, when I finished my four years of grad studies I think I had two papers. My current publication rate is higher, and part of that is due to an increase in expertise. It is just easier to write papers when you know (or think you know) what you’re talking about. But the current productivity is mainly due to the quality of my collaborators. First, at the psychology department of the University of Amsterdam we have a fantastic research master program. Many of my graduate students come from this program, having been tried and tested in the lab as RAs. When you have, say, four excellent graduate students, and each publishes one article a year, that obviously helps productivity. Second, the field of Mathematical Psychology has several exceptional researchers that I have somehow managed to collaborate with. In the early stages I was a graduate student with Jeroen Raaijmakers, and this made it easy to start work with Rich Shiffrin and Roger Ratcliff. So I was privileged and I took the opportunities that were given. But I also work hard, of course.

There is a lot of advice that I could give to early career researchers but I will have to keep it short. First, in order to excel in whatever area of life, commitment is key. What this usually means is that you have to enjoy what you are doing. Your drive and your enthusiasm will act as a magnet for collaborators. Second, you have to take initiative. So read broadly, follow the latest articles (I remain up to date through Twitter and Google Scholar), get involved with scientific organizations, coordinate a colloquium series, set up a reading group, offer your advisor to review papers with him/her, attend summer schools, etc. For example, when I started my career I had seen a new book on memory and asked the editor of Acta Psychologica whether I could review it for them. Another example is Erik-Jan van Kesteren, an undergraduate student from a different university who had attended one of my talks about JASP. He later approached me and asked whether he could help out with JASP. He is now a valuable member of the JASP team. Third, it helps if you are methodologically strong. When you are methodologically strong –in statistics, mathematics, or programming– you have something concrete to offer in a collaboration.

Considering all projects you are involved in, JASP is probably the one that will have most impact on psychology, or the social and behavioral sciences in general. How did it all start?

In 2005 I had a conversation with Mark Steyvers. I had just shown him a first draft of a paper that summarized the statistical drawbacks of p-values. Mark told me “it is not enough to critique p-values. You should also offer a concrete alternative”. I agreed and added a section about BIC (the Bayesian Information Criterion). However, the BIC is only a rough approximation to the Bayesian hypothesis test. Later I became convinced that social scientists will only use Bayesian tests when these are readily available in a user-friendly software package. About 5 years ago I submitted an ERC grant proposal “Bayes or Bust! Sensible hypothesis tests for social scientists” that contained the development of JASP (or “Bayesian SPSS” as I called it in the proposal) as a core activity. I received the grant and then we were on our way.

I should acknowledge that much of the Bayesian computations in JASP depend on the R BayesFactor package developed by Richard Morey and Jeff Rouder. I should also emphasize the contribution by JASPs first software engineer, Jonathon Love, who suggested that JASP ought to feature classical statistics as well. In the end we agreed that by including classical statistics, JASP could act as a Trojan horse and boost the adoption of Bayesian procedures. So the project started as “Bayesian SPSS”, but the scope was quickly broadened to include p-values.

JASP is already game-changing software, but it is under continuous development and improvement. More concretely, what do you plan to add in the near future? What do you hope to achieve in the long-term?

In terms of the software, we will shortly include several standard procedures that are still missing, such as logistic regression and chi-square tests. We also want to upgrade the popular Bayesian procedures we have already implemented, and we are going to create new modules. Before too long we hope to offer a variable views menu and a data-editing facility. When all this is done it would be great if we could make it easier for other researchers to add their own modules to JASP.

One of my tasks in the next years is to write a JASP manual and JASP books. In the long run, the goal is to have JASP be financially independent of government grants and university support. I am grateful for the support that the psychology department at the University of Amsterdam offers now, and for the support they will continue to offer in the future. However, the aim of JASP is to conquer the world, and this requires that we continue to develop the program “at break-neck speed”. We will soon be exploring alternative sources of funding. JASP will remain free and open-source, of course.

You are a leading advocate of Bayesian statistics. What do researchers gain by changing the way they analyze their data?

They gain intellectual hygiene, and a coherent answer to questions that makes scientific sense. A more elaborate answer is outlined in a paper that is currently submitted to a special issue for Psychonomic Bulletin & Review: https://osf.io/m6bi8/ (Part I).

The Reproducibility Project used different metrics to quantify the success of a replication – none of them really satisfactory. How can a Bayesian perspective help illuminate the “crisis of replication”?

As a theory of knowledge updating, Bayesian statistics is ideally suited to address questions of replication. However, the question “did the effect replicate?” is underspecified. Are the effect sizes comparable? Does the replication provide independent support for the presence of the effect? Does the replication provide support for the position of the proponents versus the skeptics? All these questions are slightly different, but each receives the appropriate answer within the Bayesian framework. Together with Josine Verhagen, I have explored a method –the replication Bayes factor– in which the prior distribution for the replication test is the posterior distribution obtained from the original experiment (e.g., Verhagen & Wagenmakers, 2014). We have applied this intuitive procedure to a series of recent experiments, including the multi-lab Registered Replication Report of Fritz Strack’s Facial Feedback hypothesis. In Strack’s original experiment, participants who held a pen with their teeth (causing a smile) judged cartoons to be funnier than participants who held a pen with their lips (causing a pout). I am not allowed to tell you the result of this massive replication effort, but the paper will be out soon.

You have recently co-edited a book on model-based cognitive neuroscience. What is the main idea here, and what developments in this area are most exciting to you?

The main idea is that much of experimental psychology, mathematical psychology, and the neurosciences pursue a common goal: to learn more about human cognition. So ultimately the interest is in latent constructs such as intelligence, confidence, memory strength, inhibition, and attention. The models that have been developed in mathematical psychology are able to link these latent constructs to specific model parameters. These parameters may in turn be estimated by behavioral data, by neural data, or by both data sets jointly. Brandon Turner is one of the early career mathematical psychologists who has made great progress in this area. So the mathematical models are a vehicle to achieve an integration of data from different sources. Moreover, insights from neuroscience can provide important constraints that help inform mathematical modeling. The relation is therefore mutually beneficial. This is summarized in the following paper: http://www.ejwagenmakers.com/2011/ForstmannEtAl2011TICS.pdf

One thing that distinguishes science from sophistry is replication; yet it is not standard practice. In “Good Science, Bad Science”, you had students prepare a registered replication plan. What was your experience teaching this class? What did you learn from the students?

This was a great class to teach. The students were highly motivated and oftentimes it felt more like lab-meeting than like a class. The idea was to develop four Registered Report submissions. Some time has passed, but the students and I still intend to submit the proposals for publication.

The most important lesson this class has taught me is that our research master students want to learn relevant skills and conduct real research. In the next semester I will teach a related course, “Good Research Practices”, and I hope to attain the same high levels of student involvement. For the new course, I plan to have students read a classic methods paper that identifies a fallacy; next the students will conduct a literature search to assess the current prevalence of the fallacy. I have done several similar projects, but never with master students (e.g., http://www.ejwagenmakers.com/2011/NieuwenhuisEtAl2011.pdf and http://link.springer.com/article/10.3758/s13423-015-0913-5).

What tips and tricks can you share with instructors planning to teach a similar class?

The first tip is to set your aims high. For a research master class, the goal should be publication. Of course this may not always be realized, but it should be the goal. It helps if you can involve colleagues or graduate students. If you set your aims high, the students know that you take them seriously, and that their work matters. The second tip is to arrange the teaching so that the students do most of the work. The students need to develop a sense of ownership about their projects, and they need to learn. This will not happen if you treat the students as passive receptacles. I am reminded of a course that I took as an undergraduate. In this course I had to read chapters, deliver presentations, and prepare questions. It was one of the most enjoyable and inspiring courses I had ever taken, and it took me decades to realize that the professor who taught the course actually did not have to do much at all.

Many scholarly discussions these days take place on social media and blogs. You’ve joined twitter yourself over a year ago. How do you navigate the social media jungle, and what resources can you recommend to our readers?

I am completely addicted to Twitter, but I also feel it makes me a better scientist. When you are new to Twitter, I recommend that you start by following a few people that have interesting things to say. Coming from a Bayesian perspective, I recommend Alexander Etz (@AlxEtz) and Richard Morey (@richarddmorey). And of course it is essential to follow JASP (@JASPStats). As is the case for all social media, the most valuable resource you have is the “mute” option. Prevent yourself from being swamped by holiday pictures and exercise it ruthlessly.

Fabian Dablander

Fabian Dablander is currently finishing his thesis in Cognitive Science at the University of Tübingen and Daimler Research & Development on validating driving simulations. He is interested in innovative ways of data collection, Bayesian statistics, open science, and effective altruism. You can find him on Twitter @fdabl.

More Posts - Website

Facebooktwitterrss

Publishing a Registered Report as a Postgraduate Researcher

Registered Reports (RRs) are a new publishing format pioneered by the journal Cortex (Chambers 2013). This publication format emphasises the process of rigorous research, rather than the results, in an attempt to avoid questionable research practices such as p-hacking and HARK-ing, which ultimately reduce the reproducibility of research and contribute to publication bias in cognitive science (Chambers et al. 2014). A recent JEPS post by Dablander (2016) and JEPS’ own editorial for adopting RRs (King et al. 2016) have given a detailed explanation of the RR process. However, you may have thought that publishing a RR is reserved for only senior scientists, and is not a viable option for a postgraduate student. In fact, 5 out of 6 of the first RRs published by Cortex have had post-graduate students as authors, and publishing by RR offers postgraduates and early career researchers many unique benefits.

In the following article you will hear about the experience of Dr. Hannah Hobson, who published a RR in the journal Cortex as a part of her PhD project. I spoke to Hannah about the planning that was involved, the useful reviewer comments she received, and asked her what tips she has for postgraduates interested in publishing a RR. Furthermore, there are some comments from Professor Chris Chambers who is a section editor for Cortex on how postgraduates can benefit from using this publishing format.

Interview with Dr. Hannah Hobson

Hannah completed her PhD project on children’s behavioural imitation skills, and potential neurophysiological measures of the brain systems underlying imitation. Her PhD was based at the University of Oxford, under the supervision of Professor Dorothy Bishop. During her studies, Hannah became interested in mu suppression, an EEG measure purported to reflect the activity of the human mirror neuron system. However, she was concerned that much of research on mu suppression suffered from methodological problems, despite this measure being widely used in social cognitive neuroscience. Hannah and Dorothy thought it would be appropriate to publish a RR to focus on some of these issues. This study was published in the journal Cortex, and investigated whether mu suppression is a good measure of the human mirror neuron system (Hobson and Bishop 2016). I spoke to Hannah about her project and what her experience of publishing a RR was like during her PhD.

As you can hear from Hannah’s experience, publishing a RR was beneficial in ways that would not be possible with standard publishing formats. However, they are not suitable for every study. Drawing from Hannah’s experience and Chris Chambers’ role in promoting RRs, the main strengths and concerns for postgraduate students publishing a RR are summarised below.

Strengths

Reproducible findings

It has been highlighted that the majority of psychological studies suffer from low power. As well as limiting the chances of finding an effect, low-powered studies are more likely to lack reproducibility as they overemphasise the effect size (Button et al. 2013). As a part of the stage one submission, a formal power analysis needs to be performed to identify the number of participants required for a high powered study (>90%). Therefore, PhD studies published as RRs will have greater power and reproducibility in comparison to the average unregistered study (Chambers et al. 2014).

More certainty over publications

The majority of published PhD studies begin to emerge during the final year or during your first post-doctoral position. As the academic job markets becomes ever more competitive, publications are essential. As Professor Chambers notes, RRs “enable PhD students to list provisionally accepted papers on their CVs by the time they submit their PhDs”. Employers will see greater certainty in a RR with stage one approval than the ‘in preparation’ listed next to innumerable papers following the standard publishing format.

Lower rejection rate at stage two submission

Although reaching stage one approval is more difficult due to the strict methodological rigour required, there is greater certainty in the eventual outcome of the paper once you have in-principal acceptance. In Cortex, approximately 90% of unregistered reports are rejected upon submission, but only 10% of RRs which reach stage one review have been rejected, with none being rejected so far with in-principal acceptance.

“This means you are far more likely to get your paper accepted at the first journal you submit to, reducing the tedious and time-wasting exercise of submitting down a chain of journals after your work is finished and you may already be competing on the job market”. – Professor Chris Chambers

As Dorothy Bishop explains in her blog, once you have in-principle acceptance you are in control of the timing of the publication (Bishop 2016). This means that you will have a publication in print during your PhD, as opposed to starting to submit papers towards the end which may only be ‘in preparation’ by the time of your viva voce.

Constructive reviewer comments

As the rationale and methodology is peer-reviewed before the data-collection process, reviewers are able to make suggestions to improve the design of your study. In Hannah’s experience, a reviewer pointed out an issue with her control stimuli. If she had conducted the study following the standard format, reviewers would only be able to point this out retrospectively when there is no option to change it. This experience will also be invaluable during your viva voce. As you defend your work in front of the examiners, you know your study has already gone through several rounds of review, so you can be confident in how robust it is.

Things to consider

Time restraints

Recruiting and testing participants is a lengthy process, and you often encounter a series of setbacks. If you are already in the middle of your PhD, then you may not have time to go through stage one submission before collecting your data. In Hannah’s case, publishing a RR was identified early in the project which provided a sufficient amount of time to complete it during her PhD. If you are interested in RRs, it is advisable to start the submission process as early into your PhD as possible. You may even want to start the discussion during the interview process.

Ethics merry-go-round

During stage one submission, you need to provide evidence that you already have ethical approval. If the reviewers want you to make changes to the methodology, this may necessitate amending your ethics application. In busy periods, this process of going back and forth between the reviewers and your ethics committee can become time-consuming. As time constraints is the pertinent concern for postgraduates publishing a RR, this is an additional hurdle that must be negotiated. Whilst there is no easy solution to this problem, aiming to publish a RR must be identified early in your project to ensure you will have enough time, and have a back-up plan prepared for if things do not work out.

RRs are not available in every journal

Although there has been a surge in journals offering RRs, they are not available in every one. Your research might be highly specialised and the key journal in your area may not offer the option of a RR. If your research does not fit into the scope of a journal that offers RRs, you may not have the option to publish your study as a RR. Whist there is no simple solution for this, there is a regular list of journals offering RRs on the Open Science Framework (OSF).

Supervisor conflict

Although there are a number of prominent researchers behind the initiative (Guardian Open Letter 2013), there is not universal agreement with some researchers voicing concerns (Scott 2013, although see Chambers et al. 2014 for a rebuttal to many common concerns). There have been some vocal critics of RRs, and one of these critics might end up being your supervisor. If you want to conduct a RR as a part of your PhD and your supervisor is against it, there may be some conflict. Again, it is best to identify early on in your PhD if you want to publish a RR, and make sure both you and your supervisor are on the same page.

Conclusion

Publishing a RR as a postgraduate researcher is a feasible option that provides several benefits, both to the individual student and to wider scientific progress. Research published as a RR is more likely to produce reproducible findings, due to the necessary high level of power, reviewers’ critique before data collection, and guards against questionable research practices such as p-hacking or HARK-ing. Providing the work is carried out as agreed, a study that has achieved stage one approval is likely to be published, allowing students the opportunity to publish their hard work, even if the findings are negative. Moreover, going through several rounds of peer-review on the proposed methodology provides an additional layer of rigour (good for science), that aids your defence in your viva voce (good for you). Of course, it is not all plain sailing and there are a several considerations students will need to make before embarking on an RR. Nonetheless, despite these concerns, this publishing format is a step in the right direction for ensuring that robust research is being conducted right down to the level of postgraduate students.

If you like the idea but do not think formal pre-registration with a journal is suitable for your project, perhaps consider using the OSF. The OSF is a site where researchers can timestamp their hypotheses and planned analyses, allowing them to develop hypothesis-driven research habits. In one research group, it is necessary for all studies ranging from undergraduate projects to grant-funded projects to be registered on third-party websites such as the OSF (Munafò 2015). Some researchers such as Chris Chambers have even made it a requirement for applicants wanting to join their group to demonstrate a prior commitment to open science practices (Chambers 2016). Starting to pre-register your studies and publish RRs as a postgraduate student demonstrates this commitment, and will prove to be crucial as open science practices become an essential criterion in recruitment.

“To junior researchers I would say that pre-registration — especially as a Registered Report — is an ideal option for publishing high-quality, hypothesis-driven research that reflects an investment both in good science and your future career” – Professor Chris Chambers 

Pre-registration and RRs are both initiatives to improve the rigour and transparency of psychological science (Munafò et al. 2014). These initiatives are available to us as research students, and it is not just the responsibility of senior academics to fight against questionable research practises. We can join in too.

Acknowledgements

Thank you to Dr. Hannah Hobson who was happy to talk about her experience as a PhD student and for her expertise in recording the interview. Hannah also helped to write and revise the post. I would also like to thank Professor Chris Chambers for taking the time to provide some comments for the post.

James Bartlett

James Bartlett

I'm James Bartlett, a PhD student at Coventry University, UK. The aim of my project is to create a profile of cognitive mechanisms associated with substance use in light and heavy smokers. I keep myself occupied outside of academia by playing hockey, or watching ice hockey. You can also find me on Twitter (@JamesEBartlett).

More Posts - Website

Facebooktwitterrss

Do Smokers Consist of a Single Group?

smoker-1238861-1599x2132

When you think of a smoker, it is likely that you are imagining someone who goes through a pack of cigarettes per day and can often be found running to the nearest store to maintain their supply. Perhaps you amuse yourself watching your friend conspicuously leaving work to stand outside and huddle around their cigarette in the rain. Your assumption would often be correct as the majority of smokers are dependent on nicotine and smoke throughout the day. These daily smokers account for approximately 89% of current smokers in the UK (Herbec, Brown and West 2014), and between 67%-75% of smokers in the USA (Coggins, Murrelle and Carchman 2009). However, what about this missing proportion of smokers?

This consists of non-daily smokers, a sub-group of smokers who only consume a few cigarettes per day and can often engage in voluntary days of abstinence without experiencing the effects of withdrawal (Shiffman, Ferguson and Dunbar 2012b). What makes these smokers interesting is that although they do not appear to be dependent on nicotine, 82% of them relapse within 90 days of attempting to quit (Tindle and Shiffman 2011). Compared to 87% of daily smokers, these figures are remarkably close. Similar results were found in a UK sample as 92% of daily smokers and 83% of non-daily smokers failed to remain abstinent beyond six months (Herbec et al. 2014). Despite this difficulty, smoking cessation therapies lack efficacy in non-daily smokers due to a reliance on nicotine replacement therapy (Jimenéz-Ruiz and Fagerström 2010). This is not surprising as clinical trials commonly exclude light smokers (Shiffman 2009), and they rarely experience withdrawal symptoms due to a lack of nicotine dependence.

As smoking restrictions become more and more stringent, the proportion of light smokers is predicted to increase (Coggins et al. 2009; Shiffman 2009). Although light smoking is often perceived as being less harmful, it is associated with the same increased risk of developing cardiovascular disease, lung and other types of cancer as heavy smoking. For example, one prospective study found that male and female light smokers had a significantly increased risk of ischaemic heart disease and lung cancer in comparison to non-smokers (Bjartveit and Tverdal 2005). Furthermore, a systematic review found that light smokers show an intermediate risk between non-smokers and heavy smokers, but interestingly they share the same risk for heart disease as heavy smokers (Schane, Ling and Glantz 2009). Considering this, it is important to understand what the differences are between the groups, and how we can identify them.    

What are the differences in smoking patterns?

Table 1 shows the number of cigarettes smoked per day by light and heavy smokers in a small range of studies that include figures for both groups. Although there is some fluctuation, smoking rates are approximately 15 and 4 cigarettes per day for heavy and light smokers respectively. Additionally, it is interesting that light smokers often engage in voluntary days of abstinence. Compared to heavy smokers who consistently use cigarettes every day, one study found that light smokers only tend to use cigarettes on only four days per week (Shiffman, Tindle and Li 2013). This suggests that light smokers are relatively free of nicotine dependence as the half-life of nicotine in the body is approximately two hours (Advokat, Comaty and Julien 2014). This is usually the time heavy smokers start to crave their next cigarette, but it appears that light smokers are comfortable without smoking for hours and even days after all of the nicotine has been metabolised and left the body.

Table 1

Mean Number of Cigarettes Smoked Per Day in Light and Heavy smokers

Study Smoking Group Cigarettes Per Day
Herbec et al. (2014) Daily

Non-Daily

13.90

5.20

Shiffman et al. (2012a) Daily

Intermittent

15.00

4.50

Shiffman, Dunbar and Benowitz (2014a) Daily

Intermittent

15.98

3.24

Shiffman et al. (2014b) Daily

Intermittent

15.01

4.45

Scheuermann et al. (2015) Moderate Daily

Light Daily

Converted Non-Daily

Native Non-Daily

20.60

7.41

5.78

4.25

Note: Smoking group names are reproduced with those used within each study

The early dismissal of non-daily smokers was based on the belief that they only consisted of adolescents who were in a transitioning state on the way to being a heavy smoker (Shiffman 2009). Whilst this does not provide a full explanation, non-daily smoking as a young adult is indeed an important risk factor for becoming a daily smoker later in life. One cohort study found that non-daily smoking at age 21 was associated with an odds ratio of 3.60 to becoming a daily smoker at age 38 upon follow-up (Robertson, Losua and McGee 2015). In terms of public health, this highlights the need for research to focus on non-daily adolescent smokers as they could be the target of interventions before they progress into heavier, daily smoking. However, it is not only a transient state on the road to becoming a heavy smoker. The non-daily smokers in Shiffman et al. (2012b) had been smoking for an average of 18 years, and those in Shiffman et al. (2013) had smoked an estimated 42,000 cigarettes. This suggests that light, non-daily smoking can also be a consistent behaviour pattern that can last throughout adulthood.

What are the reasons people report for smoking?

Non-daily smokers appear to show markedly different smoking habits, but they also show large differences in their reported reasons for smoking. The dominant paradigm of addictive behaviour for smokers centred around continuing to use cigarettes to avoid experiencing the aversive effects of withdrawal (Shiffman 2009). This motive appears to be consistent with heavy smokers as they cite cravings, tolerance, and a loss of control over cigarette availability as influences to smoke (Shiffman et al. 2012a). This is also consistent in young heavy smokers as higher scores of nicotine dependence was associated with smoking due to craving and habit in a sample of college students (Piasecki, Richardson and Smith 2007).

On the other hand, non-daily smokers report to smoke for radically different reasons. For example, exposure to smoking cues, weight control, sensory experiences of smoking, and positive reinforcement have been cited as motives for non-daily smokers (Shiffman et al. 2012a). This is inconsistent with daily smokers as rather than avoiding the negative experiences of smoking, they appear to smoke for the positive experiences. This has led non-daily smokers to be labelled as ‘indulgent’, as they tend to smoke to enhance the experience of situations that are already positive such as drinking alcohol in a bar with friends (Shiffman, Dunbar and Li 2014). As well as showing different habits and smoking patterns, non-daily smokers report being motivated to smoke by substantially different reasons to those normally proposed in daily smokers.

 

How can you measure cigarette consumption?

Definitions of light and heavy smoking

You may have noticed that a few different terms have been used such as: light smoker; non-daily smoker; occasional smokers. This is mainly because no one can agree on a consistent definition, and several have been used across the studies investigating this group. Firstly, light and heavy smoking has been used to highlight the contrast between consumption levels. However, this classification is associated with the largest range of criteria between studies (Husten 2009) Secondly, daily and non-daily (or intermittent) smoking is associated with a much more consistent pattern of use in contrast to light and heavy smoking (Shiffman et al. 2012a; 2012b; 2014). This is due to the number of cigarettes per day fluctuating, but smoking less than daily is a key indicator of this consumption pattern. Finally, there is a dichotomy between low and high nicotine dependence. This also appears to be a valid characterisation as non-daily/light smokers consistently exhibit significantly less nicotine dependence on every common measure (Shiffman 2012b). However, it is important to note that in reality, dependence and smoking behaviour exists along a continuum. Even within different dichotomies, there is a large amount of variation across the supposedly homogeneous sub-groups.

Measuring light and heavy smokers

On a final note of measurement, it is crucial to ask the right questions when assessing light smokers. Many questionnaires simply ask ‘are you a smoker?’ which may not detect non-daily smokers as they commonly do not identify with being a smoker (Schane et al. 2009). For example, in one study approximately 50% of light smokers said they might not admit to being a smoker (Shiffman et al. 2013). This suggests simply asking whether people smoke or not might not be the best strategy, as they may just get ‘no’ as an answer. Clearly, more nuanced approaches are necessary to detect the low number of cigarettes consumed by this group. Fortunately, there are some additional measures of cigarette consumption that can provide a more sensitive answer:

  • A diary measure of the number of cigarettes smoked over a period of time
  • Breath Carbon Monoxide (CO) in a single session
  • Average CO over a number of sessions
  • Hair cotinine (a metabolite of nicotine) or nicotine levels

However, what are the best measures to use? An intensive diary account is considered to be the most accurate but it is the most time consuming for smokers which may deter some participants (Wray, Gass and Miller 2015). When comparing this to the less motivationally intensive measures, it appears that a single daily report of cigarettes across a number of days is the measure most strongly correlated with the intensive diary. Furthermore, when the level of exhaled CO is averaged across multiple testing session, this provides a valid biomarker for measuring cigarette consumption in light smokers (Wray et al. 2015). As well as these accuracy benefits, using a handheld CO monitor is cheap and does not require the expertise associated with analysing hair cotinine and nicotine levels. Due to the heterogeneous nature of smokers, it is crucial that the complexities in identifying light smokers are fully appreciated.

Conclusion

In summary, there is a clear distinction between different types of smoker but it is often neglected in research. Despite an apparent lack of nicotine dependence, both types of smoker find it difficult to remain abstinent with only a small difference between the cessation failure rates (Tindle and Shiffman 2011; Herbec et al. 2014). This is important for public health as although they form a minority of smokers, they share the same risk factor for heart disease as heavy smokers, and have an elevated risk of lung cancer (Bjartveit and Tverdal 2005; Schane et al. 2009). Considering the number of light smokers is predicted to increase as smoking restrictions tighten (Coggins et al. 2009; Shiffman 2009), it is crucial that this group is understood better. Research should focus on the individual differences in the determinants of smoking behaviour to better understand what is motivating light and heavy smokers. This knowledge will hopefully translate into more effective smoking cessation treatments that cater to the individual needs of each smoker.

 

Reading List

Health implications: Schane, R. E., Ling, P. M., Glantz, S. A. (2010) ‘Health Effect of Light and Intermittent Smoking: A Review’. Circulation 121, 1518-1522

Smoking Patterns: Shiffman, S., Tindle, H., Li, X., Scholl, S., Dunbar, M. and Mitchell-Miland, C. (2013) ‘Characteristics and Smoking Patterns of Intermittent Smokers’. Experimental and Clinical Psychopharmacology 20(4), 264-277

Smoking Motives: Shiffman, S., Dunbar, M. S., Scholl, S. M. and Tindle, H. A. (2012a) ‘Smoking Motives of Daily and Non-Daily Smokers: A Profile Analysis’. Drug and Alcohol Dependence 126, 362-368

Definitions: Husten, C. G. (2009) ‘How Should we Define Light or Intermittent Smoking? Does it Matter?’. Nicotine and Tobacco Research 11(2), 111-121

Measurement: Wray, J. M., Gass, J. C., Miller, E. I., Wilkins, D. G., Rollins, D. E. and Tiffany, S. T. (2015) ‘A Comparative Evaluation of Self-Report and Biological Measures of Cigarette Use in Non-Daily Smokers’. Psychological Assessment [online] available from http://www.ncbi.nlm.nih.gov/pubmed/26479132  [12/07/2016]

James Bartlett

James Bartlett

I'm James Bartlett, a PhD student at Coventry University, UK. The aim of my project is to create a profile of cognitive mechanisms associated with substance use in light and heavy smokers. I keep myself occupied outside of academia by playing hockey, or watching ice hockey. You can also find me on Twitter (@JamesEBartlett).

More Posts - Website

Facebooktwitterrss