Tag Archives: research methods

A conceptual introduction to mathematical modeling of cognition

Psychological researchers try to understand how the mind works. That is, they describe observable phenomena, try to induce explanatory theories, and use those theories to deduce predictions. The explanatory value of a theory is then assessed by comparing theoretical predictions to new observations.

A good theory ideally makes precise predictions about new observations (Roberts & Pashler, 2000). While this sounds trivial, it is important to consider what it means to make precise predictions. A theory that can, in principle, predict any outcome is precise in the sense that it offers an explanation for every observation. At the same time the theory is imprecise because it is unspecific: It cannot inform our expectations about future events because it makes no prediction about what will not happen. In this sense, the theory is useless. Nobody would hire a financial adviser that can always explain why their client’s past investments failed but can never tell them where to invest next. Thus, an ideal theory predicts only what we observe and declares every other possible outcome impossible.

The law of parsimony—Occam’s razor—famously demands that we should prefer the simplest complete explanation of a phenomenon. One rationale is that simpler explanations are easier to understand, test, and falsify. Moreover, unnecessarily complex explanations yield inaccurate predictions about future events because they tend to assume causal reasons for random events that are unlikely to repeat in the future—a concept that in statistics is referred to as overfitting.

One way to conceptualize the simplicity (or complexity) of a theory is to examine the range of observations it can, in principle, explain. Following this reasoning, theories that can explain many different observations are complex; theories that can explain only very few observations are simple. Psychological theories are often verbal descriptions of assumed social or mental processes. As I will illustrate, at this level of specificity, it is often difficult to assess what exactly a theory predicts and how simple the explanation is. Mathematical models can be used to address this problem.

The number of mathematical models of cognitive processes is growing exponentially (Palminteri, Wyart, & Koechlin, 2017). However, many students of psychology and a fair number of researchers have limited knowledge about this approach to understanding the mind. In this blog post I will try to illustrate how theoretical positions can be expressed in mathematical terms as measurement models1. I will argue that formalizing a theory mathematically helps to understand it and to derive testable predictions. Finally, I will then show conceptually how the derived predictions can be tested. But first, I will try to explain some of the basic vocabulary of mathematical modeling by analogy of familiar statistical models.

What is a mathematical model?

To some the terms “mathematical model” or “formal model” may be daunting. Quite simply, a mathematical model is an expression of assumptions about how the observed data came about (i.e., about a data generating process). For example, a simple bivariate linear regression model is a mathematical model that, among other things, assumes that the relationship between two variables follows a straight line with an intercept \(a\) and a slope \(b\),

\hat y_i = a + b \times x_i,

for every observation \(i\)2. The intercept \(a\) and the slope \(b\) are the parameters of the model that quantify components of the data generating process.

To find the combination of parameter values that best describe a dataset the model is fit to those data. For some models, such as this linear regression model, formulas are available to calculate the most likely parameters. When this is not the case, the parameter values have to be cleverly guessed by an optimization algorithm that minimizes the discrepancy between model predictions \(\hat y_i\) and the observed data \(y_i\) (e.g., quantified by the root-mean-square error, \(\mathrm{RMSE} = \sqrt{\frac{1}{n}\sum^{n}_{i = 1}{(\hat y_i – y_i)^2}}\). The guessed parameter values can be used to visualize the model’s best description of the data. A visual comparison between observed data and the model description may reveal gross deviations and helps to understand what aspects of the data can be explained by the model and what aspects cannot.

To illustrate the process of fitting a linear regression model to data consider the following example inspired by Kortt & Leigh (2010)—the data used here are simulated. The authors asked “Does Size Matter?”, that is, are (logarithmized) hourly wages related linearly to body height? The relationship is visualized in the top left panel of Figure 1.

When fitting a model to data the optimization algorithm starts with an arbitrary set of parameter values, which are then adjusted step-by-step until they converge on the best description of the data. This process is illustrated by the convergence of the grey line towards the blue line. The stepwise reduction of the discrepancy between model predictions and the observed data that guides the optimization algorithm is visualized in the top right panel and the corresponding parameter values in the bottom panels of Figure 1. The final model describes the linear relationship between hourly wages and body height quite well.

Figure 1: Iterative estimation of linear regression model parameters predicting a person’s (logarithmized) hourly wages from their body height. The linear function is iteratively adjusted to the data by repeatedly trying parameter values that minimize the discrepancy between model predictions and the observed data. The blue lines indicate the optimal values derived from the analytical solution. The data used here are simulated.

Just like linear regression models, the parameters of many cognitive models can be estimated by fitting these models to data. What makes cognitive models interesting is that their parameters quantify assumed unobservable (latent) cognitive processes. That is, the parameter values usually have psychologically meaningful interpretations. I will provide some examples after some further discussion of the advantages of expressing psychological theories in mathematical terms.

What are mathematical models of cognition good for?

Expressing a theory about cognitive processes mathematically has at least three advantages. First, translating a verbal theory into a set of formulas requires specification and explicates underlying assumptions. Second, mathematical models yield specific predictions that can inform experimental tests of theories and can be used to assess a model’s complexity. Third, if we accept the assumptions of a given model, we can use the model to decompose participant responses to focus on the psychological processes of interest.

In their introductory book on computational modeling, Lewandowsky & Farrell (2011) illustrate the benefit of explicating assumptions mathematically. They attempt to translate a component process of Baddeley’s theory of working memory (Baddeley, 1986), namely the phonological loop, into a mathematical model. In the process they track the decisions about technicalities that are necessary for the implementation of the models’ mechanisms, such as the decay function or the decay rate. Lewandowsky & Farrell (2011) illustrate that there are at least 144 mathematical models of the phonological loop and conclude that a “verbal theory of working memory actually constitutes an entire family of possible models.” (p. 39, Lewandowsky & Farrell, 2011) This example clearly shows that verbal descriptions of theories are ambiguous.

The uncertainties about the specifics of a model that result in 144 candidate implementations of the theory entail uncertainty about the model’s predictions. A fully specified model allows the researcher to derive specific predictions for an experimental setup before she collects the data. These specific predictions are an important benefit to mathematical modeling.

Exploration of specific model predictions can inform the design of experiments to pit competing cognitive models against one another. Cognitive models can best be compared in conditions for which the models make diverging prediction. When such diverging prediction have been identified the researcher can explore the models’ parameter settings that yield the largest disagreement between the models. Based on this exploration the researcher can design an experiment that constitutes a maximally informative comparison between the models. This approach can even be implemented in a continuous manner while the data are being collected (Cavagnaro, Myung, Pitt, & Kujala, 2009; Myung & Pitt, 2009; Myung, Cavagnaro, & Pitt, 2013). Here on every trial the stimulus for which the models make the most diverging predictions (the response to which will be most informative) is presented. Conversely, the researcher may learn that the models make very similar predictions for the planned experiment. In this case the study would not provide a strong test between the models, is unlikely to be informative, and should be revised.

Exploration of model predictions, moreover, reveals a models’ complexity—the range of observations a model can explain. As discussed above, researchers should prefer simple explanations and thus model complexity should be penalized when researchers compare model predictions. This is difficult when a theory is expressed in words. For mathematical models, a variety of approaches to penalize model complexity in a principled manner are available (Myung & Pitt, 1997; Myung, Navarro, & Pitt, 2006; Pitt, Myung, & Zhang, 2002). Such statistical model comparisons instantiate comparisons of competing psychological theories.

Exploration of model predictions can also build an intuition as to how a model works and what “makes it tick”. That is, what are the crucial assumptions that allows the model to describe a specific pattern of results or that are responsible for empirically unsupported predictions. Such detailed understanding of the model mechanics facilitates model revision and theory development.

Finally, by fitting a cognitive model to data researchers can decompose the observed responses into the assumed cognitive processes. If the model assumptions are sensible, the parameter estimates constitute a more direct measures of the cognitive process of interest than the observed variable. In this sense, the model acts as a measurement model separating processes that researchers may be interested in from nuisance processes—measurement noise. This point will become clear when I introduce the example application in the next section.

How are predictions derived?

To illustrate some of the benefits of mathematically modeling cognitive processes I draw on an example from research in episodic long-term recognition memory. Here researchers try to understand how we judge whether we have seen something before, that is whether we perceive something to be ‘old’. A fundamental issue in the theoretical debate concerns the nature of the information that we base such judgments on3.

Latent-strength theories of recognition memory postulate that retrieval from memory yields a mnemonic signal of varying strength (e.g., Eich, 1982; Hintzman, 1984; Kahana & Sekuler, 2002; Murdock, 1993; Nosofsky & Palmeri, 2014; Shiffrin & Steyvers, 1997). This unobservable signal is assumed to be what we experience as familiarity. Things that elicit a strong signal feel familiar; things that feel familiar are likely judged to be old. It is assumed that the memory system always produces a signal of continuously varying strength. Conversely, our judgments should always be informed by a memory signal; hence, there is no such thing as random guessing.

Discrete-state theories posit that memories are either retrieved or they are not—there are no intermediate states or nuanced mnemonic strength signals that factor into the decision process (e.g., Blackwell, 1963; Bröder & Schütz, 2009; Luce, 1963). If a memory is not retrieved it is assumed that we guess randomly.

It is not obvious from these verbal descriptions how to pit these theories against one another. Fortunately, both theoretical positions have been implemented in a variety of mathematical models. For this example I will consider two variants of the well-known signal detection theory (Macmillan & Creelman, 2005; Swets, Tanner, & Birdsall, 1961) to stand in for the latent-strength perspective and the high-threshold model (Blackwell, 1963) will represent the discrete-state perspective. I will introduce the latter model first.

The high-threshold model

Figure 2: Schematic depiction of the latent states in recognition memory decisions as assumed by the high-threshold model.

The high-threshold model (HTM; Figure 2; Blackwell, 1963) assumes that when participants judge whether they have seen something before they attempt to retrieve a memory of that thing. If the thing has indeed been previously encountered, the retrieval of the corresponding memory succeeds with some probability \(p\). The model does not specify how this retrieval process proceeds. When no memory is retrieved the participant is in a state of ignorance—no information is available that could sway the judgment one way or the other. Hence, the only way to make a judgment is to resort to guess ‘old’ with probability \(b\) or guess ‘new’ with probability \(1 – b\). In case where participants are asked about something they have not encountered before the probability to retrieve the corresponding memory is assumed to be \(p = 0\)—participants always guess. Because memory retrieval and guessing are assumed to be independent processes the rate of old responses can be calculated as dependent probabilities,

\text{Hits} & = & p(\text{‘Old’}|\text{Old}) & = p + (1-p) \times b \\
\text{False alarms} & = & p(\text{‘Old’}|\text{New}) & = b.

If we are willing to accept the assumptions of HTM as reasonably accurate description of the cognitive processes involved in old-new recognition we can use this model to isolate memory performance from guessing. As both memory retrieval and guessing factor into the correct recognition of previously encountered things, the rate of ‘old’ responses—also known as hit rate—is a crude measure of memory performance. Observed changes in hit rates can result from changes in memory performance or changes in guessing behavior. However, by rearranging the above formula we can subtract out the ‘old’ responses that are due to guessing. This gives us an estimate of the probability of successful memory retrieval \(\hat p\)—a more direct measure of memory performance,

\hat p = \frac{\text{Hits} – \text{False alarms}}{1 – \text{False alarms}}.

In this sense, HTM can be interpreted as a measurement model, a theory of origin and effects of measurement error in old-new recognition. There are more assumptions models that attempt to specify how the retrieval of memories proceeds and why it may fail. As such models specify larger portions of the involved cognitive processes they are also referred to as process models. I will not cover process models in this blog post.

Signal detection theory

Figure 3: Schematic depiction of the latent mnemonic strength distributions for old and new probes in recognition memory judgments as assumed by equal- and unequal-variance signal detection theory.

The assumptions of signal detection theory (SDT; Figure 3; Swets et al., 1961) are slightly more involved. It is assumed that every memory probe elicits a mnemonic strength signal. Things that have previously been encountered elicit stronger signals than things that are new. If the mnemonic signal strength surpasses a response threshold \(c\) the participant endorses the probe as ‘old’. This threshold is an index of response bias and indicates how easily a person is convinced that they have encountered something before. However, the strength of the mnemonic signal for old and new memory probes is not fixed, it is assumed to be normally distributed. As a consequence, some new memory probes elicit a stronger signal than old probes. Assuming variability in the mnemonic signal is not only plausible but also necessary. If the model assumed fixed signal strengths for either old or new probes it would predict that either all or none of the respective probes would be judged as old, depending on the location of the response threshold. It follows from these assumptions that the rate of old responses can be calculated as the area under the curve of the respective normal distributions above the threshold \(c\),

\text{Hits} & = & p(\text{‘Old’}|\text{Old}) & = \Phi(\frac{\mu_{Old} – c}{\sigma_{Old}}), \\
\text{False alarms} & = & p(\text{‘Old’}|\text{New}) & = \Phi(\frac{\mu_{New} – c}{\sigma_{New}}),
where \(\Phi\) is the cumulative distribution function of the normal distribution. \(\mu_{Old}\) and \(\mu_{New}\) are the mean mnemonic strengths for old and new probes, \(\sigma_{Old}\) and \(\sigma_{New}\) are the standard deviations of the strength distributions.

In classic equal-variance signal detection theory (EVSDT) the dispersion of the distributions \(\sigma_{Old}\) and \(\sigma_{New}\) are assumed to be equal. Unequal-variance signal detection theory (UVSDT) is more complex in that it is assumed that \(\sigma_{Old}\) can be greater than \(\sigma_{New}\).

The distance between the two distributions \(d_a\), that is, the average difference in mnemonic strength between old and new memory probes, is an index of discriminability or sensitivity and, thus, of memory performance,

d_a = \frac{\mu_{Old} – \mu_{New}}{\sqrt{0.5(\sigma_{Old}^2 + \sigma_{New}^2})}.

In EVSDT, sensitivity is typically denoted as \(d’\). Without loss of generality it is assumed that \(\sigma_{Old}^2 = \sigma_{New}^2 = 1\). This is an arbitrary choice and could, in principle, be fixed to other values without changing the model.

Again, if we are willing to accept the assumptions of SDT as reasonably accurate description of the cognitive processes involved in old-new recognition, we can use this model to isolate memory performance from response bias. In case of EVSDT, sensitivity \(d’\) and response threshold \(c\) can easily be calculated from the observed rates of old responses,

\hat{d’} & = \Phi^{-1}(\text{Hits}) – \Phi^{-1}(\text{False alarms}), \\
\hat c & = -\frac{\Phi^{-1}(\text{Hits}) + \Phi^{-1}(\text{False alarms})}{2},
where \(\Phi^{-1}\) is the inverse cumulative distribution function of the standard normal distribution, also known as probit transformation or \(z\) scores.

Comparison of predictions

The mathematical expression of the three models can be used to drive specific predictions about the relationship between hits and false alarms. Consider the HTM. We can substitute false alarms for \(b\) and predict hits from false alarms,

\text{Hits} = p + (1-p) \times \text{False alarms}.
The resulting equation takes the same form as the linear regression function \(y = a + b \times x_i\) discussed above, with the intercept \(a = p\) and the slope \(b = 1 – p\). Hence, HTM predicts a linear relationship between hits and false alarms. Intercept and slope of the linear relationship are determined by the probability of retrieving a memory, Figure 4. Moreover, intercept and slope are inversely related: As the intercept increases, the slope decreases.

The predicted linear relationship between hits and false alarms can be tested experimentally. Under conditions where the probability of retrieving a memory \(p\) can be assumed to be constant, manipulations that affect the probability of guessing ‘old’ \(b\) should yield a linear relationship between hits and false alarms; a nonlinear relationship between would contradict HTM.

Figure 4: Predicted relationship between hits and false alarms according to high-threshold model (HTM), equal-variance (EVSDT), and unequal-variance signal detection theory (UVSDT). The predictions for UVSDT assume a constant sensitivity of \(d_a = 2.00\) to illustrate the models additional flexibility relative to EVSDT. When \(\sigma_{\mathrm{Old}} = 1.00\), UVSDT and EVSDT make identical predictions. The dotted lines indicate chance performance.

Predictions can similarly be derived for EVSDT, Figure 4. Inspection of the predicted relationships reveals that HTM and EVSDT make distinct predictions. EVSDT predicts a curved relationship between hits and false alarms where the curvature increases with the strength of the memory signal for old probes, that is the sensitivity \(d’\). Again, this constitutes an experimentally testable prediction. A comparison of the predictions of HTM and EVSDT further suggests that a paradigm that yields a medium probability of retrieving a memory or a discriminability of around \(d’ = 1.5\) would be most informative for the model comparison—the line and the curved function are distinguishable in the medium ranges of hits and false alarms.

Finally, the predictions of UVSDT illustrate the effect of assuming increased variability in the mnemonic strength distribution of old probes, Figure 4. The relationship between hits and false alarms becomes more linear in the medium and high range of false alarms. Moreover, the predictions illustrate the increased complexity of the model. When the variability in the mnemonic signal for old probes equals that of new probes UVSDT mimics EVSDT—both models make identical predictions. When the variability for old probes is large and the response threshold is low the model can predict false alarm rates that are higher than the hit rates. This observation would contradict both HTM and EVSDT.

How can the predictions be empirically tested?

As previously discussed, HTM and SDT can be used to decompose participants responses and isolate memory processes from guessing or response criteria. However, decomposition rests on the assumption that the measurement model provides a reasonably accurate description of the processes involved in recognition memory. If the assumption of the model are violated the results of the decomposition may be misleading—indices of memory performance may in part reflect processes unrelated to memory retrieval. This poses a problem: The cognitive processes involved in recognition memory cannot be observed. We can, however, compare the diverging model predictions to observed data. The model that provides the best description of the observed data—given its complexity—would be considered to provide the least implausible characterization of the latent processes. Such model comparisons do not prove that the favored model is the true model. Rather they indicate that the favored model is the least implausible. Given that it describes all relevant patterns in the data, it may provide a reasonably accurate description of the processes involved in recognition memory.

The predictions derived for HTM, EVSDT, and UVSDT suggest an experimental design to pit the models against one another. Consider the following hypothetical study inspired by Swets et al. (1961; cf. Kellen, Erdfelder, Malmberg, Dubé, & Criss, 2016). Four participants study a list of 150 words. They are instructed to memorize the list as they will be asked to remember them later. In the subsequent recognition test, another 150 new words are mixed with the study list. That is, the test list consist in equal parts of old and new memory probes. Participants receive compensation depending on their performance: They receive a bonus for every hit but a malus for every false alarm. The test list is randomly grouped into 10 sublists and the extend of the malus is varied across the sublists. Because the incentive manipulation is introduced in the test phase—all memory probes are studied as parts of the same list—we assume that it only affects processes unrelated to memory performance (i.e., guessing or response threshold). With constant memory performance HTM predicts a linear, EVSDT a symmetric curved, and UVSDT an asymmetric curved relationship between hits and false alarms.

Figure 5: Scatter plot of hits and false alarms for the hypothetical experiment. Lines indicate the best description of the data from high-threshold model, equal-variance, and unequal-variance signal detection theory. The dotted lines indicate chance performance. The data used here are simulated.

The results of the hypothetical study along with the best descriptions from each model are shown in Figure 5. Visual inspection of the plots suggests that the linear function predicted by HTM may be a decent characterization of Participant 1’s responses. However, one condition with few false alarms and hits deviates from the linear prediction and is captured much better by the SDT models. The responses by Participant 3 appear to be best described by UVSDT. There, again, is one condition with few false alarms and hits that deviates from the linear prediction. Moreover, in another condition there are more false alarms than hits—a result that only UVSDT can explain. But are the observed deviations extreme enough to support one model over the other?

Firm conclusions require statistical model comparisons. For this example I will use two information criteria, AIC\(_c\) and BIC, that quantify the models’ predictive accuracy and penalize them for their respective complexity (see Aho, Derryberry, & Peterson, 2014 for an overview), albeit crudely4. BIC penalizes model complexity more strongly than AIC\(_c\). In both cases lower values indicate better model fits. Both information criteria can be used to calculate model weights (\(w\)AIC\(_c\) and \(w\)BIC) that indicate the probability that a given model is the best model among the tested set (Wagenmakers & Farrell, 2004).

In the context of nonlinear cognitive models, such as the three models under consideration here, it has been shown that aggregating responses across participants can bias parameter estimates and lead to incorrect conclusions (e.g., Curran & Hintzman, 1995; Estes, 1956). Hence, it is not appropriate to analyse all responses jointly as if they orginated from a single participant. Alternatively, if enough data are available, the models can be compared individually for each participant (see Lewandowsky & Farrell, 2011) or jointly using advanced hierarchical modeling techniques (e.g., Rouder & Lu, 2005). For simplicity, I fit the models to each participants’ responses individually.

Figure 6 illustrates the results of the statistical model comparison. The AIC\(_c\) analysis indicates that UVSDT provides the best description for the responses of Participants 2, 3, and 4, whereas HTM provides the best description for Participant 1’s responses because these models have the lowest AIC\(_c\) values. The results of the BIC analysis are similar but the simpler models fare better due to the added penalty for the extra variance parameter in UVSDT. For example, in case of Participant 2 BIC indicates that EVSDT is the best model. The extend to which each model is to be preferred is best reflected in the model weights.

Figure 6: Heat maps of by-participant model comparisons based on Akaike Information Criterion differences (\(\Delta\)AIC\(_c\)), Akaike weights (\(w\)AIC\(_c\)), and Bayesian Information Criterion differences (\(\Delta\)BIC), and Schwarz weights (\(w\)BIC).

Beyond the comparison of the individual models, model weights can be combined to jointly compare the latent strength models to the discrete state model, e.g.,

\frac{w\text{AIC}_c^{(\text{HTM})}}{w\text{AIC}_c^{(\text{EVSDT})} + w\text{AIC}_c^{(\text{UVSDT})}}.

The joint model comparison provides a direct test of the research question while taking into account the uncertainty about the implementation of the latent strength hypothesis. According to the AIC\(_c\) the discrete-state model is favored 1.82-to-1 for Participant 1—barely informative. The latent strength models are favored 2,440.99, 11.98, and 18.48-to-1 for Participants 2, 3 and 4. According to the BIC the discrete model is favored 9.64-to-1 for Participants 1, whereas the latent strength models are favored 746.36, 2.05, and 3.16-to-1 for Participants 2, 3, and 4.

To conclude, the results are somewhat contingent on the employed information criterion but indicate that overall the latent strength models tested here may provide a better description of the observed data.

Where can I learn more?

I hope this blog post has illustrated how theoretical positions can be expressed in mathematical terms and how mathematical models of cognition can help to test and compare psychological theories. If you want to learn more, I highly recommend the book by Lewandowsky & Farrell (2011) for a general introduction and the book by Lee & Wagenmakers (2014) for a detailed introduction into Bayesian estimation techniques for cognitive models, which I haven’t covered here. Also, I would like to encourage anyone to post further suggestions for introductory materials in the comments.


Aho, K., Derryberry, D., & Peterson, T. (2014). Model selection for ecologists: The worldviews of aic and bic. Ecology, 95(3), 631–636. doi:10.1890/13-1452.1

Baddeley, A. (1986). Working Memory. Oxford: Oxford University Press.

Blackwell, H. R. (1963). Neural Theories of Simple Visual Discriminations. Journal of the Optical Society of America, 53(1), 129–160. doi:10.1364/JOSA.53.000129

Bröder, A., & Schütz, J. (2009). Recognition ROCs are curvilinear – Or are they? On premature arguments against the two-high-threshold model of recognition. Journal of Experimental Psychology – Learning, Memory, and Cognition, 35(3), 587–606. doi:10.1037/a0015279

Cavagnaro, D. R., Myung, J. I., Pitt, M. A., & Kujala, J. V. (2009). Adaptive Design Optimization: A Mutual Information-Based Approach to Model Discrimination in Cognitive Science. Neural Computation, 22(4), 887–905. doi:10.1162/neco.2009.02-09-959

Curran, T., & Hintzman, D. L. (1995). Violations of the independence assumption in process dissociation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(3), 531–547. doi:10.1037/0278-7393.21.3.531

Eich, J. M. (1982). A composite holographic associative recall model. Psychological Review, 89(6), 627–661. doi:10.1037/0033-295X.89.6.627

Estes, W. K. (1956). The problem of inference from curves based on group data. Psychological Bulletin, 53(2), 134–140. doi:10.1037/h0045156

Hintzman, D. L. (1984). MINERVA 2: A simulation model of human memory. Behavior Research Methods, Instruments, & Computers, 16(2), 96–101. doi:10.3758/BF03202365

Kahana, M. J., & Sekuler, R. (2002). Recognizing spatial patterns: A noisy exemplar approach. Vision Research, 42(18), 2177–2192. doi:10.1016/S0042-6989(02)00118-9

Kellen, D., Erdfelder, E., Malmberg, K. J., Dubé, C., & Criss, A. H. (2016). The ignored alternative: An application of luce’s low-threshold model to recognition memory. Journal of Mathematical Psychology, 75, 86–95. doi:10.1016/j.jmp.2016.03.001

Kortt, M., & Leigh, A. (2010). Does size matter in australia? Economic Record, 86(272), 71–83. doi:10.1111/j.1475-4932.2009.00566.x

Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian Cognitive Modeling: A Practical Course. Cambridge, NY: Cambridge University Press.

Lewandowsky, S., & Farrell, S. (2011). Computational Modeling in Cognition: Principles and Practice. Thousand Oaks, CA, US: SAGE.

Luce, R. D. (1963). A Threshold Theory for Simple Detection Experiments. Psychological Review, 70(1), 61–79. doi:10.1037/h0039723

Macmillan, N. A., & Creelman, D. C. (2005). Detection theory: A user’s guide (2nd ed., Vol. xix). Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers.

Mandler, G. (1980). Recognizing: The judgment of previous occurrence. Psychological Review, 87(3), 252–271. doi:10.1037/0033-295X.87.3.252

Murdock, B. B. (1993). TODAM2: A model for the storage and retrieval of item, associative, and serial-order information. Psychological Review, 100(2), 183–203. doi:10.1037/0033-295X.100.2.183

Myung, J. I., & Pitt, M. A. (1997). Applying Occam’s razor in modeling cognition: A Bayesian approach. Psychonomic Bulletin & Review, 4(1), 79–95. doi:10.3758/BF03210778

Myung, J. I., & Pitt, M. A. (2009). Optimal experimental design for model discrimination. Psychological Review, 116(3), 499–518. doi:10.1037/a0016104

Myung, J. I., Cavagnaro, D. R., & Pitt, M. A. (2013). A tutorial on adaptive design optimization. Journal of Mathematical Psychology, 57(3), 53–67. doi:10.1016/j.jmp.2013.05.005

Myung, J. I., Navarro, D. J., & Pitt, M. A. (2006). Model selection by normalized maximum likelihood. Journal of Mathematical Psychology, 50(2), 167–179. doi:10.1016/j.jmp.2005.06.008

Nosofsky, R. M., & Palmeri, T. J. (2014). An Exemplar-Based Random-Walk Model of Categorization and Recognition. In J. Busemeyer, J. Townsend, Z. Wang, & A. Eidels (Eds.), Mathematical and Computational Models of Cognition. Oxford University Press. Retrieved from http://catlab.psy.vanderbilt.edu/wp-content/uploads/NP-Oxford2014.pdf

Palminteri, S., Wyart, V., & Koechlin, E. (2017). The Importance of Falsification in Computational Cognitive Modeling. Trends in Cognitive Sciences, 21(6), 425–433. doi:10.1016/j.tics.2017.03.011

Pitt, M. A., Myung, J. I., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109(3), 472–491. doi:10.1037/0033-295X.109.3.472

Roberts, S., & Pashler, H. (2000). How persuasive is a good fit? A comment on theory testing. Psychological Review, 107(2), 358. doi:10.1037/0033-295X.107.2.358

Rouder, J. N., & Lu, J. (2005). An introduction to bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12(4), 573–604. doi:10.3758/BF03196750

Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory: REM—retrieving effectively from memory. Psychonomic Bulletin & Review, 4(2), 145–166. doi:10.3758/BF03209391

Swets, J. A., Tanner, W. P. J., & Birdsall, T. G. (1961). Decision Processes In Perception. Psychological Review, 68(5), 301–340. doi:10.1037/h0040547

Wagenmakers, E.-J., & Farrell, S. (2004). AIC model selection using Akaike weights. Psychonomic Bulletin & Review, 11(1), 192–196. doi:10.3758/BF03206482

Yonelinas, A. P. (2002). The Nature of Recollection and Familiarity: A Review of 30 Years of Research. Journal of Memory and Language, 46(3), 441–517. doi:10.1006/jmla.2002.2864

  1. The models presented in the blog post make rather abstract assumptions about the involved cognitive processes. Some mathematical models commit to more specific assumptions and mechanisms. These models are referred to as process models although the distinction between measurement models and process models is continuous rather than dichotomous.
  2. Fitting this model to data requires several additional assumptions, such as independent and identically distributed prediction errors, that I will pass over in the interest of brevity as they are irrelevant to the models’ predictions. Also, note that linear regression models can be extended to predicted nonlinear relationships, for example, by adding exponantiated predictor terms such as \(c \times x_i^2\).
  3. Another long-standing debate revolves around whether our episodic long-term memory is a unitary storage or whether it consists of multiple qualitatively different memory systems (e.g., Mandler, 1980; Yonelinas, 2002). For simplicity we will ignore this debate and focus on theories that assume episodic long-term memory to be a unitary storage.
  4. Both information criteria quantify model complexity by counting the number of free parameters. In nonlinear cognitive models not all parameters of each model grant the same flexibility. Modern model comparison methods allow the researcher to quantify model complexity in a more principled manner (Cavagnaro et al., 2009; Myung & Pitt, 2009; Myung et al., 2013).
Frederik Aust

Frederik Aust

Frederik Aust is pursuing a PhD in cognitive psychology at the University of Cologne. He is interested in mathematical models of memory and cognition, open science, and R programming.

More Posts - Website


Introduction to Data Analysis using R

R Logo

R is a statistical programming language whose popularity is quickly overtaking SPSS and other “traditional” point-and-click software packages (Muenchen, 2015). But why would anyone use a programming language, instead of point-and-click applications, for data analysis? An important reason is that data analysis rarely consists of simply running a statistical test. Instead, many small steps, such as cleaning and visualizing data, are usually repeated many times, and computers are much faster at doing repetitive tasks than humans are. Using a point-and-click interface for these “data cleaning” operations is laborious and unnecessarily slow:

“[T]he process of tidying my data took me around 10 minutes per participant as I would do it all manually through Excel. Even for a moderate sample size, this starts to take up a large chunk of time that could be spent doing other things like writing or having a beer” (Bartlett, 2016).

A programmed analysis would seamlessly apply the tidying steps to every participant in the blink of an eye, and would itself constitute an exact script of what operations were applied to the data, making it easier to repeat the steps later.

Learning to use a programming language for data analysis reduces human labor and saves time that could be better spent doing more important (or fun) things. In this post, I introduce the R programming language, and motivate its use in Psychological science. The introduction is aimed toward students and researchers with no programming experience, but is suitable for anyone with an interest in learning the basics of R.

The R project for statistical computing

“R is a free software environment for statistical computing and graphics.” (R Core Team, 2016)

Great, but what does that mean? R is a programming language that is designed and used mainly in the statistics, data science, and scientific communities. R has “become the de-facto standard for writing statistical software among statisticians and has made substantial inroads in the social and behavioural sciences” (Fox, 2010). This means that if we use R, we’ll be in good company (and that company will likely be even better and numerous in the future, see (Muenchen, 2015)).

To understand what R is, and is not, it may be helpful to begin by contrasting R to its most common alternative, SPSS. Many psychologists are familiar with SPSS, which has a graphical user interface (GUI), allowing the user to look at the two-dimensional data table on screen, and click through various drop-down menus to conduct analyses on the data. In contrast, R is an object oriented programming language. Data is loaded into R as a “variable”, meaning that in order to view it, the user has to print it on the screen. The power of this approach is that the data is an object in a programming environment, and only your imagination limits what functions you can apply to the data. R also has no GUI to navigate with the mouse; instead, users interact with the data by typing commands.

SPSS is expensive to use: Universities have to pay real money to make it available to students and researchers. R and its supporting applications, on the other hand, are completely free—meaning that both users and developers have easier access to it. R is an open source software package, which means that many cutting edge statistical methods are more quickly implemented in R than SPSS. This is apparent, for example, in the recent uprising of Bayesian methods for data analysis (e.g. Buerkner, 2016).

Further, SPSS’s facilities for cleaning, organizing, formatting, and transforming data are limited—and not very user friendly, although this is a subjective judgment—so users often resort to a spreadsheet program (Microsoft Excel, say) for data manipulation. R has excellent capacities for all steps in the analysis pipeline, including data manipulation, and therefore the analysis never has to spread across multiple applications. You can imagine how the possibility for mistakes, and time needed, is reduced when the data file(s) doesn’t need to be juggled between applications. Switching between applications, and repeatedly clicking through drop-down menus means that, for any small change, the human using the computer must re-do every step of the analysis. With R, you can simply re-use your analysis script and just import different data to it.


Figure 1. Two workflows for statistical discovery in the empirical sciences. “Analysis” consists of multiple operations, and is spread over multiple applications in Workflow 2, but not in Workflow 1. Therefore, “analysis” is more easily documented and repeated in Workflow 1. This fact alone may work to reduce mistakes in data analysis. The dashed line from R to Word Processor indicates an optional step: You can even write manuscripts with RStudio, going directly from R to Communicating Results.

These considerations lead to contrasting the two different workflows in Figure 1. Workflow 1 uses a programming language, such as R. It is difficult to learn, but beginners generally get started with real analysis in an hour or so. The payoff for the initial difficulty is great: The workflow is reproducible (users can save scripts and show their friends exactly what they did to create those beautiful violin plots); the workflow is flexible (want to do everything just the way you did it, but instead do the plots for males instead of females? Easy!); and most importantly, repetitive, boring, but important work is delegated to a computer.

The final point requires some reflecting; after all, computer programs all work on computers, so it sounds like a tautology. But what I mean is that repetitive tasks can be wrapped in a simple function (these are usually already available—you don’t have to create your own functions) which then performs the tasks as many times as you would like to. Many tasks in the data cleaning stage, for example, are fairly boring and repetitive (calculating summary statistics, aggregating data, combining spreadsheets or columns across spreadsheets), but less so when one uses a programming language.

Workflow 2, on the other hand, is easy to learn because there are few well-defined and systematic parts to it—everything is improvised on a task-by-task basis and done manually by copy-pasting, pointing-and-clicking and dragging and dropping. “Clean and organize” the data in Excel. “Analyze” in SPSS. In the optimal case where the data is perfectly aligned to the format that SPSS expects, you can get a p-value in less than a minute (excluding SPSS start-up time, which is quickly approaching infinity) by clicking through the drop-down menus. That is truly great, if that is all you want. But that’s rarely all that we want, and data is rarely in SPSS’s required format.

Workflow 2 is not reproducible (that is, it may be very difficult if not impossible to exactly retrace your steps through an analysis), so although you may know roughly that you “did an ANOVA”, you may not remember which cases were included, what data was used, how it was transformed, etc. Workflow 2 is not flexible: You’ve just done a statistical test on data from Experiment 1? Great! Can you now do it for Experiment 2, but log-transform the RTs? Sure, but then you would have to restart from the Excel step, and redo all that pointing and clicking. This leads to Workflow 2 requiring the human to do too much work, and spend time on the analysis that could be better spent “doing other things like writing or having a beer” (Bartlett, 2016).

So, what is R? It is a programming language especially suited for data analysis. It allows you to program (more on this below!) your analyses instead of pointing and clicking through menus. The point here is not that you can’t do analysis with a point-and-click SPSS style software package. You can, and you can do a pretty damn good job with it. The point is that you can work less and be more productive if you’re willing to spend some initial time and effort learning Workflow 1 instead of the common Workflow 2. And that requires getting started with R.

Getting started with R: From 0 to R in 100 seconds

If you haven’t already, go ahead and download R, and start it up on your computer. Like most programming languages, R is best understood through its console—the interface that lets you interact with the language.

Figure 2. The R console.

After opening R on your computer, you should see a similar window on your computer. The console allows us to type input, have R evaluate it, and return output. Just like a fancy calculator. Here, our first input was assigning (R uses the left arrow, <-, for assignment) all the integers from 0 to 100 to a variable called numbers. Computer code can often be read from right to left; the first one here would say “integers 0 through to 100, assign to numbers”. We then calculated the mean of those numbers by using R’s built in function, mean(). Everything interesting in R is done by using functions: There are functions for drawing figures, transforming data, running statistical tests, and much, much more.

Here’s another example, this time we’ll create some heights data for kids and adults (in centimeters) and conduct a two-sample t-test (every line that begins with a “#>” is R’s output):

That’s it, a t-test in R in a hundred seconds! Note, c() stands for “combine”, so kids is now a numeric vector (collection of numbers) with 5 elements. The t-test results are printed in R’s console, and are straightforward to interpret.

Save your analysis scripts

At its most basic, data analysis in R consists of importing data to R, and then running functions to visualize and model the data. R has powerful functions for covering the entire process going from Raw Data to Communicating Results (or Word Processor) in Figure 1. That is, users don’t need to switch between applications at various steps of the analysis workflow. Users simply type in code, let R evaluate it, and receive output. As you can imagine, a full analysis from raw data to a report (or table of summary statistics, or whatever your goal is) may involve lots of small steps—transforming variables in the data, plotting, calculating summaries, modeling and testing—which are often done iteratively. Recognizing that there may be many steps involved, we realize that we better save our work so that we can investigate and redo it later, if needed. Therefore for each analysis, we should create a text file containing all those steps, which could then be run repeatedly with minor tweaks, if required.

To create these text files, or “R scripts”, we need a text editor. All computers have a text editor pre-installed, but programming is often easier if you use an integrated development environment (IDE), which has a text editor and console all in one place (often with additional capacities.) The best IDE for R, by far, is RStudio. Go ahead and download RStudio, and then start it. At this point you can close the other R console on your computer, because RStudio has the console available for you.

Getting started with RStudio


Figure 3. The RStudio IDE to R.

Figure 3 shows the main view of RStudio. There are four rectangular panels, each with a different purpose. The bottom left panel is the R console. We can type input in the console (on the empty line that begins with a “>”) and hit return to execute the code and obtain output. But a more efficient approach is to type the code into a script file, using the text editor panel, known as the source panel, in the top left corner. Here, we have a t-test-kids-grownups.R script open, which consists of three lines of code. You can write this script on your own computer by going to File -> New File -> R Script in RStudio, and then typing in the code you see in Figure 3. You can execute each line by hitting Control + Return, on Windows computers, or Command + Return on OS X computers. Scripts like this constitute the exact documentation of what you did in your analysis, and as you can imagine, are pretty important.

The two other panels are for viewing things, not so much for interacting with the data. Top right is the Environment panel, showing the variables that you have saved in R. That is, when you assign something into a variable (kids <- c(100, 98, 89, 111, 101)), that variable (kids) is visible in the Environment panel, along with its type (num for numeric), size (1:5, for 5), and contents (100, 98, 89, 111, 101). Finally, bottom right is the Viewer panel, where we can view plots, browse files on the computer, and do various other things.

With this knowledge in mind, let’s begin with a couple easy things. Don’t worry, we’ll get to actual data soon enough, once we have the absolute basics covered. I’ll show some code and evaluate it in R to show its output too. You can, and should, type in the commands yourself to help you understand what they do (type each line in an R script and execute the line by pressing Cmd + Enter. Save your work every now and then.)

Here’s how to create variables in R (try to figure out what’s saved in each variable):

And here’s how to print those variable’s contents on the screen. (I’ll provide a comment for each line, comments begin with a # and are not evaluated by R. That is, comments are read by humans only.)

Transforming data is easy: R automatically applies operations to vectors of (variables containing multiple) numbers, if needed. Let’s create z-scores of kids heights.

I hope you followed along. You should now have a bunch of variables in your R Environment. If you typed all those lines into an R script, you can now execute them again, or modify them and then re-run the script, line-by-line. You can also execute the whole script at once by clicking “Run”, at the top of the screen. Congratulations, you’ve just programmed your first computer program!

User contributed packages

One of the best things about R is that it has a large user base, and lots of user contributed packages, which make using R easier. Packages are simply bundles of functions, and will enhance your R experience quite a bit. Whatever you want to do, there’s probably an R package for that. Here, we will install and load (make available in the current session) the tidyverse package (Wickham, 2016), which is designed for making tidying data easier.

It’s important that you use the tidyverse package if you want to follow along with this tutorial. All of the tasks covered here are possible without it, but the functions from tidyverse make the tasks easier, and certainly easier to learn.

Using R with data

Let’s import some data to R. We’ll use example data from Chapter 4 of the Intensive Longitudinal Methods book (Bolger & Laurenceau, 2013). The data set is freely available on the book’s website. If you would like to follow along, please donwload the data set, and place it in a folder (unpack the .zip file). Then, use RStudio’s Viewer panel, and its Files tab, to navigate to the directory on your computer that has the data set, and set it as the working directory by clicking “More”, then “Set As Working Directory”.


Figure 4. Setting the Working Directory

Setting the working directory properly is extremely important, because it’s the only way R knows where to look for files on your computer. If you try to load files that are not in the working directory, you need to use the full path to the file. But if your working directory is properly set, you can just use the filename. The file is called “time.csv”, and we load it into a variable called d using the read_csv() function. (csv stands for comma separated values, a common plain text format for storing data.) You’ll want to type all these functions to an R script, so create a new R script and make sure you are typing the commands in the Source panel, not the Console panel. If you set your working directory correctly, once you save the R script file, it will be saved in the directory right next to the “time.csv” file.

d is now a data frame (sometimes called a “tibble”, because why not), whose rows are observations, and columns the variables associated with those observations.

This data contains simulated daily intimacy reports of 50 individuals, who reported their intimacy every evening, for 16 days. Half of these simulated participants were in a treatment group, and the other half in a control group. To print the first few rows of the data frame to screen, simply type its name:

The first column, id is a variable that specifies the id number who that observation belongs to. int means that the data in this column are integers. time indicates the day of the observation, and the authors coded the first day at 0 (this will make intercepts in regression models easier to interpret.) time01 is just time but recoded so that 1 is at the end of the study. dbl means that the values are floating point numbers. intimacy is the reported intimacy, and treatment indicates whether the person was in the control (0) or treatment (1) group. The first row of this output also tells us that there are 800 rows in total in this data set, and 5 variables (columns). Each row is also numbered in the output (leftmost “column”), but those values are not in the data.

Data types

It’s important to verify that your variables (columns) are imported into R in the appropriate format. For example, you would not like to import time recorded in days as a character vector, nor would you like to import a character vector (country names, for example) as a numeric variable. Almost always, R (more specifically, read_csv()) automatically uses correct formats, which you can verify by looking at the row between the column names and the values.

There are five basic data types: int for integers, num (or dbl) for floating point numbers (1.12345…), chr for characters (also known as “strings”), factor (sometimes abbreviated as fctr) for categorical variables that have character labels (factors can be ordered if required), and logical (abbreviated as logi) for logical variables: TRUE or FALSE. Here’s a little data frame that illustrates the basic variable types in action:

Here we are also introduced a very special value, NA. NA means that there is no value, and we should always pay special attention to data that has NAs, because it may indicate that some important data is missing. This sample data explicitly tells us that we don’t know whether this person likes matlab or not, because the variable is NA. OK, let’s get back to the daily intimacy reports data.

Quick overview of data

We can now use the variables in the data frame d and compute summaries just as we did above with the kids’ and adults’ heights. A useful operation might be to ask for a quick summary of each variable (column) in the data set:

To get a single variable (column) from the data frame, we call it with the $ operator (“gimme”, for asking R to give you variables from within a data frame). To get all the intimacy values, we could just call d$intimacy. But we better not, because that would print out all 800 intimacy values into the console. We can pass those values to functions instead:

If you would like to see the first six values of a variable, you can use the head() function:

head() works on data frames as well, and you can use an optional number argument to specify how many first values you’d like to see returned:

A look at R’s functions

Generally, this is how R functions work, you name the function, and specify arguments to the function inside the parentheses. Some of these arguments may be data or other input (d, above), and some of them change what the argument does and how (2, above). To know what arguments you can give to a function, you can just type the function’s name in the console with a question mark prepended to it:

Importantly, calling the help page reveals that functions’ arguments are named. That is, arguments are of the form X = Y, where X is the name of the argument, and Y is the value you would like to set it to. If you look at the help page of head() (?head), you’ll see that it takes two arguments, x which should be an object (like our data frame d, (if you don’t know what “object” means in this context, don’t worry—nobody does)), and n, which is the number of elements you’d like to see returned. You don’t always have to type in the X = Y part for every argument, because R can match the arguments based on their position (whether they are the first, second, etc. argument in the parentheses). We can confirm this by typing out the full form of the previous call head(d, 2), but this time, naming the arguments:

Now that you know how R’s functions work, you can find out how to do almost anything by typing into a search engine: “How to do almost anything in R”. The internet (and books, of course) is full of helpful tutorials (see Resources section, below) but you will need to know these basics about functions in order to follow those tutorials.

Creating new variables

Creating new variables is also easy. Let’s create a new variable that is the square root of the reported intimacy (because why not), by using the sqrt() function and assigning the values to a new variable (column) within our data frame:

Recall that sqrt(d$intimacy) will take the square root of every 800 values of the vector of intimacy values, and return a vector of 800 squared values. There’s no need to do this individually for each value.

We can also create variables using conditional logic, which is useful for creating verbal labels for numeric variables, for example. Let’s create a verbal label for each of the treatment groups:

We created a new variable, Group in d, that is “Control” if the treatment variable on that row is 0, and “Treatment” otherwise.

Remember our discussion of data types above? d now contains integer, double, and character variables. Make sure you can identify these in the output, above.


Let’s focus on aggregating the data across individuals, and plotting the average time trends of intimacy, for the treatment and control groups.

In R, aggregating is easiest if you think of it as calculating summaries for “groups” in the data (and collapsing the data across other variables). “Groups” doesn’t refer to experimental groups (although it can), but instead any arbitrary groupings of your data based on variables in it, so the groups can be based on multiple things, like time points and individuals, or time points and experimental groups.

Here, our groups are the two treatment groups and 16 time points, and we would like to obtain the mean for each group at each time point by collapsing across individuals

The above code summarized our data frame d by calculating the mean intimacy for the groups specified by group_by(). We did this by first creating a data frame that is d, but is grouped on Group and time, and then summarizing those groups by taking the mean intimacy for each of them. This is what we got:

A mean intimacy value for both groups, at each time point.


We can now easily plot these data, for each individual, and each group. Let’s begin by plotting just the treatment and control groups’ mean intimacy ratings:


Figure 5. Example R plot, created with ggplot(), of two groups’ mean intimacy ratings across time.

For this plot, we used the ggplot() function, which takes as input a data frame (we used d_groups from above), and a set of aesthetic specifications (aes(), we mapped time to the x axis, intimacy to the y axis, and color to the different treatment Groups in the data). We then added a geometric object to display these data (geom_line() for a line.)

To illustrate how to add other geometric objects to display the data, let’s add some points to the graph:


Figure 6. Two groups’ mean intimacy ratings across time, with points.

We can easily do the same plot for every individual (a panel plot, but let’s drop the points for now):


Figure 7. Two groups’ mean intimacy ratings across time, plotted separately for each person.

The code is exactly the same, but now we used the non-aggregated raw data d, and added an extra function that wraps each id’s data into their own little subplot (facet_wrap(); remember, if you don’t know what a function does, look at the help page, i.e. ?facet_wrap). ggplot() is an extremely powerful function that allows you to do very complex and informative graphs with systematic, short and neat code. For example, we may add a linear trend (linear regression line) to each person’s panel. This time, let’s only look at the individuals in the experimental group, by using the filter() command (see below):


Figure 8. Treatment group’s mean intimacy ratings across time, plotted separately for each person, with linear trend lines.

Data manipulation

We already encountered an example of manipulating data, when we aggregated intimacy over some groups (experimental groups and time points). Other common operations are, for example, trimming the data based on some criteria. All operations that drop observations are conceptualized as subsetting, and can be done using the filter() command. Above, we filtered the data such that we plotted the data for the treatment group only. As another example, we can get the first week’s data (time is less than 7, that is, days 0-6), for the control group only, by specifying these logical operations in the filter() function

Try re-running the above line with small changes to the logical operations. Note that the two logical operations are combined with the AND command (&), you can also use OR (|). Try to imagine what replacing AND with OR would do in the above line of code. Then try and see what it does.

A quick detour to details

At this point it is useful to remind that computers do exactly what you ask them to do, nothing less, nothing more. So for instance, pay attention to capital letters, symbols, and parentheses. The following three lines are faulty, try to figure out why:

Why does this data frame have zero rows?

Error? What’s the problem?

Error? What’s the problem?

(Answers: 1. Group is either “Control” or “Treatment”, not “control” or “treatment”. 2. Extra parenthesis at the end. 3. == is not the same as =, the double == is a logical comparison operator, asking if two things are the same, the single = is an assignment operator.)

Advanced data manipulation

Let’s move on. What if we’d like to detect extreme values? For example, let’s ask if there are people in the data who show extreme overall levels of intimacy (what if somebody feels too much intimacy!). How can we do that? Let’s start thinking like programmers and break every problem into the exact steps required to answer the problem:

  1. Calculate the mean intimacy for everybody
  2. Plot the mean intimacy values (because always, always visualize your data)
  3. Remove everybody whose mean intimacy is over 2 standard deviations above the overall mean intimacy (over-intimate people?) (note that this is a terrible exclusion criteria here, and done for illustration purposes only)

As before, we’ll group the data by person, and calculate the mean (which we’ll call int).

We now have everybody’s mean intimacy in a neat and tidy data frame. We could, for example, arrange the data such that we see the extreme values:

Nothing makes as much sense as a histogram:


Figure 9. Histogram of everybody’s mean intimacy ratings.

It doesn’t look like anyone’s mean intimacy value is “off the charts”. Finally, let’s apply our artificial exclusion criteria: Drop everybody whose mean intimacy is 2 standard deviations above the overall mean:

Then we could proceed to exclude these participants (don’t do this with real data!), by first joining the d_grouped data frame, which has the exclusion information, with the full data frame d

and then removing all rows where exclude is TRUE. We use the filter() command, and take only the rows where exclude is FALSE. So we want our logical operator for filtering rows to be “not-exclude”. “not”, in R language, is !:

I saved the included people in a new data set called d2, because I don’t actually want to remove those people, but just illustrated how to do this. We could also in some situations imagine applying the exclusion criteria to individual observations, instead of individual participants. This would be as easy as (think why):

Selecting variables in data

After these artificial examples of removing extreme values (or people) from data, we have a couple of extra variables in our data frame d that we would like to remove, because it’s good to work with clean data. Removing, and more generally selecting variables (columns) in data frames is most easily done with the select() function. Let’s select() all variables in d except the squared intimacy (sqrt_int), average intimacy (int) and exclusion (exclude) variables (that is, let’s drop those three columns from the data frame):

Using select(), we can keep variables by naming them, or drop them by using -. If no variables are named for keeping, but some are dropped, all unnamed variables are kept, as in this example.


Let’s do an example linear regression by focusing on one participant’s data. The first step then is to create a subset containing only one person’s data. For instance, we may ask a subset of d that consists of all rows where id is 30, by typing

Linear regression is available using the lm() function, and R’s own formula syntax:

Generally, for regression in R, you’d specify the formula as outcome ~ predictors. If you have multiple predictors, you combine them with addition (“+”): outcome ~ IV1 + IV2. Interactions are specified with multiplication (“*“): outcome ~ IV1 * IV2 (which automatically includes the main effects of IV1 and IV2; to get an interaction only, use”:" outcome ~ IV1:IV2). We also specified that for the regression, we’d like to use data in the d_sub data frame, which contains only person 30’s data.

Summary of a fitted model is easily obtained:

Visualizing the model fit is also easy. We’ll use the same code as for the figures above, but also add points (geom_point()), and a linear regression line with a 95% “Confidence” Ribbon (geom_smooth(method="lm")).


Figure 10. Person 30’s intimacy ratings over time (points and black line), with a linear regression model (blue line and gray Confidence Ribbon).

Pretty cool, right? And there you have it. We’ve used R to do a sample of common data cleaning and visualization operations, and fitted a couple of regression models. Of course, we’ve only scratched the surface, and below I provide a short list of resources for learning more about R.


Programming your statistical analyses leads to a flexible, reproducible and time-saving workflow, in comparison to more traditional point-and-click focused applications. R is probably the best programming language around for applied statistics, because it has a large user base and many user-contributed packages that make your life easier. While it may take an hour or so to get acquainted with R, after initial difficulty it is easy to use, and provides a fast and reliable platform for data wrangling, visualization, modeling, and statistical testing.

Finally, learning to code is not about having a superhuman memory for function names, but instead it is about developing a programmer’s mindset: Think your problem through and decompose it to small chunks, then ask a computer to do those chunks for you. Do that a couple of times and you will magically have memorized, as a byproduct, the names of a few common functions. You learn to code not by reading and memorizing a tutorial, but by writing it out, examining the output, changing the input and figuring out what changed in the output. Even better, you’ll learn the most once you use code to examine your own data, data that you know and care about. Hopefully, you’ll be now able to begin doing just that.


The web is full of fantastic R resources, so here’s a sample of some materials I think would useful to beginning R users.

Introduction to R

  • Data Camp’s Introduction to R is a free online course on R.

  • Code School’s R Course is an interactive web tutorial for R beginners.

  • YaRrr! The Pirate’s Guide to R is a free e-book, with accompanying YouTube lectures and witty writing (“it turns out that pirates were programming in R well before the earliest known advent of computers.”) YaRrr! is also an R package that helps you get started with some pretty cool R stuff (Phillips, 2016). Recommended!

  • The Personality Project’s Guide to R (Revelle, 2016b) is a great collection of introductory (and more advanced) R materials especially for Psychologists. The site’s author also maintains a popular and very useful R package called psych (Revelle, 2016a). Check it out!

  • Google Developers’ YouTube Crash Course to R is a collection of short videos. The first 11 videos are an excellent introduction to working with RStudio and R’s data types, and programming in general.

  • Quick-R is a helpful collection of R materials.

Data wrangling

These websites explain how to “wrangle” data with R.

  • R for Data Science (Wickham & Grolemund, 2016) is the definitive source on using R with real data for efficient data analysis. It starts off easy (and is suitable for beginners) but covers nearly everything in a data-analysis workflow apart from modeling.

  • Introduction to dplyr explains how to use the dplyr package (Wickham & Francois, 2016) to wrangle data.

  • Data Processing Workflow is a good resource on how to use common packages for data manipulation (Wickham, 2016), but the example data may not be especially helpful.

Visualizing data

Statistical modeling and testing

R provides many excellent packages for modeling data, my absolute favorite is the brms package (Buerkner, 2016) for bayesian regression modeling.


Bartlett, J. (2016, November 22). Tidying and analysing response time data using r. Statistics and substance use. Retrieved November 23, 2016, from https://statsandsubstances.wordpress.com/2016/11/22/tidying-and-analysing-response-time-data-using-r/

Bolger, N., & Laurenceau, J.-P. (2013). Intensive longitudinal methods: An introduction to diary and experience sampling research. Guilford Press. Retrieved from http://www.intensivelongitudinal.com/

Buerkner, P.-C. (2016). Brms: Bayesian regression models using stan. Retrieved from http://CRAN.R-project.org/package=brms

Fox, J. (2010). Introduction to statistical computing in r. Retrieved November 23, 2016, from http://socserv.socsci.mcmaster.ca/jfox/Courses/R-course/index.html

Muenchen, R., A. (2015). The popularity of data analysis software. R4stats.com. Retrieved November 22, 2016, from http://r4stats.com/articles/popularity/

Phillips, N. (2016). Yarrr: A companion to the e-book “YaRrr!: The pirate’s guide to r”. Retrieved from https://CRAN.R-project.org/package=yarrr

R Core Team. (2016). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/

Revelle, W. (2016a). Psych: Procedures for psychological, psychometric, and personality research. Evanston, Illinois: Northwestern University. Retrieved from https://CRAN.R-project.org/package=psych

Revelle, W. (2016b). The personality project’s guide to r. Retrieved November 22, 2016, from http://personality-project.org/r/

Wickham, H. (2016). Tidyverse: Easily install and load ’tidyverse’ packages. Retrieved from https://CRAN.R-project.org/package=tidyverse

Wickham, H., & Francois, R. (2016). Dplyr: A grammar of data manipulation. Retrieved from https://CRAN.R-project.org/package=dplyr

Wickham, H., & Grolemund, G. (2016). R for data science. Retrieved from http://r4ds.had.co.nz/

Matti Vuorre

Matti Vuorre

Matti Vuorre is a PhD Student at Columbia University in New York City. He studies cognitive psychology and neuroscience, and focuses on understanding the mechanisms underlying humans' metacognitive capacities.

More Posts - Website


Introducing JASP: A free and intuitive statistics software that might finally replace SPSS

Are you tired of SPSS’s confusing menus and of the ugly tables it generates? Are you annoyed by having statistical software only at university computers? Would you like to use advanced techniques such as Bayesian statistics, but you lack the time to learn a programming language (like R or Python) because you prefer to focus on your research?

While there was no real solution to this problem for a long time, there is now good news for you! A group of researchers at the University of Amsterdam are developing JASP, a free open-source statistics package that includes both standard and more advanced techniques and puts major emphasis on providing an intuitive user interface.

The current version already supports a large array of analyses, including the ones typically used by researchers in the field of psychology (e.g. ANOVA, t-tests, multiple regression).

In addition to being open source, freely available for all platforms, and providing a considerable number of analyses, JASP also comes with several neat, distinctive features, such as real-time computation and display of all results. For example, if you decide that you want not only the mean but also the median in the table, you can tick “Median” to have the medians appear immediately in the results table. For comparison, think how this works in SPSS: First, you must navigate a forest of menus (or edit the syntax), then, you execute the new syntax. A new window appears and you get a new (ugly) table.


In JASP, you get better-looking tables in no time. Click here to see a short demonstration of this feature. But it gets even better—the tables are already in APA format and you can copy and paste them into Word. Sounds too good to be true, doesn’t it? It does, but it works!

Interview with lead developer Jonathon Love

Where is this software project coming from? Who pays for all of this? And what plans are there for the future? There is nobody who could answer these questions better than the lead developer of JASP, Jonathon Love, who was so kind as to answer a few questions about JASP.

How did development on JASP start? How did you get involved in the project?

All through my undergraduate program, we used SPSS, and it struck me just how suboptimal it was. As a software designer, I find poorly designed software somewhat distressing to use, and so SPSS was something of a thorn in my mind for four years. I was always thinking things like, “Oh, what? I have to completely re-run the analysis, because I forgot X?,” “Why can’t I just click on the output to see what options were used?,” “Why do I have to read this awful syntax?,” or “Why have they done this like this? Surely they should do this like that!”

At the same time, I was working for Andrew Heathcote, writing software for analyzing response time data. We were using the R programming language and so I was exposed to this vast trove of statistical packages that R provides. On one hand, as a programmer, I was excited to gain access to all these statistical techniques. On the other hand, as someone who wants to empower as many people as possible, I was disappointed by the difficulty of using R and by the very limited options to provide a good user interface with it.

So I saw that there was a real need for both of these things—software providing an attractive, free, and open statistics package to replace SPSS, and a platform for methodologists to publish their analyses with rich, accessible user interfaces. However, the project was far too ambitious to consider without funding, and so I couldn’t see any way to do it.

Then I met E.J. Wagenmakers, who had just received a European Research Council grant to develop an SPSS-like software package to provide Bayesian methods, and he offered me the position to develop it. I didn’t know a lot about Bayesian methods at the time, but I did see that our goals had a lot of overlap.

So I said, “Of course, we would have to implement classical statistics as well,” and E.J.’s immediate response was, “Nooooooooooo!” But he quickly saw how significant this would be. If we can liberate the underlying platform that scientists use, then scientists (including ourselves) can provide whatever analyses we like.

And so that was how the JASP project was born, and how the three goals came together:

  • to provide a liberated (free and open) alternative to SPSS
  • to provide Bayesian analyses in an accessible way
  • to provide a universal platform for publishing analyses with accessible user interfaces


What are the biggest challenges for you as a lead developer of JASP?

Remaining focused. There are hundreds of goals, and hundreds of features that we want to implement, but we must prioritize ruthlessly. When will we implement factor analysis? When will we finish the SEM module? When will data entry, editing, and restructuring arrive? Outlier exclusion? Computing of variables? These are all such excellent, necessary features; it can be really hard to decide what should come next. Sometimes it can feel a bit overwhelming too. There’s so much to do! I have to keep reminding myself how much progress we’re making.

Maintaining a consistent user experience is a big deal too. The JASP team is really large, to give you an idea, in addition to myself there’s:

  • Ravi Selker, developing the frequentist analyses
  • Maarten Marsman, developing the Bayesian ANOVAs and Bayesian linear regression
  • Tahira Jamil, developing the classical and Bayesian contingency tables
  • Damian Dropmann, developing the file save, load functionality, and the annotation system
  • Alexander Ly, developing the Bayesian correlation
  • Quentin Gronau, developing the Bayesian plots and the classical linear regression
  • Dora Matzke, developing the help system
  • Patrick Knight, developing the SPSS importer
  • Eric-Jan Wagenmakers, coming up with new Bayesian techniques and visualizations

With such a large team, developing the software and all the analyses in a consistent and coherent way can be really challenging. It’s so easy for analyses to end up a mess of features, and for every subsequent analysis we add to look nothing like the last. Of course, providing as elegant and consistent a user-experience is one of our highest priorities, so we put a lot of effort into this.


How do you imagine JASP five years from now?

JASP will provide the same, silky, sexy user experience that it does now. However, by then it will have full data entering, editing, cleaning, and restructuring facilities. It will provide all the common analyses used through undergraduate and postgraduate psychology programs. It will provide comprehensive help documentation, an abundance of examples, and a number of online courses. There will be textbooks available. It will have a growing community of methodologists publishing the analyses they are developing as additional JASP modules, and applied researchers will have access to the latest cutting-edge analyses in a way that they can understand and master. More students will like statistics than ever before.


How can JASP stay up to date with state-of-the-art statistical methods? Even when borrowing implementations written in R and the like, these always have to be implemented by you in JASP. Is there a solution to this problem?

Well, if SPSS has taught us anything, you really don’t need to stay up to date to be a successful statistical product, ha-ha! The plan is to provide tools for methodologists to write add-on modules for JASP—tools for creating user interfaces and tools to connect these user interfaces to their underlying analyses. Once an add-on module is developed, it can appear in a directory, or a sort of “App Store,” and people will be able to rate the software for different things: stability, user-friendliness, attractiveness of output, and so forth. In this way, we hope to incentivize a good user experience as much as possible.

Some people think this will never work—that methodologists will never put in all that effort to create nice, useable software (because it does take substantial effort). But I think that once methodologists grasp the importance of making their work accessible to as wide an audience as possible, it will become a priority for them. For example, consider the following scenario: Alice provides a certain analysis with a nice user interface. Bob develops an analysis that is much better than Alice’s analysis, but everyone uses Alice’s, because hers is so easy and convenient to use. Bob is upset because everyone uses Alice’s instead of his. Bob then realizes that he has to provide a nice, accessible user experience for people to use his analysis.

I hope that we can create an arms race in which methodologists will strive to provide as good a user experience as possible. If you develop a new method and nobody can use it, have you really developed a new method? Of course, this sort of add-on facility isn’t ready yet, but I don’t think it will be too far away.


You mention on your website that many more methods will be included, such as structural equation modeling (SEM) or tools for data manipulation. How can you both offer a large amount of features without cluttering the user interface in the future?

Currently, JASP uses a ribbon arrangement; we have a “File” tab for file operations, and we have a “Common” tab that provides common analyses. As we add more analyses (and as other people begin providing additional modules), these will be provided as additional tabs. The user will be able to toggle on or off which tabs they are interested in. You can see this in the current version of JASP: we have a proof-of-concept SEM module that you can toggle on or off on the options page. JASP thus provides you only with what you actually need, and the user interface can be kept as simple as you like.


Students who are considering switching to JASP might want to know whether the future of JASP development is secured or dependent on getting new grants. What can you tell us about this?

JASP is currently funded by a European Research Council (ERC) grant, and we’ve also received some support from the Centre for Open Science. Additionally, the University of Amsterdam has committed to providing us a software developer on an ongoing basis, and we’ve just run our first annual Bayesian Statistics in JASP workshop. The money we charge for these workshops is plowed straight back into JASP’s development.

We’re also developing a number of additional strategies to increase the funding that the JASP project receives. Firstly, we’re planning to provide technical support to universities and businesses that make use of JASP, for a fee. Additionally, we’re thinking of simply asking universities to contribute the cost of a single SPSS license to the JASP project. It would represent an excellent investment; it would allow us to accelerate development, achieve feature parity with SPSS sooner, and allow universities to abandon SPSS and its costs sooner. So I don’t worry about securing JASP’s future, I’m thinking about how we can expand JASP’s future.

Of course, all of this depends on people actually using JASP, and that will come down to the extent that the scientific community decides to use and get behind the JASP project. Indeed, the easiest way that people can support the JASP project is by simply using and citing it. The more users and the more citations we have, the easier it is for us to obtain funding.

Having said all that, I’m less worried about JASP’s future development than I’m worried about SPSS’s! There’s almost no evidence that any development work is being done on it at all! Perhaps we should pass the hat around for IBM.


What is the best way to get started with JASP? Are there tutorials and reproducible examples?

For classical statistics, if you’ve used SPSS, or if you have a book on statistics in SPSS, I don’t think you’ll have any difficulty using JASP. It’s designed to be familiar to users of SPSS, and our experience is that most people have no difficulty moving from SPSS to JASP. We also have a video on our website that demonstrates some basic analyses, and we’re planning to create a whole series of these.

As for the Bayesian statistics, that’s a little more challenging. Most of our effort has been going in to getting the software ready, so we don’t have as many resources for learning Bayesian statistics ready as we would like. This is something we’ll be looking at addressing in the next six to twelve months. E.J. has at least one (maybe three) books planned.

That said, there are a number of resources available now, such as:

  • Alexander Etz’s blog
  • E.J.’s website provides a number of papers on Bayesian statistics (his website also serves as a reminder of what the internet looked like in the ’80s)
  • Zoltan Dienes book is a great for Bayesian statistics as well

However, the best way to learn Bayesian statistics is to come to one of our Bayesian Statistics with JASP workshops. We’ve run two so far and they’ve been very well received. Some people have been reluctant to attend—because JASP is so easy to use, they didn’t see the point of coming and learning it. Of course, that’s the whole point! JASP is so easy to use, you don’t need to learn the software, and you can completely concentrate on learning the Bayesian concepts. So keep an eye out on the JASP website for the next workshop. Bayes is only going to get more important in the future. Don’t be left behind!


Jonas Haslbeck

Jonas Haslbeck

Jonas is a Senior Editor at the Journal of European Psychology Students. He is currently a PhD student in psychological methods at the University of Amsterdam, The Netherlands. For further info see http://jmbh.github.io/.

More Posts


Of Elephants and Effect Sizes – Interview with Geoff Cumming

We all know these crucial moments while analysing our hard-earned data – the moment of truth – is there a star above the small p? Maybe even two? Can you write a nice and simple paper or do you have to bend your back to explain why people do not, surprisingly, behave the way you thought they would? It all depends on those little stars, below or above .05, significant or not, black or white. Continue reading

Katharina Brecht

Katharina Brecht

Aside from her role as Editor-in-Chief of the Journal of European Psychology Students, Katharina is currently pursuing her PhD at the University of Cambridge. Her research interests revolve around the evolution and development of social cognition.

More Posts


Are the Methods of Psychology to Blame for its Unscientific Image? The Basis of Public Perceptions of ‘Scientific’ Research

Crystal-ball2Psychology is defined to students as the scientific study of human behaviour. However, when the American Psychological Association surveyed 1,000 adult members of the public, 70% did not agree with the statement, ‘psychology attempts to understand the way people behave through scientific research’ (Penn, Schoen and Berland Associates, 2008, p. 29). Lay people deny, what is to those within psychology, an undeniable fact: that psychology aims to test theory-grounded hypotheses in an objective, replicable and empirical manner – and is therefore scientific. Recently, psychologists have investigated the reasons for such a divide between expert and novice views of the field. In doing so, they have uncovered how lay people evaluate whether a subject deserves the scientific stamp of approval. Continue reading

Robert Blakey

Robert Blakey

Robert Blakey is a third year undergraduate student of Experimental Psychology at the University of Oxford and was a member of the 2012-2013 cohort of EFPSA's Junior Researcher Programme. He is currently carrying out a research project on the effect of interaction on estimation accuracy and writing a dissertation on consumer neuroscience. He is also interested in social cognition and specifically, public perceptions of influences on behaviour.

More Posts


Is qualitative research still considered the poor relation?

It sometimes seems that the entire area of psychology is characterised by the friction between words and numbers. When I first considered a career in psychology, as a UK student, I was faced with the confusing choice of psychology as either a Bachelor of Arts or a Bachelor of Science. The former spoke to me of enticing social science research, such as interpersonal attraction, whilst the latter screamed scary statistics – avoid, avoid, avoid! However, in the years that have passed since I had this decision to make, psychology has increasingly come to be defined as a science and the presiding impression is that the discipline takes a distinct pride in its commitment to numbers. This is perhaps the natural outcome of living in a world which dictates that evidence counts for everything, a trend which is keenly reflected in the media’s thirst for statistics-based research stories. However, I hear you ask, what has happened to the fate of “words” during this numerical domination of psychology?

This is where the field of qualitative research enters into the equation, with a number of researchers having elected to favour data gathering in the form of words, pictures or objects rather than through the standard route of numbers and statistics. However, there has long been a sense of qualitative research as the “poor relation” of quantitative efforts. The question is whether qualitative research is still somehow perceived as being of lesser value than quantitative research, and how this affects publication possibilities?

Continue reading