Not solely about that Bayes: Interview with Prof. Eric-Jan Wagenmakers

Last summer saw the publication of the most important work in psychology in decades: the Reproducibility Project (Open Science Collaboration, 2015; see here and here for context). It stirred up the community, resulting in many constructive discussions but also in verbally violent disagreement. What unites all parties, however, is the call for more transparency and openness in research.

Eric-Jan “EJ” Wagenmakers has argued for pre-registration of research (Wagenmakers et al., 2012; see also here) and direct replications (e.g., Boekel et al., 2015; Wagenmakers et al., 2015), for a clearer demarcation of exploratory and confirmatory research (de Groot, 1954/2013), and for a change in the way we analyze our data (Wagenmakers et al., 2011; Wagenmakers et al., in press).

Concerning the latter point, EJ is a staunch advocate of Bayesian statistics. With his many collaborators, he writes the clearest and wittiest exposures to the topic (e.g., Wagenmakers et al., 2016; Wagenmakers et al., 2010). Crucially, he is also a key player in opening Bayesian inference up to social and behavioral scientists more generally; in fact, the software JASP is EJ’s brainchild (see also our previous interview).


In sum, psychology is changing rapidly, both in how researchers communicate and do science, but increasingly also in how they analyze their data. This makes it nearly impossible for university curricula to keep up; courses in psychology are often years, if not decades, behind. Statistics classes in particular are usually boringly cookbook oriented and often fraught with misconceptions (Wagenmakers, 2014). At the University of Amsterdam, Wagenmakers succeeds in doing differently. He has previously taught a class called “Good Science, Bad Science”, discussing novel developments in methodology as well as supervising students in preparing and conducting direct replications of recent research findings (cf. Frank & Saxe, 2012).

Now, at the end of the day, testing undirected hypotheses using p values or Bayes factors only gets you so far – even if you preregister the heck out of it. To move the field forward, we need formal models that instantiate theories and make precise quantitative predictions. Together with Michael Lee, Eric-Jan Wagenmakers has written an amazing practical cognitive modeling book, harnessing the power of computational Bayesian methods to estimate arbitrarily complex models (for an overview, see Lee, submitted). More recently, he has co-edited a book on model-based cognitive neuroscience on how formal models can help bridge the gap between brain measurements and cognitive processes (Forstmann & Wagenmakers, 2015).

Long-term readers of the JEPS bulletin will note that topics ranging from openness of research, pre-registration and replication, and research methodology and Bayesian statistics are recurring themes. It has thus been only a matter of time for us to interview Eric-Jan Wagenmakers and ask him questions concerning all areas above. In addition, we ask: how does he stay so immensely productive? What tips does he have for students interested in an academic career; and what can instructors learn from “Good Science, Bad Science”? Enjoy the ride!

Bobby Fischer, the famous chess player, once said that he does not believe in psychology. You actually switched from playing chess to pursuing a career in psychology; tell us how this came about. Was it a good move?

It was an excellent move, but I have to be painfully honest: I simply did not have the talent and the predisposition to make a living out of playing chess. Several of my close friends did have that talent and went on to become international grandmasters; they play chess professionally. But I was actually lucky. For players outside of the world top-50, professional chess is a career trap. The pay is poor, the work insanely competitive, and the life is lonely. And society has little appreciation for professional chess players. In terms of creativity, hard work, and intellectual effort, an international chess grandmaster easily outdoes the average tenured professor. People who do not play chess themselves do not realize this.

Your list of publications gets updated so frequently, it should have its own RSS feed! How do you grow and cultivate such an impressive network of collaborators? Do you have specific tips for early career researchers?

At the start of my career I did not publish much. For instance, when I finished my four years of grad studies I think I had two papers. My current publication rate is higher, and part of that is due to an increase in expertise. It is just easier to write papers when you know (or think you know) what you’re talking about. But the current productivity is mainly due to the quality of my collaborators. First, at the psychology department of the University of Amsterdam we have a fantastic research master program. Many of my graduate students come from this program, having been tried and tested in the lab as RAs. When you have, say, four excellent graduate students, and each publishes one article a year, that obviously helps productivity. Second, the field of Mathematical Psychology has several exceptional researchers that I have somehow managed to collaborate with. In the early stages I was a graduate student with Jeroen Raaijmakers, and this made it easy to start work with Rich Shiffrin and Roger Ratcliff. So I was privileged and I took the opportunities that were given. But I also work hard, of course.

There is a lot of advice that I could give to early career researchers but I will have to keep it short. First, in order to excel in whatever area of life, commitment is key. What this usually means is that you have to enjoy what you are doing. Your drive and your enthusiasm will act as a magnet for collaborators. Second, you have to take initiative. So read broadly, follow the latest articles (I remain up to date through Twitter and Google Scholar), get involved with scientific organizations, coordinate a colloquium series, set up a reading group, offer your advisor to review papers with him/her, attend summer schools, etc. For example, when I started my career I had seen a new book on memory and asked the editor of Acta Psychologica whether I could review it for them. Another example is Erik-Jan van Kesteren, an undergraduate student from a different university who had attended one of my talks about JASP. He later approached me and asked whether he could help out with JASP. He is now a valuable member of the JASP team. Third, it helps if you are methodologically strong. When you are methodologically strong –in statistics, mathematics, or programming– you have something concrete to offer in a collaboration.

Considering all projects you are involved in, JASP is probably the one that will have most impact on psychology, or the social and behavioral sciences in general. How did it all start?

In 2005 I had a conversation with Mark Steyvers. I had just shown him a first draft of a paper that summarized the statistical drawbacks of p-values. Mark told me “it is not enough to critique p-values. You should also offer a concrete alternative”. I agreed and added a section about BIC (the Bayesian Information Criterion). However, the BIC is only a rough approximation to the Bayesian hypothesis test. Later I became convinced that social scientists will only use Bayesian tests when these are readily available in a user-friendly software package. About 5 years ago I submitted an ERC grant proposal “Bayes or Bust! Sensible hypothesis tests for social scientists” that contained the development of JASP (or “Bayesian SPSS” as I called it in the proposal) as a core activity. I received the grant and then we were on our way.

I should acknowledge that much of the Bayesian computations in JASP depend on the R BayesFactor package developed by Richard Morey and Jeff Rouder. I should also emphasize the contribution by JASPs first software engineer, Jonathon Love, who suggested that JASP ought to feature classical statistics as well. In the end we agreed that by including classical statistics, JASP could act as a Trojan horse and boost the adoption of Bayesian procedures. So the project started as “Bayesian SPSS”, but the scope was quickly broadened to include p-values.

JASP is already game-changing software, but it is under continuous development and improvement. More concretely, what do you plan to add in the near future? What do you hope to achieve in the long-term?

In terms of the software, we will shortly include several standard procedures that are still missing, such as logistic regression and chi-square tests. We also want to upgrade the popular Bayesian procedures we have already implemented, and we are going to create new modules. Before too long we hope to offer a variable views menu and a data-editing facility. When all this is done it would be great if we could make it easier for other researchers to add their own modules to JASP.

One of my tasks in the next years is to write a JASP manual and JASP books. In the long run, the goal is to have JASP be financially independent of government grants and university support. I am grateful for the support that the psychology department at the University of Amsterdam offers now, and for the support they will continue to offer in the future. However, the aim of JASP is to conquer the world, and this requires that we continue to develop the program “at break-neck speed”. We will soon be exploring alternative sources of funding. JASP will remain free and open-source, of course.

You are a leading advocate of Bayesian statistics. What do researchers gain by changing the way they analyze their data?

They gain intellectual hygiene, and a coherent answer to questions that makes scientific sense. A more elaborate answer is outlined in a paper that is currently submitted to a special issue for Psychonomic Bulletin & Review: (Part I).

The Reproducibility Project used different metrics to quantify the success of a replication – none of them really satisfactory. How can a Bayesian perspective help illuminate the “crisis of replication”?

As a theory of knowledge updating, Bayesian statistics is ideally suited to address questions of replication. However, the question “did the effect replicate?” is underspecified. Are the effect sizes comparable? Does the replication provide independent support for the presence of the effect? Does the replication provide support for the position of the proponents versus the skeptics? All these questions are slightly different, but each receives the appropriate answer within the Bayesian framework. Together with Josine Verhagen, I have explored a method –the replication Bayes factor– in which the prior distribution for the replication test is the posterior distribution obtained from the original experiment (e.g., Verhagen & Wagenmakers, 2014). We have applied this intuitive procedure to a series of recent experiments, including the multi-lab Registered Replication Report of Fritz Strack’s Facial Feedback hypothesis. In Strack’s original experiment, participants who held a pen with their teeth (causing a smile) judged cartoons to be funnier than participants who held a pen with their lips (causing a pout). I am not allowed to tell you the result of this massive replication effort, but the paper will be out soon.

You have recently co-edited a book on model-based cognitive neuroscience. What is the main idea here, and what developments in this area are most exciting to you?

The main idea is that much of experimental psychology, mathematical psychology, and the neurosciences pursue a common goal: to learn more about human cognition. So ultimately the interest is in latent constructs such as intelligence, confidence, memory strength, inhibition, and attention. The models that have been developed in mathematical psychology are able to link these latent constructs to specific model parameters. These parameters may in turn be estimated by behavioral data, by neural data, or by both data sets jointly. Brandon Turner is one of the early career mathematical psychologists who has made great progress in this area. So the mathematical models are a vehicle to achieve an integration of data from different sources. Moreover, insights from neuroscience can provide important constraints that help inform mathematical modeling. The relation is therefore mutually beneficial. This is summarized in the following paper:

One thing that distinguishes science from sophistry is replication; yet it is not standard practice. In “Good Science, Bad Science”, you had students prepare a registered replication plan. What was your experience teaching this class? What did you learn from the students?

This was a great class to teach. The students were highly motivated and oftentimes it felt more like lab-meeting than like a class. The idea was to develop four Registered Report submissions. Some time has passed, but the students and I still intend to submit the proposals for publication.

The most important lesson this class has taught me is that our research master students want to learn relevant skills and conduct real research. In the next semester I will teach a related course, “Good Research Practices”, and I hope to attain the same high levels of student involvement. For the new course, I plan to have students read a classic methods paper that identifies a fallacy; next the students will conduct a literature search to assess the current prevalence of the fallacy. I have done several similar projects, but never with master students (e.g., and

What tips and tricks can you share with instructors planning to teach a similar class?

The first tip is to set your aims high. For a research master class, the goal should be publication. Of course this may not always be realized, but it should be the goal. It helps if you can involve colleagues or graduate students. If you set your aims high, the students know that you take them seriously, and that their work matters. The second tip is to arrange the teaching so that the students do most of the work. The students need to develop a sense of ownership about their projects, and they need to learn. This will not happen if you treat the students as passive receptacles. I am reminded of a course that I took as an undergraduate. In this course I had to read chapters, deliver presentations, and prepare questions. It was one of the most enjoyable and inspiring courses I had ever taken, and it took me decades to realize that the professor who taught the course actually did not have to do much at all.

Many scholarly discussions these days take place on social media and blogs. You’ve joined twitter yourself over a year ago. How do you navigate the social media jungle, and what resources can you recommend to our readers?

I am completely addicted to Twitter, but I also feel it makes me a better scientist. When you are new to Twitter, I recommend that you start by following a few people that have interesting things to say. Coming from a Bayesian perspective, I recommend Alexander Etz (@AlxEtz) and Richard Morey (@richarddmorey). And of course it is essential to follow JASP (@JASPStats). As is the case for all social media, the most valuable resource you have is the “mute” option. Prevent yourself from being swamped by holiday pictures and exercise it ruthlessly.


Publishing a Registered Report as a Postgraduate Researcher

Registered Reports (RRs) are a new publishing format pioneered by the journal Cortex (Chambers 2013). This publication format emphasises the process of rigorous research, rather than the results, in an attempt to avoid questionable research practices such as p-hacking and HARK-ing, which ultimately reduce the reproducibility of research and contribute to publication bias in cognitive science (Chambers et al. 2014). A recent JEPS post by Dablander (2016) and JEPS’ own editorial for adopting RRs (King et al. 2016) have given a detailed explanation of the RR process. However, you may have thought that publishing a RR is reserved for only senior scientists, and is not a viable option for a postgraduate student. In fact, 5 out of 6 of the first RRs published by Cortex have had post-graduate students as authors, and publishing by RR offers postgraduates and early career researchers many unique benefits.

In the following article you will hear about the experience of Dr. Hannah Hobson, who published a RR in the journal Cortex as a part of her PhD project. I spoke to Hannah about the planning that was involved, the useful reviewer comments she received, and asked her what tips she has for postgraduates interested in publishing a RR. Furthermore, there are some comments from Professor Chris Chambers who is a section editor for Cortex on how postgraduates can benefit from using this publishing format.

Interview with Dr. Hannah Hobson

Hannah completed her PhD project on children’s behavioural imitation skills, and potential neurophysiological measures of the brain systems underlying imitation. Her PhD was based at the University of Oxford, under the supervision of Professor Dorothy Bishop. During her studies, Hannah became interested in mu suppression, an EEG measure purported to reflect the activity of the human mirror neuron system. However, she was concerned that much of research on mu suppression suffered from methodological problems, despite this measure being widely used in social cognitive neuroscience. Hannah and Dorothy thought it would be appropriate to publish a RR to focus on some of these issues. This study was published in the journal Cortex, and investigated whether mu suppression is a good measure of the human mirror neuron system (Hobson and Bishop 2016). I spoke to Hannah about her project and what her experience of publishing a RR was like during her PhD.


As you can hear from Hannah’s experience, publishing a RR was beneficial in ways that would not be possible with standard publishing formats. However, they are not suitable for every study. Drawing from Hannah’s experience and Chris Chambers’ role in promoting RRs, the main strengths and concerns for postgraduate students publishing a RR are summarised below.


Reproducible findings

It has been highlighted that the majority of psychological studies suffer from low power. As well as limiting the chances of finding an effect, low-powered studies are more likely to lack reproducibility as they overemphasise the effect size (Button et al. 2013). As a part of the stage one submission, a formal power analysis needs to be performed to identify the number of participants required for a high powered study (>90%). Therefore, PhD studies published as RRs will have greater power and reproducibility in comparison to the average unregistered study (Chambers et al. 2014).

More certainty over publications

The majority of published PhD studies begin to emerge during the final year or during your first post-doctoral position. As the academic job markets becomes ever more competitive, publications are essential. As Professor Chambers notes, RRs “enable PhD students to list provisionally accepted papers on their CVs by the time they submit their PhDs”. Employers will see greater certainty in a RR with stage one approval than the ‘in preparation’ listed next to innumerable papers following the standard publishing format.

Lower rejection rate at stage two submission

Although reaching stage one approval is more difficult due to the strict methodological rigour required, there is greater certainty in the eventual outcome of the paper once you have in-principal acceptance. In Cortex, approximately 90% of unregistered reports are rejected upon submission, but only 10% of RRs which reach stage one review have been rejected, with none being rejected so far with in-principal acceptance.

“This means you are far more likely to get your paper accepted at the first journal you submit to, reducing the tedious and time-wasting exercise of submitting down a chain of journals after your work is finished and you may already be competing on the job market”. – Professor Chris Chambers

As Dorothy Bishop explains in her blog, once you have in-principle acceptance you are in control of the timing of the publication (Bishop 2016). This means that you will have a publication in print during your PhD, as opposed to starting to submit papers towards the end which may only be ‘in preparation’ by the time of your viva voce.

Constructive reviewer comments

As the rationale and methodology is peer-reviewed before the data-collection process, reviewers are able to make suggestions to improve the design of your study. In Hannah’s experience, a reviewer pointed out an issue with her control stimuli. If she had conducted the study following the standard format, reviewers would only be able to point this out retrospectively when there is no option to change it. This experience will also be invaluable during your viva voce. As you defend your work in front of the examiners, you know your study has already gone through several rounds of review, so you can be confident in how robust it is.

Things to consider

Time restraints

Recruiting and testing participants is a lengthy process, and you often encounter a series of setbacks. If you are already in the middle of your PhD, then you may not have time to go through stage one submission before collecting your data. In Hannah’s case, publishing a RR was identified early in the project which provided a sufficient amount of time to complete it during her PhD. If you are interested in RRs, it is advisable to start the submission process as early into your PhD as possible. You may even want to start the discussion during the interview process.

Ethics merry-go-round

During stage one submission, you need to provide evidence that you already have ethical approval. If the reviewers want you to make changes to the methodology, this may necessitate amending your ethics application. In busy periods, this process of going back and forth between the reviewers and your ethics committee can become time-consuming. As time constraints is the pertinent concern for postgraduates publishing a RR, this is an additional hurdle that must be negotiated. Whilst there is no easy solution to this problem, aiming to publish a RR must be identified early in your project to ensure you will have enough time, and have a back-up plan prepared for if things do not work out.

RRs are not available in every journal

Although there has been a surge in journals offering RRs, they are not available in every one. Your research might be highly specialised and the key journal in your area may not offer the option of a RR. If your research does not fit into the scope of a journal that offers RRs, you may not have the option to publish your study as a RR. Whist there is no simple solution for this, there is a regular list of journals offering RRs on the Open Science Framework (OSF).

Supervisor conflict

Although there are a number of prominent researchers behind the initiative (Guardian Open Letter 2013), there is not universal agreement with some researchers voicing concerns (Scott 2013, although see Chambers et al. 2014 for a rebuttal to many common concerns). There have been some vocal critics of RRs, and one of these critics might end up being your supervisor. If you want to conduct a RR as a part of your PhD and your supervisor is against it, there may be some conflict. Again, it is best to identify early on in your PhD if you want to publish a RR, and make sure both you and your supervisor are on the same page.


Publishing a RR as a postgraduate researcher is a feasible option that provides several benefits, both to the individual student and to wider scientific progress. Research published as a RR is more likely to produce reproducible findings, due to the necessary high level of power, reviewers’ critique before data collection, and guards against questionable research practices such as p-hacking or HARK-ing. Providing the work is carried out as agreed, a study that has achieved stage one approval is likely to be published, allowing students the opportunity to publish their hard work, even if the findings are negative. Moreover, going through several rounds of peer-review on the proposed methodology provides an additional layer of rigour (good for science), that aids your defence in your viva voce (good for you). Of course, it is not all plain sailing and there are a several considerations students will need to make before embarking on an RR. Nonetheless, despite these concerns, this publishing format is a step in the right direction for ensuring that robust research is being conducted right down to the level of postgraduate students.

If you like the idea but do not think formal pre-registration with a journal is suitable for your project, perhaps consider using the OSF. The OSF is a site where researchers can timestamp their hypotheses and planned analyses, allowing them to develop hypothesis-driven research habits. In one research group, it is necessary for all studies ranging from undergraduate projects to grant-funded projects to be registered on third-party websites such as the OSF (Munafò 2015). Some researchers such as Chris Chambers have even made it a requirement for applicants wanting to join their group to demonstrate a prior commitment to open science practices (Chambers 2016). Starting to pre-register your studies and publish RRs as a postgraduate student demonstrates this commitment, and will prove to be crucial as open science practices become an essential criterion in recruitment.

“To junior researchers I would say that pre-registration — especially as a Registered Report — is an ideal option for publishing high-quality, hypothesis-driven research that reflects an investment both in good science and your future career” – Professor Chris Chambers 

Pre-registration and RRs are both initiatives to improve the rigour and transparency of psychological science (Munafò et al. 2014). These initiatives are available to us as research students, and it is not just the responsibility of senior academics to fight against questionable research practises. We can join in too.


Thank you to Dr. Hannah Hobson who was happy to talk about her experience as a PhD student and for her expertise in recording the interview. Hannah also helped to write and revise the post. I would also like to thank Professor Chris Chambers for taking the time to provide some comments for the post.


Do Smokers Consist of a Single Group?


When you think of a smoker, it is likely that you are imagining someone who goes through a pack of cigarettes per day and can often be found running to the nearest store to maintain their supply. Perhaps you amuse yourself watching your friend conspicuously leaving work to stand outside and huddle around their cigarette in the rain. Your assumption would often be correct as the majority of smokers are dependent on nicotine and smoke throughout the day. These daily smokers account for approximately 89% of current smokers in the UK (Herbec, Brown and West 2014), and between 67%-75% of smokers in the USA (Coggins, Murrelle and Carchman 2009). However, what about this missing proportion of smokers?

This consists of non-daily smokers, a sub-group of smokers who only consume a few cigarettes per day and can often engage in voluntary days of abstinence without experiencing the effects of withdrawal (Shiffman, Ferguson and Dunbar 2012b). What makes these smokers interesting is that although they do not appear to be dependent on nicotine, 82% of them relapse within 90 days of attempting to quit (Tindle and Shiffman 2011). Compared to 87% of daily smokers, these figures are remarkably close. Similar results were found in a UK sample as 92% of daily smokers and 83% of non-daily smokers failed to remain abstinent beyond six months (Herbec et al. 2014). Despite this difficulty, smoking cessation therapies lack efficacy in non-daily smokers due to a reliance on nicotine replacement therapy (Jimenéz-Ruiz and Fagerström 2010). This is not surprising as clinical trials commonly exclude light smokers (Shiffman 2009), and they rarely experience withdrawal symptoms due to a lack of nicotine dependence.

As smoking restrictions become more and more stringent, the proportion of light smokers is predicted to increase (Coggins et al. 2009; Shiffman 2009). Although light smoking is often perceived as being less harmful, it is associated with the same increased risk of developing cardiovascular disease, lung and other types of cancer as heavy smoking. For example, one prospective study found that male and female light smokers had a significantly increased risk of ischaemic heart disease and lung cancer in comparison to non-smokers (Bjartveit and Tverdal 2005). Furthermore, a systematic review found that light smokers show an intermediate risk between non-smokers and heavy smokers, but interestingly they share the same risk for heart disease as heavy smokers (Schane, Ling and Glantz 2009). Considering this, it is important to understand what the differences are between the groups, and how we can identify them.    

What are the differences in smoking patterns?

Table 1 shows the number of cigarettes smoked per day by light and heavy smokers in a small range of studies that include figures for both groups. Although there is some fluctuation, smoking rates are approximately 15 and 4 cigarettes per day for heavy and light smokers respectively. Additionally, it is interesting that light smokers often engage in voluntary days of abstinence. Compared to heavy smokers who consistently use cigarettes every day, one study found that light smokers only tend to use cigarettes on only four days per week (Shiffman, Tindle and Li 2013). This suggests that light smokers are relatively free of nicotine dependence as the half-life of nicotine in the body is approximately two hours (Advokat, Comaty and Julien 2014). This is usually the time heavy smokers start to crave their next cigarette, but it appears that light smokers are comfortable without smoking for hours and even days after all of the nicotine has been metabolised and left the body.

Table 1

Mean Number of Cigarettes Smoked Per Day in Light and Heavy smokers

Study Smoking Group Cigarettes Per Day
Herbec et al. (2014) Daily




Shiffman et al. (2012a) Daily




Shiffman, Dunbar and Benowitz (2014a) Daily




Shiffman et al. (2014b) Daily




Scheuermann et al. (2015) Moderate Daily

Light Daily

Converted Non-Daily

Native Non-Daily





Note: Smoking group names are reproduced with those used within each study

The early dismissal of non-daily smokers was based on the belief that they only consisted of adolescents who were in a transitioning state on the way to being a heavy smoker (Shiffman 2009). Whilst this does not provide a full explanation, non-daily smoking as a young adult is indeed an important risk factor for becoming a daily smoker later in life. One cohort study found that non-daily smoking at age 21 was associated with an odds ratio of 3.60 to becoming a daily smoker at age 38 upon follow-up (Robertson, Losua and McGee 2015). In terms of public health, this highlights the need for research to focus on non-daily adolescent smokers as they could be the target of interventions before they progress into heavier, daily smoking. However, it is not only a transient state on the road to becoming a heavy smoker. The non-daily smokers in Shiffman et al. (2012b) had been smoking for an average of 18 years, and those in Shiffman et al. (2013) had smoked an estimated 42,000 cigarettes. This suggests that light, non-daily smoking can also be a consistent behaviour pattern that can last throughout adulthood.

What are the reasons people report for smoking?

Non-daily smokers appear to show markedly different smoking habits, but they also show large differences in their reported reasons for smoking. The dominant paradigm of addictive behaviour for smokers centred around continuing to use cigarettes to avoid experiencing the aversive effects of withdrawal (Shiffman 2009). This motive appears to be consistent with heavy smokers as they cite cravings, tolerance, and a loss of control over cigarette availability as influences to smoke (Shiffman et al. 2012a). This is also consistent in young heavy smokers as higher scores of nicotine dependence was associated with smoking due to craving and habit in a sample of college students (Piasecki, Richardson and Smith 2007).

On the other hand, non-daily smokers report to smoke for radically different reasons. For example, exposure to smoking cues, weight control, sensory experiences of smoking, and positive reinforcement have been cited as motives for non-daily smokers (Shiffman et al. 2012a). This is inconsistent with daily smokers as rather than avoiding the negative experiences of smoking, they appear to smoke for the positive experiences. This has led non-daily smokers to be labelled as ‘indulgent’, as they tend to smoke to enhance the experience of situations that are already positive such as drinking alcohol in a bar with friends (Shiffman, Dunbar and Li 2014). As well as showing different habits and smoking patterns, non-daily smokers report being motivated to smoke by substantially different reasons to those normally proposed in daily smokers.


How can you measure cigarette consumption?

Definitions of light and heavy smoking

You may have noticed that a few different terms have been used such as: light smoker; non-daily smoker; occasional smokers. This is mainly because no one can agree on a consistent definition, and several have been used across the studies investigating this group. Firstly, light and heavy smoking has been used to highlight the contrast between consumption levels. However, this classification is associated with the largest range of criteria between studies (Husten 2009) Secondly, daily and non-daily (or intermittent) smoking is associated with a much more consistent pattern of use in contrast to light and heavy smoking (Shiffman et al. 2012a; 2012b; 2014). This is due to the number of cigarettes per day fluctuating, but smoking less than daily is a key indicator of this consumption pattern. Finally, there is a dichotomy between low and high nicotine dependence. This also appears to be a valid characterisation as non-daily/light smokers consistently exhibit significantly less nicotine dependence on every common measure (Shiffman 2012b). However, it is important to note that in reality, dependence and smoking behaviour exists along a continuum. Even within different dichotomies, there is a large amount of variation across the supposedly homogeneous sub-groups.

Measuring light and heavy smokers

On a final note of measurement, it is crucial to ask the right questions when assessing light smokers. Many questionnaires simply ask ‘are you a smoker?’ which may not detect non-daily smokers as they commonly do not identify with being a smoker (Schane et al. 2009). For example, in one study approximately 50% of light smokers said they might not admit to being a smoker (Shiffman et al. 2013). This suggests simply asking whether people smoke or not might not be the best strategy, as they may just get ‘no’ as an answer. Clearly, more nuanced approaches are necessary to detect the low number of cigarettes consumed by this group. Fortunately, there are some additional measures of cigarette consumption that can provide a more sensitive answer:

  • A diary measure of the number of cigarettes smoked over a period of time
  • Breath Carbon Monoxide (CO) in a single session
  • Average CO over a number of sessions
  • Hair cotinine (a metabolite of nicotine) or nicotine levels

However, what are the best measures to use? An intensive diary account is considered to be the most accurate but it is the most time consuming for smokers which may deter some participants (Wray, Gass and Miller 2015). When comparing this to the less motivationally intensive measures, it appears that a single daily report of cigarettes across a number of days is the measure most strongly correlated with the intensive diary. Furthermore, when the level of exhaled CO is averaged across multiple testing session, this provides a valid biomarker for measuring cigarette consumption in light smokers (Wray et al. 2015). As well as these accuracy benefits, using a handheld CO monitor is cheap and does not require the expertise associated with analysing hair cotinine and nicotine levels. Due to the heterogeneous nature of smokers, it is crucial that the complexities in identifying light smokers are fully appreciated.


In summary, there is a clear distinction between different types of smoker but it is often neglected in research. Despite an apparent lack of nicotine dependence, both types of smoker find it difficult to remain abstinent with only a small difference between the cessation failure rates (Tindle and Shiffman 2011; Herbec et al. 2014). This is important for public health as although they form a minority of smokers, they share the same risk factor for heart disease as heavy smokers, and have an elevated risk of lung cancer (Bjartveit and Tverdal 2005; Schane et al. 2009). Considering the number of light smokers is predicted to increase as smoking restrictions tighten (Coggins et al. 2009; Shiffman 2009), it is crucial that this group is understood better. Research should focus on the individual differences in the determinants of smoking behaviour to better understand what is motivating light and heavy smokers. This knowledge will hopefully translate into more effective smoking cessation treatments that cater to the individual needs of each smoker.


Reading List

Health implications: Schane, R. E., Ling, P. M., Glantz, S. A. (2010) ‘Health Effect of Light and Intermittent Smoking: A Review’. Circulation 121, 1518-1522

Smoking Patterns: Shiffman, S., Tindle, H., Li, X., Scholl, S., Dunbar, M. and Mitchell-Miland, C. (2013) ‘Characteristics and Smoking Patterns of Intermittent Smokers’. Experimental and Clinical Psychopharmacology 20(4), 264-277

Smoking Motives: Shiffman, S., Dunbar, M. S., Scholl, S. M. and Tindle, H. A. (2012a) ‘Smoking Motives of Daily and Non-Daily Smokers: A Profile Analysis’. Drug and Alcohol Dependence 126, 362-368

Definitions: Husten, C. G. (2009) ‘How Should we Define Light or Intermittent Smoking? Does it Matter?’. Nicotine and Tobacco Research 11(2), 111-121

Measurement: Wray, J. M., Gass, J. C., Miller, E. I., Wilkins, D. G., Rollins, D. E. and Tiffany, S. T. (2015) ‘A Comparative Evaluation of Self-Report and Biological Measures of Cigarette Use in Non-Daily Smokers’. Psychological Assessment [online] available from  [12/07/2016]


Python Programming in Psychology – From Data Collection to Analysis

Why programming?

Programming is a skill that all psychology students should learn. I can think of so many reasons on why, including automating boring stuff, and practicing problem solving skills through learning to code and programming.  In this post I will focus on two more immediate ways that may be relevant for a Psychology student, particularly during data collection and data analysis. For a more elaborated discussion on the topic read the post on my personal blog: Every Psychologist Should Learn Programming.

Here is what we will do in this post:

  • Basic Python by example (i.e., a t-test for paired samples)
  • Program a Flanker task using the Python library Expyriment
  • Visualise and analyse data

Before going into how to use Python programming in Psychology I will briefly discuss why programming may be good for data collection and analysis.

Data collection

The data collection phase of Psychological research has largely been computerised. Thus, many of the methods and tasks used to collect data are created using software. Many of these tools offer graphical user interfaces (GUIs) that may at many times cover your needs. For instance, E-prime offers a GUI which enables you to, basically, drag and drop “objects” onto a timeline to create your experiment. However, in many tasks you may need to write some customised code on top of your built experiment. For instance, quasi-randomisation may be hard to implement in the GUI without some coding (i.e., by creating CSV-files with trial order and such). At some point in your study of the human mind you will probably need to write code before collecting data.

Data collection

Data Analysis

Most programming languages can of course offer both graphical and statistical analysis of data. For instance, R statistical programming environment has recently gained more and more popularity in Psychology as well as in other disciplines. In other fields Python is also gaining popularity when it comes to analysing and visualisation of data. MATLAB has for many years also been used for quantitative methods in Psychology and cognitive science (e.g., for Psychophysical analysis, cognitive modelling, and general statistics).  Python offers extensive support for both Web scraping and the analysis of scraped data.

What language should one learn?

Okay. Okay. Programming may be useful for Psychologists! But there are so many languages! Where should I start?!” One very good start would be to learn Python. Python is a general-purpose and high-level language that was created by Guido van Rossum. Nowadays it is administrated by the non-profit organisation Python Software Foundation. Python is open source. Among many things this means that Python is free. Even for commercial use. Python is usually used and referred to as a scripting language. Thanks to its flexibility, Python is one of the most popular programming languages (e.g., 4th on the TIOBE Index for June 2016).

Programming in Psychology

One of the most important aspects, however, is that there are a variety of both general-purpose (unlike R that focuses on statistical analysis) and specialised Python packages. Good news for us interested in Psychology! This means that there are specialised libraries for creating experiments (e.g., Expyriment, PsychoPy and OpenSesame), fitting psychometric functions (e.g., pypsignifit 3.0), and analysing data (e.g., Pandas and Statsmodels). In fact, there are packages focusing on only enabling data analyses of EEG/ERP data (see my resources list for more examples). Python can be run interactively using the Python interpreter (hold on I am going to show an example later). Note, that Python comes in two major versions 2.7 (legacy) and 3.5. Discussing them is really out of the scope for this post but you can read more here.

Python from data collection to analysis

In this part of the post. you will learn how Python can be used from creating an experiment to visualising and analysing the data collected during that experiment. I have chosen a task that fits one of my research interests; attention and cognitive function. From doing research on distractors in the auditory and tactile modalities and how they impact visual tasks I am, in general, interested in how some types of information cannot be blocked out. How is it that we are unable to suppress certain responses (i.e., response inhibition)? A well-used task to measure inhibition is the Flanker task (e.g., Colcombe, Kramer, Erickson, & Scalf, 2005; Eriksen & Eriksen, 1974).  In the task we are going to create we will have two type of stimuli; congruent and incongruent. The task is to respond as quickly and accurate as possible to the direction an arrow is pointing. In congruent trials, the target arrow is surrounded by arrows pointing in the same direction (e.g., “<<<<<“) whereas on incongruent trials the surrounding arrows points to another direction (e.g., “<<><<“). Note, the target arrow is the one in the middle (e.g., the third).

For simplicity, we will examine whether the response time (RT) in congruent trials is different from RT in incongruent trials. Since we only will have two means to compare (incongruent vs congruent) we can use the paired sample t-test.

The following part is structured such that you get information on how to install Python and the libraries used. After this is done, you will get some basic information on how to write a Python script and then how to write the t-test function. After that, you get guided through how to write the Flanker task using Expyriment and, finally, you get to learn how to handle, visualise, and analyse the data from the Flanker task.

Installation of needed libraries

Before using Python you may need to install Python and the libraries that are used in the following examples. Python 2.7 can be downloaded here.

If you are running a Windows machine and have installed Python 2.7.11 your next step is to download and install Pygame.  The second library needed is SciPy which is a set of external libraries for scientific computing in Python. Installing SciPy on Windows machines are a bit complicated; first, download NumPy and SciPy,  open up windows command prompt (here is how) and use Pip to install NumPy and SciPy:

Open the command prompt, change directory to where the files were downloaded and install the packages using Pip.

Open the command prompt, change directory to where the files were downloaded and install the packages using Pip.

Expyriment, seaborn, and pandas can be downloaded and installed using Pip:

Linux users can install  the packages using Pip only and Mac users can see here on how to install the SciPy stack. If you think that the installation procedure is cumbersome I suggest that you install a scientific Python distribution (e.g., Anaconda) that will get you both Python and the libraries needed (except for Expyriment).

How to write Python scripts

Python scripts are typically written in a text editor. Windows computers comes with one called Notepad:

Notepad text editor can be used to write Python scripts (.py).

Notepad text editor can be used to write Python scripts (.py).

OS-X users can use TextEdit. Whichever text editor you end up using is not crucial but you need to save your files with the file ending .py.

Writing a t-test function

Often a Python script uses modules/libraries and these are imported at the beginning of the document. As previously mentioned the t-test script is going to use SciPy but we also need some math functions (i.e., square root). These modules are going to be imported first in our script as will become clear later on.

Before we start defining our function, I am briefly going to touch on what a function is and describe one of the datatypes we are going to use. In Python a function is parts of organised code that can be used again later. The function we will create is going to be called paired_ttest and takes the arguments x, and y. What this means is that we can send the scores from two different conditions (x and y) to the function. Our function requires the x and y variables to be of the datatype list. In the list other values can be stored (e.g., in our case the RTs in the incongruent and congruent trials). Each value stored in a list gets an index (note, in Python the index start at 0). For instance, if we have a list containing 5 differences scores we can get each of them individually by using the index on which they are stored. If we start the Python interpreter we can type the following code (see here if you are unsure on how to start the Python interpreter):

List indices

Returning to the function we are going to write, I follow this formula for the paired sample t-test:

t-test equation used for Python function

Basically,  (“d-bar”; the d with the line above) is the mean difference between two scores, Sd is the standard deviation of the differences, and n is the sample size.

Creating our function

Now we go on with defining the function in our Python script (i.e., def is what tells Python that the code in following lines are part of the function). Our function needs to calculate the difference score for each subject. Here we first create a list (i.e., di on line 5). We also need to know the sample size and we can obtain that by getting the length of the list x (by using the function len()). Note, here another datatype, int, is used. Int is short for integer and stores whole numbers. Also, worth noting here is that di and n are indented. In Python indentation is used to mark where certain code blocks start and stop.

Next we use a Python loop (e.g., line 7 below). A loop is typically used when we want to repeat something n number of times. To calculate the difference score we need to take each subject’s score in the x condition and subtract it to the score in the y condition (line 8). Here we use the list indices (e.g., x[i]). That is, i is an integer starting at 0 and going to n and the first repetition of the loop will get the first (i.e., index 0) subjects scores. The average difference score is now easy to calculate. It is just the sum of all difference scores divided by sample size (see, line 10).

Note, here we use another datatype, float. The float type represents real numbers and is stored with decimal point. In Python 2.7, we need to do this because dividing integers will lead to rounded results.

In the next part of our t-test function we are going to calculate the standard deviation. First, a float datatype is created (std_di) by using a dot after the digit (i.e., 0.). The scripts continue with looping through each difference score and adding the squared departure each subject’s score is from the average (i.e., d-dbar) to the std_di variable. In Python, squaring is done by typing “**” (see line 14). Finally, the standard deviation is obtained by taking the square root (using sqrt() from NumPy) of the value obtained in the loop.

Next statistic to be calculated is the Standard error of the mean (line 16). Finally, one line 17 and 18 we can calculate the t-value and p-value. On line 20 we add all information in the dictionary datatype that can store other objects. However, the dictionary store objects linked to keys (e.g., “T-value” in our example below).

The complete script, with an example how to use it, can be found here.

Flanker task in Expyriment

In this part of the post we are going to create the Flanker task using a  Python library called Expyriment (Krause & Lindemann, 2014).

First, we import expyriment.

We continue with creating variables that contain basic settings of our Flanker task. As can be seen in the code below we are going to have 4 trials per block, 6 blocks, and durations of 2000ms. Our flanker stimuli are stored in a list and we have some task instructions (note “\n” is the newline character and “\” just tells the Python interpreter that the string continues on the next line).

It may be worth pointing out that most Python libraries and modules have a set of classes. The classes contain a set of methods. So what is a “Class” and what is a “Method”? Essentially, a class is a template to create an object. An object can be said be a “storage” of both variables and functions. Returning to our example, we now create the Experiment class. This object will, for now, contain the task name (“Flanker Task”). The last line of the code block uses a method to initialise our object (i.e., our experiment).

We now carry on with the design of our experiment. First, we start with a for loop. In the loop we go from the first block to the last. Each block is created and temporarily stored in the variable temp_block.

Next we are going to create our trials for each block. First, in the loop we create a stimulus. Here we use the list created previously (i.e., flanker_stimuli). We can obtain one object (e.g., “<<<<<“) from the list by using the trial (4 stimuli in each list and 4 trials/block) as the index. Remember, in our loop each trial will be a number from 0 to n (e.g., number of trials) After a stimulus is created we create a trial and add the stimulus to the trial.

Since the flanker task can have both congruent (e.g., “<<<<<“) and incongruent trials (“<<><<“) we want to store this. The conditional statement (“if”) just checks whether there are as many of the first object in the list (e.g., “<“) as the length of the list. Note, count is a method of the list type object and counts the occurrences of something in the list. If the length and the number of arrows are the same the trial type is congruent:

Next we need to create the response mapping. In the tutorial example we are going to use the keys x and m as response keys. In Expyriment all character keys are represented as numbers. In the end of the code block we add the congruent/incongruent and response mapping information to our trial which, finally, is added to our block.

At the end of the block loop we use the method shuffle_trials to randomise our trials and the block is, finally, added to our experiment.

Our design is now finalised. Expyriment will also save our data (lucky us, right?!) and we need to specify the column names for the data files. Expyriment has a method (FixCross) for creating fixation cross and we want one!

We are now ready to start our experiment and present the task instructions on the screen. The last line makes the task wait for spacebar to be pressed in,

The subjects will be prompted with this text:

Expyriment task instructions for the Flanker task

Expyriment task instructions for the Flanker task

After the spacebar is pressed the task starts. It starts with the trials in the first block, of course. In each trial the stimulus is preloaded, a fixation cross is presented for 2000ms (experiment.clock.wait(durations)), and then the flanker stimuli are presented.

Fixation cross is first presented for 2000ms followed by flanker stimuli (2000ms).

Fixation cross is first presented for 2000ms followed by flanker stimuli (2000ms).

The next line to be executed is line 52 and the code on that line resets a timer so that we can use it later.  On line 54 we get response (key) and RT using the keyboard class and its wait method. We use the arguments keys (K_x and K_m are our keys, remember) and duration (2000ms). Here we use the clock method and subtract the current time (from the time that we reset the clock) from durations (line 57). This has to be done because the program waits for the subject to press a key (i.e., “m” or “r”) and next trial would continue when a key is pressed.

Accuracy is controlled using the if and else statements. That is, the actual response is compared to the correct response. After the accuracy has been determined the we add the variables the order we previously created them (i.e., “block”, “correctresp”, “response”, “trial”, “RT”, “accuracy”, “trialtype”).
Finally, when the 4 trials of a block have been run, we implement a short break (i.e., 3000 ms) and present some text notifying the participant.

The experiment end with thanking the participants for their contribution:

A recording of the task can be seen in this video:

That was how to create a Flanker task using Expyriment. For a better overview of the script as a whole see this GitHub gist. Documentation of Expyriment can be found here: Expyriment docs. To run a Python script you can open up the command prompt and change to the directory where the script is (using the command cd):

Running command prompt to execute the Flanker Script.

Data processing and analysis

Assume that we have collected data using the Flanker task and now we want to analyse our data. Expyriment saves the data of each subject in files with the file ending “.xpd”. Conveniently, the library also comes packed with methods that enables us to preprocess our data.

We are going to create a comma-separated value file (.csv) that we later are going to use to visualise and analyse our data. Lets create a script called “”. First, we import a module called os which lets us find the current directory (os.getcwd()) and by using os.sep we make our script compatible with both Windows, Linux, and OS-X. The variable datafolder stores the path to the data. In the last line, we use data_preprocessing to write a .csv file (“flanker_data.csv”) from the files starting with the name “flanker” in our data folder. Note, the Python script need to be run in the same directory as the folder ‘data’ is. Another option is to change the datafolder variable (e.g., datafolder =’path_to_where_the_data_is’).

Descriptive statistics and visualising

Each subject’s data files are now put together in “flanker_data.csv” and we can start our analyses. Here we are going to use the libraries Pandas and Seaborn. Pandas is very handy to create data structures. That is, it makes working with our data much easier. Here, in the code block below, we import Pandas as pd and Seaborn as sns. It makes using them a bit easier. The third line is going to make our plot white and without a grid.

Now we can read our csv-file (‘flanker_data.csv’). When reading in our data we need to skip the first (“# –– coding: UTF-8 –-” is no use for us!):

Concatenated data file (.csv)

Concatenated data file (.csv)

Reading in data from the data file and skipping the first row:

Pandas makes descriptive statistics quite easy as well. Since we are interested in the two types of trials, we group them. For this example, we are only going to look at the RTs:

count mean std min 25% 50% 75% max
congruent 360 560.525000 36.765310 451.0 534.75 561.0 584.0 658.0
incongruent 360 642.088889 55.847114 488.0 606.75 639.5 680.0 820.0

One way to obtain quite a lot information on our to trial types and RTs is doing a violin plot:

Violin plot of RT in the incongruent and congruent trials.

Testing our hypothesis

Just a brief reminder, we are interested here in whether people can suppress the irrelevant information (i.e., the flankers pointing to another direction than the target). We use the paired sample t-test to see if the difference in RT in incongruent and congruent trials is different from zero.

First, we need to aggregate the data, and we start by grouping our data by trial type and subject number. We can then get the mean RT for the two trial types:

Next, we are going to take the RT (values, in the script) and assign them to x and y. Remember, the t-test function we started off with takes two lists containing data. The last code in the code block below calls the function which returns the statistics needed (i.e., t-value, p-value, and degree of freedom).

Finally, before printing the results we may want to round the values. We use a for loop and go for each key and value in our dictionary (i.e., t_value). On line 7 we then round our numbers.

Printing the variable t_value (line 8 above) renders the following output:

We can conclude that there was a significant difference in the RT for incongruent (M =  642.08, SD = 55.85) and congruent (M = 560.53, SD = 36.52) trials; t(29) = 27.358, p < .001.

That was how to use Python from data collection to analysing data. If you want to play around with the scripts for processing data files for 30 simulated subjects can be downloaded here:  All the scripts described above, as well as the script to simulate the subjects (i.e., run the task automatically), can be found on this GitHub Gist. Feel free to use the Flanker task above. If you do, I would suggest that you add a couple of practice trials.


As previously mentioned the Python community is large and helpful. Thus, there are so many resources to turn to both for learning Python and finding help. It can thus be hard to know where to start. Therefore, the end of this post contains a few of the Python resources I have found useful or interesting. All resources I list below are free.

Learning Python

Python in Psychology:

Python distributions

If you think that installing Python packages seem complicated and time consuming there are a number of distributions. These distributions aims to simplify package management. That is, when you install one of them you will get many of the packages that you would have to install one by one. There are many distributions (see here) but I have personally used Anaconda and Python(x, y).

Data Collection

  • PsychoPy (Peirce, 2007) – offers both a GUI and you can use the API to program your experiments. You will find some learning/teaching resources on the homepage
  • Expyriment – the library used in the tutorial above
  • OpenSesame (Mathôt, Schreij, & Theeuwes, 2012) – offers both Python scripting (mainly inline scripts) and a GUI for building your experiments. You will find examples and tutorials on OpenSesame’s homepage.
  • PyGaze (Dalmaijer, Mathôt, & der Stigchel, 2014) – a toolbox for eye-tracking data and experiments.


  • Pandas – Python data analysis (descriptive, mainly) toolkit
  • Statsmodels – Python library enabling many common statistical methods
  • pypsignifit – Python toolbox for fitting psychometric functions (Psychophysics)
  • MNE – For processing and analysis of electroencephalography (EEG) and magnetoencephalography (MEG) data

Getting help

  • Stackoverflow – On Stackoverflow you can answer questions concerning every programming language. Questions are tagged with the programming language. Also, some of the developers of PsychoPy are active and you can tag your questions with PsychoPy.
  • User groups for PsychoPy and Expyriment can be found on Google Groups.
  • OpenSesame Forum e.g., the subforums for PyGaze and, most important, Expyriment.

That was it; I hope you have found my post valuable. If you have any questions you can either leave a comment here, on my homepage or email me.


Colcombe, S. J., Kramer, A. F., Erickson, K. I., & Scalf, P. (2005). The implications of cortical recruitment and brain morphology for individual differences in inhibitory function in aging humans. Psychology and Aging, 20(3), 363–375.

Dalmaijer, E. S., Mathôt, S., & der Stigchel, S. (2014). PyGaze: An open-source, cross-platform toolbox for minimal-effort programming of eyetracking experiments. Behavior Research Methods, 46(4), 913–921. doi:10.3758/s13428-013-0422-2

Eriksen, B. a., & Eriksen, C. W. (1974). Effects of noise letters upon the identification of a target letter in a nonsearch task. Perception & Psychophysics, 16(1), 143–149. doi:10.3758/BF03203267

Krause, F., & Lindemann, O. (2014). Expyriment: A Python library for cognitive and neuroscientific experiments. Behavior Research Methods, 46(2), 416-428.

Mathôt, S., Schreij, D., & Theeuwes, J. (2012). OpenSesame: An open-source, graphical experiment builder for the social sciences. Behavior Research Methods, 44(2), 314–324.

Peirce, J. W. (2007). PsychoPy-Psychophysics software in Python. Journal of Neuroscience Methods, 162(1-2), 8–13.


The Statistics Hell has expanded: An interview with Prof. Andy Field

FieldDoes the mention of the word “statistics” strike fear into your heart and send shivers down your spine? The results section of your thesis seeming like that dark place one should avoid at all cost? Heteroscedasticity gives you nightmares? You dread having to explain to someone what degrees of freedom are? What is the point of using ANOVA if we can do a series of t-tests? If any of these remind you of the pain of understanding statistics, or the dread of how much more lies ahead during your studies, when all you really want is someone to explain it in a humanly understandable way—look no further. Quite a few fellow students might tell you “You should go and look at Andy Field’s books. Now, at least, I understand stats”. The “Discovering statistics using …” is a gentle, student friendly introduction to statistics. Principles are introduced at a slow pace, with plenty of workable examples so that anyone with basic maths skills will be able to digest it. Now add a lens of humor and sarcasm that will have you giggling about statistics in no time!

There is a new book!

As JEPS has been excited about introducing Bayesian statistics into the lives of more psychology students (see here, here, and here for introductions, and here for software to play around with the Bayesian approach), the idea of a new book by Andy Field—whose work many of us love and wholeheartedly recommend—which incorporates this amazing approach was thrilling news.

We used this occasion to talk to Andy Field—who is he, what motivates him, and what are his thoughts on the future of psychology?

With your new book, you expand the Statistics hell with Bayesian statistics. Why is this good news for students?


There has, for a long time, been an awareness that the traditional method of testing hypotheses (null hypothesis significance testing, NHST) has its limitations. Some of these limitations are fundamental, whereas others are more about how people apply the method rather too blindly. Bayesian approaches offer an alternative, and arguably, more logical way to look at estimation and hypothesis testing. It is not without its own critics though, and it has its own set of different issues to consider. However, it is clear that there is a groundswell of support for Bayesian approaches, and that people are going to see these methods applied more and more in scientific papers. The problem is that Bayesian methods can be quite technical, and a lot of books and papers are fairly impenetrable. It can be quite hard to make the switch (or even understand what switch you would be making).

My new book essentially tries to lay some very basic foundations. It’s not a book about Bayesian statistics, it’s a book about analysing data and fitting models and I explain both the widely used classical methods and also some basic Bayesian alternatives (primarily Bayes factors). The world is not going to go Baysian overnight, so what I’m trying to do is to provide a book that covers the material that lecturers and undergraduates want covered, but also encourages them to think about the limitations of those approaches and the alternatives available to them. Hopefully, readers will have their interest piqued enough to develop their understanding by reading more specifically Bayesian books. To answer the question then, there are two reasons why introducing Bayesian approaches is a good thing for students: (1) it will help them to understand more what options are available to them when they analyse data; and (2) published research will increasingly use Bayesian methods so it will help them to make sense of what other scientists are doing with their data.

Your books are the savior for many not-so-technical psychology students. How did you first come up with writing your classic ‘Discovering Statistics with ….’ book?

Like many PhD students I was teaching statistics and SPSS to fund my PhD. I used to enjoy the challenge of trying to come up with engaging examples, and generally being a bit silly/off the wall. The student feedback was always good, and at the time I had a lot of freedom to produce my own teaching materials. At around that time, a friend-of-a-friend Dan Wright (a cognitive psychologist who was at the time doing a postdoc at City Univerity in London) was good friends with Ziyad Marar, who now heads the SAGE publications London office but at the time was a commissioning editor. Dan had just published a stats book with SAGE and Ziyad had commissioned him to help SAGE to find new authors. I was chatting to Dan during a visit to City University, and got onto the subject of me teaching SPSS and my teaching materials and whatever and he said ‘Have you ever thought of turning those into a book?’ Of course I hadn’t because books seemed like things that ‘proper’ academics did, not me. Subsequently Dan introduced me to Ziyad, who wanted to sign me up to do the book, I was in such a state of disbelief that anyone would want to publish a book written by me that I blindly agreed. The rest is history!

As an aside, I started writing it before completing my PhD although most of it was done afterwards, and I went so over the word limit that SAGE requested that I do the typesetting myself because (1) they didn’t think it would sell much (a reasonable assumption given I was a first-time author); and (2) this would save a lot of production costs. Essentially they were trying to cut their losses (and on the flip side, this also allowed me to keep the book as it was and not have to edit it to half the size!). It is a constant source of amusement to us all how much we thought the book would be a massive failure! I guess the summary is, it happened through a lot of serendipitous events. There was no master plan. I just wrote from the heart and hoped for the best, which is pretty much what I’ve done ever since.

Questionable research practices and specifically misuse of statistical methods has been a hot topic in the last years. In your opinion, what are the critical measures that have to be taken in order to improve the situation?

Three things spring immediately to mind: (1) taking the analysis away from the researcher; (2) changing the incentive structures; (3) a shift towards estimation. I’ll elaborate on these in turn.

Psychology is a very peculiar science. It’s hard to think of many other disciplines where you are expected to be an expert theoretician in a research area and also a high-level data analyst with a detailed understanding of complex statistical models. It’s bizarre really. The average medic, for example, when doing a piece of research will get expert advice from a trials unit on planning, measurement, randomization and once the data are in they’ll be sent to the biostats unit to fit the models. In other words, they are not expected to be an expert in everything: expertise is pooled. One thing, then, that I think would help is if psychologists didn’t analyse their own data but instead they were sent to a stats expert with no vested interest in the results. That way data processing and analysis could be entirely objective.

The other thing I would immediately change in academia is the incentive structures. They are completely ****** up. The whole ‘publish or perish’ mentality does nothing but harm science and waste public money. The first thing it does it create massive incentives to publish anything regardless of how interesting it is but it also incentivises ‘significance’ because journals are far more likely to publish significant results. It also encourages (especially in junior scientists) quantity over quality, and it fosters individual rather than collective motivations. For example, promotions are all about the individual demonstrating excellence rather than them demonstrating a contribution to a collective excellence. To give an example, in my research area of child anxiety I frequently have the experience that I disappear for a while to write a stats book and ignore completely child anxiety research for, say, 6 months. When I come back and try to catch up on the state of the art, hundreds, possible thousands of new papers have come out, mostly small variations on a theme, often spread across multiple publications. The signal to noise ratio is absolutely suffocating. My feeling on whether anything profound has changed in my 6 months out of the loop is ‘absolutely not’ despite several hundred new papers. Think of the collective waste of time, money and effort to achieve ‘absolutely not’. It’s good science done by extremely clever people, but everything is so piecemeal that you can’t see the word for the trees. The meaningful contributions are lost. Of course I understand that science progresses in small steps, but it has become ridiculous, and I believe that the incentive structures mean that many researchers prioritise personal gain over science. Researchers are, of course, doing what their universities expect them to do, but I can’t help but feel that psychological science would benefit from people doing fewer studies in bigger teams to address larger questions. Even at a very basic level this would mean that sample sizes would increase dramatically in psychology (which would be a wholly good thing). For this to happen, the incentive structures need to change. Value should be maximised for working in large teams, on big problems, and for saving up results to publish in more substantial papers; contribution to grants and papers should also become more balanced regardless of whether you’re first author, last author or part of a team of 30 authors.

From a statistical point of view we have to shift away from ‘all or nothing thinking’ towards estimation. From the point of view of publishing science a reviewer should ask three questions (1) is the research answering an interesting question that genuinely advances our knowledge: (2) was it well conducted to address the question being asked – i.e. does it meet the necessary methodological standards?; and (3) what do the estimates of the effects in the model tell us about the question being asked. If we strive to answer bigger questions in larger samples then p-values really become completely irrelevant (I actually think their almost irrelevant anyway but …). Pre-registration of studies helps a lot because it forces journals to address the first two questions when deciding whether to publish, but it also helps with question 3 because by making the significance of the estimates irrelevant to the decision to publish it frees the authors to focus on estimation rather than p-values. There are differing views of course on how to estimate (Classical vs Bayes, confidence intervals vs. credibility intervals etc.) but at heart, I think a shift from p-values to estimation can only be a good thing.

At JEPS we are offering students experience in scientific publishing at an early stage of their career. What could be done at universities to make students acquainted with the scientific community already during their bachelor- or master studies?

I think that psychology, as a discipline, embeds training in academic publishing within degree and PhD programs through research dissertations and the like (although note my earlier comments about the proliferation of research papers!). Nowadays though scientists are expected to engage with many different audiences through blogs, the media and so on, we could probably do more to prepare students for that by incorporating assignments into degrees that are based on public engagement. (In fact, at Sussex – and I’m sure elsewhere –  we do have these sorts of assignments).

Statistics is the predominant modeling language in almost any science and therefore sufficient knowledge about it is the prerequisite of doing any empirical work. Despite this fact, why do you think do many psychology students are reluctant to learn statistics? What could be done in education to change this attitude? How to keep it entertaining while still getting stuff done?

This really goes back to my earlier question of whether we should expect researchers to be data analysis experts. Perhaps we shouldn’t, although if we went down the route of outsourcing data analysis then a basic understanding of processing data and the types of models that can be fit would help statisticians to communicate what they have done and why.

There are lots of barriers to learning statistics. Of course anxiety is a big one, but it’s also just a very different thing to psychology. It’s a bit like putting a geography module in an English literature degree and then asking ‘why aren’t the students interested in geography?’. The answer is simple: it’s not English literature, it’s not what they want to study. It’s the same deal. People doing a psychology degree are interested in psychology, if they were interested in data they’d have chosen a maths or stats degree. The challenge is trying to help students to realize that statistical knowledge gives you power to answer interesting questions. It’s a tool, not just in research, but in making sense in an increasingly data-driven world. Numeracy and statistics, in particular, has never been more important than it is now because of the ease with which data can be collected and, therefore, the proliferation of contexts in which data is used to communicate a message to the public.

In terms of breaking down those barriers I feel strongly that teaching should be about making your own mark. What I do is not ‘correct’ (and some students hate my teaching) it’s just what works for me and my personality. In my previous books I’ve tried to use memorable examples, use humour, and I tend to have a naturally chatty writing style. In the new book I have embedded all of the academic content into a fictional story. I’m hoping that the story will be good enough to hook people in and they’ll learn statistics almost as a by-product of reading the story. Essentially they share a journey with the main character in which he keeps having to learn about statistics. I’m hoping that if the reader invests emotionally in that character then it will help them to stay invested in his journey and invested in learning. The whole enterprise is a massive gamble, I have no idea whether it will work, but as I said before I write from my heart and hope for the best!

Incidentally if you want to know more about the book and the process of creating it, see

What was your inspiration for the examples in the book? How did you come up with Satan’s little SPSS helper and other characters? How did you become the gatekeeper of the statistics hell?


The statistics hell thing comes from the fact that I listen to a lot of heavy metal music and many bands have satanic imagery. Of course, in most cases it’s just shock tactics rather than reflecting a real philosophical position, but I guess I have become a bit habituated to it. Anyway, when I designed my website (which desperately needs an overhaul incidentally) I just thought it would be amusing to poke fun at the common notion that ‘statistics is hell’. It’s supposed to be tongue-in-cheek.

As for characters in the SPSS/R/SAS book, they come from random places really. Mostly the reasons are silly and not very interesting. A few examples: the cat is simply there to look like my own cat (who is 20 now!); the Satan’s slave was because I wanted to have something with the acronym SPSS (Satan’s Personal Statistics Slave); and Oliver Twisted flags additional content so I wanted to use the phrase ‘Please sir! Can I have some more …’ like the character Oliver Twist in the Dicken’s novel. Once I knew that, it was just a matter of making him an unhinged.

The new book, of course, is much more complicated because it is a fictional story with numerous characters with different appearances and personalities. I have basically written a novel and a statistics textbook and merged the two. Therefore, each character is a lot deeper than the faces in the SPSS book – they have personalities, histories, emotions. Consequently, they have very different influences. Then, as well as the characters the storyline and the fictional world in which the story is set were influenced by all sorts of things. I’d could write you a thesis on it! In fact, I have a file on my hard drive of ‘bits of trivia’ about the new book where I kept notes on why I did certain things, where names or personalities came from, who influence the appearance of characters or objects and so on. If the book becomes a hit then come back to me and ask what influenced specific things in the book and I can probably tell you! I also think it’s nice to have some mystery and not give away too much about why the book turned out the way it did!

If you could answer any research question, what would it be?

I’d like to discover some way to make humans more tolerant of each other and of different points of view, but possibly even more than that I’d like to discover a way that people could remain at a certain age until they felt it was time to die. Mortality is the cloud over everyone’s head, but I think immortality would probably be a curse because I think you get worn down by the changing world around you. I like to think that there’s a point where you feel that you’ve done what you wanted to do and you’re ready to go. I’d invent something that allows you to do that – just stay physically at an age you liked being, and go on until you’ve had enough. There is nothing more tragic than a life ended early, so I’d stop that.

Thank you for taking the time for this interview and sharing your insights with us. We have one last question: On a 7-point Likert scale, how much do you like 7-point Likert scale?

It depends which way around the extremes are labelled …. ;-)


For more information on ‘An adventure in statistics: the reality enigma’ see:





Editor’s Pick: Our favorite MOOCs

There used to be a time when students could attend classes at their university or in their vicinity – and that was it. Lately, the geospatial restriction has vanished with the introduction of massive open online courses (MOOC’s). This format of online courses are part of the “open education” idea, offering everyone with an internet connection an opportunity to participate in various courses, presented by more and less known institutions and universities. The concept is more or less similar for all courses: anyone can join, and lectures are available in form of a video and as lecture notes. During the course, whether it is a fixed-date or self-paced (as in you deciding when to complete tasks), you will need to take quizzes, exams, and/or written projects if you wish to complete the course. In less than 10 years, this idea has grown to include millions of users, hundreds of countries and more than a dozen universities around the world, while continuing to grow.

A few years back, most courses were free and offered certificates as a reward for course completion. Nowadays, you can participate in most courses offered, but if you wish to get a certificate, there is a fee. As with every course in universities, professors or assistants are available for your questions and there is a forum for interacting with other people enrolled. In case you aren’t confident you will be able to fully understand a course in english, some of the popular courses come subtitles. If you fall in love with the format and would like to contribute, Coursera offers the possibility of you becoming a translator.

Lifelong learning is the norm nowadays. By taking MOOC, you can gain new skills and knowledge in any area of interest or keep up with the latest trends in your field. In case you are considering a change in your career or are going to start university soon, it is a nice way to sneak a peek into what the topic entails with all the time flexibility you’d like to have and from the comfort of wherever you are.

The following courses are grouped into categories, from general introductions to specific topics that enhance your methodological toolbox. Apart from the courses the JEPS team can personally recommend you, you can find a list of currently available MOOC’s on
Introduction to psychology – University of Toronto
If you are considering studying psychology or are just interested in psychology in general and are looking for a nice and comprehensive introduction, this course is yours. It covers all topics and gives you a good overview of how psychology came to be, what fields it covers, and a student favorite—mental illness. The lectures are easy to follow, cover the main topics any good textbook would cover in a more interactive and interesting way, and include the most famous experiments in psychology.

Writing in the Sciences – Stanford University
A truly excellent course that starts explaining how to improve punctuation, sentences, and paragraphs to communicate ideas as clear as possible. It also offers incredibly helpful models for how to structure your research paper. The course makes extensive use of examples so that you can apply the techniques immediately to your own work. This course will change how you write your thesis!

Understanding the Brain: The Neurobiology of Everyday Life – University of Chicago
The brain is a complex system and its neurobiology is no exception. This course takes you through all the important parts of the nervous system (beyond the brain itself) involved in our everyday functioning. Each lecture includes a very well explained theory and physiology behind the topic at hand, accompanied by very interesting examples and real-life cases to give you a better understanding. Highly recommended is the lecture on strokes–from their originas, what happens to the brain during one, to consequences to a person’s functioning.

The Brain and Space – Duke University
If you have ever wondered how our brain perceives the space around and interprets the input we get from our senses into the major picture, this course will give you a very detailed image of this complex phenomenon. Even though a general understanding of neuroscience and perception is recommended, the material can be understood with some help of Wikipedia for explanation of any unknown concepts. Everything you wanted to know about vision, spatial orientation, and perception in general is here.

Programming for Everybody (Getting started with Python) – University of Michigan
First part of the five-part course on Python programming, this is a very nice and slow-paced introductory course into the world of programming. As no previous knowledge is required, everything is explained in an easily understandable manner with a lot of examples. The shining star of this course is the professor himself, whose funny remarks make the daunting task of writing code a fun experience. In case of any doubts, there is a big and very active community on the forum ready to help at any moment.

Machine learning – Stanford University
A great introductory course in machine learning. It starts with linear regression and quickly advances into more advanced topics such as model selection, neural networks, support vector machines, large scale machine learning. The course gives both a first overview over the field and teaches you hands-on machine learning skills you can immediately apply to your research!

Calculus single variable (Five-part course) – University of Pennsylvania
Most probably the best calculus course in the world. It only requires high-school math knowledge and from there on builds up a deep knowledge about calculus by using fantastic graphics and many intuitive examples. A challenging course that is worth every minute spent on!

Introduction to Neuroeconomics: How the Brain Makes Decisions – Higher School of Economics
As neuroeconomy and psychology have been gaining a lot of attention recently, this course gives a comprehensive overview of the foundations for this new hot field and the research. As this course is highly interdisciplinary, expect to learn about neuroanatomy, psychological processes, and principles of economy merging into one theory behind decision-making. From bees, monkeys, game theory, why we dislike losing above all, and group dynamics–this course covers it all.

Statistical Learning – Stanford University
An outstanding statistics course taught by two of the world’s most famous statisticians, Trevor Hastie and Rob Tibshirani. They present tough statistical concepts in an incredibly intuitive manner and provide an R-lab after each topic to make sure that you are able to apply new knowledge immediately. They provide both of their textbooks free download for download, one heavier on the math, the other more applied.

The Addicted Brain – Emory University
Navigating in the modern world includes being exposed to (mis)information about various psychoactive substances. As having the information backed by scientific research is less biased and solid, this should be the place to learn about this topic. The course goes through all major addictive substances: from the more legal ones like alcohol, nicotine, and caffeine; medication and illegal substances; along with ways in which they change the brain and affect behavior. Lastly, two lectures cover the risks of addiction along with treatments and recent policy developments.

Drugs and the Brain – CALTECH
Building on the basics of “the Addicted Brain” (I suggest taking that one prior to this one), the course goes more in depth into what happens on a molecular level in the brain the moment a drug is taken. A big part of the course requires learning the principles of psychopharmacology, which I would wholeheartedly recommend for anyone who either wants to be a clinical psychologist or is interested in how drugs for various psychiatric diagnoses work. The course goes beyond the scope of the more basic previously mentioned course by covering neurodegenerative diseases we often hear about but aren’t really sure what they entail, along with serious headaches or migraines.

Let us know if you found this helpful or if you have any tips. Maybe you’ll find some inspiration to take a course yourself while browsing the ones we have mentioned. If you have a suggestion or previous experience with this, feel free to comment below!


JEPS introduces Registered Reports: Here is how it works

For  more than six years, JEPS has been publishing student research, both in the form of classic Research Articles as well as Literature Reviews. As of April 2016, JEPS offers another publishing format: Registered Reports. In this blog post we explain what Registered Reports are, why they could be interesting for you as a student, and how the review process works.

What are Registered Reports?

Registered Reports are a new form of research article, in which the editorial decision is based on peer review that takes place before data collection.  The review process is thereby divided into two stages: first, your research question and methodology is evaluated, while the data is yet to be collected. In case your Registered Report gets in-principle accepted, you are guaranteed to get your final manuscript published once the data is collected – irrespective of your findings. The second step of the review process then only consists of checking whether you sticked to the methodology you proposed in the Registered Report.

The format of Registered Reports alleviates many problems associated with the current publishing culture, such as the publication bias (see also our previous post): For instance, the decision whether the manuscript gets published is independent of the outcome of statistical tests and therefore publication bias is ruled out. Also, you have to stick to the hypothesis and methodology in your Registered Report and therefore a clear line between exploratory and confirmatory research is maintained.

How does the review process work exactly?

You submit a manuscript consisting of the motivation (introduction) of your research and a detailed description of your hypotheses and the methodology and analysis you intend to use to investigate your hypotheses. Your research plan will then be reviewed by at least two researchers who are experts in your field of psychology. Note that in case Registered Reports Pipeline

Reviewers might ask for revisions of your proposed methodology or analysis. Once all reviewer concerns have been sufficiently addressed, the Registered Report is accepted. This means that you can now collect your data and if you don’t make important changes to your hypotheses and methodology, you are guaranteed publication of  your final manuscript, in format very similar to our Research Articles. Any changes have to be clearly indicated as such. In the second stage of the review process, they will be examined. 


Why are Registered Reports interesting for you as a student?

First, you get feedback about your project from experts in your field of psychology. It is very likely that this feedback will make your research stronger and improves your design design. This avoids the situation that you collected your data but then realize during the review process that your methodology is not watertight. Therefore, Registered Reports offer you the chance to rule out methodological problems before collecting the data, possibly saving a lot of headache after. And then having your publication assured.

Second, it takes away the pressure to get “good results” as your results are published regardless of the outcome of your analysis. Further, the fact that your methodology was reviewed before data collection allows to give null-results more weight. Normally, registered reports also include control conditions that help interpreting any (null-) results.

Lastly, Registered Reports enable you to be open and transparent about your scientific practices. When your work is published as a Registered Report, there is a clear separation between confirmatory and exploratory data analysis. While you can change your analysis after your data collection is completed, you have to declare and explain the changes.This adds credibility to the conclusions of your paper and increases the likelihood that future research can build on your work.

And lastly, some practical points

Before you submit, you therefore need to think about, in detail, the research question you want to investigate, and how you plan to analyse your data. This includes a description of your procedures in sufficient detail that others can replicate it and of your proposed sample, a definition of exclusion criteria, a plan of your analysis (incl. Pre-processing steps), and, if you want to do Null Hypothesis significance testing, a power analysis.

Further, you can withdraw your study at any point – however, when this happens after the in-principle acceptance, many journals will publish your work in a special section of the journal called “Withdrawn Reports”. The great thing is that null-result need not to dishearten you – if you received an IPA, your study will still be published – and given that it was pre-registered and pre-peer reviewed, chances are high that others can built on your null-result.

Lastly, you should note that you need not register your work with a journal – you can also register it on the Open Science Framework, for example. In this case, however, your work won’t be reviewed.

Are you as excited about Registered Reports as we are? Are you considering submitting your next project as a Registered Report? Check out our Submission guidelines for further info. Also, please do not hesitate to contact us in case you have any questions!

Suggested Reading

Chambers et al., (2013): Open letter to the Guardian

Gelman & Loken (2013): Garden of forking paths


Replicability and Registered Reports

Last summer saw the publication of a monumental piece of work: the reproducibility project (Open Science Collaboration, 2015). In a huge community effort, over 250 researchers directly replicated 100 experiments initially conducted in 2008. Only 39% of the replications were significant at the 5% level. Average effect size estimates were halved. The study design itself—conducting direct replications on a large scale—as well as its outcome are game-changing to the way we view our discipline, but students might wonder: what game were we playing before, and how did we get here?

In this blog post, I provide a selective account of what has been dubbed the “reproducibility crisis”, discussing its potential causes and possible remedies. Concretely, I will argue that adopting Registered Reports, a new publishing format recently also implemented in JEPS (King et al., 2016; see also here), increases scientific rigor, transparency, and thus replicability of research. Wherever possible, I have linked to additional resources and further reading, which should help you contextualize current developments within psychological science and the social and behavioral sciences more general.

How did we get here?

In 2005, Ioannidis made an intriguing argument. Because the prior probability of any hypothesis being true is low, researchers continuously running low powered experiments, and as the current publishing system is biased toward significant results, most published research findings are false. Within this context, spectacular fraud cases like Diederik Stapel (see here) and the publication of a curious paper about people “feeling the future” (Bem, 2011) made 2011 a “year of horrors” (Wagenmakers, 2012), and toppled psychology into a “crisis of confidence” (Pashler & Wagenmakers, 2012). As argued below, Stapel and Bem are emblematic of two highly interconnected problems of scientific research in general.

Publication bias

Stapel, who faked results of more than 55 papers, is the reductio ad absurdum of the current “publish or perish” culture[1]. Still, the gold standard to merit publication, certainly in a high impact journal, is p < .05, which results in publication bias (Sterling, 1959) and file-drawers full of nonsignificant results (Rosenthal, 1979; see Lane et al., 2016, for a brave opening; and #BringOutYerNulls). This leads to a biased view of nature, distorting any conclusion we draw from the published literature. In combination with low-powered studies (Cohen, 1962; Button et al., 2013; Fraley & Vazire; 2014), effect size estimates are seriously inflated and can easily point in the wrong direction (Yarkoni, 2009; Gelman & Carlin, 2014). A curious consequence is what Lehrer has titled “the truth wears off” (Lehrer, 2010). Initially high estimates of effect size attenuate over time, until nothing is left of them. Just recently, Kaplan and Lirvin reported that the proportion of positive effects in large clinical trials shrank from 57% before 2000 to 8% after 2000 (Kaplan & Lirvin, 2015). Even a powerful tool like meta-analysis cannot clear the view of a landscape filled with inflated and biased results (van Elk et al., 2015). For example, while meta-analyses concluded that there is a strong effect of ego-depletion of Cohen’s d=.63, recent replications failed to find an effect (Lurquin et al., 2016; Sripada et al., in press)[2].

Garden of forking paths

In 2011, Daryl Bem reported nine experiments on people being able to “feel to future” in the Journal of Social and Personality Psychology, the flagship journal of its field (Bem, 2011). Eight of them yielded statistical significance, p < .05. We could dismissively say that extraordinary claims require extraordinary evidence, and try to sail away as quickly as possible from this research area, but Bem would be quick to steal our thunder.

A recent meta-analysis of 90 experiments on precognition yielded overwhelming evidence in favor of an effect (Bem et al., 2015). Alan Turing, discussing research on psi related phenomena, famously stated that

“These disturbing phenomena seem to deny all our usual scientific ideas. How we should like to discredit them! Unfortunately, the statistical evidence, at least of telepathy, is overwhelming.” (Turing, 1950, p. 453; cf. Wagenmakers et al., 2015)

How is this possible? It’s simple: Not all evidence is created equal. Research on psi provides us with a mirror of “questionable research practices” (John, Loewenstein, & Prelec, 2012) and researchers’ degrees of freedom (Simmons, Nelson, & Simonsohn, 2011), obscuring the evidential value of individual experiments as well as whole research areas[3]. However, it would be foolish to dismiss this as being a unique property of obscure research areas like psi. The problem is much more subtle.

The main issue is that there is a one-to-many mapping from scientific to statistical hypotheses[4]. When doing research, there are many parameters one must set; for example, should observations be excluded? Which control variables should be measured? How to code participants’ responses? What dependent variables should be analyzed? By varying only a small number of these, Simmons et al. (2011) found that the nominal false positive rate of 5% skyrocketed to over 60%. They conclude that the “increased flexibility allows researchers to present anything as significant.” These issues are elevated by providing insufficient methodological detail in research articles, by a low percentage of researchers sharing their data (Wicherts et al., 2006; Wicherts, Bakker, & Molenaar, 2011), and in fields that require complicated preprocessing steps like neuroimaging (Carp, 2012; Cohen, 2016; Luck and Gaspelin, in press).

An important amendment is that researchers need not be aware of this flexibility; a p value might be misleading even when there is no “p-hacking”, and the hypothesis was posited ahead of time (i.e. was not changed after the fact—HARKing; Kerr, 1992). When decisions are contingent on the data are made in an environment in which different data would lead to different decisions, even when these decisions “just make sense,” there is a hidden multiple comparison problem lurking (Gelman & Loken, 2014). Usually, when conducting N statistical tests, we control for the number of tests in order to keep the false positive rate at, say, 5%. However, in the aforementioned setting, it is not clear what N should be exactly. Thus, results of statistical tests lose their meaning and carry little evidential value in such exploratory settings; they only do so in confirmatory settings (de Groot, 1954/2014; Wagenmakers et al., 2012). This distinction is at the heart of the problem, and gets obscured because many results in the literature are reported as confirmatory, when in fact they may very well be exploratory—most frequently, because of the way scientific reporting is currently done, there is no way for us to tell the difference.

To get a feeling for the many choices possible in statistical analysis, consider a recent paper in which data analysis was crowdsourced from 29 teams (Silberzahn et al., submitted). The question posited to them was whether dark-skinned soccer players are red-carded more frequently. The estimated effect size across teams ranged from .83 to 2.93 (odds ratios). Nineteen different analysis strategies were used in total, with 21 unique combinations of covariates; 69% found a significant relationship, while 31% did not.

A reanalysis of Berkowitz et al. (2016) by Michael Frank (2016; blog here) is another, more subtle example. Berkowitz and colleagues report a randomized controlled trial, claiming that solving short numerical problems increase children’s math achievement across the school year. The intervention was well designed and well conducted, but still, Frank found that, as he put it, “the results differ by analytic strategy, suggesting the importance of preregistration.”

Frequently, the issue is with measurement. Malte Elson—whose twitter is highly germane to our topic—has created a daunting website that lists how researchers use the Competitive Reaction Time Task (CRTT), one of the most commonly used tools to measure aggressive behavior. It states that there are 120 publications using the CRTT, which in total analyze the data in 147 different ways!

This increased awareness of researchers’ degrees of freedom and the garden of forking paths is mostly a product of this century, although some authors have expressed this much earlier (e.g., de Groot, 1954/2014; Meehl, 1985; see also Gelman’s comments here). The next point considers an issue much older (e.g., Berkson, 1938), but which nonetheless bears repeating.

Statistical inference

In psychology and much of the social and behavioral sciences in general, researchers overly rely on null hypothesis significance testing and p values to draw inferences from data. However, the statistical community has long known that p values overestimate the evidence against H0 (Berger & Delampady, 1987; Wagenmakers, 2007; Nuzzo, 2014). Just recently, the American Statistical Association released a statement drawing attention to this fact (Wasserstein & Lazar, 2016); that is, in addition to it being easy to obtain p < .05 (Simmons, Nelson, & Simonsohn, 2011), it is also quite a weak standard of evidence overall.

The last point is quite pertinent because the statement that 39% of replications in the reproducibility project were “successful” is misleading. A recent Bayesian reanalysis concluded that the original studies themselves found weak evidence in support of an effect (Etz & Vandekerckhove, 2016), reinforcing all points I have made so far.

Notwithstanding the above, p < .05 is still the gold standard in psychology, and is so for intricate historical reasons (cf., Gigerenzer, 1993). At JEPS, we certainly do not want to echo calls nor actions to ban p values (Trafimow & Marks, 2015), but we urge students and their instructors to bring more nuance to their use (cf., Gigerenzer, 2004).

Procedures based on classical statistics provide different answers from what most researchers and students expect (Oakes, 1986; Haller & Krauss; 2002; Hoekstra et al., 2014). To be sure, p values have their place in model checking (e.g., Gelman, 2006—are the data consistent with the null hypothesis?), but they are poorly equipped to measure the relative evidence for H1 or H0 brought about by the data; for this, researchers need to use Bayesian inference (Wagenmakers et al., in press). Because university curricula often lag behind current developments, students reading this are encouraged to advance their methodological toolbox by browsing through Etz et al. (submitted) and playing with JASP[5].

Teaching the exciting history of statistics (cf. Gigerenzer et al., 1989; McGrayne, 2012), or at least contextualizing the developments of currently dominating statistical ideas, is a first step away from their cookbook oriented application.

Registered reports to the rescue

While we can only point to the latter, statistical issue, we can actually eradicate the issue of publication bias and the garden of forking paths by introducing a new publishing format called Registered Reports. This format was initially introduced to the journal Cortex by Chris Chambers (Chambers, 2013), and it is now offered by more than two dozen journals in the fields of psychology, neuroscience, psychiatry, and medicine (link). Recently, we have also introduced this publishing format at JEPS (see King et al., 2016).

Specifically, researchers submit a document including the introduction, theoretical motivation, experimental design, data preprocessing steps (e.g., outlier removal criteria), and the planned statistical analyses prior to data collection. Peer review only focuses on the merit of the proposed study and the adequacy of the statistical analyses[5]. If there is sufficient merit to the planned study, the authors are guaranteed in-principle acceptance (Nosek & Lakens, 2014). Upon receiving this acceptance, researchers subsequently carry out the experiment, and submit the final manuscript. Deviations from the first submissions must be discussed, and additional statistical analyses are labeled exploratory.

In sum, by publishing regardless of the outcome of the statistical analysis, registered reports eliminate publication bias; by specifying the hypotheses and analysis plan beforehand, they make apparent the distinction between exploratory and confirmatory studies (de Groot 1954/2014), avoid the garden of forking paths (Gelman & Loken, 2014), and guard against post-hoc theorizing (Kerr, 1998).

Even though registered reports are commonly associated with high power (80-95%), this is unfeasible for student research. However, note that a single study cannot be decisive in any case. Reporting sound, hypothesis-driven, not-cherry-picked research can be important fuel for future meta-analysis (for an example, see Scheibehenne, Jamil, & Wagenmakers, in press).

To avoid possible confusion, note that preregistration is different from Registered Reports: The former is the act of specifying the methodology before data collection, while the latter is a publishing format. You can preregister your study on several platforms such as the Open Science Framework or AsPredicted. Registered reports include preregistration but go further and have the additional benefits such as peer review prior to data collection and in-principle acceptance.


In sum, there are several issues impeding progress in psychological science, most pressingly the failure to distinguish between exploratory and confirmatory research, and publication bias. A new publishing format, Registered Reports, provides a powerful means to address them both, and, to borrow a phrase from Daniel Lakens, enable us to “sail away from the seas of chaos into a corridor of stability” (Lakens & Evers, 2014).

Suggested Readings

  • Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.
  • Wagenmakers, E. J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632-638.
  • Gelman, A., & Loken, E. (2014). The Statistical Crisis in Science. American Scientist, 102(6), 460-465.
  • King, M., Dablander, F., Jakob, L., Agan, M., Huber, F., Haslbeck, J., & Brecht, K. (2016). Registered Reports for Student Research. Journal of European Psychology Students, 7(1), 20-23
  • Twitter (or you might miss out)


[1] Incidentally, Diederik Stapel published a book about his fraud. See here for more.

[2] Baumeister (2016) is a perfect example of how not to respond to such a result. Michael Inzlicht shows how to respond adequately here.

[3] For a discussion of these issues with respect to the precognition meta-analysis, see Lakens (2015) and Gelman (2014).

[4] Another related, crucial point is the lack of theory in psychology. However, as this depends on whether you read the Journal of Mathematical Psychology or, say, Psychological Science, it is not addressed further. For more on this point, see for example Meehl (1978), Gigerenzer (1998), and a class by Paul Meehl which has been kindly converted to mp3 by Uri Simonsohn.

[5] However, it would be premature to put too much blame on p. More pressingly, the misunderstandings and misuse of this little fellow point towards a catastrophic failure in undergraduate teaching of statistics and methods classes (for the latter, see Richard Morey’s recent blog post). Statistics classes in psychology are often boringly cookbook oriented, and so students just learn the cookbook. If you are an instructor, I urge you to have a look at “Statistical Rethinking” by Richard McElreath. In general, however, statistics is hard, and there are many issues transcending the frequentist versus Bayesian debate (for examples, see Judd, Westfall, and Kenny, 2012; Westfall & Yarkoni, 2016).

[6] Note that JEPS already publishes research regardless of whether p < .05. However, this does not discourage us from drawing attention to this benefit of Registered Reports, especially because most other journals have a different policy.

This post was edited by Altan Orhon.


Meet the Authors

Do you wish to publish your work but don’t know how to get started? We asked some of our student authors, Janne Hellerup Nielsen, Dimitar Karadzhov, and Noelle Sammon, to share their experience of getting published.

Janne Hellerup Nielsen is a psychology graduate from Copenhagen University. Currently, she works in the field of selection and recruitment within the Danish Defence. She is the first author of the research article “Posttraumatic Stress Disorder among Danish Soldiers 2.5 Years after Military Deployment in Afghanistan: The Role of Personality Traits as Predisposing Risk Factors”. Prior to this publication, she had no experience with publishing or peer review but she decided to submit her research to JEPS because “it is a peer reviewed journal and the staff at JEPS are very helpful, which was a great help during the editing and publishing process.”

Dimitar Karadzhov moved to Glasgow, United Kingdom to study psychology (bachelor of science) at the University of Glasgow. He completed his undergraduate degree in 2014 and he is currently completing a part-time master of science in global mental health at the University of Glasgow. He is the author of “Assessing Resilience in War-Affected Children and Adolescents: A Critical Review”. Prior to this publication, he had no experience with publishing or peer review. Now having gone through the publication process, he recommends fellow students to submit their work because “it is a great research and networking experience.”

Noelle Sammon has an honors degree in business studies. She returned to study in university in 2010 and completed a higher diploma in psychology in the National University of Ireland, Galway. She is currently completing a master’s degree in applied psychology at the University of Ulster, Northern Ireland. She plans to pursue a career in clinical psychology. She is the first author of the research article “The Impact of Attention on Eyewitness Identification and Change Blindness”. Noelle had some experience with the publication process while previously working as a research assistant. She describes her experience with JEPS as follows: “[It was] very professional and a nice introduction to publishing research. I found the editors that I was in contact with to be really helpful in offering guidance and support. Overall, the publication process took approximately 10 months from start to finish but having had the opportunity to experience this process, I would encourage other students to publish their research.”

How did the research you published come about?

Janne: “During my psychology studies, I had an internship at a research center in the Danish Defence. Here I was a part of a big prospective study regarding deployed soldiers and their psychological well-being after homecoming. I was so lucky to get to use the data from the research project to conduct my own studies regarding personality traits and the development of PTSD. I’ve always been interested in differential psychology—for example, why people manage the same traumatic experiences differently. Therefore, it was a great opportunity to do research within the field of personality traits and the development of PTSD, and even to do so with some greatly experienced supervisors, Annie and Søren.”

Dimitar: “In my final year of the bachelor of science degree in psychology, I undertook a critical review module. My assigned supervisor was liberal enough and gave me complete freedom to choose the topic I would like to write about. I then browsed a few The Psychologist editions I had for inspiration and was particularly interested in the area of resilience from a social justice perspective. Resilience is a controversial and fluid concept, and it is key to recovery from traumatic events such as natural disasters, personal trauma, war, terrorism, etc. It originates from biomedical sciences and it was fascinating to explore how such a concept had been adopted and researched by the social and humanitarian sciences. I was intrigued to research the similarities between biological resilience of human and non-human animals and psychological resilience in the face of extremely traumatic experiences such as war. To add an extra layer of complexity, I was fascinated by how the most vulnerable of all, children and adolescents, conceptualize, build, maintain, and experience resilience. From a researcher’s perspective, one of the biggest challenges is to devise and apply methods of inquiry in order to investigate the concept of resilience in the most valid, reliable, and culturally appropriate manner. The quantitative–qualitative dyad was a useful organizing framework for my work and it was interesting to see how it would fit within the resilience discourse.”

Noelle: “The research piece was my thesis project for the higher diploma (HDIP). I have always had an interest in forensic psychology. Moreover, while attending the National University of Ireland, Galway as part of my HDIP, I studied forensic psychology. This got me really interested in eyewitness testimony and the overwhelming amount of research highlighting the problematic reliability with it.”

What did you enjoy most in your research and what did you find difficult?

Janne: “There is a lot of editing and so forth when you publish your research, but then again it really makes sense because you have to be able to communicate the results of your research out to the public. To me, that is one of the main purposes of research: to be able to share the knowledge that comes out of it.”

Dimitar: “[I enjoyed] my familiarization with conflicting models of resilience (including biological models), with the origins and evolution of the concept, and with the qualitative framework for investigation of coping mechanisms in vulnerable, deprived populations. In the research process, the most difficult part was creating a coherent piece of work that was very informative and also interesting and readable, and relevant to current affairs and sociopolitical processes in low- and middle-income countries. In the publication process, the most difficult bit was ensuring my work adhered to the publication standards of the journal and addressing the feedback provided at each stage of the review process within the time scale requested.”

Noelle: “I enjoyed developing the methodology to test the research hypothesis and then getting the opportunity to test it. [What I found difficult was] ensuring the methodology would manipulate the variables required.”

How did you overcome these difficulties?

Janne: “[By] staying focused on the goal of publishing my research.”

Dimitar: “With persistence, motivation, belief, and a love for science! And, of course, with the fantastic support from the JEPS publication staff.”

Noelle: “I conducted a pilot using a sample of students asking them to identify any problems with materials or methodology that may need to be altered.”

What did you find helpful when you were doing your research and writing your paper?

Janne: “It was very important for me to get competent feedback from experienced supervisors.”

Dimitar: “Particularly helpful was reading systematic reviews, meta-analyses, conceptual papers, and methodological critique.”

Noelle: “I found my supervisor to be very helpful when conducting my research. In relation to the write-up of the paper, I found that having peers and non-psychology friends read and review my paper helped ensure that it was understandable, especially for lay people.”

Finally, here are some words of wisdom from our authors.

Janne: “Don’t think you can’t do it. It requires some hard work, but the effort is worth it when you see your research published in a journal.”

Dimitar: “Choose a topic you are truly passionate about and be prepared to explore the problem from multiple perspectives, and don’t forget about the ethical dimension of every scientific inquiry. Do not be afraid to share your work with others, look for feedback, and be ready to receive feedback constructively.”

Noelle: “When conducting research it is important to pick an area of research that you are interested in and really refine the research question being asked. Also, if you are able to get a colleague or peer to review it for you, do so.”

We hope our authors have inspired you to go ahead and make that first step towards publishing your research. We welcome your submissions anytime! Our publication guidelines can be viewed here. We also prepared a manual for authors that we hope will make your life easier. If you do have questions, feel free to get in touch at

This post was edited by Altan Orhon.


The Mind-the-Mind Campaign: Battling the Stigma of Mental Disorders

People suffering from mental disorders face great difficulties in their daily lives and deserve all possible support from their social environment. However, their social milieus are often host to stigmatizing behaviors that actually serve to increase the severity of their mental disorders: People diagnosed with a mental disorder are often believed to be dangerous and excluded from social activities. Individuals who receive treatment are seen as being “taken care of” and social support is extenuated. Concerned friends, with all their best intentions, might show apprehensiveness when it comes to approaching someone with a diagnosis, and end up doing nothing (Corrigan & Watson, 2002). These examples are not of exceptional, sporadic situations—according to the World Health Organisation, nine out of ten people with a diagnosis report suffering from stigmatisation (WHO, 2016). Continue reading