Of Elephants and Effect Sizes – Interview with Geoff Cumming

We all know these crucial moments while analysing our hard-earned data – the moment of truth – is there a star above the small p? Maybe even two? Can you write a nice and simple paper or do you have to bend your back to explain why people do not, surprisingly, behave the way you thought they would? It all depends on those little stars, below or above .05, significant or not, black or white.

This famous “Null-Hypothesis Significance Testing” (NHST)-approach is hugely appealing – after all, it gives us a cut-off, and we do not have to think much further, it simply decides (for us) if or if not the effect in question exists. It should not surprise us that life is rarely as simple as this. Indeed, contrary to what students are often taught, there exists quite some controversy about this “cook-book style” of statistics (see e.g. Dienes, 2008; Gigerenzer, 2004). In his book, “Understanding the new statistics”, Geoff Cumming explains why our usual approach to statistical thinking is hugely flawed (see also Ioannidis, 2005, and this post in the JEPS Bulletin) which in turn means that psychology as a science is in deep trouble.

Focusing on arbitrary p-values is problematic as it motivates us to torture the data until the stars appear: selective reporting, dropping certain items, or testing a few more participants (also called p-hacking; Simonsohn, Nelson, & Simmons, 2014). But even without that, p-values must not to be trusted. P-values “dance” – meaning that if you draw a new sample from your population and test your hypothesis again and again, given H1 is true, the value of p differs greatly: P is a highly unreliable measure and not to be trusted (Cumming, 2014).

Thus, Geoff Cumming proposes a “New Statistics” (also Cumming, 2013, 2014), a new way of thinking about statistics and psychological research, where reporting data focuses on effect sizes and confidence intervals (but see Morey, Rouder, Verhagen, & Wagenmakers, 2014). But the ideas of the New Statistics is not only about p-values: Cumming argues that we also need to change the way how we do research in the first place. For example, all details of a study (procedure, selection of participants, sample sizes, measures, and statistical analyses) should be specified before we see any results, ideally pre-registered, but we also need to report the finished study in full detail. Crucially, when reporting our results, we should think about future meta-analysis – Cumming argues that we need meta-analyses to build a cummulative science of psychology that includes all relevant results, and not just the significant ones, thus avoiding the file drawer effect.

Just recently, the journal Basic and Applied Social Psychology has announced to “ban p-values” from all their publications (Trafimow & Marks, 2015) – enough reason to ask Geoff Cummings a few questions about his work and psychology. Geoff Cummings is a emeritus professor of La Trobe University, Melbourne, and his main research focusses at statistical cognition: how do people understand (if they understand) statistical concepts.

I gather that you come from visual / spatial attention – how did you start to become interested in statistical cognition?

As a kid growing up in Melbourne I was fascinated by physics, especially nuclear physics. I was of the generation who believed that splitting (or fusing) the atom would give us abundant clean energy, and solve the world’s problems. How things change!

But, how could they make the teaching of physics so incredibly tedious?! Even first and second year physics at university was so rote, so mechanical–not thinking and imagination. So in my third year I took Psychology 1 as an extra subject, and was hooked. They expected us to think and write and propose solutions to problems! Science as process, not mere established fact and theory. I dropped physics, finished my degree in maths and stats, then went to Oxford for a DPhil in experimental psychology.

My thesis, with Anne Treisman, was on visual attention, but I also taught lots of stats. From the start I found NHST frustrating, and took ages trying to convince myself how it works. Then I tried numerous ways to explain it to others. Do things really need to be this complicated? I taught it for 40+ years and am still asking that same question.

As an experimental psychologist I also wondered why the tortuous logic underlying NHST had been so little studied. Statistics naturally struck me as being about communication–formulate a message, ideally with pictures, to convey a story to your readers. So it’s as much a question of perception and cognition as statistical models. Statistical cognition should be one of psychology’s gifts to all of science, by providing evidence to guide better statistical practices.

When and how did you realize that something was wrong with the way we conduct science?

I regarded it as a problem with the way we drew conclusions from data, rather than any broader issue. Only in the last few years have I seen how it’s so central. John Ioannidis did us a wonderful service by writing his 2005 article ‘Why most published research findings are false’, and explaining that over-reliance on .05 was the key underlying problem. It was so good the message came from medicine, not psychology–we might actually take it seriously.

In your paper as well as on your homepage you call for a change in attitude towards NHST. I was wondering how we can join the crusade. Where do you see the role of students?

Hey students–you are our future, you are our discipline, you are the big hope. Bottom line is that any scientist should strive for best research practice, whether in choosing measures, paradigms, techniques–or statistical methods. We’re all responsible for weighing the possibilities in all those aspects of our research, making our choices, and being prepared to explain and defend what we think is best.

OK, but back on planet Earth… I’m delighted to say it’s getting less difficult day by day. The APA Publication Manual recommends basing interpretation on point and interval estimates, Psychological Science has revised its guidelines, and others are making moves in similar directions. There are more resources available. Even so, it’s a challenge. But be strong; it’s worth persisting. Take a sentence or two, or a page or two, to explain and justify what you think is the best statistical approach in your particular situation, then go for it. If your name is on it, it should be what you believe to be best practice.

What do you consider to be the biggest challenge in psychological science in the next 10 years? 

The elephant (not to mention the blue whale, the dinosaur, and the echidna) in every room is climate change. Our children and grandchildren will ask us only one question about this and the previous decade–why did we do so little, when the writing is so devastatingly clear on the wall? We could say that the climate science is now relatively settled and the only real uncertainty is the psychology of why we are so easily persuaded en masse into torpor by vested interests and short term self-interest. It’s all about attitudes, decision making, and behaviour change–core business for psychology, surely. There’s tons of scope in our discipline for imaginative research that can make a difference–for our species.

Is there something you wish someone had told you at the beinning of your career? 

Find your passion, find good mentors. Everyone knows that, but most of us still insist on learning it for ourselves.

P.S. I’m also very interested in Bayesian techniques. They may become much more widely used as the tools and materials improve. I’m advocating estimation because at present it seems to me more achievable, with better materials available now, even if we still need more and better. Show me how Bayesian techniques can be made readily accessible to beginning students and I’ll be extremely interested. We also need serious statistical cognition investigation of Bayesian techniques and how they can best be taught.

References

Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge.

Cumming, G. (2014). The New Statistics : Why and How. Psychological Science, 25(1), 7–29. doi:10.1177/0956797613504966

Cumming, G. (2013). Replication and p intervals: p value predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science. doi:10.1111/j.1745-6924.2008.00079.x

Dienes, Z. (2008). Understanding Psychology as a Science: An Introduction to Scientific and Statistical Inference. Basingstoke, UK: Palgrave Macmillan.

Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics33(5), 587-606. doi:10.1016/j.socec.2004.09.033

Ioannidis, J. P. a. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124. doi:10.1371/journal.pmed.0020124

Morey, R. D., Rouder, J. N., Verhagen, J., & Wagenmakers, E. J. (2014). Why Hypothesis Tests Are Essential for Psychological Science A Comment on Cumming (2014). Psychological science, 25(6), 1289-1290.

Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534–547. doi:10.1037/a0033242

Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(March), 1–2. doi:10.1080/01973533.2015.101299

 

 

About the author

Katharina Brecht Aside from her role as Editor-in-Chief of the Journal of European Psychology Students, Katharina is currently pursuing her PhD at the University of Cambridge. Her research interests revolve around the evolution and development of social cognition.

Facebooktwitterrss
  • Fabian Dablander

    This is an excellent interview; I wholeheartedly agree with Prof. Cumming that we need to make psychological research more transparent (e.g. through https://osf.io/), as well as abolish null hypothesis significance testing, a method of inference that – on top of being incoherent, irrational, and addressing the wrong question – is widely misunderstood by students AND their teachers (http://goo.gl/1haq4).

    However, focusing on estimation – especially when the procedures are still based on classical statistics – and abolishing testing just won’t cut it (for a little polemic against the “new statistics”, see http://goo.gl/RJnEQt). We need hypothesis testing / model comparison based e.g. on Bayes factors, because estimation is an ill-suited means to tackle these problems (see http://goo.gl/1HSwKG). Estimation answers a different question (http://goo.gl/JdOwx6).

  • Fabian Dablander

    since not everything fit into one comment, here is the rest:
    As alluded to in the interview by Prof. Cumming, we need tools and software to make Bayesian inference available to students and researchers. Fortunately, excellent tools are already available, and under heavy development!

    For a nice, slick SPSS-alternative that also does Bayesian tests, just hoover over to https://jasp-stats.org/.

    If you a an avid R user (cheers!), check out Richard Morey’s BayesFactor package; it has great documentation: http://bayesfactorpcl.r-forge.r-project.org/

    kind regards,
    Fabian

  • http://psychnstatstutor.com Psych Stats Tutor

    Effect sizes are so much more real world; fuzzy, messy and dependent on a bundle of things

  • http://www.centropsicologicobosques.com Brian

    Your article its really interesting! I didn´t even imagine that so much thing can affect us.