Why statistics is an art

Data is part of our lives; therefore, we need higher levels of data literacy.
David Spiegelhalter

Chair of the Winton Centre

01 Jun 2025
David Spiegelhalter
Key Points
  • Statistics is an art; the choices about what data we collect, what we do to it and how we present those conclusions require huge amounts of judgement.
  • The aim of statistics is to inform people, enabling them to make better choices.
  • Data is part of our lives; therefore, we need higher levels of data literacy.

 

Why don’t we appreciate statistics?

Photo by AnemStyle

I’m a statistician, so obviously I think statistics are really important, but to admit that you’re a statistician at a party can be very bad because people think, oh, that’s so boring; statisticians just count things. But in this world of data that’s overwhelming us, and we’ve seen so much of this during the coronavirus crisis, statistics has become incredibly important.

Unfortunately, statistics is not very well taught. When I was in front of an audience, I’d ask how many people have done statistics courses and lots of people would put their hands up. Then I’d ask, how many people actually enjoyed them? Most of the hands would go down again. I’m very sad about this, but I can understand why. I started off as a mathematician. I started with mathematical statistics. My thesis had essentially no data in it at all. And unfortunately, that is how a lot of people have learnt statistics, as a series of mathematical operations, tricks, tests – a bag of tools to apply to data in some circumstances. I think that puts them off.

But there’s a new generation, a new insight into statistics teaching which has taken root in schools, and it’s starting to penetrate much more widely. That’s the way I wrote my book. It’s based around the idea of problem solving. You don’t start with the data; you start with a problem that you want to solve.

Solving mass murder through statistics

In my book, I use the example of Harold Shipman, the mass murderer. He was a doctor in the UK who killed hundreds of his patients. I was on the public inquiry, investigating whether he could have been caught earlier. So there’s the problem to solve: to understand more about his murders.

Then you make a plan, and that can be to collect data. Now, maybe there’s no data that would help. In this case, there was data; from both the death certificates of all the patients that had died over 20 years and his colleagues. Some poor soul had to go through all of those and record the details in the times of death and so on. That’s a big exercise in itself. Then there’s sorting out the data; there’s cleaning it and looking at what to do about missing data.

Eventually you get to the analysis, which has two parts. You’ve got an exploratory analysis when you make graphs and just look at the data, all those lovely visualisations that people play with. Then there can be a confirmatory analysis. And that’s what people generally have to suffer in statistics courses: the regressions, the p-values, the hypothesis tests and all that stuff. But it’s only a small segment of this entire data cycle. The next step in the cycle is the conclusions and communication, and this is so underplayed, but it is absolutely vital. What claims can I make on the basis of this data? How can I communicate that through infographics or through telling a story? What sort of narrative are my conclusions going to be embedded in?

Numbers don’t speak for themselves

Photo by TaLaNoVa

Here’s a wonderful insight from my book. It’s not from me; it’s from Nate Silver, who wrote The Signal and the Noise, and he’s got this fantastic quote that says: "The numbers do not speak for themselves. We imbue them with meaning." This idea that just because we’ve got masses of data they’ll offer up their secrets by some magic alchemy is complete nonsense. We have to apply great skill in order to draw the appropriate conclusions from those data and communicate them in a clear way.

We go right around this cycle. It’s sometimes called PPDAC: problem, plan, data, analysis, conclusions. In this case with Shipman, when we just plotted the data, it turned out that most of his deaths were between two and three in the afternoon. There was a great big spike in the graph. It didn’t need any fancy analysis, and this was because he used to do home visits to his patients, inject a high dose of diamorphine, and they would die a quiet and peaceful death in front of him. Why? We just don’t know. He never spoke. So that’s the whole cycle. Then it starts again, because in this case, the relatives ask, OK, could he have been caught earlier? Off we went again around that cycle. This is very important to understand how statistics actually works.

Why statistics is an art

Statistics is about the collection of data, analysis and drawing conclusions. By data, we generally mean measurements of some form or another, like how big things are or what films people prefer on Netflix. That’s all data. It’s all information, but it’s information that is put in a particular format, generally in some sort of structured data table. Then we can do things with it: the visualisations, the analyses. But, as I said, it doesn’t speak for itself. We are always making choices about what data we collect, what we do to it and how we present those conclusions.

That’s why I call my book The Art of Statistics. It is statistical science, but the application of it requires huge amounts of judgement. When we’re dealing with claims made on the basis of data, we should be very aware of those judgements that have been made. Why did people collect this data? Is this the appropriate data – is it actually measuring what we think it’s measuring? Are their conclusions justified? Crucially, why are they telling us this? What’s in their mind? Are they trying to manipulate us? Are they trying to change our emotions, what we feel about something, or are they just trying to inform us? There is an art to all of this, both in doing the data analysis and the presentation and the receiving of it as an audience.

Using statistics for public good

Statisticians do have the reputation of being rather dull people, just collecting numbers and drawing up tables. But I don’t think this has ever been the case. Actually, statisticians are people of great integrity and great passion about their subject. Florence Nightingale, who’s a staggeringly important statistician, was known as the “passionate statistician” because she was so concerned with using the statistics she collected to improve the health of the general public. She was absolutely dedicated to using those statistics for a purpose.

Statisticians are like that. They can be a bit pedantic; they can be a bit nerdy – but that’s just fine because they tend not to have very strong agendas one way or another. They tend not to be on one side or another of an argument. What they want is for the argument to be better informed. What really upsets them is when data is being abused, when claims are being made that are not based on good evidence. Then it’s just a bad argument that’s going on.

Certainly, from my perspective and I think most of my colleagues, what we want is to see better information, and for disputes to have a much stronger foundation on basic evidence. This is a very worthy perspective. It’s full of integrity and it is more needed than ever in our current world of fake news and false claims and misinformation.

Why we all need data literacy

Photo by Song_about_summer

Every time you use your phone, you’re providing data. Every time you buy anything, that’s more data. You’re using that data all the time. In terms of the recommendations that are coming back to you, or when you click and get an insurance quote, you are the receiver of data analyses.

Data is incredibly important in our lives. Therefore, we need a much higher level of what I call data literacy. And that’s got two perspectives. First of all, it means that as many people as possible should be able to actually do something with data. In particular, it should be an integral part of education in schools, not just in math. This is not part of maths. I started off as a mathematician. I’m no longer a mathematician; I do statistics. It’s very essential that this be part of the education system: to be able to do the rudiments of data analysis and presentation.

Data literacy as a consumer

But even more important is data literacy as a consumer. We’re all consumers of analyses, every time we get a message coming up on our phone saying, we think you might be interested in this. Every profession is having to use data in more and more subtle ways, from even the most basic of jobs right up to civil servants, lawyers, doctors and even journalists and politicians. All of them could do with a much greater level of data literacy in order not to be manipulated by the claims that are made.

In this coronavirus crisis, we’ve seen this so much: people making diametrically opposite claims, both saying the data reinforces what they say. How do you take those claims apart? They’re all using numbers – the death rate in Sweden or something like that. How can we deconstruct, for want of a better phrase, those claims? These are techniques that can be taught to a certain extent. As I said, it is an art as well. You need practice. But they are essential skills in the modern age.

To inform, not persuade

At the Winton Centre for Risk and Evidence Communication – where I’m the chair and I’m really proud of working with these wonderful people, psychologists and communication experts – our little tagline is: to inform and not persuade. We put that in there because so much communication is persuasion. Advertising is obviously persuasion. But most political statements, especially if they’ve got numbers in them, are trying to persuade us of something, whether the numbers are big or the numbers are small. Even the news and huge amounts of social media are trying to persuade us to be either frightened or reassured.

And we say, no, hang on. What we should be trying to do with good statistics is inform people, not to push them in one direction or another, but allow them to make better choices, more informed choices – whether it’s about their medical treatments, who to vote for or how much risk to take in this current atmosphere. People need better information so they can make better choices. While none of us are rational human beings that carefully weigh up everything in a numerical way, that doesn’t mean we should say, oh, we just have to go with our gut. That’s a recipe for disaster.

Discover More About

why we need statistics

Spiegelhalter, D. (2019). The Art of Statistics: How to Learn from Data. Basic Books.

Spiegelhalter, D. (2020). Use of “normal” risk to improve understanding of dangers of covid-19. BMJ, 370, m3259.

0:00 / 0:00