That’s a surprisingly big number! Descriptive statistics

Econ Talk
Feb 20, 2015
That’s a surprisingly big number! Descriptive statistics

You’re on the bus. It’s early in the morning and difficult to remember what number comes after one. (Twelfty, isn’t it?). There seem to be some other people on the bus, people with coats, and hair, and bags, and mobile phones. When the early morning coffee starts to kick in, you start counting the colours of the coats and the hair and the bags. “Wow!” you think, “That’s amazing! There are seventeen people on this bus with blonde coats! That’s millions!” “So what?” says the bus driver when you try to tell him about your exciting discovery. Counting the things you see is relatively easy. Descriptive statistics is about giving some context to your observations, and starting you on the way to a genuine discovery.

Raw meat

There is a trend in sports analytics to count everything that moves. The Six Nations (that’s the rugby tournament that marks the start of Spring) has an official analytics partner. The “analytics” involves lots of counting, and helpful numbers pop up in the television coverage on “statistics” such as lineouts won on own throw. We can assume that they’re selling something more sophisticated to the teams but what’s presented is probably better described as “metrics”. The closest they get to statistics is percentages of play in action areas, but even then we’re not told what percentages one would ordinarily expect. The real break-through in sports analytics has been in measurement and coding and there are piles of raw numbers generated every week. Players wear GPS trackers that measure distance covered and the intensity of impacts. We now know, for example, that scrums are about 6G and tackles can be over 30G (thanks,!). Video analysis has also progressed to classifying the contribution of each player arriving, but that’s data transformation, not analysis.

The first real analytical steps in building on the masses of data becoming available have to include looking at distributions and central tendency. Distribution has to do with positioning people on a scale, say from no tackles in a match to lots of tackles. There will be some players in a match who do lots of tackling, and some who do very little, and most who are somewhere in the middle, the group tending towards the centre, you see. There are three kinds of centre too: the mean is the arithmetic average, the median is the middle number of tackles if they’re arranged from least to most, and the mode is the most common number. The trend in some sports stats has been towards reporting the extremes, but with no clear rationale.


Out-liers (not outliers, which, if it were a word, would refer to things being more outly) are data points that look really big but do not appear to fit in a dataset; they might be genuine freaks, they might be measurement errors, or they might just indicate that the sample is too small. Hampel (1974. Thanks, Hampel!) helpfully came up with the concept of the influence curve of a data point, the degree to which one observation affects the pattern in a dataset: The influence of a single data point is approximately inversely proportional to the sample size, that is, the smaller the sample the greater the risk of influence. In a rugby match featuring a maximum sample of 46, the influence of out-liers is tight-head prop-esque.

Numbers of lineouts won on own throw is just as useless as number of blonde coats without any context and without any expectation. If we were also told – and the data are certainly there – the average number of lineouts won in matches over the last 15 years, and whether today’s match was significantly above or below the average, that would be something to shout about. There’s a difference between saying, “Wow! There’s a really big number!” based on a single number and “Wow! There’s a surprisingly big number!” based on a comparison. The next step is to link differences in what can easily be counted to what actually counts, that is, the result of the match. It is possible, for a start, to correlate any performance metric of a team with the outcome of the match. It is possible to control for the influence of any other performance metric, and soon you’re on the way to a statistical model of how to win a rugby match.

Analysis of sport has taken several steps in a more numerically literate ‎direction but it’s still a long way from the very appearance of terms “kurtosis” or even “predictor”. There’s probably still too much reliance on raw, de-contextualised data and on impressive-looking outliers but there’s also enormous potential to find out what really counts. The first step is descriptive statistics, but that’s still several steps away from master-minding a Six Nations win, so don’t mention anything to the bus driver quite yet.

Featured Jobs

IPPR (Institute for Public Policy Research)

London WC2N 6DF

November 18, 2018

Economic Insight

London, UK

January 13, 2019

Competition & Markets Authority


December 09, 2018

The Office for Students (OfS)

Bristol, UK

November 20, 2018

Cabinet Office


November 25, 2018


Barcelona, Spain

December 07, 2018

The Department for Environment, Food and Rural Affairs (Defra)

November 16, 2018

The Civil Service Fast Stream


November 15, 2018

The Competition and Markets Authority


November 26, 2018

The Institute for Public Policy Research

London WC2N 6DF

November 18, 2018


London, Canary Wharf

November 26, 2018

Frontier Economics

London WC1V 6DA

December 01, 2018

Economic Insight


December 17, 2018

Oxford Economics

Oxford, OX1 1HB

November 19, 2018

University of Oxford


November 21, 2018

Public Health Wales NHS Trust

Cardiff CF10

November 14, 2018

Warwickshire County Council

Warwick, CV34 4TH

November 25, 2018

Competition and Markets Authority (CMA)


November 26, 2018

Our Partners

Logo for Bank Of England
Logo for Cma
Logo for Fca
Logo for Frontier
Logo for Heathrow
Logo for Home Office
Logo for Ofcom
Logo for Oxweb
Logo for Pwc
Logo for Three

Like what you see?

Post a job