How quartiles are calculated in Sniffie
Niko Naakka avatar
Written by Niko Naakka
Updated over a week ago

A quartile is one of four equal groups into which a data set can be divided according to the distribution of values of a particular variable. As Wikipedia explains, there are three different accepted methods with which to calculate quartiles. If there are an even number of data points in the data set, all three different methods give the same value for quartiles. If there are an odd number of data points, these methods can provide a different value depending on data sets. In certain scenarios, these differences across the methods may be significant in your decision making. This can make it confusing if you are trying to verify the statistics behind the analytics in Sniffie by yourself.

Here at Sniffie, we pride ourselves in the quality and statistical correctness of all our data and analytics. Thus, I felt it was necessary to write this article and provide information on how the quartiles (and quantiles) are calculated and the reason for us selecting one method out of the three outlined in the above Wikipedia article.

Furthermore, let's all hope that one day, there is a standard method for calculating these values. Already in 1996 some statisticians have argued such a standard should be chosen by the statistics community for discrete distributions, but alas, it has not yet happened. Maybe it will happen one day.

Methods outlined for quartile calculations in Wikipedia (cited directly from this article)

Text below describing the quartile calculation methods is directly cited from this article on Wikipedia on 31st July, 2019.

Method 1:

  1. Use the median to divide the ordered data set into two halves. If there is an odd number of data points in the original ordered data set, do not include the median (the central value in the ordered list) in either half. If there is an even number of data points in the original ordered data set, split this data set exactly in half.

  2. The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.

Method 2:

  1. Use the median to divide the ordered data set into two halves. If there are an odd number of data points in the original ordered data set, include the median (the central value in the ordered list) in both halves. If there are an even number of data points in the original ordered data set, split this data set exactly in half.

  2. The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.

Method 3:

  1. If there are even numbers of data points, then Method 3 is the same as either method above

  2. If there are (4n+1) data points, then the lower quartile is 25% of the nth data value plus 75% of the (n+1)th data value; the upper quartile is 75% of the (3n+1)th data point plus 25% of the (3n+2)th data point.

  3. If there are (4n+3) data points, then the lower quartile is 75% of the (n+1)th data value plus 25% of the (n+2)th data value; the upper quartile is 25% of the (3n+2)th data point plus 75% of the (3n+3)th data point.

Mathematical explanation how quartiles are calculated in Sniffie's analytics

1st quartile (Q1), by mathematical definition, is the 25th percentile and falls exactly in the 1/4 point in the data set. In an even number data set, that 1/4 point falls exactly between two points in an even data set and Q1 has, by definition, only a one single value.
โ€‹
Let's consider the following set of numbers (Dataset A): 1, 1, 8, 12, 13, 13, 14, 16, 19, 22, 27, 28, 31.

There are 13 items in Dataset A. Now the 1/4 point of the data set does not fall between two points. Rather, it falls nearer one point than any other, but it also does not fall exactly on any of the points either. This is the cause for the differences in the values the calculation methods provide.

In Dataset A, the different methods for calculating quartiles provided in the Wikipedia article above are as follows:

  • Method 1 gives a value of 10 (average of the 3rd and the 4th value in Dataset A)

  • Method 2 gives a value of 12 (4th value in Dataset A)

  • Method 3 gives a value of 11 (0.75 of the way between 3rd and 4th value in Dataset A)

As you can see, the difference in minor but there is still a difference. Method 2 is used in e.g. Microsoft Excel, Google Sheets and Python's NumPy library in quartile calculation. Method 1 is used by may online tools when calculating quartiles. Method 3 is used in e.g. Wolfram Alpha computational engine.

In total, there are actually 9 different ways of calculating quantiles (quartiles are 4-quantiles), which means that there may be actually 9 different outcomes for the Q1 value. For Dataset A, these are: 12, 12, 8, 9, 11, 10, 12, 10.667, 10.75

Here at Sniffie, we use Method 2 in calculating quartiles (and quantiles). We chose these because of the following reasons:

  1. Dataset A is a population and number 12 is part of that population

  2. 3 out of 9 methods in calculating quantiles provide 12 as the value for Q1

  3. Many popular tools used by our customers (e.g. Microsoft Excel, Google Sheets, Python's NumPy) treats this as the default method (i.e. method that is called without specifying any extra parameters) for calculating quantiles .

  4. If you use our automatic pricing rules that change positions to specific quantiles in the market, we don't want the value of you products to be set "too low" if you decide to double check the automatic pricing actions from the pricing logs.

Did this answer your question?