When you find the upper and lower quartiles what happens if the median is not in the data set?

For example:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

The median is 6.5

To find the upper and lower quartiles you need to find the mean of the set of data to the right of the median and the left of the median. But what happens when the median is not in the data set? How do you find the upper and lower quartiles?

1 Answer
May 22, 2018

Use #0.25(n+1)# and #0.75(n+1)# to calculate the positions that the quartiles would be in. Compute the appropriate weighted average of the elements on either side of these positions.

#Q_1 = 3.25#.
#Q_3 = 9.75#.

Explanation:

The median is often introduced as "the middle term of an ordered set". Of course, there is a catch: there's only a 1-in-2 chance a set will have a middle term, because sets with an even number of elements have no middle term.

At this point, we use a formula to estimate what the middle term would be, if it existed. In words, the formula is "the average of the two middle terms."

For a set of size #n#, the median is the term that would be in position #0.5(n+1)#. Think of this as "halfway through a set of size #n+1#". For example, if #n=11#, we get #0.5(11+1) = 6#, and if #n=12#, we get #0.5(12+1) = 6.5#. The element that would be at position 6.5 is halfway between the elements in positions 6 and 7. This agrees with what we know about medians. So far, so good.


The lower and upper quartlies (#Q_1# and #Q_3#) may be similarly introduced as "the medians of the subsets of data to the left/right of the median". What we really mean is, they are the elements that are 1/4 and 3/4 of the way through the set. But when the original median is not an element of the set, we need to resort to a formula that gives us what the 1/4 and 3/4 terms would be, if they existed.

To find the quartiles, we modify the formula for the median's position to give us the quartiles' positions. For the lower quartile, its position is #0.25(n+1)#. Likewise, the upper quartile's position is #0.75(n+1)#.

These position numbers could be integers (i.e. when #n+1# is a multiple of 4). For example, if #n=11#, then we get #0.25(11+1) = 3# and #0.75(11+1) = 9#, giving us quartlies that are elements of the set.

If #n=12#, then we get #0.25(12+1)= 3.25# and #0.75(12+1) = 9.75#. In this case, #Q_1# is 25% of the way between elements 3 and 4, while #Q_3# is 75% of the way between elements 9 and 10. How do we calculate these "elements"?

We use #Q_1 = x_3 + 0.25(x_4-x_3)# (i.e. the 3rd element, plus 25% of the distance to the 4th element). Likewise, #Q_3 = x_9 + 0.75(x_10-x_9)#.

For this question, #Q_1 = 3 + 0.25(4-3) = 3 + 0.25(1) = 3.25,# and #Q_3 = 9 + 0.75(10-9) = 9+0.75(1) = 9.75.# The fact that these quartiles match their positions is just luck. The given data are just ordered integers. Usually, the positions and the "elements" will not match.

Summary:

The #p^"th"# percentile is the element in position #p%(n+1)#. Calling this number #w.dd# (for "whole"."decimal"), the percentile's value is #x_w+dd%(x_(w+1)-x_w).#