Why are lower and upper fences important?
I know the formulas are #Q_1-1.5(IQR)# and #Q_3+1.5(IQR)# , but other than extreme values that need to be chopped off from the overall data set, why do they matter? Do they affect the data min/max pair?
I know the formulas are
1 Answer
Well, I haven't done statistics in a long time, but the lower/upper fences are basically cutoffs for what data points to include (regardless of how far the data points themselves extend).
The minimum and maximum are affected, technically, but in principle, outliers aren't supposed to be physically reasonable data points (for example, if there was an error in the data-taking technique). So, the true maximum and minimum should be retained.
See the below Gaussian distribution, with an interquartile range
The population standard deviation,
#sigma = sqrt((sum_(i=1)^(N) (x_i - barx_"pop")^2)/(N))# where
#barx_"pop"# is the population mean, and#N# is the number of data points#x_i# .
If you have too many extreme outliers, your data may look less reliable, since the standard deviation will be larger than you'd expect.
In order to get a standard deviation with a reasonable precision (not skewed by outliers), these upper and lower fences narrow your data spread to include only the most reliable data points.
That makes it look like you are more sure of your data.