How can you determine if a difference between the means of two samples is significant?
You form a new statistic which is the difference between the two means which allows you to ask significance questions about it.
In this question we want to know about the difference of two means. This is a function of random variables, i.e.
or, in our case the function is of the sample means:
There are many assumptions that go into the next steps (see this link for details Stat Trek: difference between means ) but for now lets assume that the two distributions we are sampling are approximately normal, and that we only have relatively few sampling points for each (otherwise we would be certain of the values of the means and therefore the difference).
Given this, we can calculate the sample variance of the difference of means from the sample variances of the two samples:
Note that what goes into this calculation is the sample variances divided by the number of points, which is the variance of the calculated means and follows the expected form of the central limit theorem (Variance of sample mean ).
Before we ask questions about the new distribution using t-statistics, we need to know the degrees of freedom which can be approximated from (Welch–Satterthwaite equation ):
This equation allows for a different significance of each point from the two distributions based on their variance and a different number of samples from each. If the distributions have the same variance and we take the same number of samples,
Given all of these, we can use the students-t distribution to ask questions about the probability of the statistic
Where d is the proposed distance between the two means.