Can someone explain in few sentences, what these terms in statistic means? Descriptive statisics, Pareto diagram, outliers, contingency table, hypothesis testing and multiple regression analysis. What is the purpose of these terms?

1 Answer
Apr 10, 2018

A descriptive statistic is a number that describes the existing data. It tells us a quick summary of the information we already know. Examples would be:

  • sample mean #barx#, median #stackrel~x#, and mode
  • sample variance #s^2#
  • range

Alternatively, an inferential statistic uses the sample data we know to make a guess about the population (or a future observation). Examples would be population mean estimate #hatmu#, population variance estimate #hatsigma^2#, and future observation estimate #haty# based on existing regression data.

A Pareto diagram is a two-part diagram. It displays categories of data in descending-quantity order (shown with vertical bars) and gives a cumulative percentage of the sample (shown with an increasing line graph on top of the bars.)

An outlier is a data point that appears far removed from the rest of the data. Including outliers in the calculation of a statistic has a lot more influence on the statistic than a non-outlier.

A contingency table is a two-way table filled with (usually) count observations for two different categorical/discrete variables. They are used to determine if there is dependence between the two variables by using the chi-squared statistic.

Hypothesis testing is the process of forming a null hypothesis #H_0# and a complementary alternative hypothesis #H_1# or #H_A# (usually before collecting data), and then, assuming #H_0# is true, computing a statistic that should come from a certain distribution. If that statistic's value is unlikely to have come from that distribution (using a desired confidence level #alpha#), then we reject #H_0# in favour of #H_1.# Otherwise, we do not reject #H_0.#

Multiple regression analysis is the process of comparing one variable (dependent) against a combination of other variables (independent) to see how well some combination of the independent variables can explain the dependent one. For instance, we can use multiple regression analysis to test the idea that monthly food bills are correlated with family size, living location, household income, etc.