**Fundamental Sampling**

**Distributions and Data Descriptions**

**Random Sampling**

The outcome of a statistical experiment may be recorded either as*a numerical value or as a descriptive representation. When a pair of dice is tossed and the total is the outcome of interest, we record a numerical value. However, if the students of a certain school are given blood tests and the type of blood is of interest, then a descriptive representation might be the most useful. A person’s blood can be classified in 8 ways: AB, A, B, or 0, with a plus or minus sign, depending on the presence or absence of the antigen.

In this chapter we focus on sampling from distributions or populations and study such important quantities as the sample mean and *sample *variance, which are of vital importance in future chapters. In addition, we attempt to give the reader an introduction to the role that the sample mean and variance will play in later chapters in statistical inference. The vise of the modern faigh-spefcd computer allows the scientist or engineer to greatly enhance his or her use of formal statistical inference with graphical techniques. Much of the time formal inference appears quite dry and perhaps even abstract to the practitioner or the manager who wishes to let statistical analysis be a guide to decision-making.

**population and Samples**

We begin this section by discussing the notions of *populations *and samples. Both are mentioned in a broad fashion in Chapter 1. However, much more needs to be discussed about them here, particularly in the context of the concept of random variables. The totality of observations with which we are concerned, whether their number be finite or infinite, constitutes what W call a population. There was a time when the word population referred to observations obtained from statistical studies about people. Today, the statistician uses the term to r<-ST to observations relevant to anything of interest, whether it be groups of, or all possible outcomes from some complicated biological or engineering system.

**Definition 8.1:**

A population consists of the totality of the observations with which we are concerned. The number of observations in the population is defined to be the size of the population. If there are 600 students in the school that we classified according to blood type, we say that we have a population of size GOO. The numbers on the cards in a deck, the heights of residents in a certain city, and the lengths of fish in a particular lake are examples of populations with finite size. In each case the total number of observations is a finite number. The observations obtained by measuring the atmospheric pressure every clay from the past on into the future, or all measurements on the depth of a lake.

KSiitpres”m populations whose sizes are infinite. Some finite populations are so large that in theory we assume them to be infinite. This is true if you consider the population of lifetimes of a certain type of storage .battery being manufactured formass distribution throughout the country.

Each observation in a population is a value of a random variable A” having son probability distribution *f(x). *If one is inspecting items coming off an assembly lit for defects, then each observation in the population might be a value 0 or 1 of| Bernoulli random variable *X *with probability distribution. where 0 indicates a nondefective item and 1 indicates a defective item. Of < it is assumed that *>p, *the probability of any item being defective, remains < from trial to trial. In the blood-type experiment the random variable *X *rep the type of blood by assuming a value from 1 to 8, Each student is given one of 4 values of the discrete random variable. The lives of the storage batteries are’ assumed by a continuous random variable having perhaps a normal diatribe When we refer hereafter to a “binomial population,” a “normal population,” c general, the “population /(x),” we shall mean a population whose observations I values of a random variable having a binomial distribution, a normal distribute or the probability distribution *f(x). *Hence the mean and variance of a rand variable or probability distribution are also referred to as the mean and varies of the corresponding population.

In the field of statistical inference the statistician is interested in arriving conclusions concerning a population when it is impossible or impractical to serve the entire set of observations that make up the population. For example. in attempting to determine the average length of life of a certain brand of bulb, it would be impossible to test all such bulbs if we are to have any left to i Exorbitant costs can also be a prohibitive factor in studying the entire populate< Therefore, we must depend on a subset of observations from the population to he us make inferences concerning that same population. This brings us to con the notion of sampling

**Definition ****8.2:** A sample is a subset of a population.

If our inferences from the sample to the population are to be valid, we mu obtain samples that are representative of the population.All too often we are tempted to choose a sample by selecting the most convenient members of the population. Such a procedure may lead to erroneous inferences concerning the population. Any sampling procedure that produces inferences that consistently overestimate or consistently underestimate some characteristic of the population is said to be biased. To eliminate any possibility of bias in the sampling procedure,” it, is desirable to choose a random sample in the sense that the observations are made independently and at random.

In selecting a random sample of size *n *from a population *f(x). *let us define the random variable A”,-, t = 1,2, …,n, to represent the ith measurement or sample value that we observe. The random variables *Xi,X^,…,X _{n} *will then constitute a random sample from the population (x) with numerical values xi,X2

_{; }

*z*f the measurements are obtained by repeating the experiment

_{n}o*n*independent times under essentially the same conditions. Because of the identical conditions under which the elements of the sample are selected, it is reasonable to assume that the n random variables A”j,

*X%,… ,X*are independent and that each has the same prob ability distribution /(x). That is, the probability distributions of A’i, AY A’

_{n}_{n }are, respectively,

*f(xi),/(xy),…,f(x*and their joint probability distribution is

_{n})*f(xi,X3,…,x*=

_{n})*f(xi)f(x2)—/(x*The concept of a random sample is described formally by’the following definition.

_{n})-**Definition 8.3**

Let A”i,A”2,.. .,A”_{n} be n independent random variables, each having the same probability distribution /(x). Define A”i, A”a,…, *X _{n} *to be a random sample of size

*n*from the population /(x) and write its joint probability distribution as If one makes a random selection of n = 8 storage batteries from a manufacturing process, which has maintained the same specification, and records the length of life for each battery with the first measurement

*x\*being a value of

*‘X\,*the second

^{v }measurement xj a value of A~2, and so forth, then xi,X2,… . xg are the values of the random sample A”j,

*Xz,*

*••.,*AV If we assume the population of battery lives to be normal, the possible values of any A”<, i = 1,2,…, 8, will be precisely the same as those in the original population, and hence A”; has the same identical normal distribution as A”.

**Some Important Statistics**

Our main purpose in selecting random samples is to elicit information about the unknown population parameters. Suppose, for example, that vc wish to arrive at a conclusion concerning the proportion of coffee-drinking people in the United States who prefer a certain brand of coffee. It would be impossible to question every coffee-drinking American in order to compute the value of the parameter *p *representing the population proportion. Instead, a large random sample is selected and the proportion p of people in this sample favoring the brand of coffee in question is calculated. The value *p *is now used to make an inference concerning the true proportion *p. *Now, p is a function of the observed values in the random sample. random samples are possible from the same population, we would expect *p *to vary somewhat from sample to sample. That is, p is a value of a random variable that we represent by *P.’ *Such a random variable is called a statistic.

**Definition 8.4:**

Any function of the random variables constituting a randojfn s’&mple is called a statistic.

**Central Tendency in the Sample; the Sample Avian**

In Chapter 4 we introduced, which the center of location and the variability of a probability distract lion, These are constant population parameters and are in no way affected infiue&ced by the observations of a random sample. We shall, however, defuw some important statistics that describe corresponding mcamirw’of a randpm sample. The most commonly used statistics for measuring the center of a set of data, arranged in order of magnitude, are the mean, median, and mode. All-of these statistics are defined in Chapter 1. The mean will be defined again here. If *X\, X^, **•.., X _{n} *represent a random sample of size n, then the sample mean

is defined by the statistic.

** **

**Definition 8.5:**

Note that the statistic *X *assumes .the value *x *= £ £ i, when *X\ *assumes the value xi, *Xy *assumes the value *x?, *and so forth. In practice the value of a statistict is usually given the same name as the statistic. For instance, the term sump *mean *is applied to- both the statistic *X *and its computed value *x. *There is an earlier reference made to the sample mean in Chapter 1. Example were given that illustrated the computation of a sample mean.As we suggested in Chapter 1, a measure of central tendency in the sample dc not by itself give a clear indication of the nature of the sample. Thus a measur of variability in the sample must also be considered.

**The Sample Variance**** **

The variability in the sample should display how the observations spread out i the average. The reader is referred to Chapter 1 for more discussion. It is sible to have two sets of observations with the same mean or median that considerably in the variability of their measurements about the average.

Consider the following measurements, in liters, for two samples of orange ju

bottled by companies *A *and *B:*

** Sample A = 0.97 1.00 0.94 1.03 1.06**

** Sample B = I.tW 1.01 0.88 0.91 1:14**

Both samples have the, same mean, 1.00 liter. It is obvious that company bottles orange juice with a morf uniform content than company *B. *We say A the variability or the dispersion of the observations from the average is less for sample *A *than for, sample *,&? *Therefore, in buying orange juice, we would feel more confident that the bottle- we-select will be closer to the advertised average if we buy from company *A.*

* *In Chapter J’ w« introduced several measures of sample variability including the sample Variance and sample range. In this chapter we will focus on the sample variance.

**Dafination 8.6:**

If *X\, *X_{2},*…*, *X _{n} *represent a random sample of size n, then the sample variance is defined by the statistic The computed value of S

^{2}for a given sample is denoted by s

^{2}. Note” that S

^{2}is essentially defined to be tte average of trie squares of the deviations of the observations from their mean, the reason for’using n – 1 as a divisor rather than the more obvious ehoice

*n*will become apparent, in. Chapter 8.

**Example 8.1**:

A comparison of coffee prices at *4 *randomly selected grocery stores in San Diego showed increases frdm’tliepreyfe;us month of 12,15, 17, and 20 cents for a 1-pound bag. Find the variance of this random sample of price increases.

*Solution:*

Calculating tjie sample meaji_{v} we get

X= 12+ 15+ 17+ 20 = 16s

Replacing *X *by *J^Xi/n *and multiplying numerator and denominator by *n, *obtain .the more useful computational formula of Theorem 8.1.

**Definition 8.7:**

The sample standard deviation, denoted by *S, *is the positive square root *< *the sample variance.

**Example 8.2**:

Find the vaxiance of the data 3, 4, 5. 6, 6, and 7, representing the number of tr caught by a random sample of 6 fishermen on June 19,1996, at Lake Muskoka.

*Solution:*

* *We find that £ if = 171, £ «» = 3r, n = 6. Hence

Thus the sample standard deviation *s *= 1.47