The mean number of seeds in a watermelon is 176 and the standard deviation is 40. What percentage of melons have less than 100 or greater than 225 seeds?

1 Answer
Apr 2, 2017

Approximately #14%# of melons have less than #100# or more than #225# seeds.

Explanation:

We need to make many assumptions in order to answer this question. I address one here, and some others at the end of the answer.

You imply in the topic, but not in the problem desciption, that the number of seeds in some (idealised) set of melons all follow a Normal distribution with known mean and variance. This may or may not be a good approximation in the real world.

Let us model the number of seeds in a melon as a random variable #X# distributed as #N(mu, sigma^2) = N(176, 40^2)#. Then the probability of the number of seeds being less than #100# or more than #225# is
#P(X < 100) + P(X>225)#.

Now we can rewrite this in terms of a Standard Normally distributed variable #Z# distributed as #N(0,1)# as follows
#P((X - mu)/sigma < (100 - mu)/sigma) + P((X-mu)/sigma>(225-mu)/sigma)#
#P(Z <-1.9) + P(Z>1.225)#,
where #Z = (X-mu)/sigma# is Normally distributed with #0# mean and standard deviation #1#.

At this point, to find the probabilities explicitly we need to consult tables, or use software. In many tables probabilities are only given on the form #P(Z < z)#, where #z# is a real number. In this case we need to rewrite our second probability term as
#P(Z>1.225) = 1 - P(Z<1.225)#.
The total probability we seek can then be written as
#P(Z<-1.9) + 1 - P(Z<1.225)#.

Looking up a table, for instance at http://math.arizona.edu/~rsims/ma464/standardnormaltable.pdf
We find that
#P(Z<-1.9) ~~ 0.02872#
#P(Z<1.225) ~~ P(Z<1.3) ~~ 0.89065#
such that the probability of a melon having less than #100# or more than #225# seeds is
#P(Z<-1.9) + 1 - P(Z<1.225) ~~ 0.13807 ~~ 14%#.

If we now assume that the proportion of melons in the real world is well approxiated by the probabilities of this random variable, then #14%# corresponds to the percentage of melons with fewer than #100# or more than #225# seeds.

Some issues:

  1. The number of seeds is a discrete number, while the Normal distribution deals with continuous quantities. We must keep this in mind. Since the smallest possible difference in number of seeds between any two melons is at most 1, which is small compared to typical number of seeds in a melon, we might be able to approximate the number of seeds as continuous without much error.

  2. The normal distribution deals with probabilities, and not proportions. The number of melons in the world is finite, so even if the number of seeds in each melon is an outcome of a mathematically idealised random variable, when you take a finite sample of melons, their seed counts will only approximately be distributed as a Normal variable. The larger the sample, the more likely that the seed counts are close to the Normal random variable that they are outcomes of.

  3. The normal distribution allows for almost infinitely many seeds, as well as negatively many seeds (!?). Therefore, the assumption of normal distribution is at best approximate here. Hopefully, the probabilities at the ends of the distibution do not form a large part of the total probability (in this case they do not).

  4. In the real world we should check if the number of seeds in melons actually are approximately normally distributed. Typically, we also have to estimate the mean and standard deviation, and then we have to use slightly more advanced methods.