If a wholesaler sells to 500 stores and one store shows a 50% uptick in sales, how can the wholesaler determine if this uptick is significant or if it is expected for a few stores to randomly see an uptick of 50%?

1 Answer
Mar 20, 2016

There is no single simple answer. It depends on additional parameters that are not given.
See explanation below.

Explanation:

Important parameters that are not given in this problem are distribution of goods among stores and the number of customers buying in these stores.

Let's try to address a problem generally, and then we will make certain reasonable assumptions.

The distribution of goods among stores is related to probability of customers to buy goods in each specific store.
Assume that the probability of a single item to be bought at store #S_1# is #p_1#, at store #S_2# is #p_2#, ... at store #S_i# is #p_i#,... at store #S_500# is #p_500#.

Assume further that the total number of items purchased is #n#.

Consider now a store #S_i#. Introduce a random variable #xi_i# that is equal to #1# when an item is bought at store #S_i# (with probability #p_i#) and is equal to #0# otherwise (with probability #1-p_i#).
This is a Bernoulli random variable.
Its mathematical expectation is
#E(xi_i)=1*p_i+0*(1-p_i)=p_i#,
its variance is
#Var(xi_i)=(1-p_i)^2*p_i+(0-p_i)^2*(1-p_i)=p_i(1-p_i)#,
its standard deviation is
#sigma(xi_i)=sqrt(p_i(1-p_i))#

The wholesaler has certain number #n# of items of his goods that he distributes among #500# stores. It's reasonable to assume that the number of items #n# is rather big to cover all stores and must be significantly higher than the number of stores.
For instance, if we are talking about bottles of soda, it must be thousands per store.

Consider now #n# random variables independent of each other and each distributed identically with #xi_i#:
#xi_(i1)#, #xi_(i2)#,...#xi_(i n)#
Here random variable #xi_(ij)# indicates whether #j#th item was bought at #i#th store.
Obviously, the sum of the above random variable is a random variable equal to the number of items bought at #i#th store:
#eta_i=xi_(i1)+xi_(i2)+...+xi_(i n)#

Let's analyse the distribution of probabilities of #eta_i#.
First of all, according to the Central Limit Theorem, this distribution should be very close to Normal.
Since it's a sum of independent identically distributed random variables, its expectation is a sum of expectations of its components and its variance is a sum of variances:
#E(eta_i)=p_i*n#
#Var(eta_i)=p_i*(1-p_i)*n#
#sigma(eta_i)=sqrt(p_i*(1-p_i)*n)#

It's time to make some additional assumption. To simplify the problem, let's assume that all stores are approximately equal in the number of customers who buy there. Therefore, the probability of a single item to be bought in store #S_i# is independent of store and, therefore, equal to #1/500=0.002#.
That makes all #eta_i# to have the same distribution of probabilities - Normal with expectation #E(eta_i)=0.002*n# and standard deviation #sigma(eta_i)~=0.0447*sqrt(n)#

Let's say, we want to determine the probability of purchases in store #S_1# (or any other fixed store for this matter) to be within reasonable limits around average with total number of items distributed among all stores #n=10,000#.
In this case
#E(eta_1)=0.002*10000=20#,
#sigma(eta_1)~=0.0447*sqrt(10000)=4.47#

According to the "rule of 2#sigma#", with 95% certainty we can say that deviation of the value of our random variable #eta_1# from its mathematical expectation #E(eta_1)# should not exceed #2*sigma(eta_1)~=9#, which is slightly less than 50% of its average value #20#.
So, under the condition of equal probabilities of purchase in different stores #p_i=1/500# and about #10,000# items purchased in all stores combined, the probability of the number of items purchased in store #S_1# (or any other fixed store) not to exceed 50% of average is greater than 95%.

The second part of this problem is related to probability of ANY store purchase not to exceed 50% of its average. With certain degree of precision it can be calculated as the product of corresponding probabilities in EACH store.
To achieve 95% certainty that number of purchases in any store would not exceed 95%, we need the probability of each store to be
#0.95^(1/500)~=0.9999=99.99%#

To achieve this probability for each store we need the number of purchases to be very high. "Rule of 3#sigma#" states that Normal random variable takes values not further than 3#sigma# from its average with probability 99.7%. To achieve 99.99% certainty we have increase the interval around average to 6#sigma#.

Thus, with #n=100,000# we have
#E(eta_1)=0.002*100000=200#
#sigma(eta_1)~=0.0447*sqrt(100000)=14.14#
#6sigma(eta_1)~=85#,
which is about 43% of the average, so it's sufficient to have 100,000 items to distribute to make sure that none of the store would have more than 50% extra purchases with certainty of 95%.

If, evenly distributing 100,000 items among 500 relatively equivalent (in average number of purchases) stores, at least one store exceeded its sale by more than 50%, something abnormal and unexpected happened.

Please refer to Unizor for details on probabilities and statistics.