| I don't think this is a similar problem to the exit polls. The
technique I learnt at school involves the assumption (or proof) that the
distribution of the sample results be normal. Your sample results will
have a binominal distribution, but the test would be against the
hypothesis that there was 1 bad address, giving a very skewed
distribution for which the normal distribution is not a good
approximation. If there is one bad address out of 2^64 then he chance
of a sample of size n having no bad addresses is (1-2^-64)^n. I would
say that could say that you are (1-(1-2^-64)^n)*100% confident that
there is not one error if you find no errors with a sample size of n.
So if you want to p% confident you need (1-2^-64)^n to equal 1-p/100.
So n=log(1-p/100)/log(1-2^-64).
Peter
|
| There is a fairly simple minded answer you can find in any statistics book.
Take M randomly chosen items from the population, of size N. If just one
of the N is bad, you have a probability of
P(M,N) = ( 1 - 1/N )^M
of getting a completely good sample. Your goal is to drive this probability
below some suitable threshold.
X > P(M,N)
The calculations will be easier if you take logs
log X > log P(M,N) = M * log ( 1 - 1/N )
and solve for M
M > ( log X ) / ( log ( 1 - 1/N ) )
Unfortunately, I think you will find that M is extremely close to N.
Why is your answer different from the exit polls, which usually can use small
samples? There are two reasons.
First, exit polls use a lot of prior knowledge about voters to sample by
age, income, race and so forth. Then they use complex models to combine
these samples into a total prediction. You could use similar techniques if
you can learn something about the distribution of GOOD and BAD.
Second, exit polls usually aim for an error of a few percent or so. You
are actually aiming for an error of 1 in N. If this is really your aim, you
might as well test all N and be done with it. If your aim is really to show
that the BAD count is less than some number B, fairly far from both 1 and N,
then you would find that M is a much smaller number.
|