T.R | Title | User | Personal Name | Date | Lines |
---|
1042.1 | Use Bayes' Theorem | NIZIAK::YARBROUGH | I PREFER PI | Tue Mar 21 1989 09:28 | 6 |
| I believe the best approach to this problem is to apply Bayes' Theorem,
which I think has been discussed elsewhere in the conference, and is
certainly discussed in any probability text. Your first calculation is, I
think, not far wrong, if overly precise (.000001 persons???).
Lynn Yarbrough
|
1042.2 | Doesn't LSQ imply equal error bands? | POOL::HALLYB | The Smart Money was on Goliath | Tue Mar 21 1989 11:20 | 18 |
| Probably you mean how many TICKETS were purchased, not how many people
participated. (Unless Colorado has unusually restrictive laws...)
I think any time you use the higher-paying results you are going to
introduce more error than you correct.
.0> One answer: since 21,824 people guessed at least 3 of the 6 and
.0> the probability of this is 0.0290646625692, then the number who
.0> played is about 21824/0.0290646625692 or 750,877.459803.
Suppose instead we look at tickets where all 6 were guessed correctly.
There were about 0/.00000019 or 0 tickets played, using the logic
above from .0 Obviously more than 0 tickets (21,824 were winners)
were sold. So by including 6-out-of-6 you are probably introducing
error, not correcting it. Similar logic applies to 5-out-of-6 and
4-out-of-6, though to lesser extents.
John
|
1042.3 | | DEC25::ROBERTS | Reason, Purpose, Self-esteem | Tue Mar 21 1989 11:21 | 8 |
| Thanks, Lynn. I appreciate the reference to Bayes' Theorem.
I did a DIR/TITLE=BAYE to see if I could locate the note you refer to,
but drew a blank. Could someone point me to the proper note to read or
apply Bayes' Theorem to the problem in 1042.0 with an explanation?
/Dwayne
|
1042.4 | | DEC25::ROBERTS | Reason, Purpose, Self-esteem | Tue Mar 21 1989 12:21 | 26 |
| RE: .2 by John
I don't want to get into a discussion here about how restrictive
Colorado's laws are. There's a place for everything and MATH isn't for
politics. But you're right, of course, about it being tickets sold
rather than people participating. Believe it or not, it was actually
reported in the paper as people playing. I guess it makes it sound like
more of a popular game than it really is.
I appreciate your argument about trying to predict based solely on the
information of the number of 6-out-of-6 winners. In general, the more
information available, the less the error. This is why one could argue
that the Least Squares Method is more accurate than dividing total
winners by the probability of the winners. It doesn't lump 8 pieces of
information into 2 sums.
But Least Squares seems to introduce an arbitrary manipulation into the
approximation. I.e., why squares? Why not power 1.9? Or power 3.14? Why
is the power constant for each term? Maybe it should be power 1.0 at
the extremes and power 2.0 at the modal value, with some distribution
in between.
Just some meandering thoughts.
/Dwayne
|
1042.5 | Why not to use least-squares. | CADSYS::COOPER | Topher Cooper | Tue Mar 21 1989 13:59 | 20 |
| RE: .4
Least squares is justified when certain conditions are met, which is
not the case here. Basically the errors on each point must be
approximately normally distributed and the variance for each must be
the same. The first might be met (except that the proportions are
so small that I wouldn't bet (so to speak) on it). My intuition says
that the second condition is *not* met, so you would have to use
an appropriate weighted least-squares. I think Bayes theorem is
the way to go (Bayes theorem is a theorem in probability theory which
can be interpreted to provide the probability that something (H) is
true given some piece of evidence (E) given the probability that E
will occur if H is true and the probability that H is true before
you have taken account of evidence E; it thus allows you to build
up evidence incrementally about something; If I get a chance, and
no one beats me to it, I'll give more detail. The interpretation
of Bayes theorem stands at the center of the largest, longest-running
controversy in statistics and probability).
Topher
|
1042.6 | Just some meandering answers | POOL::HALLYB | The Smart Money was on Goliath | Tue Mar 21 1989 15:27 | 30 |
| .4> But Least Squares seems to introduce an arbitrary manipulation into the
.4> approximation. I.e., why squares? Why not power 1.9? Or power 3.14? Why
.4> is the power constant for each term? Maybe it should be power 1.0 at
The motivation behind squaring is that ultimately you want to build up
as powerful as toolkit as you can. Measuring dispersion "d" by squaring
differences, i.e., d(x,y) = (x-y)� does that. There's lots of ways
of measuring dispersion, and you are wondering why one should square
the difference instead of performing other operations.
Consider the "metric" d(x,y) = 0 iff x=y, d(x,y) = 1 iff x ^= y.
(Or if you have several points, d(xN,yN) = N iff xN ^= yN :-)
Such measures tend not to produce any useful theories.
You suggest why not d(x,y) = (x-y)^1.9. Consider (4-8)^1.5 = SQRT(-64).
You get the same problem with many exponents; squaring always gives
you a nonnegative number and that is a good thing. You could ask about
simply taking absolute value, d(x,y) = |x - y|, and that would also
give you nonnegative values, but remember that behind all this is a
mathematical foundation. Theories based on absolute value require
using functions that are not everywhere differentiable, which then
introduces unnecessary complications into life. Squaring differences
results in smooth, well-behaved functions that tend to cooperate in
development of a mathematical theory. And of course it is less work
that raising to higher powers.
Squares of differences, then, result in arguably the cleanest approach
to dispersion, and that is why they always show up; it isn't arbitrary.
John
|
1042.7 | Why all the estimates are bogus. | CADSYS::COOPER | Topher Cooper | Wed Mar 22 1989 12:38 | 30 |
| Forgot to mention yesterday. All these attempts at estimation are
predicated on one very bad assumption -- that each ticket represents a
random, uniform, independent sample from the set of 6-tuples-without-
replacement. People actually cluster quite heavily due to various
psychological reasons -- essentially that people's intuition about
statistics and probabilities are grossly wrong. People for example
tend to feel that even numbers are in some sense "less random" than
odd numbers and so are less likely to be drawn in a random sample;
they therefore choose many more odd numbers than even. People also
tend to believe that "obvious" arithmetic progressions are less likely
to occur than something more patternless and so tend to steer clear of
those.
To see why this throws the estimates off -- imagine that 97% of the
tickets were for the same 6 numbers, none of which happend to be drawn.
In that case your estimation effort would be only estimating the 3%
of randomly drawn tickets.
If you're approaching this as simply an interesting abstract puzzle
inspired by the lottery, and are thus willing to arbitrarily specify
uniform betting, than we can continue.
If on the other hand you're actually curious about the answer, than
I suggest you call the State Lotto Commission (or whatever its called
there) for the answer. Alternately, if you know the structure of
the payoff system (i.e., how much gets skimmed by the state and how
the remains are distributed among the various classes of winners)
then you should be able to get a precise answer from the payoffs.
Topher
|
1042.8 | 1 2 3 4 5 6 | DEC25::ROBERTS | Reason, Purpose, Self-esteem | Wed Mar 22 1989 13:38 | 34 |
| Thanks, Topher.
Actually, my interest is both abstract and practical. The math is
fun, but I sometimes play the game, myself.
Your point is well taken. I doubt that 97% were for the same 6 numbers
(as I'm sure you do, too), and wonder what the average Joe's "random"
distribution really looks like.
You said in -.1, "People also tend to believe that "obvious" arithmetic
progressions are less likely to occur than something more patternless
and so tend to steer clear of those." This is evidentially true. I
observed a man choosing his "random" numbers by asking his
pre-school-aged son for numbers.
"Give me a number, son."
"One."
"OK. Now give me another."
"One."
"No, no, no. It's got to be different."
"Two."
"Well, OK. Give me another."
"Three."
"Now look, son. The odds of getting three numbers in a row are almost
zero. Try another number other than `three'."
"Four."
At this point, the man started yelling at his kid, picked him up and
virtually threw him into his empty shopping cart.
For what it's worth.
/Dwayne
|
1042.9 | Is it a good bogus or a bad bogus? | POOL::HALLYB | The Smart Money was on Goliath | Wed Mar 22 1989 13:50 | 16 |
| .7> replacement. People actually cluster quite heavily due to various
.7> psychological reasons -- essentially that people's intuition about
.7> statistics and probabilities are grossly wrong. People for example
.7> tend to feel that even numbers are in some sense "less random" than
Would it be possible to account for this by looking at historical
records of how often each number has been selected by players?
If we assume, for the sake of argument, that numbers are selected in
inverse proportion to their value (1 most often, 2 next, ... 42 last),
and we know the 6 winning numbers and the outcomes as provided in .0 by
Dwayne, can we then make an estimate of the number of tickets sold?
It would be interesting to see how it compares with the "non-adjusted"
values already guesstimated.
John
|
1042.10 | | AITG::DERAMO | Daniel V. {AITG,ZFC}:: D'Eramo | Wed Mar 22 1989 16:14 | 7 |
| There's another "minor" skewing from the fact that one person
buying two tickets most likely chooses different combinations
on them [unless he doesn't like sharing]. If they were drawn
randomly the two would be the same combination with probability
equal to the probability of winning.
Dan
|
1042.11 | Guaranteed Winner! | DEC25::ROBERTS | Reason, Purpose, Self-esteem | Wed Mar 22 1989 17:45 | 10 |
| A related question:
What's the minimum number of tickets I must buy to guarantee I'll win
at least one 3-out-of-6 prize? 4-out-of-6? 5-out-of-6?
6-out-of-6 is easy: C(42,6)=5245786; i.e., the number of combinations
of 6 items out of 42.
/Dwayne
|
1042.12 | | BEING::POSTPISCHIL | Always mount a scratch monkey. | Thu Mar 23 1989 08:16 | 11 |
| Re .7:
> People also tend to believe that "obvious" arithmetic progressions
> are less likely to occur than something more patternless and so tend to
> steer clear of those.
Actually, arithmetic progressions are the most frequently chosen
tickets.
-- edp
|
1042.13 | okay, so suppose it is ideal... | KOBAL::GILBERT | Ownership Obligates | Thu Mar 23 1989 09:12 | 23 |
| Let's restate the problem.
A number of independent tests are done. The result of each test is
a non-negative number t; t occurs with probability P[t].
After the tests, the number of tests that resulted in t is S[t].
Given the P[t] values and a subset of the S[t] values, determine
the probability that there were exactly N tests.
For example, let P[0] + P[1] = 1, and suppose we are given S[0] = s0.
Then:
Prob( N=s0+k | S[0]=s0 ) = Prob( N=S[0]+k & S[0]=s0 ) / Prob( S[0]=s0 )
s0 k
C( s0+k, s0 ) P[0] P[1]
= ------------------------------ ;
inf s0 i
Sum C( s0+i, s0 ) P[0] P[1]
i=0
where C(a,b) is the binomial coefficient: 'a choose b'.
|
1042.14 | $4M split 19,412 ways is, um, | POOL::HALLYB | The Smart Money was on Goliath | Thu Mar 23 1989 09:14 | 33 |
| > Actually, arithmetic progressions are the most frequently chosen
> tickets.
Well, yes, but it isn't the A.P. property that causes that. It's
the visual pattern that sometimes is also an A.P.
<<< TIXEL::DUA2:[NOTES$LIBRARY]LOTTERIES.NOTE;1 >>>
-< Lotteries Discussions >-
================================================================================
Note 30.1 Mass Megabucks Statistics 1 of 1
TIXEL::ARNOLD "Never repeat yourself. Never." 263 lines 23-AUG-1988 11:02
-< More statistics from the Lottery Commission >-
--------------------------------------------------------------------------------
[...]
Based on $4 million revenue drawing, only 70.24% pf the total
1,947,792 (6/36) possible number combinations are bet.
Most popular combinations: Number of potential winners:
------------------------- ----------------------------
01 - 08 - 15 - 22 - 29 - 36 19,412 tickets with this pattern
06 - 11 - 16 - 21 - 26 - 31 19,333 " " " "
03 - 09 - 15 - 21 - 27 - 33 9,452
06 - 12 - 18 - 24 - 30 - 36 8,015
04 - 10 - 16 - 22 - 28 - 34 6,723
05 - 11 - 17 - 23 - 29 - 35 6,075
01 - 07 - 13 - 19 - 25 - 31 5,254
05 - 10 - 15 - 20 - 25 - 30 4,276
02 - 08 - 14 - 20 - 26 - 32 4,124
01 - 02 - 03 - 04 - 05 - 06 3,257 <=== "Daddy" was right�
05 - 11 - 12 - 23 - 28 - 31 2,558
----------
�In a manner of speaking
|
1042.15 | Didn't mean to overgeneralize my examples. | CADSYS::COOPER | Topher Cooper | Thu Mar 23 1989 11:42 | 26 |
| .12 (edp) .14 (HALLYB):
Interesting.
I reread my note .7 and found that I gave an impression of being more
specific than I meant to. I should have made clear that my examples
of people's tendency to "cluster" was taken from other contexts and
could not be blindly applied to this kind of lottery. They were meant
only as an example of the type of clustering that can occur when people
try to be random. Complicating things when you are talking about
a lottery like this is that people use different strategies -- some
people try to guess a "most random" number, some people use dice or
some other device to get a number (I know from the ads that in Mass.
there is now a service for this -- you can request a random number be
chosen for you rather than you supplying one), while other people use
various systems which can produce highly patterned results (e.g.,
betting columns on the sheet).
A friend of mine told me about when he worked on the Hong Kong lottery
(they used DEC computers). The Hong Kong lottery was *not* parimutual.
One day one of the major newspapers displayed a picture of a car wreck
on the front page with a prominantly displayed lisence plate number.
Thousands bet on it, and it came in -- the lottery commision went
bankrupt.
Topher
|
1042.16 | | DEC25::ROBERTS | Reason, Purpose, Self-esteem | Fri Mar 24 1989 10:32 | 21 |
| RE: my own note 1042.11 (What's the minimum number of tickets I
must buy to guarantee I'll win at least one n-out-of-6 prize?)
n minimum
= =======
0 0
1 7 ( 1 2 3 4 5 6 )
( 7 8 9 10 11 12 )
( 13 14 15 16 17 18 )
( 19 20 21 22 23 24 )
( 25 26 27 28 29 30 )
( 31 32 33 34 35 36 )
( 37 38 39 40 41 42 )
2 91 ?
3 1330 ?
4 ?
5 ?
6 5245786
/Dwayne
|
1042.17 | | KOBAL::GILBERT | Ownership Obligates | Fri Mar 24 1989 12:49 | 4 |
| > What's the minimum number of tickets I must buy to guarantee
> I'll win at least one n-out-of-m prize?
See note 746.* for this particular subproblem.
|
1042.18 | 50 cent tour of Bayesian Statistics | CADSYS::COOPER | Topher Cooper | Tue Mar 28 1989 15:19 | 152 |
| Bayes' Law or Bayes' Theorem says:
Pr(E | Hx) * Pr(Hx)
Pr(Hx | E) = _________________________
---
\
> Pr(E | Hi) * Pr(Hi)
/
---
i
Where
Hx is a hypothesis.
E is an event to be used as evidence about that hypothesis.
Hi is any one of a set of complete (i.e., one of them has to be
true) and distinct (i.e., if any one of them is true the others
are all false) hypotheses which includes Hx.
Pr(X) is the probability that X is true, and
Pr(X|Y) is the probability that X is true given that Y is true.
The summation is, of course, over all the Hi.
If the description of the H's and E's had been that they were outcomes
of experimental trials, then there would have been absolutely no
controversy about the above. It is simply an elementary, frequently
useful, theorem of probability theory. But with the descriptions
I used, Bayes' Law is the basis of the controversial discipline called
Bayesian Statistics.
The root of the controversy has to do with the interpretation given
to the concept of "probability". In the traditional school of
of statistics -- generally called the "frequentist" school --
probability refers to the frequency of undistinguished events in
a large number of identical repetitions of the same circumstances.
It is meaningless, according to this interpretation, to talk about
the probability of a general hypothesis, since they are unique and
are thus either simply true or simply false. At best one could give
them probability values of 0 or 1 but no value between can be
justified. Similarly, if E is in some sense a unique event -- the
outcome of a specific experiment, for example -- then it also cannot
meaningfully have a probability value associated with it. It either
occurs or it doesn't occur.
Bayesians, however, take a much broader view of probability.
Essentially their view is that anything which has the proper
mathematical form is "proper." This allows Bayesians to use
probability theory and statistics to model "rational uncertainty."
Especially controversial is the common use of "subjective
probabilities", i.e., expert judgments as to the probability of
something. Generally these are used for setting values for the
initial prior probabilities (the probabilities of the form Pr(Hi)),
but may also be used in estimating the conditional probabilities
(Pr(E|Hi)). Care must be taken that the subjective probabilities
are "rational" (obey the mathematical rules of probabilities), e.g.,
they must be normalized so that the sum of all the exclusive
probabilities equals 1.
You will often see "likelihood" used instead of "probability" by
Bayesians, this is essentially just a dodge to avoid some of the heat
from the mainstream about what a "probability value" is.
The Bayesians justification for all this is: 1) it works, 2) it
corresponds better to what people actually mean when they talk about
probabilities -- e.g., scientists talk all the time about the
probability that a particular theory is true, and 3) since subjective
judgments are going to be made *anyway* they may as well be made
explicit, formalized and made part of the process rather than done
implicitly after the "statistics" are "finished".
In practice, Bayesian statistics usually is concerned with evaluating
not a single piece of evidence but a set of pieces of evidence.
This is done by the simple expedient of applying the above formula
to one of the pieces of evidence, using whatever prior probabilities
seem justified. The result of that process is then used as the
prior probability in a second application of the formula using a
second piece of evidence. Its fairly easy to show that the order
that one uses the evidence is irrelevant to the final outcome. Less
easy to show but true is that the values of the initial prior
probabilities quickly become irrelevant as long as they aren't too
extreme (e.g., if one of your prior probabilities is 0 it will remain
0 no matter what -- if your mind is made up, no amount of evidence can
be expected to change it).
One important "cultural" difference between frequentist and Bayesian
statistics is what is taken as a "point estimate" of a quantity.
In traditional statistics, the most common value used is the
"expectation", mean or average, though occasionally the median is used.
Bayesians on the other hand, tend to use the "mode" the point of
highest probability.
For example, in note 1042.0 estimates of the number of lottery
tickets bought was given with a fraction, even though only an integer
number of tickets could have been purchased. From a traditional
statistical viewpoint, this is a reasonable thing to do, since the
result sought is a sort of "summary" of the entire distribution of
possible values. Typically a Bayesian, however, would have given an
integer value -- that which had the highest probability associated with
it.
Frequently one will see quite different seeming formula described as
Bayes' Law. These are either equivalent or are derived from the above.
One particularly useful version is gotten by dividing the formula
for two different hypotheses by each other, and using some defined
quantities:
Pr(E | H1)
R(H1, H2 | E) = ------------ * R(H1, H2)
Pr(E | H2)
Where:
Pr(X | Y) is as before,
R(Hx, Hy) is the relative likelihood (probability) of Hx to Hy
(e.g., Hx is four times more likely than Hy).
R(Hx, Hy | E) is the relative likelihood of Hx to Hy given that
E occurred.
Note that in this form one can completely ignore the probabilities, or
even the existence of other hypotheses than those you are examining.
Two identities (actually two forms of the same identity) which are
useful with this formula in extending it to more than two hypotheses
are:
R(H1, H2) = R(H1, H3) / R(H2, H3)
and
R(H1, H3) = R(H1, H2) * R(H2, H3)
Why are Bayesian statistics not more widely used despite their obvious
intuitive appeal? Obviously the rather academic objections of the
traditional school (even if valid) are unlikely to interfere with
people "in the trenches" using them if they know about them and they
wish to use them. Bayesian statistics *are* widely known about --
though they do not represent the mainstream, they are not the province
of only a few isolated "nuts". Many elementary and most general
intermediate statistics books contain a chapter on Bayesian statistics.
So they seem desirable, and they are known about, so how come they are
not used? The main answer is that they are a pain in the butt.
Bayesian statistics require careful, explicit judgments to be made at
each stage. They are not easily captured into cookbook procedures such
as are the mainstay of traditional statistics. It is much harder to
calculate the postori probability of a hypothesis properly by using
Bayesian procedures than to calculate a "p-value" using traditional
procedures and then to effectively treat it (improperly) as the
probability of a hypothesis.
Topher
|
1042.19 | a Bayesian calculation | PULSAR::WALLY | Wally Neilsen-Steinhardt | Tue Apr 18 1989 15:02 | 76 |
| At least one other discussion of applying Bayes' Law appears in
note 831. I think you would have had to do DIR/TITLE=BAYES *.*
to find it. There may be others, but I don't remember where.
Reply 1042.18 gave the form of Bayes' law that we need, so I will
just carry out two calculations as an example. I will make the
(unrealistic) assumption that ticket numbers were uniformly
distributed.
An easy way to rephrase this question is as follows: I know the
probability with which some condition is satisfied, and I know the
actual number of times that the condition was satisfied in a sample,
so what is my best estimate of the number of events in the sample?
The probabilities of the conditions are calculated as fractions
in .0, and the actual numbers are given there. I will use the form
of Bayes' law in .18, but I will ignore the denominator, since I
know that it is just a normalizing factor that I can calculate when
I need it. So my form will be
Pr( Hx | E ) = Pr( E | Hx ) * Pr( Hx ) / D
The Pr( E | Hx ) on the right is the probability that the evidence
will be seen, given the hypothesis, or in this case, the probability
of seeing r conditions satisfied in n events, given that the
probability of seeing the condition in one event is f. This is
given by the binomial probability distribution or
Pr( r | n f ) = C( n, r ) * f^n * ( 1-f )^( n-r )
[ note the similarity to .13 ]
Consider first the special case of the 6 digit match, where r = 0, and
f = 1/5245786. Here Pr( r | n f ) simplifies to
Pr( 0 | n f ) = ( 1 - f )^n
To put this into Bayes' law, we need a value for Pr( Hx ), the
probability we assign to n before we have any results. Conventionally
we assume some large upper limit, say 10^8 and call it nmax, is more
tickets than the Colorado lottery can sell, and that in our prior
state of knowledge we can only assume that all ns <= nmax are equally
likely, so Pr( n ) = 1/nmax. (note that those who do not like the
Bayesian approach usually start jumping up and down and shouting
about here). Substituting this into Bayes' Law gives
Pr( n | E ) = ( 1 - f )^n / (D*nmax)
Because f is so small, this probability goes slowly to zero as n
increases. In other words, the six digit match does not provide
you with much information, as suggested in .2. About all it tells
you is that n is likely to be less than about 10^7.
So let's go to the other end and look at the three digit match.
Here it is useful to note, before we get all tied up in computing
factorials, that for large n and r, the binomial distribution is
well approximated by a normal distribution with mean = n * f and
variance = n * f * (1-f). Substituting in the values in .0 tells
us that the three digit match gives us a most likely value of 754761
with a standard deviation of 141.
Similarly the four digit match gives us a most likely value of 698346
with a standard deviation of 35. The fact that these two most likely
values are so many standard deviations apart confirms what we
suspected: lotto numbers are not randomly chosen. So we don't get
to use the cascading process which is traditional in Bayesian analysis:
use the six digit matches to get a first estimate, then refine it
with the five digit matches and so forth.
Note finally that the least squares approach in .0 has been shown
to be inapplicable. Because the variances differ, we should not
combine the values in a simple least squares. If our assumption
of randomly chosen numbers had been verified, we could have tried
a weighted least squares approach as the 'classical' solution. Since
our assumption was not verified, we must confess that we know almost
nothing about the number of tickets sold.
|