[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

1042.0. "Colorado Lotto" by DEC25::ROBERTS (Reason, Purpose, Self-esteem) Mon Mar 20 1989 18:44

    Greetings! It's been a long time since I participated in this
    conference.
    
    I'm studying the Colorado state "lotto" game. In it, one attempts
    to guess the six numbers out of forty-two that will be drawn.
    
    In the last game,
    
         0 people guessed all 6.
        20 people guessed 5 of the 6.
     1,258 people guessed 4 of the 6.
    20,546 people guessed 3 of the 6.
    ======
    21,824 people guessed at least 3 of the 6.
    
    The probability of guessing at least 3 of the 6 is 152467/5245786
    or approximately 0.0290646625692.
    
    How many people played?
    
    One answer: since 21,824 people guessed at least 3 of the 6 and
    the probability of this is 0.0290646625692, then the number who
    played is about  21824/0.0290646625692 or 750,877.459803.
    
    Another answer: doing least squares for each level of winner; i.e.,
    choosing X in order to minimize
    
    (142800/5245786*X-20546)^2 + 
    (  9450/5245786*X- 1258)^2 +
    (   216/5245786*X-   20)^2 +
    (     1/5245786*X-    0)^2
    
    yields X approximately  754,514.6635.
    
    Which answer is better and why? Is there a better estimate, and
    if so how is it reached?
    
    					/Dwayne

T.R	Title	User	Personal Name	Date	Lines
1042.1	Use Bayes' Theorem	NIZIAK::YARBROUGH	I PREFER PI	`Tue Mar 21 1989 09:28`	6
	I believe the best approach to this problem is to apply Bayes' Theorem, which I think has been discussed elsewhere in the conference, and is certainly discussed in any probability text. Your first calculation is, I think, not far wrong, if overly precise (.000001 persons???). Lynn Yarbrough
1042.2	Doesn't LSQ imply equal error bands?	POOL::HALLYB	The Smart Money was on Goliath	`Tue Mar 21 1989 11:20`	18
	Probably you mean how many TICKETS were purchased, not how many people participated. (Unless Colorado has unusually restrictive laws...) I think any time you use the higher-paying results you are going to introduce more error than you correct. .0> One answer: since 21,824 people guessed at least 3 of the 6 and .0> the probability of this is 0.0290646625692, then the number who .0> played is about 21824/0.0290646625692 or 750,877.459803. Suppose instead we look at tickets where all 6 were guessed correctly. There were about 0/.00000019 or 0 tickets played, using the logic above from .0 Obviously more than 0 tickets (21,824 were winners) were sold. So by including 6-out-of-6 you are probably introducing error, not correcting it. Similar logic applies to 5-out-of-6 and 4-out-of-6, though to lesser extents. John
1042.3		DEC25::ROBERTS	Reason, Purpose, Self-esteem	`Tue Mar 21 1989 11:21`	8
	Thanks, Lynn. I appreciate the reference to Bayes' Theorem. I did a DIR/TITLE=BAYE to see if I could locate the note you refer to, but drew a blank. Could someone point me to the proper note to read or apply Bayes' Theorem to the problem in 1042.0 with an explanation? /Dwayne
1042.4		DEC25::ROBERTS	Reason, Purpose, Self-esteem	`Tue Mar 21 1989 12:21`	26
	RE: .2 by John I don't want to get into a discussion here about how restrictive Colorado's laws are. There's a place for everything and MATH isn't for politics. But you're right, of course, about it being tickets sold rather than people participating. Believe it or not, it was actually reported in the paper as people playing. I guess it makes it sound like more of a popular game than it really is. I appreciate your argument about trying to predict based solely on the information of the number of 6-out-of-6 winners. In general, the more information available, the less the error. This is why one could argue that the Least Squares Method is more accurate than dividing total winners by the probability of the winners. It doesn't lump 8 pieces of information into 2 sums. But Least Squares seems to introduce an arbitrary manipulation into the approximation. I.e., why squares? Why not power 1.9? Or power 3.14? Why is the power constant for each term? Maybe it should be power 1.0 at the extremes and power 2.0 at the modal value, with some distribution in between. Just some meandering thoughts. /Dwayne
1042.5	Why not to use least-squares.	CADSYS::COOPER	Topher Cooper	`Tue Mar 21 1989 13:59`	20
	RE: .4 Least squares is justified when certain conditions are met, which is not the case here. Basically the errors on each point must be approximately normally distributed and the variance for each must be the same. The first might be met (except that the proportions are so small that I wouldn't bet (so to speak) on it). My intuition says that the second condition is not met, so you would have to use an appropriate weighted least-squares. I think Bayes theorem is the way to go (Bayes theorem is a theorem in probability theory which can be interpreted to provide the probability that something (H) is true given some piece of evidence (E) given the probability that E will occur if H is true and the probability that H is true before you have taken account of evidence E; it thus allows you to build up evidence incrementally about something; If I get a chance, and no one beats me to it, I'll give more detail. The interpretation of Bayes theorem stands at the center of the largest, longest-running controversy in statistics and probability). Topher
1042.6	Just some meandering answers	POOL::HALLYB	The Smart Money was on Goliath	`Tue Mar 21 1989 15:27`	30
	.4> But Least Squares seems to introduce an arbitrary manipulation into the .4> approximation. I.e., why squares? Why not power 1.9? Or power 3.14? Why .4> is the power constant for each term? Maybe it should be power 1.0 at The motivation behind squaring is that ultimately you want to build up as powerful as toolkit as you can. Measuring dispersion "d" by squaring differences, i.e., d(x,y) = (x-y)� does that. There's lots of ways of measuring dispersion, and you are wondering why one should square the difference instead of performing other operations. Consider the "metric" d(x,y) = 0 iff x=y, d(x,y) = 1 iff x ^= y. (Or if you have several points, d(xN,yN) = N iff xN ^= yN :-) Such measures tend not to produce any useful theories. You suggest why not d(x,y) = (x-y)^1.9. Consider (4-8)^1.5 = SQRT(-64). You get the same problem with many exponents; squaring always gives you a nonnegative number and that is a good thing. You could ask about simply taking absolute value, d(x,y) = \|x - y\|, and that would also give you nonnegative values, but remember that behind all this is a mathematical foundation. Theories based on absolute value require using functions that are not everywhere differentiable, which then introduces unnecessary complications into life. Squaring differences results in smooth, well-behaved functions that tend to cooperate in development of a mathematical theory. And of course it is less work that raising to higher powers. Squares of differences, then, result in arguably the cleanest approach to dispersion, and that is why they always show up; it isn't arbitrary. John
1042.7	Why all the estimates are bogus.	CADSYS::COOPER	Topher Cooper	`Wed Mar 22 1989 12:38`	30
	Forgot to mention yesterday. All these attempts at estimation are predicated on one very bad assumption -- that each ticket represents a random, uniform, independent sample from the set of 6-tuples-without- replacement. People actually cluster quite heavily due to various psychological reasons -- essentially that people's intuition about statistics and probabilities are grossly wrong. People for example tend to feel that even numbers are in some sense "less random" than odd numbers and so are less likely to be drawn in a random sample; they therefore choose many more odd numbers than even. People also tend to believe that "obvious" arithmetic progressions are less likely to occur than something more patternless and so tend to steer clear of those. To see why this throws the estimates off -- imagine that 97% of the tickets were for the same 6 numbers, none of which happend to be drawn. In that case your estimation effort would be only estimating the 3% of randomly drawn tickets. If you're approaching this as simply an interesting abstract puzzle inspired by the lottery, and are thus willing to arbitrarily specify uniform betting, than we can continue. If on the other hand you're actually curious about the answer, than I suggest you call the State Lotto Commission (or whatever its called there) for the answer. Alternately, if you know the structure of the payoff system (i.e., how much gets skimmed by the state and how the remains are distributed among the various classes of winners) then you should be able to get a precise answer from the payoffs. Topher
1042.8	1 2 3 4 5 6	DEC25::ROBERTS	Reason, Purpose, Self-esteem	`Wed Mar 22 1989 13:38`	34
	Thanks, Topher. Actually, my interest is both abstract and practical. The math is fun, but I sometimes play the game, myself. Your point is well taken. I doubt that 97% were for the same 6 numbers (as I'm sure you do, too), and wonder what the average Joe's "random" distribution really looks like. You said in -.1, "People also tend to believe that "obvious" arithmetic progressions are less likely to occur than something more patternless and so tend to steer clear of those." This is evidentially true. I observed a man choosing his "random" numbers by asking his pre-school-aged son for numbers. "Give me a number, son." "One." "OK. Now give me another." "One." "No, no, no. It's got to be different." "Two." "Well, OK. Give me another." "Three." "Now look, son. The odds of getting three numbers in a row are almost zero. Try another number other than `three'." "Four." At this point, the man started yelling at his kid, picked him up and virtually threw him into his empty shopping cart. For what it's worth. /Dwayne
1042.9	Is it a good bogus or a bad bogus?	POOL::HALLYB	The Smart Money was on Goliath	`Wed Mar 22 1989 13:50`	16
	.7> replacement. People actually cluster quite heavily due to various .7> psychological reasons -- essentially that people's intuition about .7> statistics and probabilities are grossly wrong. People for example .7> tend to feel that even numbers are in some sense "less random" than Would it be possible to account for this by looking at historical records of how often each number has been selected by players? If we assume, for the sake of argument, that numbers are selected in inverse proportion to their value (1 most often, 2 next, ... 42 last), and we know the 6 winning numbers and the outcomes as provided in .0 by Dwayne, can we then make an estimate of the number of tickets sold? It would be interesting to see how it compares with the "non-adjusted" values already guesstimated. John
1042.10		AITG::DERAMO	Daniel V. {AITG,ZFC}:: D'Eramo	`Wed Mar 22 1989 16:14`	7
	There's another "minor" skewing from the fact that one person buying two tickets most likely chooses different combinations on them [unless he doesn't like sharing]. If they were drawn randomly the two would be the same combination with probability equal to the probability of winning. Dan
1042.11	Guaranteed Winner!	DEC25::ROBERTS	Reason, Purpose, Self-esteem	`Wed Mar 22 1989 17:45`	10
	A related question: What's the minimum number of tickets I must buy to guarantee I'll win at least one 3-out-of-6 prize? 4-out-of-6? 5-out-of-6? 6-out-of-6 is easy: C(42,6)=5245786; i.e., the number of combinations of 6 items out of 42. /Dwayne
1042.12		BEING::POSTPISCHIL	Always mount a scratch monkey.	`Thu Mar 23 1989 08:16`	11
	Re .7: > People also tend to believe that "obvious" arithmetic progressions > are less likely to occur than something more patternless and so tend to > steer clear of those. Actually, arithmetic progressions are the most frequently chosen tickets. -- edp
1042.13	okay, so suppose it is ideal...	KOBAL::GILBERT	Ownership Obligates	`Thu Mar 23 1989 09:12`	23
	Let's restate the problem. A number of independent tests are done. The result of each test is a non-negative number t; t occurs with probability P[t]. After the tests, the number of tests that resulted in t is S[t]. Given the P[t] values and a subset of the S[t] values, determine the probability that there were exactly N tests. For example, let P[0] + P[1] = 1, and suppose we are given S[0] = s0. Then: Prob( N=s0+k \| S[0]=s0 ) = Prob( N=S[0]+k & S[0]=s0 ) / Prob( S[0]=s0 ) s0 k C( s0+k, s0 ) P[0] P[1] = ------------------------------ ; inf s0 i Sum C( s0+i, s0 ) P[0] P[1] i=0 where C(a,b) is the binomial coefficient: 'a choose b'.
1042.14	$4M split 19,412 ways is, um,	POOL::HALLYB	The Smart Money was on Goliath	`Thu Mar 23 1989 09:14`	33
	> Actually, arithmetic progressions are the most frequently chosen > tickets. Well, yes, but it isn't the A.P. property that causes that. It's the visual pattern that sometimes is also an A.P. <<< TIXEL::DUA2:[NOTES$LIBRARY]LOTTERIES.NOTE;1 >>> -< Lotteries Discussions >- ================================================================================ Note 30.1 Mass Megabucks Statistics 1 of 1 TIXEL::ARNOLD "Never repeat yourself. Never." 263 lines 23-AUG-1988 11:02 -< More statistics from the Lottery Commission >- -------------------------------------------------------------------------------- [...] Based on $4 million revenue drawing, only 70.24% pf the total 1,947,792 (6/36) possible number combinations are bet. Most popular combinations: Number of potential winners: ------------------------- ---------------------------- 01 - 08 - 15 - 22 - 29 - 36 19,412 tickets with this pattern 06 - 11 - 16 - 21 - 26 - 31 19,333 " " " " 03 - 09 - 15 - 21 - 27 - 33 9,452 06 - 12 - 18 - 24 - 30 - 36 8,015 04 - 10 - 16 - 22 - 28 - 34 6,723 05 - 11 - 17 - 23 - 29 - 35 6,075 01 - 07 - 13 - 19 - 25 - 31 5,254 05 - 10 - 15 - 20 - 25 - 30 4,276 02 - 08 - 14 - 20 - 26 - 32 4,124 01 - 02 - 03 - 04 - 05 - 06 3,257 <=== "Daddy" was right� 05 - 11 - 12 - 23 - 28 - 31 2,558 ---------- �In a manner of speaking
1042.15	Didn't mean to overgeneralize my examples.	CADSYS::COOPER	Topher Cooper	`Thu Mar 23 1989 11:42`	26
	.12 (edp) .14 (HALLYB): Interesting. I reread my note .7 and found that I gave an impression of being more specific than I meant to. I should have made clear that my examples of people's tendency to "cluster" was taken from other contexts and could not be blindly applied to this kind of lottery. They were meant only as an example of the type of clustering that can occur when people try to be random. Complicating things when you are talking about a lottery like this is that people use different strategies -- some people try to guess a "most random" number, some people use dice or some other device to get a number (I know from the ads that in Mass. there is now a service for this -- you can request a random number be chosen for you rather than you supplying one), while other people use various systems which can produce highly patterned results (e.g., betting columns on the sheet). A friend of mine told me about when he worked on the Hong Kong lottery (they used DEC computers). The Hong Kong lottery was not parimutual. One day one of the major newspapers displayed a picture of a car wreck on the front page with a prominantly displayed lisence plate number. Thousands bet on it, and it came in -- the lottery commision went bankrupt. Topher
1042.16		DEC25::ROBERTS	Reason, Purpose, Self-esteem	`Fri Mar 24 1989 10:32`	21
	RE: my own note 1042.11 (What's the minimum number of tickets I must buy to guarantee I'll win at least one n-out-of-6 prize?) n minimum = ======= 0 0 1 7 ( 1 2 3 4 5 6 ) ( 7 8 9 10 11 12 ) ( 13 14 15 16 17 18 ) ( 19 20 21 22 23 24 ) ( 25 26 27 28 29 30 ) ( 31 32 33 34 35 36 ) ( 37 38 39 40 41 42 ) 2 91 ? 3 1330 ? 4 ? 5 ? 6 5245786 /Dwayne
1042.17		KOBAL::GILBERT	Ownership Obligates	`Fri Mar 24 1989 12:49`	4
	> What's the minimum number of tickets I must buy to guarantee > I'll win at least one n-out-of-m prize? See note 746.* for this particular subproblem.
1042.18	50 cent tour of Bayesian Statistics	CADSYS::COOPER	Topher Cooper	`Tue Mar 28 1989 14:19`	152
	Bayes' Law or Bayes' Theorem says: Pr(E \| Hx) * Pr(Hx) Pr(Hx \| E) = _________________________ --- \ > Pr(E \| Hi) * Pr(Hi) / --- i Where Hx is a hypothesis. E is an event to be used as evidence about that hypothesis. Hi is any one of a set of complete (i.e., one of them has to be true) and distinct (i.e., if any one of them is true the others are all false) hypotheses which includes Hx. Pr(X) is the probability that X is true, and Pr(X\|Y) is the probability that X is true given that Y is true. The summation is, of course, over all the Hi. If the description of the H's and E's had been that they were outcomes of experimental trials, then there would have been absolutely no controversy about the above. It is simply an elementary, frequently useful, theorem of probability theory. But with the descriptions I used, Bayes' Law is the basis of the controversial discipline called Bayesian Statistics. The root of the controversy has to do with the interpretation given to the concept of "probability". In the traditional school of of statistics -- generally called the "frequentist" school -- probability refers to the frequency of undistinguished events in a large number of identical repetitions of the same circumstances. It is meaningless, according to this interpretation, to talk about the probability of a general hypothesis, since they are unique and are thus either simply true or simply false. At best one could give them probability values of 0 or 1 but no value between can be justified. Similarly, if E is in some sense a unique event -- the outcome of a specific experiment, for example -- then it also cannot meaningfully have a probability value associated with it. It either occurs or it doesn't occur. Bayesians, however, take a much broader view of probability. Essentially their view is that anything which has the proper mathematical form is "proper." This allows Bayesians to use probability theory and statistics to model "rational uncertainty." Especially controversial is the common use of "subjective probabilities", i.e., expert judgments as to the probability of something. Generally these are used for setting values for the initial prior probabilities (the probabilities of the form Pr(Hi)), but may also be used in estimating the conditional probabilities (Pr(E\|Hi)). Care must be taken that the subjective probabilities are "rational" (obey the mathematical rules of probabilities), e.g., they must be normalized so that the sum of all the exclusive probabilities equals 1. You will often see "likelihood" used instead of "probability" by Bayesians, this is essentially just a dodge to avoid some of the heat from the mainstream about what a "probability value" is. The Bayesians justification for all this is: 1) it works, 2) it corresponds better to what people actually mean when they talk about probabilities -- e.g., scientists talk all the time about the probability that a particular theory is true, and 3) since subjective judgments are going to be made anyway they may as well be made explicit, formalized and made part of the process rather than done implicitly after the "statistics" are "finished". In practice, Bayesian statistics usually is concerned with evaluating not a single piece of evidence but a set of pieces of evidence. This is done by the simple expedient of applying the above formula to one of the pieces of evidence, using whatever prior probabilities seem justified. The result of that process is then used as the prior probability in a second application of the formula using a second piece of evidence. Its fairly easy to show that the order that one uses the evidence is irrelevant to the final outcome. Less easy to show but true is that the values of the initial prior probabilities quickly become irrelevant as long as they aren't too extreme (e.g., if one of your prior probabilities is 0 it will remain 0 no matter what -- if your mind is made up, no amount of evidence can be expected to change it). One important "cultural" difference between frequentist and Bayesian statistics is what is taken as a "point estimate" of a quantity. In traditional statistics, the most common value used is the "expectation", mean or average, though occasionally the median is used. Bayesians on the other hand, tend to use the "mode" the point of highest probability. For example, in note 1042.0 estimates of the number of lottery tickets bought was given with a fraction, even though only an integer number of tickets could have been purchased. From a traditional statistical viewpoint, this is a reasonable thing to do, since the result sought is a sort of "summary" of the entire distribution of possible values. Typically a Bayesian, however, would have given an integer value -- that which had the highest probability associated with it. Frequently one will see quite different seeming formula described as Bayes' Law. These are either equivalent or are derived from the above. One particularly useful version is gotten by dividing the formula for two different hypotheses by each other, and using some defined quantities: Pr(E \| H1) R(H1, H2 \| E) = ------------ * R(H1, H2) Pr(E \| H2) Where: Pr(X \| Y) is as before, R(Hx, Hy) is the relative likelihood (probability) of Hx to Hy (e.g., Hx is four times more likely than Hy). R(Hx, Hy \| E) is the relative likelihood of Hx to Hy given that E occurred. Note that in this form one can completely ignore the probabilities, or even the existence of other hypotheses than those you are examining. Two identities (actually two forms of the same identity) which are useful with this formula in extending it to more than two hypotheses are: R(H1, H2) = R(H1, H3) / R(H2, H3) and R(H1, H3) = R(H1, H2) * R(H2, H3) Why are Bayesian statistics not more widely used despite their obvious intuitive appeal? Obviously the rather academic objections of the traditional school (even if valid) are unlikely to interfere with people "in the trenches" using them if they know about them and they wish to use them. Bayesian statistics are widely known about -- though they do not represent the mainstream, they are not the province of only a few isolated "nuts". Many elementary and most general intermediate statistics books contain a chapter on Bayesian statistics. So they seem desirable, and they are known about, so how come they are not used? The main answer is that they are a pain in the butt. Bayesian statistics require careful, explicit judgments to be made at each stage. They are not easily captured into cookbook procedures such as are the mainstay of traditional statistics. It is much harder to calculate the postori probability of a hypothesis properly by using Bayesian procedures than to calculate a "p-value" using traditional procedures and then to effectively treat it (improperly) as the probability of a hypothesis. Topher
1042.19	a Bayesian calculation	PULSAR::WALLY	Wally Neilsen-Steinhardt	`Tue Apr 18 1989 14:02`	76
	At least one other discussion of applying Bayes' Law appears in note 831. I think you would have had to do DIR/TITLE=BAYES . to find it. There may be others, but I don't remember where. Reply 1042.18 gave the form of Bayes' law that we need, so I will just carry out two calculations as an example. I will make the (unrealistic) assumption that ticket numbers were uniformly distributed. An easy way to rephrase this question is as follows: I know the probability with which some condition is satisfied, and I know the actual number of times that the condition was satisfied in a sample, so what is my best estimate of the number of events in the sample? The probabilities of the conditions are calculated as fractions in .0, and the actual numbers are given there. I will use the form of Bayes' law in .18, but I will ignore the denominator, since I know that it is just a normalizing factor that I can calculate when I need it. So my form will be Pr( Hx \| E ) = Pr( E \| Hx ) * Pr( Hx ) / D The Pr( E \| Hx ) on the right is the probability that the evidence will be seen, given the hypothesis, or in this case, the probability of seeing r conditions satisfied in n events, given that the probability of seeing the condition in one event is f. This is given by the binomial probability distribution or Pr( r \| n f ) = C( n, r ) * f^n * ( 1-f )^( n-r ) [ note the similarity to .13 ] Consider first the special case of the 6 digit match, where r = 0, and f = 1/5245786. Here Pr( r \| n f ) simplifies to Pr( 0 \| n f ) = ( 1 - f )^n To put this into Bayes' law, we need a value for Pr( Hx ), the probability we assign to n before we have any results. Conventionally we assume some large upper limit, say 10^8 and call it nmax, is more tickets than the Colorado lottery can sell, and that in our prior state of knowledge we can only assume that all ns <= nmax are equally likely, so Pr( n ) = 1/nmax. (note that those who do not like the Bayesian approach usually start jumping up and down and shouting about here). Substituting this into Bayes' Law gives Pr( n \| E ) = ( 1 - f )^n / (Dnmax) Because f is so small, this probability goes slowly to zero as n increases. In other words, the six digit match does not provide you with much information, as suggested in .2. About all it tells you is that n is likely to be less than about 10^7. So let's go to the other end and look at the three digit match. Here it is useful to note, before we get all tied up in computing factorials, that for large n and r, the binomial distribution is well approximated by a normal distribution with mean = n f and variance = n * f * (1-f). Substituting in the values in .0 tells us that the three digit match gives us a most likely value of 754761 with a standard deviation of 141. Similarly the four digit match gives us a most likely value of 698346 with a standard deviation of 35. The fact that these two most likely values are so many standard deviations apart confirms what we suspected: lotto numbers are not randomly chosen. So we don't get to use the cascading process which is traditional in Bayesian analysis: use the six digit matches to get a first estimate, then refine it with the five digit matches and so forth. Note finally that the least squares approach in .0 has been shown to be inapplicable. Because the variances differ, we should not combine the values in a simple least squares. If our assumption of randomly chosen numbers had been verified, we could have tried a weighted least squares approach as the 'classical' solution. Since our assumption was not verified, we must confess that we know almost nothing about the number of tickets sold.