[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

1291.0. "On Mathematical Illiteracy" by CIVAGE::LYNN (Lynn Yarbrough @WNP DTN 427-5663) Tue Sep 04 1990 10:11

In John Allen Paulos' recent (and very valuable) book "INNUMERACY - 
Mathematical Illiteracy and its Consequences" (Published 1988 by Hill &
Wang) a section on conditional probability contains the following: 

"A confusion between the probability of A given B and the probability of B 
given A is also quite common. A simple example: the conditional probability
of having chosen a king card when it's known that the card is a face card -
a king, queen, or jack - is 1/3. However, the conditional probabillity that
the card is a face card given that it's a king is 1, 100 percent. The
conditional probability that someone is an American citizen, given that he
or she speaks English, is, let's assume, about 1/5. The conditional
probability that someone speaks English, given that he or she is an
American, is probably about 19/20 or .95. 

"Consider now some randomly selected family of four. Given that Myrtle has 
a sibling, what is the conditional probability that her sibling is a 
brother? Given that Myrtle has a *younger* sibling, what is the conditional
probability that her sibling is a brother? The answers are, respectively,
2/3 and 1/2. 

"In general, there are four equally likely possibilities for a family with 
two children - BB, BG, GB, GG - where the order of the letters B (boy) and 
G (girl) indicates birth order. In the first case, the possibility BB is 
ruled out since Myrtle is a girl, and in two of the three other equally 
likely possibilites, there is a boy, Myrtle's brother. In the second case,
the possibilities BB and BG are ruled out since Myrtle, a girl, is the
older sibling, and in one of the remaining two equally likely
possibilities, there is a boy, Myrtle's brother. In the second case, we
know more, accounting for the differing conditional probabilities." 

Any comments B^) ?

T.R	Title	User	Personal Name	Date	Lines
1291.1	Mass testing.	CADSYS::COOPER	Topher Cooper	`Tue Sep 04 1990 17:17`	26
	Here is another, rather topical, "paradox" of conditional probability (paradox in the sense that its results are contrary to "common sense"), that's been making the rounds in various forms. The numbers used here are my own -- they produce rather "tidy" calculations. I'm casting this in the form of employee drug testing, but it is applicable to many forms of "group testing". An obvious example is mass testing for HIV virus antibodies, but there are other examples, not all of them medical. -------------------------------------------------------------------------- A particular company has decided to institute a program of testing for drug use among its employees. In the general population from which the employees are randomly selected 1% are drug users. The test has a 1% failure rate both in the sense that out of all non-drug users tested 1% will get (false) positives, and in the sense that out of all drug users tested 1% will get (false) negatives. A randomly chosen employee is tested and is found to test positive, with unpleasent consequences to the employee. The question is: what is the probability that the employee is, in fact, a drug user? Try first to guess the answer without calculation -- preferably without taking into account my presumed motivations for posting this. The calculated answer will be in the next reply. Topher
1291.2	Mass testing answer.	CADSYS::COOPER	Topher Cooper	`Tue Sep 04 1990 17:29`	22
	Probability of an incorrect accusation = 1/2. Imagine that there are 10000 employees tested. By hypothesis, the expected number of drug users are 100. Among actual drug we expect 1%, or 1 drug user to get a false negative. Therefore we expect 100-1 = 99 drug users to get a (true) positive. We would expect that there will be about 10000 - 100 or 9900 non-drug-users. Among them we expect 1%, or 99 non-drug-users to get a false positive. We expect, that there will be the same number (99) of false positives as true positives overall, and thus that any particular positive is as likely to be false as true. Think about it. Topher
1291.3	But is this the problem?	CIVAGE::LYNN	Lynn Yarbrough @WNP DTN 427-5663	`Wed Sep 05 1990 10:44`	5
	While your point is well taken - Paulos discusses it as well - you have missed the point of the base note. The author screwed up, either intentionally or not (in the following section of the chapter he discusses the very trap that he fell into). It's worth thinking through exactly where he went off the rails, so I will try to refrain from saying more....
1291.4		GUESS::DERAMO	Dan D'Eramo	`Wed Sep 05 1990 11:52`	14
	>> .0 "Consider now some randomly selected family of four. Given that Myrtle has >> a sibling, what is the conditional probability that her sibling is a >> brother? Given that Myrtle has a younger sibling, what is the conditional >> probability that her sibling is a brother?" >> .3 The author screwed up Aha! The author wrongly assumes that the randomly selected family of four is made up of a father, a mother, and two children, and that Myrtle is one of the two children and is a girl. It could be that Myrtle is the mom, and she happens to have a brother, who is not a member of the family of four. Dan
1291.5	...but did I?	CADSYS::COOPER	Topher Cooper	`Wed Sep 05 1990 13:31`	41
	RE: .3 (Dan) Whether this could be considered an error or not is a matter of philosophy. Given any finite description of a problem we must make certain assumptions in order to solve that problem. If we don't make assumptions then: why assume that Myrtle is female rather than a male with a weird name? Why assume that Myrtle is a member of the family at all? Why assume that the probability of a girl vs a boy is reasonably approximated by � -- maybe this takes place in a small rural Chinese village where girl-babies are frequently killed at birth? Why assume that the informant is telling the truth -- maybe its really a family of 12? Maybe the informant is speaking in code? Word problems differ from first hand real world problems in one important way. In a first hand real-world problem we would deal with these issues probabilistically, and, if we feel that alternatives have slightly too high a probability than we could conditionalize our answer by our assumptions. In a word problem there is a convention that routine assumptions which are required for the solution of the problem are correct unless contradicted -- directly or subtly (in which case it is a "trick" question) -- by information provided. This is doubly true in problems provided for expository purpose rather than for puzzles. It would be legitimate for an author to announce for later expository purposes that they had "screwed up" in a problem -- that they had through error or connivance not followed the expository rules and stated all necessary information. It is legitmate to do this to make the point that in the "real world" reasonable assumptions necessary for solving the problem are not always correct. It is then an amusing exercise to see what assumptions may most plausibly be modified. But it is the author who "screwed up" (or pretends to have), not the reader who accepted a chain of reasoning and assumed that the author had "played fair" in presenting the problem. If this was the "point" of .0 (rather than some other error which continues to elude me) then I am quite content to have missed it. I would be embarrased to be such a nit-picker as to have caught it (and I am, as those who know me will acknowledge, a world-class nit-picker :-)). Topher
1291.6	As I see it...	CIVAGE::LYNN	Lynn Yarbrough @WNP DTN 427-5663	`Wed Sep 05 1990 15:14`	38
	My point follows, after a <ff> and suitable spaces to push a DECWindows image off the screen: The author's conclusion is nonsense: the birth order of children is irrelevant to their sex, and the 2/3 probability he obtains is completely off-the-wall. The author's analysis of cases is wrong. In general, when the sexes of both siblings are unknown, the four relevant cases are BB, BG, GB, GG. But when the sex of one, Myrtle, is known - and it doesn't matter what sex Myrtle is! - the four relevant cases are BM, MB, GM, and MG, where M is Myrtle; as reason and instinct would indicate, the odds of B vs G for the sibling are (as roughly as you like) 50-50. While knowing that the sibling is younger does add information, it is irrelevant information - like knowing that it snowed the day Myrtle was born - and has no effect on the conditional probabilites.
1291.7	...and more detailed	CIVAGE::LYNN	Lynn Yarbrough @WNP DTN 427-5663	`Wed Sep 05 1990 15:37`	32
	Further explanation follows, after a <ff> and suitable spaces to push a DECWindows image off the screen: According to Paulos, the four relevant cases are BB, BG, GB, GG. He correctly dismisses the BB case, but then states that the remaining cases - BG, GB, and GG - are equally likely. That's false - it fails to take into account that Myrtle may be either older or younger, so the GG case is actually Gg plus gG, where g is the sister, thus twice as likely as either of the other two, and as likely as both.
1291.8	Beg to disagree...	CADSYS::COOPER	Topher Cooper	`Wed Sep 05 1990 17:51`	170
	RE: .6, .7 Further discussion follows, after a <ff> and suitable spaces to push a DECWindows image off the screen: I think that this is exactly the type of thing I was talking about in my previous note, albeit a more subtle case of it. The "answer" we get depends on the assumptions made about the procedure by which the expositor came to make the statement which (s)he did. It is my opinion that the most direct assumption for purposes of a word problem lead to the reasoning in .0, while the reasoning in .7 would have to be directly stated to be accepted as a "fair" problem statement. The assumption in .0 comes from the following presumed selection procedure. Given a population of families of four (two children), for the first subproblem: 1) A random family is sampled. 2) If both children are boys repeat 1. 3) If only one of the children is a girl, ask her name and announce it. If both children are girls, choose one of them at random and announce her name. That is, we select a random family in which at least one of the children are a girl, and we provide that (or one of the) girls names. The process for the second subproblem is the same except that steps 2 and 3 become: 2') If the older child is a boy repeat 1. 3') Announce the name of the older child (a girl). In this case we select a family at random in which the older child is a girl and we provide her name. Under these assumptions, the analysis in .0 is correct. If anyone is in doubt, here is the main routine of a simulation I wrote (appologies to BLISSaphobes): ROUTINE mainroutine = BEGIN LOCAL oldercount, girlcount, yngrbrocount, brocount; oldercount = girlcount = yngrbrocount = brocount = 0; randinit(); INCR simulations FROM 1 TO number_simulations DO BEGIN LOCAL family; ! Encoded as two bits, high bit older sibling, low bit ! the younger sibling. Bit=1 a boy, Bit=0 a girl. family = randfamily(); SELECT .family OF SET [0, 1]: ! An older girl oldercount = .oldercount + 1; [0, 1, 2]: ! A girl. girlcount = .girlcount + 1; [1]: ! A younger brother to a girl. yngrbrocount = .yngrbrocount + 1; [1,2]: ! A brother to a girl. brocount = .brocount + 1; TES; END; display(.yngrbrocount, .oldercount, .brocount, .girlcount); SS$_NORMAL END; I ran the simulation 10000 times. The result was that: proportion of times that an older sister had a younger brother: 2494/4973 = .5015081 ~= 1/2 proportion of times that there was at least one sister that she had a brother: 5042/7521 = .6703895 ~= 2/3 I believe that this is the most natural interpretation of the problem, and the one which is most reasonable for a reader to assume was intended. Other interpretations are possible, however, and these lead to other results. We can for example, assume that the following process was used for the first subproblem: 1) A random family is sampled. 2) If both of the children are named Myrtle then repeat from 1 (an unnatural naming situation). 3) If neither of the children are named Myrtle then repeat from 1. We assume here that a child named Myrtle will be a girl. 4) Announce that one of the children is named Myrtle. The second subproblem is similar except 3 and 4 become: 3') If the older child is not named Myrtle then repeat from 1. 4') Announce that the older child is named Myrtle. Here the probabilities are always equal, but dependent on the probability that a randomly selected girl will be named Myrtle. The probability in either case of Myrtle having a brother is: 1 ------- 2 - p Where p is the probability of a girl being named Myrtle. When I simulated this situation (still running 10000 simulations) with a probability of 1/2 that any particular girl would be named Myrtle (program available upon request): proportion of times that an older Myrtle had a younger brother: 2502/3743 = .6684477 ~= 2/3 proportion of times that a Myrtle had a brother: 1275/1923 = .6630265 ~= 2/3 Here knowing that Myrtle was the older sibling added no additional information, but unlike the analysis in .7 the probability of a (a younger) brother is not 1/2, and cannot be 1/2. The analysis in .7 seems to be based on the rather poorly motivated selection procedure: 1) Select a (girl) child named Myrtle, and examine her family. 2) Announce that the family has a child named Myrtle. The second subproblem uses: 1') Select an eldest (girl) child named Myrtle, and examine her family. 2') Announce that the family has an eldest child named Myrtle. I would say that this is a rather unnatural interpretation for the problem stated in .0, although possible. Topher
1291.9		GUESS::DERAMO	Dan D'Eramo	`Wed Sep 05 1990 18:26`	34
	And here is my simulation: Lisp> (pprint-definition 'trial) ;;; The function TRIAL has a compiled definition. (DEFUN TRIAL () (LET ((K1 (SVREF '#(JOE FRED MYRTLE SUSAN) (RANDOM 4))) (K2 (SVREF '#(JOE FRED MYRTLE SUSAN) (RANDOM 4)))) (IF (EQ K1 K2) (TRIAL) (IF (OR (EQ K1 'MYRTLE) (EQ K2 'MYRTLE)) (LIST K1 K2) (TRIAL))))) Two kids are generated at random. Cases where the two have the same name, or where neither is named Myrtle, are rejected. Of the remaining cases, in 10,000 trials I ended up with (JOE MYRTLE) 1698 times (MYRTLE FRED) 1651 times (FRED MYRTLE) 1655 times (MYRTLE SUSAN) 1621 times (SUSAN MYRTLE) 1748 times (MYRTLE JOE) 1627 times -------------------------- 10000 times (The TRIAL function implements that each family is one in which there is one Myrtle and that she has a sibling.) Of the 10,000 cases, in 6,631 of them Myrtle has a brother (66%). Of the 4899 cases in which Myrtle has a younger sibling (i.e., in which Myrtle is older), in 3278 of them (67%) Myrtle has a brother. Dan
1291.10		CADSYS::COOPER	Topher Cooper	`Thu Sep 06 1990 12:38`	18
	RE: .9 (Dan) For those not following real close, Dan's LISP simulation is a simulation of a situation analytically equivalent to my second pair of assumptions about how sampling is done. If he expanded the number of names to six (three boys and three girls names) he would get results of around 60% for both cases. Dan's outer IF expression could be rewritten as: (IF (XOR (EQ K1 'MYRTLE) (EQ K2 'MYRTLE)) (LIST K1 K2) (TRIAL)) ... if Common LISP included an XOR predicate, which it does not -- an odd lack in "The language for those who want everything.". Topher
1291.11	determining what's routine is not routine :-)	EAGLE1::BEST	R D Best, sys arch, I/O	`Thu Sep 06 1990 15:46`	50
	re .5 >RE: .3 (Dan) > > Whether this could be considered an error or not is a matter of > philosophy. Given any finite description of a problem we must make > certain assumptions in order to solve that problem. If we don't > make assumptions then: why assume that Myrtle is female rather than > a male with a weird name? Why assume that Myrtle is a member of the > family at all? Why assume that the probability of a girl vs a boy > is reasonably approximated by � -- maybe this takes place in a small > rural Chinese village where girl-babies are frequently killed at birth? > Why assume that the informant is telling the truth -- maybe its really > a family of 12? Maybe the informant is speaking in code? > > Word problems differ from first hand real world problems in one > important way. In a first hand real-world problem we would deal with > these issues probabilistically, and, if we feel that alternatives have > slightly too high a probability than we could conditionalize our answer > by our assumptions. In a word problem there is a convention that > routine assumptions which are required for the solution of the problem > are correct unless contradicted -- directly or subtly (in which case it > is a "trick" question) -- by information provided. This is doubly true > in problems provided for expository purpose rather than for puzzles. I agree with your statements about having to make some assumptions. However, I often have trouble determining what the 'routine assumptions' are. I have noticed an interesting (and annoying) psychological phenomenon. As I've grown older and more experienced(?), I find I have to take more time solving word problems. At first, I wondered whether I might be dimming intellectually (still a possibility :-), but have since come to the conclusion that the difficulty principally arises from being able to produce more problem interpretations and self-consistent assumption sets than I would have been able to in the past. I now more frequently conclude that problem statements are 'underspecified'. Some related comments on proofs: When reviewing proofs, I find that I have become highly sensitive to unstated assumptions, and frequently find that the proofs are not as convincing as they used to be. When I try proving things myself, I frequently wind up with a set of relatively low level subproofs that can prove maddeningly difficult to crack, because they consist of things I previously thought of as 'immediately obvious to the most casual observer', but now feel the need to justify. I got a good dose of this in independently working on MATH 313 (the integer rectangle problem). (Does anyone else experience this ??)
1291.12	You look good in a uniform - distribution	CIVAGE::LYNN	Lynn Yarbrough @WNP DTN 427-5663	`Thu Sep 06 1990 16:26`	52
	> The assumption in .0 comes from the following presumed selection procedure. > Given a population of families of four (two children), for the first > subproblem: > 1) A random family is sampled. > 2) If both children are boys repeat 1. > 3) If only one of the children is a girl, ask her name and announce > it. If both children are girls, choose one of them at random > and announce her name. > That is, we select a random family in which at least one of the > children are a girl, and we provide that (or one of the) girls names. While that is a random way of choosing a girl for the 'experiment', it does not appear to me to be based on a uniform distribution. Assuming 'Myrtle' is available, in the cases BG and GB she is 100% likely to be the girl of interest, but in the GG case only 50%; the process is biased toward choosing BG siblings. If we select one child from a pair from a uniform distribution and find that it is a girl, the odds are 2-1 that the choice was from GG as opposed to either BG combination; given that a girl was selected the complete set of probabilities is: BB 0 BG .25 GB .25 GG .50 and again in half the cases the sibling is a boy, in half a girl. To my mind a better algorithm to model the situation is: Choose a pair p randomly from <00,01,10,11>. Choose a member m from <0,1>. Examine the mth bit of p. If it's a 1(boy), reject the case; that can't be Myrtle. If it's a 0, it's Myrtle, so increment the count of girls if the other bit is 0, else increment the count of boys. If we restate the problem in purely non-sexist terms it becomes a bit clearer, I think. (My name is unisex, so I will use it in what follows.) Consider a random pair of siblings. One of them happens to be named Lynn. What are the chances that Lynn's sibling is of the opposite sex? Under the assumption that the sexes are equally common and uniformly distributed, it is very hard to come up with any answer other than 50-50. To model this case, the algorithm would be: Choose a pair p randomly from <00,01,10,11>. If the bits are the same, increment the count of like siblings, else increment the count of different siblings. Even if males and females are not equally common, the answer is very close to 50-50. Assume that boys occur 40% of the time, girls 60%. Then BB+GG will occur (.4.4+.6.6)=.52 of the time, and (BG+GB) will occur (.4.6+.6.4)=.48. The sum of the (BB+GG) frequencies tends to be the same as the sum of the (BG+GB) frequencies.
1291.13	Uniform distributions -- multiform problems.	CADSYS::COOPER	Topher Cooper	`Thu Sep 06 1990 17:49`	38
	RE: .12 (Lynn) Lynn, you are absolutely correct, but you are solving a different problem than the one stated -- unless you stretch the assumptions about interpretation quite far. If you uniformly select a girl (or a girl named Myrtle) and ask what the a postori probability that (assuming the girl has a single sibling) Myrtle's family is a gg family, then the answer is, without question, 1/2. Why? Because gg families are twice as likely to be sampled as bg and gb families -- the families are not being sampled uniformly. But the problem statement is: Consider now some randomly selected family of four. Using the "default" definition of "randomly selected" (i.e., uniformly sampled) this implies strongly that the family is selected uniformly rather than the girl. I do not consider this a legitimate interpretation of the problem: the propounder was not playing fair if this was the intention. I would accept as perhaps a legitimate difference in "default reasoning" a different interpretation of how the actor came to announce that the family has a child named Myrtle, but not shifting the "random selection" to uniform selection of a child. For this to be a reasonable statement, we would have had to have a statement to the effect: Consider now a randomly selected child with a single sibling. The analysis in .0 is an accurate analysis of a reasonable interpretation of the problem statement (and I think the most reasonable interpretation). The analyis in .12 is an accurat analysis of another problem or of a rather unreasonable interpretation of the same problem -- by no stretch of the imagination can it be taken as THE uniquely correct interpretation/analysis of the problem stated in .0 Topher
1291.14	What? There is still something left to say?	CSSE::NEILSEN	I used to be PULSAR::WALLY	`Fri Sep 28 1990 12:48`	58
	I've been puzzling over the example in .0 for six months, off and on. The one thing I am sure of is that it is not a good example of mathematical illiteracy, since literate people can give different answers. My thanks to Lynn and Topher, because their comments have helped me to clarify my thinking greatly. What follows is to some extent just a clarification of their ideas, although I reach a different conclusion. Look carefully at the first two sentences in Paulos' statement of the problem. >"Consider now some randomly selected family of four. Given that Myrtle has >a sibling, what is the conditional probability that her sibling is a >brother? Ignoring the many enticing bypaths suggested by Dan in .4, it is clear that there is something missing between these two sentences. In a randomly selected family of four, the probability is very small that there will be a child named Myrtle. There is only a 75% probability that there will be a girl child at all. The question is what is missing. Topher suggests that first sentence should be "Consider a randomly selected family of four with at least one girl child", and Paulos seems to have had something like this in mind, judging by his analysis. A sentence must also be inserted, something like "Announce the name of a girl child, chosen randomly from the set of girl children in the family." There is another quite reasonable insertion. Add the second sentence: "Select a child at random and announce its name." This is close to, but I think not identical to, Lynn's suggestion in .12. This gives 50% as the answer to both questions, as the following table shows: Case Pair of Child Sex of Sex of # Children Selected Selected Other Child Child 1 BB 1 B B 2 BB 2 B B 3 BG 1 B G 4 BG 2 G B 5 GB 1 G B 6 GB 2 B G 7 GG 1 G G 8 GG 2 G G All cases are equally probable, assuming that the selection is random and independent. Cases 4,5,7 & 8 apply to Paulos' first question; there are two B and two G cases for the other child. Cases 4 & 8 apply to Paulos' second question; there is a B case and a G case. So the answer is 50% in both cases. Both ways of interpreting the text seem to me acceptable, although there are a few reasons for preferring the second. The first interpretation requires changing the meaning of the first sentence. Under it the family of four is not randomly selected; it is selected according to an implicit condition. The second interpretation seems to make less of a change to the actual words of the problem. I guess this is another illustration of how hard it is to write unambiguously.