[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

1009.0. "anonymous sampling" by HERON::BUCHANAN (Andrew @vbo/dtn8285805/ARES,HERON) Fri Jan 06 1989 09:00

SCENE:	McDonalds (yes, even in Antibes, France)
TIME:   22:00 5 Jan

Me:  [ranting about doubling cube in backgammon].   [pause for breath]
Steve:	Well, I was never any good at statistics.
Me:  No? [munch on burger]
Steve:  But I remember my first statistics class:  the teacher showed us all
how statistics could be useful.   The guy asked the entire class some 
mildly incriminating yes/no question, like "have you ever smoked a funny
cigarette?" and of course people don't want to answer aloud.
	But he explained that if eveyone tossed a coin secretly, and then
told the truth if it was heads, and lied if it was tails, then he could
work out the percentage that *had* Indulged, without any *individual's*
history becoming public knowledge.
	Because all the lies cancel out you see.
Me: [munch]
Steve: [munch]
Me:  Hang on, there's something not quite right here.   Whatever the
audience, with your coin scheme, you'd expect half of them to say "YES".
So how can you tell anything at all?
Steve: Well it was something to do with a coin.
Me: Now if you had a dice, and on a 5 or 6 you lie, otherwise you tell
the truth, then your going to be able to get an estimate.   But now the
guy is revealing something about himself by his remark.
Steve:  No it definitely wasn't a dice.
Me:  Can I steal one of your french-fries?
Steve:  Did he divide the room in half, or something?
Me:  I need more ketchup.
Steve:  Gee, for ten years I'd thought that that was how statistics worked.
Me:  These French fries are pretty soggy.
Steve:  Well, I said I was no good at statistics.

	Any ideas, anyone?

Andrew.

T.R	Title	User	Personal Name	Date	Lines
1009.1	some variants work	NIZIAK::YARBROUGH		`Fri Jan 06 1989 10:26`	8
	If the coins are unbiased you can't get any information out of a single flip and Head=true, Tail=false; you will tend to get exactly 50% yes-no responses. If each sample is based on truth=TWO heads in 2 throws, false otherwise, you can begin to get something out. Alternatively, if only the guilty lie when their coin is tails you can get a significant result: the number of guilty will tend to twice the number of 'YES' responses.
1009.2	Slight misunderstanding, I suspect.	RDVAX::COOPER	Topher Cooper	`Fri Jan 06 1989 11:41`	22
	This sounds like a minor misunderstanding of a previously obscure statistical survey technique which has in the last few years received a lot of attention because it has been used in surveying AIDS victims (and non-victims for control purposes). The technique is as your friend Steve described it but instead of lying if tails came up, the "subject" is instructed to then always give the "incriminating" answer (in this case, "Yes, I have smoked a 'funny' cigarette."). The surveyer cannot tell if any incriminating answer is true or not, and what's more, unless the incidence of incriminating behavior is near 100%, it is much more likely that a specific incriminating response is due to the coin flip than to sanctioned behavior. If you survey 100 people and 72 of them give the sanctioned answer, then your estimate for the population proportion is (72-50)/50 = 44%. The cost is that you have to sample 100 people to get the same accuracy you would get from 50 people in a straight forward survey. The benefit is that your 50 "effective people" are likely to be much more honest. Topher
1009.3	I think that these are blind herrings	HERON::BUCHANAN	Andrew @vbo/dtn8285805/ARES,HERON	`Fri Jan 06 1989 11:51`	50
	> Alternatively, if only the guilty lie when their coin is tails you > can get a significant result: the number of guilty will tend to > twice the number of 'YES' responses. But then you know that any individual who says 'yes' must be guilty, contrary to the principle of protecting any individual's privacy. > If the coins are unbiased you can't get any information out of a > single flip and Head=true, Tail=false; you will tend to get exactly > 50% yes-no responses. If each sample is based on truth=TWO heads > in 2 throws, false otherwise, you can begin to get something out. This is a compromise. It breaks the symmetry which is very important, but still privacy is not protected here. If someone says 'yes', then what do we conclude? P(guilty\|yes) = P(yes\|guilty)P(guilty) ----------------------------------------------------- P(yes\|guilty)P(guilty) + P(yes\|innocent)P(innocent) By Bayes' Theorem. Assuming that P(guilty) = p, then this expression equals: p/4 --------------- = p/(3-2p) p/4 + 3(1-p)/4 while P(guilty\|no) = 3p/(1+2p) See that (assuming p =/= 0 or 1, which is reasonable): P(guilty\|no) >= P(guilty\|yes). Which isn't keeping privacy. The guy has revealed something about his probable history by what he said. What we want is P(guilty\|no) = P(guilty\|yes), but to still have some estimate of p which is consistent (ie. as n increases, our estimate of p converges on the real p). So instead of �, let's say we tell the truth we probability q. Then we want, for all p in (0,1): pq / (pq + (1-p)(1-q) ) = p(1-q) / ( p(1-q) + (1-p)q ) => q^2 = (1-q)^2 => q = �. Which is exactly want robs us of our consistency. So, we need something a little more subtle than just* tossing a coin. Any ideas?
1009.4	Where does the apostrophe go in "Bayes Theorem"?	AITG::DERAMO	Daniel V. {AITG,ZFC}:: D'Eramo	`Fri Jan 06 1989 11:55`	31
	Let p be the probability that the true answer is yes. Suppose everyone flips a coin and tells the truth on heads, and lies on tails. Suppose the probability of heads is q. Then the probability of a yes answer is: pq + (1-p)(1-q) = pq + 1 - p - q + pq = 1 - (p + q) + 2pq The probability of a no answer is: (1-p)q + p(1-q) = q - pq + p - pq = (p + q) - 2pq Do these add to one? Yes. :-) For a fair coin q = 1/2, and the probability of a yes answer becomes 1 - (p + q) + 2pq = 1/2 - p + p = 1/2. So using q=1/2 gives no information about p. Suppose however one uses dice, and say, q = 1/3. Then the probability of a yes answer is now 1 - (p + q) + 2pq = 2/3 - p + (2/3)p = 2/3 - p/3 or (2 - p)/3. Thus one can now get some information from the proportion of yes answers. However, I bet that using Bayes Theorem will show that in either case (i.e., q = 1/2 or q not= 1/2) an individual's answer does reveal information about the individual (unless p = 1/2). More later. Dan
1009.5	oops	AITG::DERAMO	Daniel V. {AITG,ZFC}:: D'Eramo	`Fri Jan 06 1989 12:00`	4
	.2 and .3 came in while I was replying; .3 already contains the "follow up" and the answer to the title of .4. Dan
1009.6	.2 & .3 are malordered	HERON::BUCHANAN	Andrew @vbo/dtn8285805/ARES,HERON	`Fri Jan 06 1989 12:44`	45
	> The technique is as your friend Steve described it but instead of > lying if tails came up, the "subject" is instructed to then always > give the "incriminating" answer (in this case, "Yes, I have smoked > a 'funny' cigarette."). The surveyer cannot tell if any incriminating > answer is true or not, and what's more, unless the incidence of > incriminating behavior is near 100%, it is much more likely that > a specific incriminating response is due to the coin flip than to > sanctioned behavior. Yes, this has to be a valuable technique in practice. But still the guy who says 'yes' may be a smoker, whilst the guy who says 'no' cannot be. With the figures you used above, 44 out of 72 are smokers. If one is a libertarian or paranoid person, one could imagine that this would enable a government to 'home in' on a particular subset. The question is: does there exist a technique where we can extract general information, without any loss of privacy for the individual? I had an idea... It's a slightly flippant idea, but it might be that it has a serious application, in some different domain. (1) Divide the individuals into two classes, A & B. (2) Explain the question to those in in class A, and ask them to reply. Each can lie or tell the truth, as they please. (3) Ask those in class B to toss a coin each. If heads, goto (4) if tails goto (5). (4) Ask that person to tell the truth (5) Ask that person to lie or tell the truth, as they please. Suppose that x of class A say "Yes" and y of class B. Then how about 2y-x as an estimate of the total number of smokers. This assumes that the members of class A and the members of class B would behave the same if asked to say yes or no, as they please. There may be a little care in experimental design required to ensure that the members of class A are in exactly the same state as class B. E.g. get everyone* to toss a coin, and open one of two envelopes on that basis (both enevlopes contain the same message for class A) then make a decision. Is this valid?
1009.7	Not completely	RDVAX::COOPER	Topher Cooper	`Fri Jan 06 1989 15:58`	66
	RE: .6 (Andrew) > ... does there exist a technique where we can extract general > information, without any loss of privacy for the individual? In a word: no. General information about any sampled group about sensitive subjects can be used to stigmatize members of that group. If we discover that 80% (to make up a figure) of AIDS patients engage in socially unacceptable behavior, then we can conclude that any particular AIDS patient (whether or not they participated in the survey) probably engages in the unacceptable behavior. And even if the survey results cannot be generalized, then it can still be used to stigmatize the individuals who participated in the survey (if 80% of the people who participated in the survey beat their spouses, then the survey can be used to label the people who took part as spouse-beaters). However, this does not rule out decreasing or eliminating the specifically personal risk of someone in the "tell the truth" group answering honestly with a truthfully "stigmatizable" response, or, for that matter, the risk to someone in the "always answer stigmatizable" group being lumped in (statistically) with the truthfully stigmatizable group. The method I described can be adjusted quite simply to reduce the risk to the individual to any desired degree. Simply increase the relative size of the "always answer stigmatizable" to the desired level. If the instructions are to answer truthfully only if two coin flips both come up heads, than a stigmatizable answer is even less likely to indicate stigmatizable behavior. The cost is, of course, that larger and larger groups are needed for the same level of accuracy. The method assumes that one response is stigmatizable while the other would always be considered safe. The method will not work if either response might be stigmatizable. In that case the population should be divided by the initial coin-toss (more likely: die roll) into three groups: always answer A, always answer B and tell the truth. The first two groups would be ideally equally proportioned, or, more sophisticatedly, proportioned according to the relative risk of the two answers (the original method is a specialization of this sophisticated proportioning). Note that if there is no risk associated with one of the answers then the 50% proportioning provides no additional protection to the honestly stigmatizable, but increases the risk of stigmatization to the honestly non-stigmatizable group. A variant of this new method, would be to use one coin flip to determine whether someone is in the "random answer" or "honest answer" groups, then a second coin flip to determine in the former case what the random answer should be. Unless I have missed something, this is essentially the method you have proposed, except the second coin flip is replaced with the subject's impulse as a randomizer, and group A has been added to estimate the characteristics of that randomizer. I see no benefit to this, since it requires a much larger sample (to include group A), is less reliable (since our estimate of the proportions of each random answer is subject to sampling variation), and may deviate from the ideal proportions (to see this note that if all "random responders" are moved to make the same response then the method is the same as the original, except we have added group A). Topher
1009.8		KOBAL::GILBERT	Ownership Obligates	`Sun Jan 08 1989 23:42`	12
	Suppose we spin a roulette wheel, and use the table: Black -> Answer "Yes" Red -> Answer "No" 00 -> Tell the truth Then with a large enough sample, we should get a significant result, and knowing an individual's answer doesn't give enough information to stigmatize him. P.S. We could just use secret ballots. :^)
1009.9	I guess they could wear a mask :-)	RDVAX::COOPER	Topher Cooper	`Mon Jan 09 1989 15:21`	18
	RE: .8 An excellent example of a device such as I was trying to describe (I probably should have included a concrete example such as this to clarify what I was saying. Thanks). > P.S. We could just use secret ballots. :^) Despite the smiley face it may be worthwhile mentioning the context which makes this technique a useful one. Written questionaires tend to be biased in response and accuracy with respect to people who are partially or wholly illiterate in English. A verbal interview allows the interviewer to assess "interactively" that the interviewee understands what is being asked and to take corrective action if not. Complete annonymity then rests on trust of the interviewer and hence the problem. Topher
1009.10	there are three privacy concerns here	PULSAR::WALLY	Wally Neilsen-Steinhardt	`Wed Jan 18 1989 13:51`	32
	Note that .3 and .7 are raising privacy concerns that the test method, correctly described in .2, was not intended to address. The single concern motivating the test method as described was: suppose that I as a subject give an incriminating answer. Could this be traced back to me as an individual and used to incriminate me? The method described in .2 removes this concern, since there is no proof that the incriminating answer is true. .3 raises a second concern: that an incriminating answer may raise the subjective probability that the incriminating answer is true for the individual. As discussed elsewhere, particularly in .3 and .4, this change to subjective probability can be minimized but not eliminated. I personally would argue that this second concern is less significant than the first, since this subjective probability is never admissible as evidence in a criminal case and seldom ina civil case. But other people have other standards of privacy. .7 raises a third concern: that survey results may be used to stigmatize a particular group. However, the connection between group characteristics is often what is being sought by the survey. This is a conflict in goals which no test design can eliminate. For (a controversial) example: suppose a public health agency wants to test the hypothesis that promiscuous homosexuals have an increased risk of having AIDS. The agency says it needs the information to design a prevention campaign. An advocacy group says it will be used to inflame public opinion against promiscuous homosexuals. Any test design which satisfies the agency will be objectionable to the group, and vice versa. The real issue here is the relative merits of the two arguments, not the test design. There is no general answer here, since for most of us, changing the details of the situation will change the side we favor.