[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

1487.0. "Probability problem" by ULTRA::ELLIS (David Ellis) Mon Aug 26 1991 15:32

Let K be a small finite set {1,...,k}, and let E be a large finite set of 
elements with a uniformly distributed mapping f: E -> K.  That is, if e is a 
randomly selected element of E and 1 <= j <= k, then the probability that 
f(e) = j is 1/k.

Consider the collection S of subsets P of E such that
card(P) = p and card(f(P)) = p.  The intent is that each P in S maps under f
into p distinct elements of K.

If we select a collection C consisting of n randomly chosen sets from S,
what is the probability that C covers K, i.e. card(f(C)) = k?

How many sets should we expect to have to select from S before our collection
covers K?

T.R	Title	User	Personal Name	Date	Lines
1487.1	Before we get started,...	CORREO::BELDIN_R	Pull us together, not apart	`Tue Aug 27 1991 07:35`	16
	>Consider the collection S of subsets P of E such that >card(P) = p and card(f(P)) = p. How would you go about generating S and its members? First, obviously p <= k, since card(f(P)) = p. But it seems that since card(P) = P, S has a very special character. It is certainly much smaller then P(E), the power set of E. I understand how to check that P is in S, but to make a random selection from S, I will need either an algorithm for generating successive elements of S or a list of its members. Maybe I'm missing something, but before I invest much time in your question, I need to understand the intermediate problems. Dick
1487.2	clarification (I hope)	ULTRA::ELLIS	David Ellis	`Tue Aug 27 1991 11:33`	12
	> How would you go about generating S and its members? I understand how > to check that P is in S, but to make a random selection from S, I will > need either an algorithm for generating successive elements of S or a > list of its members. The collection S is not meant to be accessible or constructible as a whole. What needs to be "generated" (I prefer the term "selected") are randomly chosen P that satisfy the conditions of being in S. This is intended to mean subsets of E containing p elements whose images under f are distinct but otherwise randomly distributed over K. Please see the next reply for some context behind this problem.
1487.3	context	ULTRA::ELLIS	David Ellis	`Tue Aug 27 1991 11:48`	16
	This problem is a generalization of something that might be viewed as an inverse to the Birthday Paradox. Let E be a large set of people and K be the set {1, ..., 365} of their birthdays (excluding leap year day). If we let p=1 for the moment, then S should be equivalent to E itself. Given n randomly chosen people, the probability that there is a duplication of birthdays rises with n, crossing the 50% level at about n=24. The question I am asking in this case is how much duplication is encountered before we exhaust the space? In other words, how many people should we expect to gather together before their birthdays cover the entire year? To generalize, consider sets P of p people with distinct birthdays. To some extent, this "quantizes" the distribution. Is there a significant effect of this quantization on the expected number of people needed to exhaust the space of birthdays?
1487.4		ALLVAX::JROTH	I know he moves along the piers	`Tue Aug 27 1991 15:17`	6
	That sounds somewhat like the coupon collectors problem... there are some notes in here somewhere about it (under the teabag proberbs note?) No, I don't know the distribution off the top of my head though. - Jim
1487.5	Approximate answer.	CADSYS::COOPER	Topher Cooper	`Tue Aug 27 1991 15:39`	26
	Well, here is a rough estimate: Let q be the proportion of the k "birthdays" not selected at each round. At each round an expected proportion q of the those birthdays which have not been "taken" at the beginning of the round are still not taken at the end of the round. Therefore the expected proportion of the k birthdays which are not taken after i rounds is q^i, and the expected number of birthdays which are not taken after i rounds is N*q^i. A rough estimate of the expected number of rounds before all the birthdays are taken can be gotten by finding the i such that the expected number of birthdays untaken after i rounds is less than 1. Some math tells us that the expected number of birthdays untaken after i rounds will be less than one when: i >= -ln(k)/ln(q) So the expected number of rounds before all the birthdays are taken is approximately: ceiling(-ln(k)/ln(q)) Topher
1487.6		CLT::TRACE::GILBERT	Ownership Obligates	`Tue Aug 27 1991 18:16`	10
	We can almost convert this problem to the birthday problem. The rephrased question from .0 is: What is the probability that p*n people completely cover K, the space of birthdays? However, this doesn't quite work, because the elements in a subset P cover K better than would be expected by p independent elements from E; e.g., we know that card(P) = card(f(P)) = p (so these p people never share a birthday).
1487.7		ALIEN::EDP	Always mount a scratch monkey.	`Wed Aug 28 1991 08:15`	8
	The disc-changer problem in topic 1144 bears some resemblance to this problem; they both have aspects of repeated selection without replacement and expected number of repetitions to cover a set. That one wasn't solved. This one may be easier; solving it may help the other. -- edp
1487.8		ULTRA::ELLIS	David Ellis	`Wed Aug 28 1991 10:58`	8
	Re: .4 > That sounds somewhat like the coupon collectors problem... there are > some notes in here somewhere about it (under the teabag proberbs note?) No topics in this conference with titles containing "coup", "teabag" or "proverb". There are a couple of topics matching "collect", but they seem to be unrelated to this problem.
1487.9	psst -- it's note 831	VMSDEV::HALLYB	The Smart Money was on Goliath	`Wed Aug 28 1991 14:01`	7
	> No topics in this conference with titles containing "coup", "teabag" or > "proverb". There are a couple of topics matching "collect", but they seem > to be unrelated to this problem. Try "tea bag" (2 words). John
1487.10		CLT::TRACE::GILBERT	Ownership Obligates	`Thu Aug 29 1991 20:05`	7
	I'm making some progress on this. Here's a subproblem, ... Given p distict values 'uniformly' distributed in the range 1..k, what is the probability that exactly h of them are in the range 1..r? Where 0 <= h <= p <= k, and 0 <= h <= r <= k.
1487.11		ZFC::deramo	a pleasant distraction	`Fri Aug 30 1991 11:46`	13
	>Given p distict values 'uniformly' distributed in the range 1..k, >what is the probability that exactly h of them are in the range 1..r? > >Where 0 <= h <= p <= k, and > 0 <= h <= r <= k. Isn't this just a counting problem? Without replacement, h values can be selected from 1..r in C(h,r) ways. That leaves p-h values which are selected from r+1...k in C(p-h,k-r) ways. The total count of ways to combine these is the product, and the probability of one of these is that count divided by C(p,k). Dan
1487.12	A Pascal program for the problem in .0	CLT::TRACE::GILBERT	Ownership Obligates	`Tue Sep 03 1991 15:59`	89
	program x1487 (input, output); const nmax = 1000; kmax = 40; type reel = quadruple; var comb : array[0..kmax,0..kmax] of reel; fact : array[0..kmax] of reel; function flt(b: boolean): reel; begin if b then flt := 1 else flt := 0; end; function prob_1(p, k, h, r: integer): reel; { } { Given p distinct values in the range 1..k, } { return the probability that h of them are in the range 1..r. } { } { Given: } { 0 <= h <= p <= k } { 0 <= h <= r <= k } { } begin prob_1 := comb[h,r] * comb[p-h,k-r] / comb[p,k]; end; procedure init; var i,j: integer; begin fact[0] := 1; for i := 1 to kmax do fact[i] := fact[i-1] * i; for i := 0 to kmax do for j := i to kmax do comb[i,j] := fact[j]/(fact[i]fact[j-i]); end; procedure main; var h, k, m, n, p, r: integer; curr_n, prev_n: boolean; sum: reel; pr: array[boolean,0..kmax] of reel; { Prob that n sets of p cover k values } label 99; begin write('Enter p and k: '); readln(p, k); if p <= k then else goto 99; if 0 <= p then else goto 99; if k <= kmax then else goto 99; n := 0; curr_n := false; prev_n := not curr_n; for r := 1 to k do pr[curr_n,r] := 0; pr[curr_n,0] := 1; for n := 1 to nmax do begin prev_n := curr_n; { prev_n is old value of curr_n } curr_n := not curr_n; { Advance to next n } pr[curr_n,0] := 1; for r := 1 to k do { r of them remain to be covered } begin sum := 0; for h := 0 to min(p,r) do sum := sum + prob_1(p, k, h, r) pr[prev_n,r-h]; pr[curr_n,r] := sum; end; writeln('pr[', n:0, ',', k:0, '] = ', pr[curr_n,k]); if pr[curr_n,k] > 0.999999 then goto 99; end; 99: end; begin init; while true do main; end.
1487.13		ZFC::deramo	echoes down the canyon	`Tue Sep 03 1991 17:10`	3
	But there's no Monte Carlo random number generator in .-1. :-) Dan
1487.14		CLT::4GL::GILBERT	Ownership Obligates	`Wed Sep 04 1991 15:33`	5
	.0> If we select a collection C consisting of n randomly chosen sets from S, .0> what is the probability that C covers K, i.e. card(f(C)) = k? Given p and k, the program displays the above probability as pr[n,k], for n >= 0 until pr[n,k] >= 0.999999.