[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1487.0. "Probability problem" by ULTRA::ELLIS (David Ellis) Mon Aug 26 1991 16:32

Let K be a small finite set {1,...,k}, and let E be a large finite set of 
elements with a uniformly distributed mapping f: E -> K.  That is, if e is a 
randomly selected element of E and 1 <= j <= k, then the probability that 
f(e) = j is 1/k.

Consider the collection S of subsets P of E such that
card(P) = p and card(f(P)) = p.  The intent is that each P in S maps under f
into p distinct elements of K.

If we select a collection C consisting of n randomly chosen sets from S,
what is the probability that C covers K, i.e. card(f(C)) = k?

How many sets should we expect to have to select from S before our collection
covers K?
T.RTitleUserPersonal
Name
DateLines
1487.1Before we get started,...CORREO::BELDIN_RPull us together, not apartTue Aug 27 1991 08:3516
>Consider the collection S of subsets P of E such that
>card(P) = p and card(f(P)) = p.  
    
    How would you go about generating S and its members?  First, obviously
    p <= k, since card(f(P)) = p.  But it seems that since card(P) = P, S
    has a very special character.  It is certainly much smaller then P(E),
    the power set of E.  I understand how to check that P is in S, but to
    make a random selection from S, I will need either an algorithm for
    generating successive elements of S or a list of its members.
    
    Maybe I'm missing something, but before I invest much time in your
    question, I need to understand the intermediate problems.
    
    Dick

    
1487.2clarification (I hope)ULTRA::ELLISDavid EllisTue Aug 27 1991 12:3312
>    How would you go about generating S and its members?  I understand how
>    to check that P is in S, but to make a random selection from S, I will
>    need either an algorithm for generating successive elements of S or a
>    list of its members.

The collection S is not meant to be accessible or constructible as a whole.
What needs to be "generated" (I prefer the term "selected") are randomly chosen
P that satisfy the conditions of being in S.  This is intended to mean subsets
of E containing p elements whose images under f are distinct but otherwise
randomly distributed over K.

Please see the next reply for some context behind this problem.
1487.3contextULTRA::ELLISDavid EllisTue Aug 27 1991 12:4816
This problem is a generalization of something that might be viewed as
an inverse to the Birthday Paradox.

Let E be a large set of people and K be the set {1, ..., 365} of their 
birthdays (excluding leap year day).  If we let p=1 for the moment, then
S should be equivalent to E itself.  Given n randomly chosen people, the
probability that there is a duplication of birthdays rises with n, crossing
the 50% level at about n=24.  The question I am asking in this case is
how much duplication is encountered before we exhaust the space?
In other words, how many people should we expect to gather together before 
their birthdays cover the entire year?

To generalize, consider sets P of p people with distinct birthdays.  To
some extent, this "quantizes" the distribution.  Is there a significant 
effect of this quantization on the expected number of people needed to exhaust
the space of birthdays?
1487.4ALLVAX::JROTHI know he moves along the piersTue Aug 27 1991 16:176
    That sounds somewhat like the coupon collectors problem... there are
    some notes in here somewhere about it (under the teabag proberbs note?)

    No, I don't know the distribution off the top of my head though.

    - Jim
1487.5Approximate answer.CADSYS::COOPERTopher CooperTue Aug 27 1991 16:3926
    Well, here is a rough estimate:

    Let q be the proportion of the k "birthdays" not selected at each
    round.  At each round an expected proportion q of the those birthdays
    which have not been "taken" at the beginning of the round are *still*
    not taken at the end of the round.

    Therefore the expected proportion of the k birthdays which are not
    taken after i rounds is q^i, and the expected number of birthdays which
    are not taken after i rounds is N*q^i.

    A rough estimate of the expected number of rounds before all the
    birthdays are taken can be gotten by finding the i such that the
    expected number of birthdays untaken after i rounds is less than 1.

    Some math tells us that the expected number of birthdays untaken after
    i rounds will be less than one when:

	i >= -ln(k)/ln(q)

    So the expected number of rounds before all the birthdays are taken is
    approximately:

	ceiling(-ln(k)/ln(q))

					Topher
1487.6CLT::TRACE::GILBERTOwnership ObligatesTue Aug 27 1991 19:1610
We can *almost* convert this problem to the birthday problem.  The rephrased
question from .0 is:

	What is the probability that p*n people completely cover K,
	the space of birthdays?
	
However, this doesn't quite work, because the elements in a subset P
cover K better than would be expected by p independent elements from E;
e.g., we know that card(P) = card(f(P)) = p (so these p people never
share a birthday).
1487.7ALIEN::EDPAlways mount a scratch monkey.Wed Aug 28 1991 09:158
    The disc-changer problem in topic 1144 bears some resemblance to this
    problem; they both have aspects of repeated selection without
    replacement and expected number of repetitions to cover a set.  That
    one wasn't solved.  This one may be easier; solving it may help the
    other.
    
    
    				-- edp
1487.8ULTRA::ELLISDavid EllisWed Aug 28 1991 11:588
Re: .4 

>    That sounds somewhat like the coupon collectors problem... there are
>    some notes in here somewhere about it (under the teabag proberbs note?)

No topics in this conference with titles containing "coup", "teabag" or
"proverb".  There are a couple of topics matching "collect", but they seem
to be unrelated to this problem.
1487.9psst -- it's note 831VMSDEV::HALLYBThe Smart Money was on GoliathWed Aug 28 1991 15:017
> No topics in this conference with titles containing "coup", "teabag" or
> "proverb".  There are a couple of topics matching "collect", but they seem
> to be unrelated to this problem.
    
    Try "tea bag" (2 words).
    
      John
1487.10CLT::TRACE::GILBERTOwnership ObligatesThu Aug 29 1991 21:057
I'm making some progress on this.  Here's a subproblem, ...

Given p distict values 'uniformly' distributed in the range 1..k,
what is the probability that exactly h of them are in the range 1..r?

Where	0 <= h <= p <= k, and
	0 <= h <= r <= k.
1487.11ZFC::deramoa pleasant distractionFri Aug 30 1991 12:4613
>Given p distict values 'uniformly' distributed in the range 1..k,
>what is the probability that exactly h of them are in the range 1..r?
>
>Where	0 <= h <= p <= k, and
>	0 <= h <= r <= k.

Isn't this just a counting problem?  Without replacement, h values
can be selected from 1..r in C(h,r) ways.  That leaves p-h values
which are selected from r+1...k in C(p-h,k-r) ways.  The total count
of ways to combine these is the product, and the probability of one
of these is that count divided by C(p,k).

Dan
1487.12A Pascal program for the problem in .0CLT::TRACE::GILBERTOwnership ObligatesTue Sep 03 1991 16:5989
program x1487 (input, output);

const
    nmax = 1000;
    kmax = 40;
type
    reel = quadruple;
var
    comb : array[0..kmax,0..kmax] of reel;
    fact : array[0..kmax] of reel;

function flt(b: boolean): reel;
begin
if b then flt := 1 else flt := 0;
end;

function prob_1(p, k, h, r: integer): reel;
{									}
{ Given p distinct values in the range 1..k,				}
{ return the probability that h of them are in the range 1..r.		}
{									}
{ Given:								}
{	0 <= h <= p <= k						}
{	0 <= h <= r <= k						}
{ }
begin
prob_1 := comb[h,r] * comb[p-h,k-r] / comb[p,k];
end;

procedure init;
var
    i,j: integer;
begin

fact[0] := 1;
for i := 1 to kmax do fact[i] := fact[i-1] * i;

for i := 0 to kmax do
for j := i to kmax do
    comb[i,j] := fact[j]/(fact[i]*fact[j-i]);

end;

procedure main;
var
    h, k, m, n, p, r: integer;
    curr_n, prev_n: boolean;
    sum: reel;
    pr: array[boolean,0..kmax] of reel;	{ Prob that n sets of p cover k values }
label
    99;

begin
write('Enter p and k: ');
readln(p, k);
if p <= k then else goto 99;
if 0 <= p then else goto 99;
if k <= kmax then else goto 99;

n := 0;
curr_n := false;
prev_n := not curr_n;

for r := 1 to k do
    pr[curr_n,r] := 0;
pr[curr_n,0] := 1;

for n := 1 to nmax do
    begin
    prev_n := curr_n;			{ prev_n is old value of curr_n }
    curr_n := not curr_n;		{ Advance to next n }
    pr[curr_n,0] := 1;
    for r := 1 to k do			{ r of them remain to be covered }
	begin
	sum := 0;
	for h := 0 to min(p,r) do
	    sum := sum + prob_1(p, k, h, r) * pr[prev_n,r-h];
	pr[curr_n,r] := sum;
	end;
    writeln('pr[', n:0, ',', k:0, '] = ', pr[curr_n,k]);
    if pr[curr_n,k] > 0.999999 then goto 99;
    end;
99:
end;

begin
init;
while true do main;
end.
1487.13ZFC::deramoechoes down the canyonTue Sep 03 1991 18:103
But there's no Monte Carlo random number generator in .-1. :-)

Dan
1487.14CLT::4GL::GILBERTOwnership ObligatesWed Sep 04 1991 16:335
.0> If we select a collection C consisting of n randomly chosen sets from S,
.0> what is the probability that C covers K, i.e. card(f(C)) = k?

Given p and k, the program displays the above probability as pr[n,k],
for n >= 0 until pr[n,k] >= 0.999999.