T.R | Title | User | Personal Name | Date | Lines |
---|
1487.1 | Before we get started,... | CORREO::BELDIN_R | Pull us together, not apart | Tue Aug 27 1991 08:35 | 16 |
| >Consider the collection S of subsets P of E such that
>card(P) = p and card(f(P)) = p.
How would you go about generating S and its members? First, obviously
p <= k, since card(f(P)) = p. But it seems that since card(P) = P, S
has a very special character. It is certainly much smaller then P(E),
the power set of E. I understand how to check that P is in S, but to
make a random selection from S, I will need either an algorithm for
generating successive elements of S or a list of its members.
Maybe I'm missing something, but before I invest much time in your
question, I need to understand the intermediate problems.
Dick
|
1487.2 | clarification (I hope) | ULTRA::ELLIS | David Ellis | Tue Aug 27 1991 12:33 | 12 |
| > How would you go about generating S and its members? I understand how
> to check that P is in S, but to make a random selection from S, I will
> need either an algorithm for generating successive elements of S or a
> list of its members.
The collection S is not meant to be accessible or constructible as a whole.
What needs to be "generated" (I prefer the term "selected") are randomly chosen
P that satisfy the conditions of being in S. This is intended to mean subsets
of E containing p elements whose images under f are distinct but otherwise
randomly distributed over K.
Please see the next reply for some context behind this problem.
|
1487.3 | context | ULTRA::ELLIS | David Ellis | Tue Aug 27 1991 12:48 | 16 |
| This problem is a generalization of something that might be viewed as
an inverse to the Birthday Paradox.
Let E be a large set of people and K be the set {1, ..., 365} of their
birthdays (excluding leap year day). If we let p=1 for the moment, then
S should be equivalent to E itself. Given n randomly chosen people, the
probability that there is a duplication of birthdays rises with n, crossing
the 50% level at about n=24. The question I am asking in this case is
how much duplication is encountered before we exhaust the space?
In other words, how many people should we expect to gather together before
their birthdays cover the entire year?
To generalize, consider sets P of p people with distinct birthdays. To
some extent, this "quantizes" the distribution. Is there a significant
effect of this quantization on the expected number of people needed to exhaust
the space of birthdays?
|
1487.4 | | ALLVAX::JROTH | I know he moves along the piers | Tue Aug 27 1991 16:17 | 6 |
| That sounds somewhat like the coupon collectors problem... there are
some notes in here somewhere about it (under the teabag proberbs note?)
No, I don't know the distribution off the top of my head though.
- Jim
|
1487.5 | Approximate answer. | CADSYS::COOPER | Topher Cooper | Tue Aug 27 1991 16:39 | 26 |
| Well, here is a rough estimate:
Let q be the proportion of the k "birthdays" not selected at each
round. At each round an expected proportion q of the those birthdays
which have not been "taken" at the beginning of the round are *still*
not taken at the end of the round.
Therefore the expected proportion of the k birthdays which are not
taken after i rounds is q^i, and the expected number of birthdays which
are not taken after i rounds is N*q^i.
A rough estimate of the expected number of rounds before all the
birthdays are taken can be gotten by finding the i such that the
expected number of birthdays untaken after i rounds is less than 1.
Some math tells us that the expected number of birthdays untaken after
i rounds will be less than one when:
i >= -ln(k)/ln(q)
So the expected number of rounds before all the birthdays are taken is
approximately:
ceiling(-ln(k)/ln(q))
Topher
|
1487.6 | | CLT::TRACE::GILBERT | Ownership Obligates | Tue Aug 27 1991 19:16 | 10 |
| We can *almost* convert this problem to the birthday problem. The rephrased
question from .0 is:
What is the probability that p*n people completely cover K,
the space of birthdays?
However, this doesn't quite work, because the elements in a subset P
cover K better than would be expected by p independent elements from E;
e.g., we know that card(P) = card(f(P)) = p (so these p people never
share a birthday).
|
1487.7 | | ALIEN::EDP | Always mount a scratch monkey. | Wed Aug 28 1991 09:15 | 8 |
| The disc-changer problem in topic 1144 bears some resemblance to this
problem; they both have aspects of repeated selection without
replacement and expected number of repetitions to cover a set. That
one wasn't solved. This one may be easier; solving it may help the
other.
-- edp
|
1487.8 | | ULTRA::ELLIS | David Ellis | Wed Aug 28 1991 11:58 | 8 |
| Re: .4
> That sounds somewhat like the coupon collectors problem... there are
> some notes in here somewhere about it (under the teabag proberbs note?)
No topics in this conference with titles containing "coup", "teabag" or
"proverb". There are a couple of topics matching "collect", but they seem
to be unrelated to this problem.
|
1487.9 | psst -- it's note 831 | VMSDEV::HALLYB | The Smart Money was on Goliath | Wed Aug 28 1991 15:01 | 7 |
| > No topics in this conference with titles containing "coup", "teabag" or
> "proverb". There are a couple of topics matching "collect", but they seem
> to be unrelated to this problem.
Try "tea bag" (2 words).
John
|
1487.10 | | CLT::TRACE::GILBERT | Ownership Obligates | Thu Aug 29 1991 21:05 | 7 |
| I'm making some progress on this. Here's a subproblem, ...
Given p distict values 'uniformly' distributed in the range 1..k,
what is the probability that exactly h of them are in the range 1..r?
Where 0 <= h <= p <= k, and
0 <= h <= r <= k.
|
1487.11 | | ZFC::deramo | a pleasant distraction | Fri Aug 30 1991 12:46 | 13 |
| >Given p distict values 'uniformly' distributed in the range 1..k,
>what is the probability that exactly h of them are in the range 1..r?
>
>Where 0 <= h <= p <= k, and
> 0 <= h <= r <= k.
Isn't this just a counting problem? Without replacement, h values
can be selected from 1..r in C(h,r) ways. That leaves p-h values
which are selected from r+1...k in C(p-h,k-r) ways. The total count
of ways to combine these is the product, and the probability of one
of these is that count divided by C(p,k).
Dan
|
1487.12 | A Pascal program for the problem in .0 | CLT::TRACE::GILBERT | Ownership Obligates | Tue Sep 03 1991 16:59 | 89 |
| program x1487 (input, output);
const
nmax = 1000;
kmax = 40;
type
reel = quadruple;
var
comb : array[0..kmax,0..kmax] of reel;
fact : array[0..kmax] of reel;
function flt(b: boolean): reel;
begin
if b then flt := 1 else flt := 0;
end;
function prob_1(p, k, h, r: integer): reel;
{ }
{ Given p distinct values in the range 1..k, }
{ return the probability that h of them are in the range 1..r. }
{ }
{ Given: }
{ 0 <= h <= p <= k }
{ 0 <= h <= r <= k }
{ }
begin
prob_1 := comb[h,r] * comb[p-h,k-r] / comb[p,k];
end;
procedure init;
var
i,j: integer;
begin
fact[0] := 1;
for i := 1 to kmax do fact[i] := fact[i-1] * i;
for i := 0 to kmax do
for j := i to kmax do
comb[i,j] := fact[j]/(fact[i]*fact[j-i]);
end;
procedure main;
var
h, k, m, n, p, r: integer;
curr_n, prev_n: boolean;
sum: reel;
pr: array[boolean,0..kmax] of reel; { Prob that n sets of p cover k values }
label
99;
begin
write('Enter p and k: ');
readln(p, k);
if p <= k then else goto 99;
if 0 <= p then else goto 99;
if k <= kmax then else goto 99;
n := 0;
curr_n := false;
prev_n := not curr_n;
for r := 1 to k do
pr[curr_n,r] := 0;
pr[curr_n,0] := 1;
for n := 1 to nmax do
begin
prev_n := curr_n; { prev_n is old value of curr_n }
curr_n := not curr_n; { Advance to next n }
pr[curr_n,0] := 1;
for r := 1 to k do { r of them remain to be covered }
begin
sum := 0;
for h := 0 to min(p,r) do
sum := sum + prob_1(p, k, h, r) * pr[prev_n,r-h];
pr[curr_n,r] := sum;
end;
writeln('pr[', n:0, ',', k:0, '] = ', pr[curr_n,k]);
if pr[curr_n,k] > 0.999999 then goto 99;
end;
99:
end;
begin
init;
while true do main;
end.
|
1487.13 | | ZFC::deramo | echoes down the canyon | Tue Sep 03 1991 18:10 | 3 |
| But there's no Monte Carlo random number generator in .-1. :-)
Dan
|
1487.14 | | CLT::4GL::GILBERT | Ownership Obligates | Wed Sep 04 1991 16:33 | 5 |
| .0> If we select a collection C consisting of n randomly chosen sets from S,
.0> what is the probability that C covers K, i.e. card(f(C)) = k?
Given p and k, the program displays the above probability as pr[n,k],
for n >= 0 until pr[n,k] >= 0.999999.
|