T.R | Title | User | Personal Name | Date | Lines |
---|
750.1 | | CLT::GILBERT | Builder | Sat Aug 15 1987 22:36 | 16 |
| > I'm looking for an exact solution, a function of F, M, and N.
I think that this has no closed-form solution. If you have some
rough idea of the values of F, M, and N, perhaps an approximate
solution would suffice?
I did find that
p(M,F,N) = ((M+1-F) * choose(M-N,F-N)) / choose(M,F), if F < 2*N,
where choose(N,K) is a binomial coefficient, i.e., N!/( (N-K)! K! ).
The problem with the above is that if P >= 2*N, it counts some things
too many times -- the probability is too high. However, if the
probability is known to be low, the overcounting can be approximated
and removed.
|
750.2 | | VMSINT::THIEL | Dave Thiel -- VMS Development | Sun Aug 16 1987 22:34 | 19 |
| The limited domain solution of .1 seems to fail a basic sanity test,
e.g.
P(M,F,1) = 1 for F > 0
The offered limted solution is:
((M+1-F) * choose(M-N,F-N)) / choose(M,F), if F < 2*N,
When N = 1, this is:
((M+1-F) * choose(M-1,F-1)) / choose(M,F)
= ((M+1-F) * (M-1)! * F! * (M-F)!) / ((F-1)! * (M-F)! * M!)
= (M+1-F) * F / M
|
750.3 | | BEING::POSTPISCHIL | Always mount a scratch monkey. | Mon Aug 17 1987 10:23 | 25 |
| Re .2:
> = (M+1-F) * F / M
Since N <= F < 2N and N=1, then F=1, so the expression is
= (M+1-1)*1/M = 1, which is expected.
However, I think the expression in .1 needs adjustment for another
reason. It basically enumerates the ways to pick a sequence and then
pick the F-N remaining numbers. This counts some sequences twice for
some selections of the remaining numbers. (E.g., picking 3, 4 and then
2 is the same as picking 2, 3, and then 4.)
There are C(M-N,F-N) selections which contain a sequence starting with
1. For the other numbers 2 through M+1-F, there are C(M-N-1,F-N)
selections, obtained by picking a sequence and then selecting the rest
of the numbers except for the number right before the sequence, which
would give us a sequence starting with a lower number, which we already
counted. So we have:
p(M,F,N) = (C(M-N,F-N) + (M-F) C(M-N-1,F-N)) / C(M,F) if F <= 2N
-- edp
|
750.4 | | CLT::GILBERT | Builder | Mon Aug 17 1987 18:14 | 8 |
| > For the other numbers 2 through M+1-F, there are [...]
Don't you mean "For the other numbers 2 through M-N+1, there are ..."?
You'll find that this simplifies to the expression given in .1.
> p(M,F,N) = (C(M-N,F-N) + (M-F) C(M-N-1,F-N)) / C(M,F) if F <= 2N
Here, you must intend "if F < 2N".
|
750.6 | | CLT::GILBERT | Builder | Tue Aug 18 1987 13:34 | 75 |
| The approach taken in .3 is useful, but in general it overaccounts for
things. The following takes the corrections into consideration, and
yields an answer which, while not a closed expression, is the sum of
only floor(f/n) terms.
Let z(m,n,p) be the number of ways to place p distinct runs of length n
over a range of m. With a little work we discover that:
z(m,n,p) = C(m-p*n+1, p)
Note .3 had
w(m,n,f,1) = C(m-n,f-n) + (m-n)*C(m-n-1,f-n)
More generally,
w(m,n,f,p) = z(m-(n+1),n,p-1) * C(m-p*(n+1)+1,f-p*n)
+ (z(m,n,p) - z(m-(n+1),n,p-1)) * C(m-p*(n+1),f-p*n)
Note that this is roughly z(m,n,p) * C(m-p*(n+1),f-p*n); that is,
the number of ways to place p runs of length n, times the number
of ways to place the rest of the numbers. The exact expression
accounts for combinations that have a run starting at position 1.
We can tediously expand the equation for w(m,n,f,p), to get
(m-n*p)!
w(m,n,f,p) = (m-f+1) * ----------------------
(m-p-f+1)! p! (f-n*p)!
But w is not what we want, because it counts arrangements having multiple
runs too many times. For example, w(m,n,f,1) counts arrangements having
two runs twice, arrangements having three runs three times, and so on.
More generally, w(m,n,f,p) counts arrangements having k runs C(k,p) times.
Letting x(m,n,f,p) be the exact count, we have:
inf
---
w(m,n,f,p) = > C(k,p) * x(m,n,f,k)
---
k=p
Of course, the upper bound on the summation need only go to floor(f/n).
What we want is the exact number arrangements having at least one run:
inf
---
> x(m,n,f,k)
---
k=1
But this is equal (!) to the following:
inf inf
--- --- k+1
> x(m,n,f,k) = > (-1) w(m,n,f,k)
--- ---
k=1 k=1
And so the number of ways to choose f numbers from a set of m such
that there is at least one run of length n is:
floor(f/n)
--- p+1 (m-f+1) (m-n*p)!
> (-1) ----------------------
--- (m-p-f+1)! p! (f-n*p)!
p=1
Of course, the probability is this divided by C(m,f).
- Gilbert
|
750.7 | | CLT::GILBERT | Builder | Tue Aug 18 1987 13:54 | 7 |
| Or to make it a tad prettier,
floor(f/n) floor(f/n)
--- p+1 (m-f+1) (m-n*p)! --- p+1
> (-1) ---------------------- = > (-1) C(m-f+1,p) C(m-n*p,m-f)
--- (m-p-f+1)! p! (f-n*p)! ---
p=1 p=1
|
750.8 | | SSDEVO::LARY | | Thu Aug 20 1987 15:41 | 20 |
|
In the summation
floor(f/n)
--- p+1 (m-f+1) (m-n*p)!
> (-1) ----------------------
--- (m-p-f+1)! p! (f-n*p)!
p=1
The quantity (m-p-f+1) can become negative if f is greater than ~ mn/(n+1).
Computationally, I don't think this is a problem, as by the time this happens
in the summation the terms have become vanishingly small.
I believe the problem arises from the fact that the domain of w(m,n,f,p) in p
is smaller than the domain of the final summation in p; i.e. sometimes there
aren't any "rest of the numbers" to place. It can be fixed by changing the
upper limit of the sum to min(floor(f/n),m-f+1).
Nice derivation! especially the combinatorial sum trick....
|
750.9 | | CLT::GILBERT | Builder | Thu Aug 20 1987 18:48 | 2 |
| Thanks for pointing out the problem. I think your explanation doesn't
quite suffice, but agree that w(m,n,f,p) is the likely culprit.
|
750.10 | | VMSINT::THIEL | Dave Thiel -- VMS Development | Fri Aug 21 1987 01:06 | 70 |
| An urn contains M balls uniquely numbered 1 through M. A set of
F balls is drawn at random from the urn. M >= F >= N > 0.
What is the probability that the set of F balls contains at least
one set of N sequentially numbered balls?
There are C(M,F) ways to choose a set of F balls from the M balls.
Let Q (M,F,N), M >= F >= N > 0, be the number of ways to choose F balls
from the M balls such that there are at least N consecutively numbered
balls among the F.
Q (M,F,N)
P (M,F,N) = ---------
C(M,F)
Determine Q by enumerating all possibilities (stay with me, it's
not as tedious as it may sound).
Use the notation 1 (2) 3 ... to indicate a set of balls containing
the balls numbered 1 and 3, omitting the ball numbered 2, and and
arbitrary collection of balls with numbers greater than 3.
Q (M,F,N) =
if M = F, then 1 (there is only one way to choose F,
and F must contain N consecutively
numbered balls)
otherwise, M > F.
(1) ... then Q(M-1,F,N)
1 (2) ... if N > 1 then Q(M-2,F-1,N) if F-1 >= N else 0
1 2 (3) ... if N > 2 then Q(M-3,F-2,N) if F-2 >= N else 0
...
1 2 ... N-1 (N) then Q(M-N,F-N+1,N) if F-N+1 >= N else 0
1 2 ... N-1 N ... then C(M-N,F-N)
Adding all of these up:
Q(M,F,N) (M >= F >= N > 0) =
if M = F then 1
min(N,F+1-N)
---
otherwise C(M-N,F-N) + > Q(M-i,F-i+1,N)
---
i=1
and therefore:
P(M,F,N) (M >= F >= N > 0) =
if M = F then 1 / C(M,F)
min(N,F+1-N)
---
C(M-N,F-N) + > Q(M-i,F-i+1,N)
---
i=1
otherwise -----------------------------------
C(M,F)
|
750.11 | Ayup, that counts all the possibilities | CLT::GILBERT | Builder | Fri Aug 21 1987 14:53 | 30 |
| re .10
How many evaluations of distinct Qs are needed? Assuming
you store each computed Q in a two-dimensional array (the
N parameter is the same for all of them), as reuse these
saved values as needed, then roughly 2*N*(M-N)*(M-F)/(N-1)
different values of the Q function must be evaluated.
And, if F > 2N, most of these Qs require N additions; this
approach is never much better than that given in .6, and
may be considerably worse.
Note that the calculations can be done without recursion:
f := N;
for m := f to M - ((F-f)*N) div (N-1) do
Q[m,f] := C(m,f);
for f := N+1 to F do
begin
ibound := min(N,f+1-N);
for m := f to M - ((F-f)*N) div (N-1) do
begin
temp := C(m-N,f-N);
for i := 1 to ibound do
temp := temp + Q[m-i,f-i+1];
Q[m,f] := temp;
end;
end;
Note that m, M, f, and F are distinct variables. For simplicity, and
only slightly more computation, the expression "M - ((F-f)*N) div (N-1)"
could be replaced with "M - (F-f)".
|
750.12 | Demonstration of equivalent results | CLT::GILBERT | Builder | Sun Aug 23 1987 21:03 | 130 |
| The following Pascal program computes Q(m,f,n) by three different
techniques, compares the results (to ensure they are equal), and
displays the amount of CPU time consumed by each calculation.
- Gilbert
program y (input, output);
function C(n,k: integer): integer;
var
i,t: integer;
begin
if n < k
then
t := 0
else
begin
t := 1;
for i := 1 to k do t := (t*(n+1-i)) div i;
end;
C := t;
end;
function Q1 (m,f,n: integer): integer;
var
p,t,u: integer;
begin
t := 0;
for p := 1 to f div n do
begin
u := C(m-f+1,p) * C(m-n*p,m-f);
if not odd(p) then u := -u;
t := t + u;
end;
Q1 := t;
end;
{ }
{floor(f/n) floor(f/n)
{ --- p+1 (m-f+1) (m-n*p)! --- p+1
{ > (-1) ---------------------- = > (-1) C(m-f+1,p) C(m-n*p,m-f)
{ --- (m-p-f+1)! p! (f-n*p)! ---
{ p=1 p=1
{ }
function Q2 (m,f,n: integer): integer;
var
i,t: integer;
begin
if m = f
then
t := 1
else
begin
t := C(m-n,f-n);
for i := 1 to min(n,f+1-n) do
t := t + Q2(m-i,f-i+1,n);
end;
Q2 := t;
end;
function Q3 (M,F,N: integer): integer;
var
mm,ff,i,ibound,t: integer;
Q: array [1..100,1..100] of integer;
begin
ff := N;
for mm := ff to M - ((F-ff)*N) div (N-1) do
Q[mm,ff] := mm - ff + 1;
for ff := N+1 to F do
begin
ibound := min(N,ff+1-N);
for mm := ff to M - ((F-ff)*N) div (N-1) do
begin
t := C(mm-N,ff-N);
if mm > ff then
for i := 1 to ibound do
t := t + Q[mm-i,ff-i+1];
Q[mm,ff] := t;
end;
end;
Q3 := Q[M,F];
end;
procedure check (m,f,n: integer);
var
E1,E2,E3: integer;
t0,t1,t2,t3: integer;
begin
t0 := CLOCK;
E1 := Q1(m,f,n);
t1 := CLOCK;
E2 := Q2(m,f,n);
t2 := CLOCK;
E3 := Q3(m,f,n);
t3 := CLOCK;
if (E1 = E2) and (E2 = E3) then else write ('*** ');
write ('Q(',m:0,',',f:0,',',n:0,') = ',E1:0);
if (E1 = E2) and (E2 = E3) then else write (' vs ',E2:0,' vs ',E3:0);
writeln;
writeln (t1-t0:0,' ',t2-t1:0,' ',t3-t2:0);
end;
{ }
{ Q(M,F,N) (M >= F >= N > 0) =
{ if M = F then 1
{ min(N,F+1-N)
{ ---
{ otherwise C(M-N,F-N) + > Q(M-i,F-i+1,N)
{ ---
{ i=1
{ }
procedure main;
var
m,f,n: integer;
begin
while true do
begin
readln (m,f,n);
check (m,f,n);
end;
end;
begin
main;
end.
|