[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

750.0. "Urn probability problem; with application" by VMSINT::THIEL (Dave Thiel -- VMS Development) Fri Aug 14 1987 20:47

    An urn contains M balls uniquely numbered 1 through M.  A set of
    F balls is drawn at random from the urn.  N <= F, F <= M.
    
    What is the probability that the set of F balls contains at least
    one set of N sequentially numbered balls?
    
    I'm looking for an exact solution, a function of F, M, and N.
    

    This problem has a practical aspect.  Allow me to restate the problem.
    A VAX is configured with M pages of physical memory.  VMS has
    placed F of those pages onto the free list.  Let us assume that the
    pages on the free list are selected at random from the total set of M
    pages.  What is the probability that the free list contains at least
    one set of N physically contiguous pages?

T.R	Title	User	Personal Name	Date	Lines
750.1		CLT::GILBERT	Builder	`Sat Aug 15 1987 22:36`	16
	> I'm looking for an exact solution, a function of F, M, and N. I think that this has no closed-form solution. If you have some rough idea of the values of F, M, and N, perhaps an approximate solution would suffice? I did find that p(M,F,N) = ((M+1-F) * choose(M-N,F-N)) / choose(M,F), if F < 2N, where choose(N,K) is a binomial coefficient, i.e., N!/( (N-K)! K! ). The problem with the above is that if P >= 2N, it counts some things too many times -- the probability is too high. However, if the probability is known to be low, the overcounting can be approximated and removed.
750.2		VMSINT::THIEL	Dave Thiel -- VMS Development	`Sun Aug 16 1987 22:34`	19
	The limited domain solution of .1 seems to fail a basic sanity test, e.g. P(M,F,1) = 1 for F > 0 The offered limted solution is: ((M+1-F) * choose(M-N,F-N)) / choose(M,F), if F < 2N, When N = 1, this is: ((M+1-F) choose(M-1,F-1)) / choose(M,F) = ((M+1-F) * (M-1)! * F! * (M-F)!) / ((F-1)! * (M-F)! * M!) = (M+1-F) * F / M
750.3		BEING::POSTPISCHIL	Always mount a scratch monkey.	`Mon Aug 17 1987 10:23`	25
	Re .2: > = (M+1-F) * F / M Since N <= F < 2N and N=1, then F=1, so the expression is = (M+1-1)*1/M = 1, which is expected. However, I think the expression in .1 needs adjustment for another reason. It basically enumerates the ways to pick a sequence and then pick the F-N remaining numbers. This counts some sequences twice for some selections of the remaining numbers. (E.g., picking 3, 4 and then 2 is the same as picking 2, 3, and then 4.) There are C(M-N,F-N) selections which contain a sequence starting with 1. For the other numbers 2 through M+1-F, there are C(M-N-1,F-N) selections, obtained by picking a sequence and then selecting the rest of the numbers except for the number right before the sequence, which would give us a sequence starting with a lower number, which we already counted. So we have: p(M,F,N) = (C(M-N,F-N) + (M-F) C(M-N-1,F-N)) / C(M,F) if F <= 2N -- edp
750.4		CLT::GILBERT	Builder	`Mon Aug 17 1987 18:14`	8
	> For the other numbers 2 through M+1-F, there are [...] Don't you mean "For the other numbers 2 through M-N+1, there are ..."? You'll find that this simplifies to the expression given in .1. > p(M,F,N) = (C(M-N,F-N) + (M-F) C(M-N-1,F-N)) / C(M,F) if F <= 2N Here, you must intend "if F < 2N".
750.6		CLT::GILBERT	Builder	`Tue Aug 18 1987 13:34`	75
	The approach taken in .3 is useful, but in general it overaccounts for things. The following takes the corrections into consideration, and yields an answer which, while not a closed expression, is the sum of only floor(f/n) terms. Let z(m,n,p) be the number of ways to place p distinct runs of length n over a range of m. With a little work we discover that: z(m,n,p) = C(m-pn+1, p) Note .3 had w(m,n,f,1) = C(m-n,f-n) + (m-n)C(m-n-1,f-n) More generally, w(m,n,f,p) = z(m-(n+1),n,p-1) * C(m-p(n+1)+1,f-pn) + (z(m,n,p) - z(m-(n+1),n,p-1)) * C(m-p(n+1),f-pn) Note that this is roughly z(m,n,p) * C(m-p(n+1),f-pn); that is, the number of ways to place p runs of length n, times the number of ways to place the rest of the numbers. The exact expression accounts for combinations that have a run starting at position 1. We can tediously expand the equation for w(m,n,f,p), to get (m-np)! w(m,n,f,p) = (m-f+1) ---------------------- (m-p-f+1)! p! (f-np)! But w is not what we want, because it counts arrangements having multiple runs too many times. For example, w(m,n,f,1) counts arrangements having two runs twice, arrangements having three runs three times, and so on. More generally, w(m,n,f,p) counts arrangements having k runs C(k,p) times. Letting x(m,n,f,p) be the exact count, we have: inf --- w(m,n,f,p) = > C(k,p) x(m,n,f,k) --- k=p Of course, the upper bound on the summation need only go to floor(f/n). What we want is the exact number arrangements having at least one run: inf --- > x(m,n,f,k) --- k=1 But this is equal (!) to the following: inf inf --- --- k+1 > x(m,n,f,k) = > (-1) w(m,n,f,k) --- --- k=1 k=1 And so the number of ways to choose f numbers from a set of m such that there is at least one run of length n is: floor(f/n) --- p+1 (m-f+1) (m-np)! > (-1) ---------------------- --- (m-p-f+1)! p! (f-np)! p=1 Of course, the probability is this divided by C(m,f). - Gilbert
750.7		CLT::GILBERT	Builder	`Tue Aug 18 1987 13:54`	7
	Or to make it a tad prettier, floor(f/n) floor(f/n) --- p+1 (m-f+1) (m-np)! --- p+1 > (-1) ---------------------- = > (-1) C(m-f+1,p) C(m-np,m-f) --- (m-p-f+1)! p! (f-n*p)! --- p=1 p=1
750.8		SSDEVO::LARY		`Thu Aug 20 1987 15:41`	20
	In the summation floor(f/n) --- p+1 (m-f+1) (m-np)! > (-1) ---------------------- --- (m-p-f+1)! p! (f-np)! p=1 The quantity (m-p-f+1) can become negative if f is greater than ~ mn/(n+1). Computationally, I don't think this is a problem, as by the time this happens in the summation the terms have become vanishingly small. I believe the problem arises from the fact that the domain of w(m,n,f,p) in p is smaller than the domain of the final summation in p; i.e. sometimes there aren't any "rest of the numbers" to place. It can be fixed by changing the upper limit of the sum to min(floor(f/n),m-f+1). Nice derivation! especially the combinatorial sum trick....
750.9		CLT::GILBERT	Builder	`Thu Aug 20 1987 18:48`	2
	Thanks for pointing out the problem. I think your explanation doesn't quite suffice, but agree that w(m,n,f,p) is the likely culprit.
750.10		VMSINT::THIEL	Dave Thiel -- VMS Development	`Fri Aug 21 1987 01:06`	70
	An urn contains M balls uniquely numbered 1 through M. A set of F balls is drawn at random from the urn. M >= F >= N > 0. What is the probability that the set of F balls contains at least one set of N sequentially numbered balls? There are C(M,F) ways to choose a set of F balls from the M balls. Let Q (M,F,N), M >= F >= N > 0, be the number of ways to choose F balls from the M balls such that there are at least N consecutively numbered balls among the F. Q (M,F,N) P (M,F,N) = --------- C(M,F) Determine Q by enumerating all possibilities (stay with me, it's not as tedious as it may sound). Use the notation 1 (2) 3 ... to indicate a set of balls containing the balls numbered 1 and 3, omitting the ball numbered 2, and and arbitrary collection of balls with numbers greater than 3. Q (M,F,N) = if M = F, then 1 (there is only one way to choose F, and F must contain N consecutively numbered balls) otherwise, M > F. (1) ... then Q(M-1,F,N) 1 (2) ... if N > 1 then Q(M-2,F-1,N) if F-1 >= N else 0 1 2 (3) ... if N > 2 then Q(M-3,F-2,N) if F-2 >= N else 0 ... 1 2 ... N-1 (N) then Q(M-N,F-N+1,N) if F-N+1 >= N else 0 1 2 ... N-1 N ... then C(M-N,F-N) Adding all of these up: Q(M,F,N) (M >= F >= N > 0) = if M = F then 1 min(N,F+1-N) --- otherwise C(M-N,F-N) + > Q(M-i,F-i+1,N) --- i=1 and therefore: P(M,F,N) (M >= F >= N > 0) = if M = F then 1 / C(M,F) min(N,F+1-N) --- C(M-N,F-N) + > Q(M-i,F-i+1,N) --- i=1 otherwise ----------------------------------- C(M,F)
750.11	Ayup, that counts all the possibilities	CLT::GILBERT	Builder	`Fri Aug 21 1987 14:53`	30
	re .10 How many evaluations of distinct Qs are needed? Assuming you store each computed Q in a two-dimensional array (the N parameter is the same for all of them), as reuse these saved values as needed, then roughly 2N(M-N)(M-F)/(N-1) different values of the Q function must be evaluated. And, if F > 2N, most of these Qs require N additions; this approach is never much better than that given in .6, and may be considerably worse. Note that the calculations can be done without recursion: f := N; for m := f to M - ((F-f)N) div (N-1) do Q[m,f] := C(m,f); for f := N+1 to F do begin ibound := min(N,f+1-N); for m := f to M - ((F-f)N) div (N-1) do begin temp := C(m-N,f-N); for i := 1 to ibound do temp := temp + Q[m-i,f-i+1]; Q[m,f] := temp; end; end; Note that m, M, f, and F are distinct variables. For simplicity, and only slightly more computation, the expression "M - ((F-f)N) div (N-1)" could be replaced with "M - (F-f)".
750.12	Demonstration of equivalent results	CLT::GILBERT	Builder	`Sun Aug 23 1987 21:03`	130
	The following Pascal program computes Q(m,f,n) by three different techniques, compares the results (to ensure they are equal), and displays the amount of CPU time consumed by each calculation. - Gilbert program y (input, output); function C(n,k: integer): integer; var i,t: integer; begin if n < k then t := 0 else begin t := 1; for i := 1 to k do t := (t(n+1-i)) div i; end; C := t; end; function Q1 (m,f,n: integer): integer; var p,t,u: integer; begin t := 0; for p := 1 to f div n do begin u := C(m-f+1,p) C(m-np,m-f); if not odd(p) then u := -u; t := t + u; end; Q1 := t; end; { } {floor(f/n) floor(f/n) { --- p+1 (m-f+1) (m-np)! --- p+1 { > (-1) ---------------------- = > (-1) C(m-f+1,p) C(m-np,m-f) { --- (m-p-f+1)! p! (f-np)! --- { p=1 p=1 { } function Q2 (m,f,n: integer): integer; var i,t: integer; begin if m = f then t := 1 else begin t := C(m-n,f-n); for i := 1 to min(n,f+1-n) do t := t + Q2(m-i,f-i+1,n); end; Q2 := t; end; function Q3 (M,F,N: integer): integer; var mm,ff,i,ibound,t: integer; Q: array [1..100,1..100] of integer; begin ff := N; for mm := ff to M - ((F-ff)N) div (N-1) do Q[mm,ff] := mm - ff + 1; for ff := N+1 to F do begin ibound := min(N,ff+1-N); for mm := ff to M - ((F-ff)N) div (N-1) do begin t := C(mm-N,ff-N); if mm > ff then for i := 1 to ibound do t := t + Q[mm-i,ff-i+1]; Q[mm,ff] := t; end; end; Q3 := Q[M,F]; end; procedure check (m,f,n: integer); var E1,E2,E3: integer; t0,t1,t2,t3: integer; begin t0 := CLOCK; E1 := Q1(m,f,n); t1 := CLOCK; E2 := Q2(m,f,n); t2 := CLOCK; E3 := Q3(m,f,n); t3 := CLOCK; if (E1 = E2) and (E2 = E3) then else write ('*** '); write ('Q(',m:0,',',f:0,',',n:0,') = ',E1:0); if (E1 = E2) and (E2 = E3) then else write (' vs ',E2:0,' vs ',E3:0); writeln; writeln (t1-t0:0,' ',t2-t1:0,' ',t3-t2:0); end; { } { Q(M,F,N) (M >= F >= N > 0) = { if M = F then 1 { min(N,F+1-N) { --- { otherwise C(M-N,F-N) + > Q(M-i,F-i+1,N) { --- { i=1 { } procedure main; var m,f,n: integer; begin while true do begin readln (m,f,n); check (m,f,n); end; end; begin main; end.