Title: | Mathematics at DEC |
Moderator: | RUSURE::EDP |
Created: | Mon Feb 03 1986 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 2083 |
Total number of notes: | 14613 |
I liked this problem from USENET, and thought others might, too. - Gilbert Here's a probability question whose answer should be known, but I don't know it. Suppose that A and B are two random binary strings of length n. What is the expected size of the longest common substring, where the subtring doesn't have to appear contiguously in A and B? For example, if A = 001011 and B = 101101, an example of a longest common substring is 0101, which appears as 0-101-- in A and -01-01 in B. A good asymptotic estimate would do if an exact answer isn't available. Experiment suggests a value close to 0.82*n for large n. Please e-mail any suggestions (even if you post them, as news often gets lost on the way here). Thanks. Brendan McKay Paper: Computer Science Department, Australian National University, GPO Box 4, Canberra, ACT 2601, Australia. Telephone: + 61 62 49 3845 Telex: AA 62760 NATUNI ACSnet: [email protected] ARPA: bdm%[email protected] CSNET: [email protected]@csnet-relay.csnet JANET: anucsd.oz!bdm@ukc UUCP: {uunet,ubc-vision,ukc,mcvax,prlb2,hplabs,enea,mulga}!munnari!anucsd!bdm BITNET: [email protected] (or similar)
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
803.1 | what if they must be contiguous ? | VIDEO::OSMAN | type video::user$7:[osman]eric.vt240 | Mon Dec 14 1987 15:45 | 13 |
What is the largest expected common substring when comparing two length-n strings if the substring must be contiguous ? For instance, consider two random strings of length 8, which I make up as we speak, by letting my fingers randomly hit the 0 and 1 keys: 01100110 10101010 What do we have here ? Largest is 10, length 2. /Eric | |||||
803.2 | CLT::GILBERT | Builder | Wed Dec 16 1987 19:07 | 41 | |
Let E(a,b) be the expectation of the length of the longest discontiguous match between random binary strings of length a and b, respectively. Then E(a,b) = E(b,a), E(a,0) = 0, and E(1,1) = 1/2 E(1,2) = 3/4 E(1,3) = 7/8 E(1,b) = ( 2^b - 1 )/( 2^b ) E(2,2) = 9/8 E(2,3) = 23/16 E(2,b) = ( 4*2^b - 2*b - 3 )/( 2*2^b ) E(3,3) = 58/32 E(3,4) = 137/64 E(3,5) = 308/128 E(3,6) = 667/256 E(3,7) = 1406/512 E(3,8) = 2909/1024 E(4,4) = 323/128 E(4,5) = 732/256 E(4,6) = 1612/512 E(4,7) = 3467/1024 E(4,8) = 7313/2048 E(5,5) = 1662/512 E(5,6) = 3677/1024 E(5,7) = 7975/2048 E(5,8) = 17019/4096 E(6,6) = 8151/2048 E(6,7) = 17739/4096 E(6,8) = 38049/8192 E(7,7) = 38678/8192 E(7,8) = 83186/16384 E(8,8) = 178212/32768 | |||||
803.3 | SSDEVO::LARY | Wed Dec 16 1987 21:44 | 7 | ||
E(3,b) = ( 12*2^b - 2*b^2 - 3*b - 11 )/( 4*2^b ) E(4,b) = ( 32*2^b - 4*(b^3)/3 - (b^2)/2 - 103*b/6 - 27 ) / ( 8*2^b ) Note: 4*(b^3)/3 + (b^2)/2 + 103*b/6 is always an integer, as it equals b^3 + 17*b + b*(b+1)*(2*b+1)/6 and the rightmost term is always an integer. | |||||
803.4 | SSDEVO::LARY | Wed Dec 16 1987 22:04 | 22 | ||
Another way of looking at E(a,b) is E(a,b) = a - Pa(b)/(2^b), where: P1(b) = 1 3 P2(b) = b + - 2 1 2 3 11 P3(b) = - b + - b + -- 2 4 4 1 3 1 2 103 27 P4(b) = - b + -- b + --- b + -- 6 16 48 8 Do these polynomials look familiar?? (I'd feel a lot better if P3's 11/4 were 9/4.....) | |||||
803.5 | Another look at it | SQM::HALLYB | Khan or bust! | Thu Dec 17 1987 10:54 | 19 |
N E(N,N) from Peter equals: 1 E(1,1) = 1/2 2 / 4^1 2 E(2,2) = 9/8 18 / 4^2 3 E(3,3) = 58/32 116 / 4^3 4 E(4,4) = 323/128 646 / 4^4 5 E(5,5) = 1662/512 3324 / 4^5 6 E(6,6) = 8151/2048 16302 / 4^6 7 E(7,7) = 38678/8192 77356 / 4^7 8 E(8,8) = 178212/32768 356424 / 4^8 Anybody recognize the sequence above ^^^ ? Factoring the elements seems to lead nowhere, and after 5 they no longer grow with N-factorial. Maybe some clever reader will recognize some direct or recursive relation. Maybe Peter will furnish us with higher order E(N,N). John |