Title: | Mathematics at DEC |
Moderator: | RUSURE::EDP |
Created: | Mon Feb 03 1986 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 2083 |
Total number of notes: | 14613 |
Here's a problem about substring matching. I'm looking for the shortest possible string of letters of the alphabet such that a search for any substring of 8 (arbitrary choice) or less letters can be found. Here's one string of letters: AAABACADAE... In this string, we can clearly find AA or AB or AC or AD etc. We quickly realize that we can let the AA and AB overlap to reveal a shorter string, such that AA, AB, AC, AD etc. are still to be found, namely: AABACADAE... How long a string is needed, so all substrings up to 8 letters long can be found ? ------------------------------- A little history: I thought of this problem a few years ago while designing an EMACS macro. The macro had internal need for the "search" command. However, I didn't want the macro to leave the default search string altered when it was through executing, lest the user's attempt to issue a "search next" command look for the macro's internal string rather than the last string the user searched for. EMACS didn't (does it now ??) have a command for revealing the current default search string. If it did, I'd use it at the beginning of my macro in order to save it, then restore it at the end. So, I was thinking about how I might reveal the current default search string in lieu of a builtin function to do it. My theoretical solution was to have a buffer called ALLCOMBOS which would contain an appropriate huge string, constructed so that ALL possible search strings could be found ! That way, I could figure out the default search string by putting the cursor at the beginning of that buffer, issuing "search next", marking the spot, "search previous", after which point and mark would surround sought default search string. Of course, this was just theoretical, since such an ALLCOMBOS buffer would probably be larger than the universe. However, it does lead to the aforehereto mentioned substring puzzle. /Eric
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
277.1 | GOLLY::GILBERT | Fri May 03 1985 00:27 | 4 | ||
Oddly, there is some *old* work on this problem. It's similar to the problem of ringing 'cycles' (I think that's what they're called) on a carillion, or other sets of chimes. The Encyclopaedia Brittanica had a pretty interesting article about this. | |||||
277.2 | METOO::YARBROUGH | Wed May 08 1985 09:17 | 8 | ||
Here's another related problem: find a sequence of 1's and 0's with the following property: given a boolean function B, the leftmost ocurrence in the string of bits x and y is immediately followed by B(x,y). (I have used this to implement Boolean algebra in a SNOBOL program, for example.) Some interesting relationships fall out of this: for example, AND(x,y) can be represented by '11100010' and OR(x,y) is represented by its complement, '00011101'. The symetric function XOR(x,y) can be represented either by '1011000' or by its reversal, '0001101'. And so forth. - Lynn Yarbrough | |||||
277.3 | R2ME2::GILBERT | Wed May 08 1985 23:03 | 18 | ||
Problem: Given a boolean function B(x,y) construct a sequence of 1s and 0s such that the leftmost occurence of x,y in the sequence is immediately followed by B(x,y). Solution. We construct the sequence by appending digits to a partial solution. (1) Initially, set S (the sequence being constructed) to 00. (2) If the 2**2 triples x,y,B(x,y) occur in S, then we are done. Otherwise, let x,y be the last two digits of S. (3) If this is the leftmost occurence of x,y in S, then append B(x,y), and go to step (2). (4) If y,z does not appear in S (for some z), append z to S, and go to step (2). (5) Let x',y' be some pair that doesn't appear in S; append x',y' to S, and go to step (2). Note that a similar procedure can be used to generate a finite solution for any function B(x,y,...z) over a finite alphabet. | |||||
277.4 | RANI::LEICHTERJ | Mon May 13 1985 23:09 | 10 | ||
"A string over an alphabet of k characters is said to be an n-meander if it contains every possible substring of length n over the alphabet. For example, 0001111011001010000 is a 4-meander for the alphabet 01." This is quoted from the "Reference Manual for the Icon Programming Language", which proceeds to give an example Icon program that generates meanders. It referes to the following article: "Minimal Meandering Strings", by James F. Gimpel and William Keister. Technical report, Bell Labs, Holmdel, NJ July 1970. The minimal meandering string has length k^n+n-1. -- Jerry | |||||
277.5 | R2ME2::STAN | Fri May 24 1985 02:40 | 20 | ||
Re: .0 What you want is very closely related to De Bruijn Sequences. Given m+1 symbols (0,1,2,...,m), a De Bruijn sequence is the minimal sequence which when arranged in a circle, contains as subsequences of consecutive symbols, every sequence of length n of the symbols. [you don't want the sequence arranged in a circle] Anyhow, there is much literature on this and related problems. I suggest you start with Anthony Ralston, De Bruijn Sequences, Mathematics Magazine, 55(1982)131-143. See also S. W. Golomb, Shift Register Sequences, Holden Day, 1967, pp. 118-122 and see also Knuth, volume 2. |