[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

571.0. "Probability analysis of nearest neighbor alg." by EVE::R_LI () Fri Aug 22 1986 15:22

    Hello!
    
    I am new to Math Notes, and I have a math problem.  Any
    solutions or help would be appreciated.  
    
    Briefly, I am trying to do a probabilistic analysis of the nearest
    neighbor algorithm which is an algorithm that can store and recall
    stored strings.  This algorithm can do 'error correction' in that
    it has the ability to return the original stored string given a
    partial or incorrect string.  The recall process is very simple:
    you merely take the recall string and compare it against every stored
    string and return the string that is closest to the recall string,
    using a measure termed the HAMMING DISTANCE.  
    
    The hamming distance is merely the number of bits between any two
    strings that are different.  Examples, the following two strings
    have a hamming distance of 2 (2nd and 4th bits different)
    		
    			1100
    			1001
    
    In this example, if we had a recall string of 1101 (which is the
    first stored string with 1-bit error in the fourth position), the
    nearest neighbor algorithm would correctly return the first string.
    However, if we tried to recall using 1000, the algorithm would not
    be able to tell whether it was originally the first string with
    an error in the second bit, or the second stored string with an
    error in the fourth bit  (This happens whenever the hamming distance
    between the original stored string and the recall string equals
    the hamming distance between the recall string and any other stored
    string).  And of course if we tried to correct more bits of error 
    (e.g. 1001 from 1100), it is clear the algorithm could fail.  
    
    *** Now, the question is...  What is the probability of correct recall
    given 'b'-bit totally random strings, 'n' stored strings and 'x'
    bits of error in the recall string?
    
    Some general info to get you started:
                                   
    	*  Totally random means that it is equally likely for any
    particular bit to be a '1' as it is to be a '0'. 
    
    	*  If 'b' = number of bits in a string, then there are 2**b unique
    strings that can be stored.
            
    	*  Correct recall occurs whenever the hamming distance between
    the recall string and the original stored string is less than the
    hamming distance between the recall key and all other stored strings.
    In other words, let
    		X = recall key with 'x'-bits error
    		N_x = original stored string we want to recall
    		N_s = all other stored strings 
    then,
    		HD(X,N_x) < HD(X,N_s)  for all 's' 
    or
    		x < HD(X,N_s) for all 's'.
                                        
    	* When only 1 string is stored, correct recall is guaranteed
    regardless of the string length, 'b' or the number of bits in error,
    'x'.  Obviously, the probability decreases as more strings are stored.
                          
    ----------------------------------------------------------------------
    
    I started doing analysis of this problem, but have discovered it
    is much more complicated than I had thought.  A solution would be
    fantastic!  Hints or references to a solution would also be great!
    Please write if you have questions.  Thanks for all your time!
    
    Ruby Li                              
    EVE::R_LI
    3-6606
    
T.RTitleUserPersonal
Name
DateLines
571.1LATOUR::JMUNZERFri Aug 22 1986 17:5515
Ruby:

>    			1100
>    			1001
>    
>    In this example, if we had a recall string of 1101 (which is the
>    first stored string with 1-bit error in the fourth position), the
>    nearest neighbor algorithm would correctly return the first string.

If	HD (1101, 1100) = 1	and	HD (1101, 1001) = 1,
why does the algorithm return 1100?

Isn't this a tie (i.e. a failure)?

John
571.2CLT::GILBERTeager like a childFri Aug 22 1986 18:045
    Are you looking for the probability that a particular string will be
    correctly recalled, or that any/all strings will be correctly recalled
    when they have 'x' bits of error?  Also, is the error pattern (the 'x'
    bits) chosen randomly, or is the question whether *any* x-bit error
    pattern will cause an incorrect recall?
571.3Problem clarification (RE: 571.1 and .2)EVE::R_LIMon Aug 25 1986 09:4331
    RE:  Note 571.1
    
    	Yes, you are correct, the example I gave for recalling string
    number 1 is actually a tie (which equals failure).  Here is a correct
    example (sorry about that...)
    	
    		String 1 = 1100
    		String 2 = 1001
    		Recall   = 0100
    
    	HD(1100, 0100) = 1
    	HD(1001, 0100) = 3
    
    Therefore, algorithm returns string #1 (which is what we want).
    
    -----------------------------------------------------------------------
    
    RE:  Note 571.2
                   
    	I would like to find the probability that *any* string will
    be correctly recalled with x-bits of error.  Also, the bit errors
    are chosen randomly, though no bit may be flipped twice (this merely
    cancels the error causing no change).  I.e.  if we pick a
    random stored string and put x-bits of error in it, what is the
    probability we can correctly recall it with 'n' b-bit length strings
    stored.  It would also be useful to know the probability that *all*
    stored strings will be correctly recalled with any combination of
    'x'-bits of error.    
    
    Hope this clarifies the problem a little bit... good luck (and thanks!)
    
571.4A tryTAV02::NITSANNitsan Duvdevani, Digital IsraelThu Aug 28 1986 09:3432
           b
There are ( ) different strings with a distance of "d" from the "recall string".
           d

           2^b
There are (   ) different options to store n strings of b bits.
            n

The number of options in which ONE of the stored strings has a distance of "d"
from the "recall string" and the rest have distance >d is:

            b        S                     d   b     b   b
N(d)  =    ( )  *  (   )  , for S = 2^b - SUM ( ) = SUM ( )
            d       n-1                   i=1  i   i=d+1 i

so it looks that the number of options in which there is ONE stored string
with minimal distance (from the recall string) is:

       b
M  =  SUM N(d)
      d=0

                             2^b
and the probability is  M / (   )
                              n

All this is if you know nothing about "x" (the number of error bits). For
example, if you know x=0 (no errors in the recall string), then ofcourse
the probability to choose the right stored string is 1.

or maybe I made a mistake (it's late in the day...)
Nitsan