[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

571.0. "Probability analysis of nearest neighbor alg." by EVE::R_LI () Fri Aug 22 1986 14:22

    Hello!
    
    I am new to Math Notes, and I have a math problem.  Any
    solutions or help would be appreciated.  
    
    Briefly, I am trying to do a probabilistic analysis of the nearest
    neighbor algorithm which is an algorithm that can store and recall
    stored strings.  This algorithm can do 'error correction' in that
    it has the ability to return the original stored string given a
    partial or incorrect string.  The recall process is very simple:
    you merely take the recall string and compare it against every stored
    string and return the string that is closest to the recall string,
    using a measure termed the HAMMING DISTANCE.  
    
    The hamming distance is merely the number of bits between any two
    strings that are different.  Examples, the following two strings
    have a hamming distance of 2 (2nd and 4th bits different)
    		
    			1100
    			1001
    
    In this example, if we had a recall string of 1101 (which is the
    first stored string with 1-bit error in the fourth position), the
    nearest neighbor algorithm would correctly return the first string.
    However, if we tried to recall using 1000, the algorithm would not
    be able to tell whether it was originally the first string with
    an error in the second bit, or the second stored string with an
    error in the fourth bit  (This happens whenever the hamming distance
    between the original stored string and the recall string equals
    the hamming distance between the recall string and any other stored
    string).  And of course if we tried to correct more bits of error 
    (e.g. 1001 from 1100), it is clear the algorithm could fail.  
    
    *** Now, the question is...  What is the probability of correct recall
    given 'b'-bit totally random strings, 'n' stored strings and 'x'
    bits of error in the recall string?
    
    Some general info to get you started:
                                   
    	*  Totally random means that it is equally likely for any
    particular bit to be a '1' as it is to be a '0'. 
    
    	*  If 'b' = number of bits in a string, then there are 2**b unique
    strings that can be stored.
            
    	*  Correct recall occurs whenever the hamming distance between
    the recall string and the original stored string is less than the
    hamming distance between the recall key and all other stored strings.
    In other words, let
    		X = recall key with 'x'-bits error
    		N_x = original stored string we want to recall
    		N_s = all other stored strings 
    then,
    		HD(X,N_x) < HD(X,N_s)  for all 's' 
    or
    		x < HD(X,N_s) for all 's'.
                                        
    	* When only 1 string is stored, correct recall is guaranteed
    regardless of the string length, 'b' or the number of bits in error,
    'x'.  Obviously, the probability decreases as more strings are stored.
                          
    ----------------------------------------------------------------------
    
    I started doing analysis of this problem, but have discovered it
    is much more complicated than I had thought.  A solution would be
    fantastic!  Hints or references to a solution would also be great!
    Please write if you have questions.  Thanks for all your time!
    
    Ruby Li                              
    EVE::R_LI
    3-6606

T.R	Title	User	Personal Name	Date	Lines
571.1		LATOUR::JMUNZER		`Fri Aug 22 1986 16:55`	15
	Ruby: > 1100 > 1001 > > In this example, if we had a recall string of 1101 (which is the > first stored string with 1-bit error in the fourth position), the > nearest neighbor algorithm would correctly return the first string. If HD (1101, 1100) = 1 and HD (1101, 1001) = 1, why does the algorithm return 1100? Isn't this a tie (i.e. a failure)? John
571.2		CLT::GILBERT	eager like a child	`Fri Aug 22 1986 17:04`	5
	Are you looking for the probability that a particular string will be correctly recalled, or that any/all strings will be correctly recalled when they have 'x' bits of error? Also, is the error pattern (the 'x' bits) chosen randomly, or is the question whether any x-bit error pattern will cause an incorrect recall?
571.3	Problem clarification (RE: 571.1 and .2)	EVE::R_LI		`Mon Aug 25 1986 08:43`	31
	RE: Note 571.1 Yes, you are correct, the example I gave for recalling string number 1 is actually a tie (which equals failure). Here is a correct example (sorry about that...) String 1 = 1100 String 2 = 1001 Recall = 0100 HD(1100, 0100) = 1 HD(1001, 0100) = 3 Therefore, algorithm returns string #1 (which is what we want). ----------------------------------------------------------------------- RE: Note 571.2 I would like to find the probability that any string will be correctly recalled with x-bits of error. Also, the bit errors are chosen randomly, though no bit may be flipped twice (this merely cancels the error causing no change). I.e. if we pick a random stored string and put x-bits of error in it, what is the probability we can correctly recall it with 'n' b-bit length strings stored. It would also be useful to know the probability that all stored strings will be correctly recalled with any combination of 'x'-bits of error. Hope this clarifies the problem a little bit... good luck (and thanks!)
571.4	A try	TAV02::NITSAN	Nitsan Duvdevani, Digital Israel	`Thu Aug 28 1986 08:34`	32
	b There are ( ) different strings with a distance of "d" from the "recall string". d 2^b There are ( ) different options to store n strings of b bits. n The number of options in which ONE of the stored strings has a distance of "d" from the "recall string" and the rest have distance >d is: b S d b b b N(d) = ( ) * ( ) , for S = 2^b - SUM ( ) = SUM ( ) d n-1 i=1 i i=d+1 i so it looks that the number of options in which there is ONE stored string with minimal distance (from the recall string) is: b M = SUM N(d) d=0 2^b and the probability is M / ( ) n All this is if you know nothing about "x" (the number of error bits). For example, if you know x=0 (no errors in the recall string), then ofcourse the probability to choose the right stored string is 1. or maybe I made a mistake (it's late in the day...) Nitsan