[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

858.0. "hashing it out, go linear" by DPDMAI::FRAMELI () Fri Apr 08 1988 06:08

        would it be practical to hash out values for real & character
        data, so that i could apply an algorithm like the one below
        or would the overhead involved in hashing unique equivalents
        defeat the purpose. this microsoft basic implementation of the
        ac sort will easily out perform the quicksort, (integers only). 
        would anyone be interested in converting this to turbo pascal?
    
        dale
    
    
      ' address calculation sort - 'for integer data sets only'
      ' byte, november 1983, pg 496        
        
        defint cndx, jndx, number, timezero, totaltime, x
        input"enter the number of elements to be ac sorted: ";number
        dim array1%(number)

      ' generate random numbers between -32k and 32k
        print"generating random data set..."
        for jndx = 1 to number
            array1%(jndx) = int(65535! * rnd(1)) - 32767
            print jndx, array1%(jndx)  
        next jndx 

      ' ac sort algorithm
        print"address calculation sort."
        timezero = timer
        i = 2.36 * number
        bp = i / 65535!
        dim array2%(i + number)

      ' main loop
        for x = 1 to number
           xa = x
            v = (32767 + array1%(x)) * bp
   b:
            if array2%(v) = 0 then array2%(v) = xa: goto a:
            if array1%(array2%(v)) > array1%(xa) then swap array1%(array2%(v)), 
               array1%(xa)
            v = v + 1
            goto b:
   a:
        next x
        totaltime = (timer - timezero)
        print"finished...elapsed time: "; totaltime; "seconds."

      ' print out
        cndx = 0
        for jndx = 0 to i + number
           if array2%(jndx) then print c, array1%(array2%(jndx)): c = c + 1
           if c <= number then next j
 
        input"",dummy$
        end

T.R Title User Personal
Name Date Lines

858.3 if this is what you want ... :-( ZFC::DERAMO Daniel V. D'Eramo Fri Apr 08 1988 19:40 73

T.R	Title	User	Personal Name	Date	Lines
858.3	if this is what you want ... :-(	ZFC::DERAMO	Daniel V. D'Eramo	`Fri Apr 08 1988 19:40`	73
	Re .0 Let me see if I understand what you are asking. You want to sort some data. It has multiple keys, some numeric (floating point, i.e., real numbers) and some character. So the comparison is something like record a < record b means: (numeric) field 1 of record a < field 1 of record b if equal, then break ties by (character) field 2 of record a < field 2 of record b The (quicksort?) sort that you are using has some problems due to the initial ordering of your records. So you want to: hash all of the fields of each record sort based on the value of the hash function (an integer) use a very fast sort for integers to do this and have this sort result in the same order as doing it the hard way does. I suppose that you can do the first three fairly easily, but I doubt very much that you will be able to come up with a hash function to "small" integers [say, four bytes] such that record a < record b if and only if hash(a) < hash(b) The only hash function that I could think of that would work is: Take the first field, we I have assumed is numeric. Can you compare floating point numbers on a VAX as though they were integers? If so, the "sub hash" of this field is the number itself. If not, the "sub hash" is, say, the sign of the number (one bit) followed by the exponent followed by the "mantissa". Either way, this sub hash is at least as many bits as the field itself. Take the next field, a character field. It's sub hash is as long as the longest possible string. Each byte of the sub hash is the "collating index" of that byte (character) of the field. Do this for each field. Lay the bytes all together, like a thousand bit integer :-( Sort these using your integer sort algorithm, which must be rewritten to use such large "integer"'s. There are lots of hash functions that will scrunch many long fields into a small integer, but those will not be "order preserving". To do that practically forces the length of the hash to be roughly the length of all of the fields. It would be wonderful if it worked, but I can only envision it working if you have a FIXED set of n items KNOWN IN ADVANCE, and you write a program to search for a hash function that takes the items and yields a "hash" of 1 for the first item, 2 for the second, etc. It may run for a long time, but if it works you have a valuable hash function. If you hash n items into the numbers 1 to 100n in an order preserving way then you still have a valuable hash function. It would probably make up for the time spent searching for it ... ...but if tomorrow you add another item, ... ... you get to start searching all over again. )-: Dan

    Re .0
    
    Let me see if I understand what you are asking.
    
    You want to sort some data.  It has multiple keys, some numeric
    (floating point, i.e., real numbers) and some character.  So the
    comparison is something like record a < record b means:
    
        (numeric) field 1 of record a < field 1 of record b
        if equal, then break ties by
        (character) field 2 of record a < field 2 of record b
    
    The (quicksort?) sort that you are using has some problems due to
    the initial ordering of your records.
    
    So you want to:
    
          hash all of the fields of each record
          sort based on the value of the hash function (an integer)
          use a very fast sort for integers to do this
    
    and
    
          have this sort result in the same order as doing it
          the hard way does.
    
    I suppose that you can do the first three fairly easily, but I
    doubt very much that you will be able to come up with a hash function
    to "small" integers [say, four bytes] such that
    
        record a < record b    if and only if    hash(a) < hash(b)
    
    The only hash function that I could think of that would work is:
    
         Take the first field, we I have assumed is numeric.
         Can you compare floating point numbers on a VAX as though
         they were integers?  If so, the "sub hash" of this field
         is the number itself.  If not, the "sub hash" is, say,
         the sign of the number (one bit) followed by the exponent
         followed by the "mantissa".  Either way, this sub hash is
         at least as many bits as the field itself.
    
         Take the next field, a character field.  It's sub hash is
         as long as the longest possible string.  Each byte of the
         sub hash is the "collating index" of that byte (character)
         of the field.
    
         Do this for each field.
    
         Lay the bytes all together, like a thousand bit integer :-(
    
         Sort these using your integer sort algorithm, which must
         be rewritten to use such large "integer"'s.
    
    There are lots of hash functions that will scrunch many long fields
    into a small integer, but those will not be "order preserving".
    To do that practically forces the length of the hash to be roughly
    the length of all of the fields.
    
    It would be wonderful if it worked, but I can only envision it
    working if you have a FIXED set of n items KNOWN IN ADVANCE, and
    you write a program to search for a hash function that takes the
    items and yields a "hash" of 1 for the first item, 2 for the second,
    etc.  It may run for a long time, but if it works you have a valuable
    hash function.  If you hash n items into the numbers 1 to 100n in an
    order preserving way then you still have a valuable hash function.
    It would probably make up for the time spent searching for it ...
    
    ...but if tomorrow you add another item, ...
    
    ... you get to start searching all over again. )-:
    
    Dan