| Re .0
Let me see if I understand what you are asking.
You want to sort some data. It has multiple keys, some numeric
(floating point, i.e., real numbers) and some character. So the
comparison is something like record a < record b means:
(numeric) field 1 of record a < field 1 of record b
if equal, then break ties by
(character) field 2 of record a < field 2 of record b
The (quicksort?) sort that you are using has some problems due to
the initial ordering of your records.
So you want to:
hash all of the fields of each record
sort based on the value of the hash function (an integer)
use a very fast sort for integers to do this
and
have this sort result in the same order as doing it
the hard way does.
I suppose that you can do the first three fairly easily, but I
doubt very much that you will be able to come up with a hash function
to "small" integers [say, four bytes] such that
record a < record b if and only if hash(a) < hash(b)
The only hash function that I could think of that would work is:
Take the first field, we I have assumed is numeric.
Can you compare floating point numbers on a VAX as though
they were integers? If so, the "sub hash" of this field
is the number itself. If not, the "sub hash" is, say,
the sign of the number (one bit) followed by the exponent
followed by the "mantissa". Either way, this sub hash is
at least as many bits as the field itself.
Take the next field, a character field. It's sub hash is
as long as the longest possible string. Each byte of the
sub hash is the "collating index" of that byte (character)
of the field.
Do this for each field.
Lay the bytes all together, like a thousand bit integer :-(
Sort these using your integer sort algorithm, which must
be rewritten to use such large "integer"'s.
There are lots of hash functions that will scrunch many long fields
into a small integer, but those will not be "order preserving".
To do that practically forces the length of the hash to be roughly
the length of all of the fields.
It would be wonderful if it worked, but I can only envision it
working if you have a FIXED set of n items KNOWN IN ADVANCE, and
you write a program to search for a hash function that takes the
items and yields a "hash" of 1 for the first item, 2 for the second,
etc. It may run for a long time, but if it works you have a valuable
hash function. If you hash n items into the numbers 1 to 100n in an
order preserving way then you still have a valuable hash function.
It would probably make up for the time spent searching for it ...
...but if tomorrow you add another item, ...
... you get to start searching all over again. )-:
Dan
|