| <<< Note 1120.0 by TRACE::GILBERT "Ownership Obligates" >>>
-< Rating a Hashing function >-
>> Is there a formula for rating a hashing function? Hashing a set of values
>> should give a set of numbers uniformly distributed in the range 0..N-1.
>> Given the distribution, the number of numbers that fall into each of these
>> N buckets, how may the uniformity of the distribution be rated?
The performance of any hashing function is crucially dependent on the
distribution of the input numbers; if the distribution is such that you
get plenty of collisions with your given hashing algorithm, you will
get very bad performance even though the table may be nowhere near
full.
Having stated the standard caveat, I would suggest that you do:
1. Get a representative sample of the input distribution.
2. Feed the representative sample through the hash function and note
the number of elements that fall in each bucket.
3. See if the number of collisions is excessive (it's up to you to
decide what "excessive" means). If the number of collisions is
excessive, get a new algorithm.
4. Run a chi-square test against a uniform distribution. (There are
some tests that don't depend on an underlying distribution, but �/
these tests tend to be less powerful than the "standard" ones (because
they make fewer assumptions) and �/ the standard tests tend to give
fairly good results even when some of their underlying assumptions are
violated.)
|