SUHA (computer science)

In computer science, SUHA (Simple Uniform Hashing Assumption) is a basic assumption that facilitates the mathematical analysis of hash tables. The assumption states that a hypothetical hashing function will evenly distribute items into the slots of a hash table. Moreover, each item to be hashed has an equal probability of being placed into a slot, regardless of the other elements already placed. This assumption generalizes the details of the hash function and allows for certain assumptions about the stochastic system.

Applications

SUHA is most commonly used as a foundation for mathematical proofs describing the properties and behavior of hash tables in theoretical computer science. Minimizing hashing collisions can be achieved with a uniform hashing function. These functions often rely on the specific input data set and can be quite difficult to implement. Assuming uniform hashing allows hash table analysis to be made without exact knowledge of the input or the hash function used.

Mathematical implications

Certain properties of hash tables can be derived once uniform hashing is assumed.

Uniform distribution

Under the assumption of uniform hashing, given a hash function h, and a hash table of size m, the probability that two non-equal elements will hash to the same slot is

P(h(a)=h(b))={\frac {1}{m}}.

Collision chain length

Under the assumption of uniform hashing, the load factor $\alpha$ and the average chain length of a hash table of size m with n elements will be

\alpha ={\tfrac {n}{m}}

Successful lookup

Under the assumption of uniform hashing, the average time (in big-O notation) to successfully find an element in a hash table using chaining is

O(\alpha +1)\,

Unsuccessful lookup

Under the assumption of uniform hashing, the average time (in big-O notation) to unsuccessfully find an element in a hash table using chaining is

O(\alpha +1)\,

Example

A simple example of using SUHA can be seen while observing an arbitrary hash table of size 10 and a data set of 30 unique elements. If chaining is used to deal with collisions, the average chain length of this hash table may be a desirable value. Without any assumptions and with no more additional information about the data or hash function, the chain length cannot be estimated. With SUHA however, we can state that because of an assumed uniform hashing, each element has an equal probability of mapping to a slot. Since no particular slot should be favored over another, the 30 elements should hash into the 10 slots uniformly. This will produce a hash table with, on average, 10 chains each of length 3

\alpha ={\tfrac {n}{m}}

\alpha ={\tfrac {30}{10}}

\alpha =3\,

References

General

Collins, William (2004). "Section 14.3.2: The Uniform Hashing Assumption". Data Structures and the Java Collections Framework. McGraw-Hill. p. 608. ISBN 0-07-282379-8.
Cormen, Thomas H.; Charles E. Leiserson; Ronald L. Rivest; Clifford Stein (2001). "Section 11.2: Hash Tables". Introduction to Algorithms. MIT Press and McGraw-Hill. pp. 226–228. ISBN 0-262-03293-7.