Hash Function and Hash Tables

Hash function transforms a key into a table index. Some hash functions are division method, multiplication method, mid square method, folding method etc. A Perfect hash function brings no hash clash occurence. A hash table is a data structure used to implement an associative array (a structure that can map keys to values). The hash table becomes slower, as the load factor grows larger.

Summary

Hash function transforms a key into a table index. Some hash functions are division method, multiplication method, mid square method, folding method etc. A Perfect hash function brings no hash clash occurence. A hash table is a data structure used to implement an associative array (a structure that can map keys to values). The hash table becomes slower, as the load factor grows larger.

Things to Remember

  1. Hash function transforms a key into a table index
  2.  Some hash functions are division method, multiplication method, mid square method, folding method etc.
  3.  A Perfect hash function brings no hash clash occurence.
  4. A hash table is a data structure used to implement an associative array (a structure that can map keys to values). 
  5. The hash table becomes slower, as the load factor grows larger

MCQs

No MCQs found.

Subjective Questions

No subjective questions found.

Videos

No videos found.

Hash Function and Hash Tables

Hash Function and Hash Tables

Hash Function

A hash function transforms a key into a table index.It takes keys or other data as input and produces a hash(key) as the output result.A hash function can be used to map data of arbitrary size to data of fixed size. The values returned by a hash function are called hash values, hash codes simply hashes.

There are different hash functions used for hashing. Some of them are described below:

1. Division Method

Algorithm:

* Receive TableSize (Automatically or from User)
* Receive Key
*Return key % table size
-The algorithm can be implemented via resolution techniques such as rehashing
-When TableSize is prime it is better

2. Multiplication Method

Algorithm:

-Receive key
-hash = floor(m*frac(c*key))
frac -> fractional part
c -> real number between 0 and 1
-Return hash

3. Mid Square Method

Algorithm:

-Receive Key
-inter = key * key
-hash = mid(inter,i,j)
-Return hash

  • Number Multiplied by itself and some middle digits are extracted depending upon the size
  • If the square is considered as a decimal number table size must
    be the power of 10 and if binary it should be a power of 2.
  • Problem: Doesn’t yield uniform hash values and not better
    than before mentioned methods.

4. Folding Method

Algorithm:

-Receive Key
-while( a xoredcomplete number is processed)
x[i ]= fixed no. of digits
-hash=XOR or add all x[i]
-return hash

Breaks the key into several parts that are addeXORedtogether to form hash values.
Example:
Let us take : 010111001010110
Let us divide them into:
01011 10010 10110
Let us XOR them
01111 (15)
So we have a small hash

*Specially useful for generating file hashes.

Perfect Hash Functions

Given a set of keys, a perfect hash function is a hash function h such that h(k)!=h(j) for all distinct k and j, i.e no hash clash occur under any condition.
This is an ideal definition and it is very difficult task to derive a hash function fulfilling above requirements for the set of keys.
As it fails to work for expanded sets in general, sofor n positions in a hash table for n keys if we have a perfect hash function it is called Minimal.
There are such algorithms for determining Minimal eg. By Sprungli Algorithms which are complex in nature.
eg. Sprungli Quotient Reduction PHF
Sprungli Remainder Reduction PHF,etc.
Jaeschke Reciprocal Hashing: (c / (d * key + e) ) % tablesize

Hash Table

A hash table is a data structure used to implement an associative array (a structure that can map keys to values). It uses the hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

Ideally, the hash function used in the hash table will assign each key to a unique bucket, but it is possible that two keys will generate an identical hash causing both keys to point to the same bucket. This needs to be taken care of in a hash table.

Fig: A small phone book as a hash table


Fig: A small phone book as a hash table

The time complexity in big O notation in a hash table is given below:

Operation Average Case Worst Case
Space O(n) O(n)
Search O(1) O(n)
Insert O(1) O(n)
Delete O(1) O(n)

In many situations, hash tables turn out to be more efficient than search trees or other table lookup structures. For this reason, they are widely used in many kinds of computer software, mainly for associative arrays, database indexing, caches and sets.

A critical statistic for a hash table is the load factor, that is the number of entries divided by the number of buckets:

Load factor  = \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\frac {n}{k}

where:

n = number of entries k = number of buckets

The hash table becomes slower, as the load factor grows larger, and finally even may fail to work. The expected constant time property of a hash table assumes that the load factor is kept below some bound.

Advantages of Hash Tables

  • The main advantage of hash tables over other table data structures is speed. This advantage is more apparent when the number of entries is large. Hash tables are particularly efficient when the maximum number of entries can be predicted in advance, so that the array can be allocated once with the optimum size and never resized.
  • If the set of key-value pairs is fixed and known ahead of time (so insertions and deletions are not allowed), one may reduce the average lookup cost by a careful choice of the hash function, table size, and internal data structures. In particular, one may be able to devise a hash function that is collision-free, or even perfect. In this case the keys need not be stored in the table.

Drawbacks of Hash Tables

  • Although operations on a hash table take constant time on average, the cost of a good hash function can be significantly higher than the inner loop of the lookup algorithm for a sequential list or search tree. Thus hash tables are not effective when the number of entries is very small. (However, in some cases the high cost of computing the hash function can be mitigated by saving the hash value together with the key.)
  • For certain string processing applications, such as spell-checking, hash tables may be less efficient than trees, finite automata. Also, if each key is represented by a small enough number of bits, then, instead of a hash table, one may use the key directly as the index into an array of values. Note that there are no collisions in this case.
  • If the keys are not stored (because the hash function is collision-free), there may be no easy way to enumerate the keys that are present in the table at any given moment.
  • Although theaveragecost per operation is constant and fairly small, the cost of a single operation may be quite high. In particular, if the hash table usesdynamic resizing, an insertion or deletion operation may occasionally take time proportional to the number of entries. This may be a serious drawback in real-time or interactive applications.
  • Hash tables in general exhibit poorlocality of reference—that is, the data to be accessed is distributed seemingly at random in memory. Because hash tables cause access patterns that jump around, this can triggermicroprocessor cachemisses that cause long delays. Compact data structures such as arrays searched withlinear searchmay be faster, if the table is relatively small and keys are integers or other short strings. According toMoore's Law, cache sizes are growing exponentially and so what is considered "small" may be increasing. The optimal performance point varies from system to system.

  • Hash tables become quite inefficient when there are many collisions. While extremely uneven hash distributions are extremely unlikely to arise by chance, amalicious adversarywith knowledge of the hash function may be able to supply information to a hash that creates worst-case behavior by causing excessive collisions, resulting in very poor performance, e.g. adenial of service attack.In critical applications,universal hashingcan be used; a data structure with better worst-case guarantees may be preferable.

References

1.https://en.wikipedia.org/wiki/Hash_function

2.https://en.wikipedia.org/wiki/Hash_table

3.Karumanchi, N. "Data Structures and Algorithms Made Easy"

Lesson

Searching

Subject

Computer Engineering

Grade

Engineering

Recent Notes

No recent notes.

Related Notes

No related notes.