Collision Resolution Technique

Collision is the state in which two different keys produce the same address. Collision needs to resolved. Chaining and open addressing are the basic methods of resolving collision. In chaining, linked lists are maintained at every hash index for collided elements. In open addressing, the new address after collision is found by looking at other empty slots of the table.

Summary

Collision is the state in which two different keys produce the same address. Collision needs to resolved. Chaining and open addressing are the basic methods of resolving collision. In chaining, linked lists are maintained at every hash index for collided elements. In open addressing, the new address after collision is found by looking at other empty slots of the table.

Things to Remember

  1. Collision is the state in which two different keys produce the same address.
  2. Collision needs to resolved. Chaining and open addressing are the basic methods of resolving collision.
  3. In chaining, linked lists are maintained at every hash index for collided elements.
  4. In open addressing, the new address after collision is found by looking at other empty slots of the table.

MCQs

No MCQs found.

Subjective Questions

No subjective questions found.

Videos

No videos found.

Collision Resolution Technique

Collision Resolution Technique

When two different keys produce the same address, there is a collision. The keys involved are called synonyms.Coming up with a hashing function that avoids collision is extremely difficult. It is best to simply find ways to deal with them. The possible solution, can be:

  • Spread out the records
  • Use extra memory
  • Put more than one record at a single address.

An example of Collision

Hash table size: 11
Hash function: key mod hash size
So, the new positions in the hash table are:

1

Some collisions occur with this hash function as shown in the above figure.

Another example (in a phonebook record):

1

2

Here, the buckets for keys 'John Smith' and 'Sandra Dee' are the same. So, its a collision case.

Collision Resolution

Collision occurs when h(k1) = h(k2), i.e. the hash function gives the same result for more than one key. The strategies used for collision resolution are:

  • Chaining
    • Store colliding keys in a linked list at the same hash table index
  • Open Addressing
    • Store colliding keys elsewhere in the table

Chaining

Fig: Separate Chaining


Fig: Separate Chaining

Strategy:
Maintains a linked list at every hash index for collided elements.

Lets take the example of an insertion sequence: {0 1 4 9 16 25 36 49 64 81}

Here, h(k) = k mod tablesize = k mod 10 (tablesize = 10)

  • Hash table T is a vector of linked lists
    Insert element at the head (as shown here) or at the tail
  • Key k is stored in list at T[h(k)]

So, the problem is like: "Insert the first 10 perfece squares in a hash table of size 10"

The hash table looks like:

Fig: A hash table



Collision Resolution by Chaining: Analysis

  • Load factor λ of a hash table T is defined as follows:
    N = number of elements in T (“current size”)
    M = size of T (“table size”)
    λ = N/M (“ load factor”)
    i.e., λ is the average length of a chain
  • Unsuccessful search time: O(λ)
    Same for insert time
  • Successful search time: O(λ/2)
  • Ideally, want λ ≤ 1 (not a function of N)

Potential diadvantages of Chaining

  • Linked lists could get long
    Especially when N approaches M
    Longer linked lists could negatively impact performance
  • More memory because of pointers
  • Absolute worst-case (even if N << M):
    All N elements in one linked list!
    Typically the result of a bad hash function

Open Addressing

Fig: Open Addressing


Fig: Open Addressing

As shown in the above figure, in open addressing, when collision is encountered, the next key is inserted in the empty slot of the table. So, it is an 'inplace' approach.

Advantages over chaining

  • No need for list structures
  • No need to allocate/deallocate memory during insertion/deletion (slow)

Diadvantages

  • Slower insertion – May need several attempts to find an empty slot
  • Table needs to be bigger (than chaining-based table) to achieve average-case constant-time performance
    Load factor λ ≈ 0.5

Probing

The next slot for the collided key is found in this method by using a technique called "Probing". It generates a probe sequence of slots in the hash table and we need to chose the proper slot for the key 'x'.

  • h0(x), h1(x), h2(x), …
  • Needs to visit each slot exactly once
  • Needs to be repeatable (so we can find/delete what we’ve inserted)
  • Hash function
    • hi(x) = (h(x) + f(i)) mod TableSize
    • f(0) = 0 ==> position for the 0th probe
    • f(i) is “the distance to be traveled relative to the 0th probe position, during the ith probe”.

Some of the common methods of probing are:

1. Linear Probing:

Suppose that a key hashes into a position that has been already occupied. The simplest strategy is to look for the next available position to place the item. Suppose we have a set of hash codes consisting of {89, 18, 49, 58, 9} and we need to place them into a table of size 10. The following table demonstrates this process.

1.

The first collision occurs when 49 hashes to the same location with index 9. Since 89 occupies the A[9], we need to place 49 to the next available position. Considering the array as circular, the next available position is 0. That is (9+1) mod 10. So we place 49 in A[0].

Several more collisions occur in this simple example and in each case we keep looking to find the next available location in the array to place the element. Now if we need to find the element, say for example, 49, we first compute the hash code (9), and look in A[9]. Since we do not find it there, we look in A[(9+1) % 10] = A[0], we find it there and we are done.

So what if we are looking for 79? First we compute hashcode of 79 = 9. We probe in A[9], A[(9+1)]=A[0], A[(9+2)]=A[1], A[(9+3)]=A[2], A[(9+4)]=A[3] etc. Since A[3] = null, we do know that 79 could not exists in the set.

Issues with Linear Probing:

  • Probe sequences can get longer with time
  • Primary clustering
    • Keys tend to cluster in one part of table
    • Keys that hash into cluster will be added to the end of the cluster (making it even bigger)
    • Side effect: Other keys could also get affected if mapping to a crowded neighborhood





2. Quadratic Probing:

Although linear probing is a simple process where it is easy to compute the next available location, linear probing also leads to some clustering when keys are computed to closer values. Therefore we define a new process of Quadratic probing that provides a better distribution of keys when collisions occur. In quadratic probing, if the hash value is K , then the next location is computed using the sequence K + 1, K + 4, K + 9 etc..

The following table shows the collision resolution using quadratic probing.

2

  • Avoids primary clustering
  • f(i) is quadratic in i: eg: f(i) = i2
  • hi(x) = (h(x) + i2) mod tablesize

Quadratic Probing: Analysis

  • Difficult to analyze
  • Theorem
    New element can always be inserted into a table that is at least half empty and TableSize is prime
  • Otherwise, may never find an empty slot, even is one exists
  • Ensure table never gets half full
    If close, then expand it
  • May cause “secondary clustering”
  • Deletion
    Emptying slots can break probe sequence and could cause find stop prematurely
  • Lazy deletion:Differentiate between empty and deleted slot
    When finding skip and continue beyond deleted slots
    If you hit a non-deleted empty slot, then stop find procedure returning “not found”
  • May need compaction at some time

3. Double Hashing

Double hashing uses the idea of applying a second hash function to the key when a collision occurs. The result of the second hash function will be the number of positions form the point of collision to insert.

There are a couple of requirements for the second function:

  • it must never evaluate to 0
  • must make sure that all cells can be probed

A popular second hash function is: Hash2(key) = R - ( key % R ) where R is a prime number that is smaller than the size of the table.

2

4. Hashing with Rehashing

Once the hash table gets too full, the running time for operations will start to take too long and may fail. To solve this problem, a table at least twice the size of the original will be built and the elements will be transferred to the new table.

The new size of the hash table:

  • should also be prime
  • will be used to calculate the new insertion spot (hence the name rehashing)

This is a very expensive operation! O(N) since there are N elements to rehash and the table size is roughly 2N. This is ok though since it doesn't happen that often.

The question becomes when should the rehashing be applied?

Some possible answers:

  • once the table becomes half full
  • once an insertion fails
  • once a specific load factor has been reached, where load factor is the ratio of the number of elements in the hash table to the table size


References:

1.http://faculty.cs.niu.edu/~freedman/340/340notes/340hash.htm

2.https://en.wikipedia.org/wiki/Double_hashing

3.https://en.wikipedia.org/wiki/Quadratic_probing

4.Karumanchi, N. "Data Structures and Algorithms Made Easy"

Lesson

Searching

Subject

Computer Engineering

Grade

Engineering

Recent Notes

No recent notes.

Related Notes

No related notes.