Hashing Concepts, Static and Dynamic Hashing

A hash function 'h' is a function from the set of all search-key values k to the set of all bucket addresses B. A hash function is used to locate records for access, insertion as well as deletion. The records with different search-key values may be mapped to the same bucket thus entire bucket has to be searched sequentially to locate a record. A bucket is a unit of storage containing one or more records. A bucket can typically be understood as a disk block. In a hash file organization we obtain the bucket of a record directly from the search-key value using a hash function. Bucket overflow can occur because of the reasons such as insufficient buckets and skew. Overflow chaining is referred to as the overflow buckets of a given bucket that are chained together in a linked list called closed hashing. If a record must be insert into a new bucket b where b is already full then the system provides an overflow bucket for b and inserts the record into the overflow bucket. If the overflow is also full then the system provides another overflow bucket and so on. Open addressing are of two types and they are linear probing and quadratic probing. Hashing can be used not only for organizing files but also for creating index-structure. A hash index organizes the search keys with their associated record pointers into a hash file structure. Precisely speaking, hash indices are always secondary indices. If the file itself is organized using hashing then a separate primary hash index on it using the same search-key is unnecessary. However we use the term hash index to refer as both secondary index structures and hash organized files. In static hashing, function h maps search-key which values to a fixed set of B of bucket addresses. Databases happen to grow or shrink with time. In this hashing scheme the set of keys can be varied and the address space is allocated dynamically. Extendable hashing is good for database that grows and shrink in size. It allows the hash function to be modified dynamically. Extendable hashing is one form of dynamic hashing. Extendable hashing accesses data stored in buckets indirectly through an index that is dynamically adjusted to reflect changes in the file. The characteristic feature of extendable hashing is the organization of the index which is an expendable table.

Summary

Things to Remember

A hash function 'h' is a function from the set of all search-key values k to the set of all bucket addresses B. A hash function is used to locate records for access, insertion as well as deletion.
The records with different search-key values may be mapped to the same bucket thus entire bucket has to be searched sequentially to locate a record.
A bucket is a unit of storage containing one or more records. A bucket can typically be understood as a disk block.
In a hash file organization we obtain the bucket of a record directly from the search-key value using a hash function.
Bucket overflow can occur because of the reasons such as insufficient buckets and skew.
Overflow chaining is referred to as the overflow buckets of a given bucket that are chained together in a linked list called closed hashing.
If a record must be insert into a new bucket b where b is already full then the system provides an overflow bucket for b and inserts the record into the overflow bucket.
If the overflow is also full then the system provides another overflow bucket and so on.
Open addressing are of two types and they are linear probing and quadratic probing.
Hashing can be used not only for organizing files but also for creating index-structure. A hash index organizes the search keys with their associated record pointers into a hash file structure. Precisely speaking, hash indices are always secondary indices.
If the file itself is organized using hashing then a separate primary hash index on it using the same search-key is unnecessary. However we use the term hash index to refer as both secondary index structures and hash organized files.
Extendable hashing is good for database that grows and shrink in size. It allows the hash function to be modified dynamically. Extendable hashing is one form of dynamic hashing. Extendable hashing accesses data stored in buckets indirectly through an index that is dynamically adjusted to reflect changes in the file.

MCQs

No MCQs found.

Subjective Questions

No subjective questions found.

Videos

No videos found.

Hashing Concepts, Static and Dynamic Hashing

A bucket is a unit of storage containing one or more records. A bucket can typically be understood as a disk block. In a hash file organization we obtain the bucket of a record directly from the search-key value using a hash function. A hash function 'h' is a function from the set of all search-key values k to the set of all bucket addresses B. A hash function is used to locate records for access, insertion as well as deletion. The records with different search-key values may be mapped to the same bucket thus entire bucket has to be searched sequentially to locate a record.

Given below is a hash file organization of account file, using branch_name as key

There are 10 buckets.
The binary representation of the i^th character is assumed to be the integer i.
The hash function returns the sum of the binary representation of the characters modulo 10. For example: h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3

The worst hash function maps all search-key values to the same bucket and this makes access-time proportional to the number of search-key values in the file. Ideal hash function is uniform that is each bucket is assigned same number of search-key values from the set of all possible values. Ideal hash function is random so each bucket will have same number of records assigned to it irrespective of the actual distribution of search-key values in the file. A typical hash functions perform computation on the internal binary representation of the search-key. For example, for a string search-key, the binary representations of all the characters in the string could be added and the sum modulo the number of buckets could be returned.

Bucket overflow can occur because of the following reasons:

Insufficient buckets
nB must be chosen such that nB > nr/fr; where nB = number of bucket, n = number of record, fr= number of records that will fit in a bucket.
Skew
A bucket may overflow even when buckets still have space, this situation is called skew. Skew can occur due to two reasons and they are:
Multiple records have same search-key value.
Chosen hash function produces non-uniform distribution of key values.

Although the probability of bucket overflow can be reduced it cannot be eliminated and this is handled by using overflow buckets.

Overflow chainingis referred to as the overflow buckets of a given bucket that are chained together in a linked list called closed hashing.

Closed hashing: If a record must be insert into a new bucket b where b is already full then the system provides an overflow bucket for b and inserts the record into the overflow bucket. If the overflow is also full then the system provides another overflow bucket and so on.

An alternative method called open hashing which does not use overflow buckets is not suitable for database applications.

Open addressing are of two types and they are:

Linear probing
Quadratic probing

Hash Indices

Hashing can be used not only for organizing files but also for creating index-structure. Ahash index organizes the search keys with their associated record pointers into a hash file structure. Precisely speaking, hash indices are always secondary indices. If the file itself is organized using hashing then a separate primary hash index on it using the same search-key is unnecessary. However we use the term hash index to refer as both secondary index structures and hash organized files.

Deficiencies of Static Hashing

In static hashing, function h maps search-key which values to a fixed set of B of bucket addresses. Databases happen to grow or shrink with time.

If initial number of buckets is too small and file grows performance will degrade due to excessive overflow.
If a space is allocated for anticipated growth then a significant amount of space will be wasted initially (and buckets will be under full).
If database shrinks, again the space will be wasted.

There is one solution and that is, periodic re-organization of the file with a new hash function which is expensive and disrupts normal operations. However the better solution will be to allow the number of buckets to be modified dynamically.

Extendable Hashing

In this hashing scheme the set of keys can be varied and the address space is allocated dynamically. Extendable hashing is good for database that grows and shrink in size. It allows the hash function to be modified dynamically. Extendable hashing is one form of dynamic hashing. The keys are stored in buckets. Each bucket can only hold a fixed size of items. Index is an extendable table where h(x) hashes a key value to x yo a bit map and only a portion of a bit map is used to build a directory. Once the bucket is full, split the bucket into two. Two situations will be possible:

Directory remains of the same size adjust pointer to a bucket.
Size grows from 2^k to 2^k+1 i.e. directory size can be 1, 2, 4, 8, 16 etc. Number of buckets will remain the same that is some references will point to the same bucket.

Finally, one can use a bitmap to build the index but store an actual key in the bucket! Assume that a hashing technique is applied to dynamically changing file composed of buckets and each bucket can hold only a fixed number of items. Extendable hashing accesses data stored in buckets indirectly through an index that is dynamically adjusted to reflect changes in the file. The characteristic feature of extendable hashing is the organization of the index which is an expendable table. A hash function applied to a certain key indicates a position in the index and not in the file ( or table or keys). Values returned by such a hash function are called pseudo keys. The files require no re-organization when data are added to or deleted from it since these changes are indicated in the index. Only one hash function h can be used but depending on the size of the index, only the portion of the added h(K) is utilized. A simple way to achieve this effect is by looking at the address into the string of bits from which only the i leftmost bits can be used.

References:

H.F.Korth and A. Silberschatz,"Database system concepts",McGraw Hill,2010
A.K.Majumdar and p, Bhatt acharya,"Database Management Systems",Tata McGraw Hill,India,2004
F.Korth, Henry. Database System Concepts. 6th edition.

Lesson

File Structure and Hashing

Subject

Computer Engineering

Grade

Engineering

Recent Notes

No recent notes.

Related Notes

No related notes.