Query Operations

Query Operation Select operation File scan: It search algorithms that locate and retrieve records that fulfill a selection condition. Algorithm A1 (linear search): This algorithm scan each file block and test all records to see whether they satisfy the selection condition. Index scan - This search algorithms that use an index. The selection condition must be on search-key of index. Algorithm A3 (primary index on candidate key, equality): This algorithm retrieves a single record which satisfies the corresponding equality condition. Algorithm A4 ( primary index on non-key, equality): This algorithm retrieves multiple records. The records will be on consecutive blocks. Algorithm A5 (equality on search-key of secondary index): This algorithm retrieves a single record if the search-key is a candidate key whose cost =( hi + 1) * (tT + tS). It retrieves multiple keys if the search-key is not a candidate key where each of n matching records may be on a different block whose cost =( hi + n) * (tT + tS). This can be expensive. It can implement selections of the form σA≤ V(r) or σA≥ V(r) by using a linear file scan or binary search or by using indices. Algorithm A6 (primary index, comparison): For σA≤ V(r), use index to find first tuple≥ v and scan relation sequentially from there. For σA≥ V(r) just scan the relation sequentially till first tuple >v and do not use index. Algorithm A7 (secondary index, comparison): For σA≤ V(r), use index to find first index entry ≥ v and scan index sequentially from there. For σA≥ V(r) just scan leaf pages of index finding pointers to records, till first entry >v. Sorting may be requested by the query for example order by. It is an important preprocessing steps for other queries such as those that involve a join. If records are completely in main memory then standard sorting algorithms such as quick sort apply. Otherwise some records are still on disk resulting in what is called external sorting. Common external sorting techniques is an external short - merge which cumulatively sorts multiple runs of the data based on amount that fits in memory at one time. There are several different algorithms to implement joins and they are nested-loop join, merge-join and hash-join.

Summary

Query Operation Select operation File scan: It search algorithms that locate and retrieve records that fulfill a selection condition. Algorithm A1 (linear search): This algorithm scan each file block and test all records to see whether they satisfy the selection condition. Index scan - This search algorithms that use an index. The selection condition must be on search-key of index. Algorithm A3 (primary index on candidate key, equality): This algorithm retrieves a single record which satisfies the corresponding equality condition. Algorithm A4 ( primary index on non-key, equality): This algorithm retrieves multiple records. The records will be on consecutive blocks. Algorithm A5 (equality on search-key of secondary index): This algorithm retrieves a single record if the search-key is a candidate key whose cost =( hi + 1) * (tT + tS). It retrieves multiple keys if the search-key is not a candidate key where each of n matching records may be on a different block whose cost =( hi + n) * (tT + tS). This can be expensive. It can implement selections of the form σA≤ V(r) or σA≥ V(r) by using a linear file scan or binary search or by using indices. Algorithm A6 (primary index, comparison): For σA≤ V(r), use index to find first tuple≥ v and scan relation sequentially from there. For σA≥ V(r) just scan the relation sequentially till first tuple >v and do not use index. Algorithm A7 (secondary index, comparison): For σA≤ V(r), use index to find first index entry ≥ v and scan index sequentially from there. For σA≥ V(r) just scan leaf pages of index finding pointers to records, till first entry >v. Sorting may be requested by the query for example order by. It is an important preprocessing steps for other queries such as those that involve a join. If records are completely in main memory then standard sorting algorithms such as quick sort apply. Otherwise some records are still on disk resulting in what is called external sorting. Common external sorting techniques is an external short - merge which cumulatively sorts multiple runs of the data based on amount that fits in memory at one time. There are several different algorithms to implement joins and they are nested-loop join, merge-join and hash-join.

Things to Remember

  • In select operation, file scan search algorithms that locate and retrieve records that fulfill a selection condition.
  • Algorithm A1 (linear search): This algorithm scan each file block and test all records to see whether they satisfy the selection condition.
  • Algorithm A2 (binary search): This algorithm is applicable if selection is an equality comparison on the attribute on which file is ordered.
  • Index scan search algorithms that use an index. The selection condition must be on search-key of index.
  • Algorithm A3 (primary index on candidate key, equality): This algorithm retrieves a single record which satisfies the corresponding equality condition. 
  • Algorithm A4 ( primary index on non-key, equality): This algorithm retrieves multiple records. The records will be on consecutive blocks. 
  • Algorithm A5 (equality on search-key of secondary index): This algorithm retrieves a single record if the search-key is a candidate key whose cost =( hi + 1) * (tT + tS). It retrieves multiple keys if the search-key is not a candidate key where each of n matching records may be on a different block whose cost =( hi + n) * (tT + tS). This can be expensive. It can implement selections of the form σA≤ V(r) or σA≥ V(r) by using a linear file scan or binary search or by using indices.
  • Algorithm A6 (primary index, comparison): For σA≤ V(r), use index to find first tuple≥ v and scan relation sequentially from there. For σA≥ V(r) just scan the relation sequentially till first tuple >v and do not use index.
  • Algorithm A7 (secondary index, comparison): For σA≤ V(r), use index to find first index entry ≥ v and scan index sequentially from there. For σA≥ V(r) just scan leaf pages of index finding pointers to records, till first entry >v. 
  • Sorting may be requested by the query for example order by. It is an important preprocessing steps for other queries such as those that involve a join.
  • If records are completely in main memory then standard sorting algorithms such as quick sort apply. Otherwise some records are still on disk resulting in what is called external sorting.
  • Common external sorting techniques is an external short - merge which cumulatively sorts multiple runs of the data based on amount that fits in memory at one time.
  • There are several different algorithms to implement joins and they are nested-loop join, merge-join and hash-join. 

MCQs

No MCQs found.

Subjective Questions

No subjective questions found.

Videos

No videos found.

Query Operations

Query Operations

Query Operation

Select operation

File scan: It search algorithms that locate and retrieve records that fulfill a selection condition.

Algorithm A1 (linear search): This algorithm scan each file block and test all records to see whether they satisfy the selection condition.

  • Cost estimate = brblock transfers + 1 seek
    br denotes number of blocks containing records from relation r.
  • If the selection is on a key attribute then it can stop on finding record
    cost = (br/ 2) block transfers + 1 seek
  • Linear search can be applied regardless of
    selection condition or
    ordering of records in the file, or
    availability of indices

Algorithm A2 (binary search): This algorithm is applicable if selection is an equality comparison on the attribute on which file is ordered.

  • Assume that the blocks of a relation are stored contiguously.
  • Cost estimate or in other words we can say that the number of disk blocks to be scanned:
    cost of locating the first tuple by a binary search on the blocks
    ⌈log2 (br)⌉ * (tr + ts)
    If there are multiple records satisfying selection then add transfer cost of the number of blocks containing records.

Index scan - This search algorithms that use an index. The selection condition must be on search-key of index.

Algorithm A3 (primary index on candidate key, equality): This algorithm retrieves a single record which satisfies the corresponding equality condition.

  • Cost = ( hi + 1) * (tT + tS)

Algorithm A4 ( primary index on non-key, equality): This algorithm retrieves multiple records. The records will be on consecutive blocks.

  • Let b = number of blocks containing matching records.
  • hi * (tT + tS) +tS +tT* b

Algorithm A5 (equality on search-key of secondary index): This algorithm retrieves a single record if the search-key is a candidate key whose cost =( hi + 1) * (tT + tS). It retrieves multiple keys if the search-key is not a candidate key where each of n matching records may be on a different block whose cost =( hi + n) * (tT + tS). This can be expensive. It can implement selections of the form σA≤ V(r) or σA≥ V(r) by using a linear file scan or binary search or by using indices.

Algorithm A6 (primary index, comparison): For σA≤ V(r), use index to find first tuple≥ v and scan relation sequentially from there. For σA≥ V(r) just scan the relation sequentially till first tuple >v and do not use index.

Algorithm A7 (secondary index, comparison): For σA≤ V(r), use index to find first index entry ≥ v and scan index sequentially from there. For σA≥ V(r) just scan leaf pages of index finding pointers to records, till first entry >v. In either case, retrieve records that are pointed to

  • requires an I/O for each record
  • Linear file scan may be cheaper

Sorting

Sorting may be requested by the query for example order by. It is an important preprocessing steps for other queries such as those that involve a join. If records are completely in main memory then standard sorting algorithms such as quick sort apply. Otherwise some records are still on disk resulting in what is called external sorting. Common external sorting techniques is an external short - merge which cumulatively sorts multiple runs of the data based on amount that fits in memory at one time.

Join Operation

There are several different algorithms to implement joins and they are as follows:

  • Nested-loop join
  • Merge-join
  • Hash-join

Choices to implement joins are taken based on the cost estimations.

Nested-Loop Join

To compute the theta join, r⋈θ S

for each tuple trin r do begin

for each tuple tsin s do begin

test pair (tr,ts) to see if they satisfy the join condition θ

if they do add tr* tsto the result.

end

end

Here, r is called the outer relation and s is called the inner relation of the join. The nested-loop join does not require any indices and can be used with any kind of join condition. It is expensive since it examines every pair of tuples in the two relations. In the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is nr * bs+ br block transfers, plus ns + brseeks. If the smaller relation fits entirely in memory, it is used as the inner relation. Doing this, the cost is reduced to br+ bsblock transfers and 2 seeks. Assuming the availability of worst case memory the cost estimate is

  • with depositor as outer relation
    5000 * 400 + 100 = 2,000,100 block transfers,
    5000 + 100 = 5100 seeks
  • with customer as outer relation
    10000 * 100 + 400 = 1,000,400 block transfers and 10,400 seeks

If smaller relation (depositor) fits entirely in memory, the cost estimate will be 500 block transfers.

Examples of Nested-loop join costs

Compute depositor customer, with depositor as the outer relation. Let a customer have a primary B+-tree index on the join attribute customer-name, which contains 20 entries in each index node. Since the customer has 10,000 tuples, the height of the tree s 4, and one more access is needed to find the actual data. The depositor has 500 tuples.

  • Cost of block nested loops join
    400 * 100 + 100 = 40,100 block transfers + 2 * 100 = 200 seeks. Hence this can be considered as the worst case memory and it can be significantly less with more memory.
  • Cost of indexed nested loops join
    + 5000 * 5 = 25,100 block transfers and seeks. The cost of CPU is likely to be less than that for block nested loop joins.

Merge-Join

This type of join sorts both relations on their join attributes (if not already sorted on the join attributes). Then it merges the sorted relations to join them

  • Join step is similar to the merge stage of the sort- merge algorithm.
  • The main difference is to handle duplicate values in join attribute where every pair with same value on join attribute must be matched.

.

This join can be used only for equi-joins and natural joins. Each block needs to be read only once (assuming all tuples for any given value of the join attribute fits in memory). Thus the cost of merge join is:

br + bsblock transfers + ⌈br / bb⌉+ ⌈bs / bb⌉seeks. + The costs of sorting if relations are unsorted.

If one relation is sorted and the other relation has a secondary B+- tree index on the join attributes then such join is referred to as hybrid merge-join.

  • Merge the sorted relation with the leaf entries of the B+-tree
  • Sort the result on the addresses of the unsorted relation's tuples.
  • Scan the unsorted relation in physical address order and merge with previous result, to replace addresses by the actual tuples. Sequential scan is more efficient than the random lookup

Hash-Join

Hash-Join is applicable for equi-joins and natural joins. A hash function h is used to partition tuples of both relations. h maps JoinAttrs values to {0, 1, ......, n}, where JoinAttrs denotes the common attributes of r and s used in the natural join. r0,r1,....., rndenotes partitions of r tuples. Each tuple tr∈ r is put in partition riwhere i = h (tr[JoinAttrs]).

r0, r1,....., rndenotes partitions of s tuples. Each tuple ts∈ s is put in partition siwhere i = h (ts[JoinAttrs]).

r tuples in rineed only to be compared with s tuples in si and it does not need to be compared with s tuples in any other partition, since an r tuple and an s tuple that satisfy the join condition will have the same value for the join attributes and ifthis value is hashed to some value i then the r tuple has to be in ri and the s tuple in si.

Example of Cost of Hash-Join: customer⋈depositor,

  • Assume that memory size is 20 blocks
  • bdepositor = 100 and bcustomer = 400.
  • Depositor is to be used as built input. Partition it into five partitions, each of size 20 blocks. This partitioning can be done in one pass.
  • Similarly partition the customer into five partitions, each of size 80. This is also done in one past.
  • Therefore if we ignore the cost of writing partially filled blocks then the total cost is:
    3 (100 + 400) = 1500 block transfers + 2 (⌈ 100/3⌉ +⌈ 400/3⌉) = 336 seeks.

References:

  1. H.F.Korth and A. Silberschatz,"Database system concepts",McGraw Hill,2010
  2. A.K.Majumdar and p, Bhatt acharya,"Database Management Systems",Tata McGraw Hill,India,2004
  3. F.Korth, Henry. Database System Concepts. 6th edition.

Lesson

Query Processing and Optimization

Subject

Computer Engineering

Grade

Engineering

Recent Notes

No recent notes.

Related Notes

No related notes.