Query Operations

Query Operation Select operation File scan: It search algorithms that locate and retrieve records that fulfill a selection condition. Algorithm A1 (linear search): This algorithm scan each file block and test all records to see whether they satisfy the selection condition. Index scan - This search algorithms that use an index. The selection condition must be on search-key of index. Algorithm A3 (primary index on candidate key, equality): This algorithm retrieves a single record which satisfies the corresponding equality condition. Algorithm A4 ( primary index on non-key, equality): This algorithm retrieves multiple records. The records will be on consecutive blocks. Algorithm A5 (equality on search-key of secondary index): This algorithm retrieves a single record if the search-key is a candidate key whose cost =( hi + 1) * (tT + tS). It retrieves multiple keys if the search-key is not a candidate key where each of n matching records may be on a different block whose cost =( hi + n) * (tT + tS). This can be expensive. It can implement selections of the form σA≤ V(r) or σA≥ V(r) by using a linear file scan or binary search or by using indices. Algorithm A6 (primary index, comparison): For σA≤ V(r), use index to find first tuple≥ v and scan relation sequentially from there. For σA≥ V(r) just scan the relation sequentially till first tuple >v and do not use index. Algorithm A7 (secondary index, comparison): For σA≤ V(r), use index to find first index entry ≥ v and scan index sequentially from there. For σA≥ V(r) just scan leaf pages of index finding pointers to records, till first entry >v. Sorting may be requested by the query for example order by. It is an important preprocessing steps for other queries such as those that involve a join. If records are completely in main memory then standard sorting algorithms such as quick sort apply. Otherwise some records are still on disk resulting in what is called external sorting. Common external sorting techniques is an external short - merge which cumulatively sorts multiple runs of the data based on amount that fits in memory at one time. There are several different algorithms to implement joins and they are nested-loop join, merge-join and hash-join.

Summary

Things to Remember

In select operation, file scan search algorithms that locate and retrieve records that fulfill a selection condition.
Algorithm A1 (linear search): This algorithm scan each file block and test all records to see whether they satisfy the selection condition.
Algorithm A2 (binary search): This algorithm is applicable if selection is an equality comparison on the attribute on which file is ordered.
Index scan search algorithms that use an index. The selection condition must be on search-key of index.
Algorithm A3 (primary index on candidate key, equality): This algorithm retrieves a single record which satisfies the corresponding equality condition.
Algorithm A4 ( primary index on non-key, equality): This algorithm retrieves multiple records. The records will be on consecutive blocks.
Algorithm A5 (equality on search-key of secondary index): This algorithm retrieves a single record if the search-key is a candidate key whose cost =( hi + 1) * (tT + tS). It retrieves multiple keys if the search-key is not a candidate key where each of n matching records may be on a different block whose cost =( hi + n) * (tT + tS). This can be expensive. It can implement selections of the form σA≤ V(r) or σA≥ V(r) by using a linear file scan or binary search or by using indices.
Algorithm A6 (primary index, comparison): For σA≤ V(r), use index to find first tuple≥ v and scan relation sequentially from there. For σA≥ V(r) just scan the relation sequentially till first tuple >v and do not use index.
Algorithm A7 (secondary index, comparison): For σA≤ V(r), use index to find first index entry ≥ v and scan index sequentially from there. For σA≥ V(r) just scan leaf pages of index finding pointers to records, till first entry >v.
Sorting may be requested by the query for example order by. It is an important preprocessing steps for other queries such as those that involve a join.
If records are completely in main memory then standard sorting algorithms such as quick sort apply. Otherwise some records are still on disk resulting in what is called external sorting.
Common external sorting techniques is an external short - merge which cumulatively sorts multiple runs of the data based on amount that fits in memory at one time.
There are several different algorithms to implement joins and they are nested-loop join, merge-join and hash-join.

MCQs

No MCQs found.

Subjective Questions

No subjective questions found.

Videos

No videos found.

Query Operations

Query Operation

Select operation

File scan: It search algorithms that locate and retrieve records that fulfill a selection condition.

Algorithm A1 (linear search): This algorithm scan each file block and test all records to see whether they satisfy the selection condition.

Cost estimate = b_rblock transfers + 1 seek
b_r denotes number of blocks containing records from relation r.
If the selection is on a key attribute then it can stop on finding record
cost = (b_r/ 2) block transfers + 1 seek
Linear search can be applied regardless of
selection condition or
ordering of records in the file, or
availability of indices

Algorithm A2 (binary search): This algorithm is applicable if selection is an equality comparison on the attribute on which file is ordered.

Assume that the blocks of a relation are stored contiguously.
Cost estimate or in other words we can say that the number of disk blocks to be scanned:
cost of locating the first tuple by a binary search on the blocks
⌈log₂ (b_r)⌉ * (t_r + t_s)
If there are multiple records satisfying selection then add transfer cost of the number of blocks containing records.

Index scan - This search algorithms that use an index. The selection condition must be on search-key of index.

Algorithm A3 (primary index on candidate key, equality): This algorithm retrieves a single record which satisfies the corresponding equality condition.

Cost = ( h_i + 1) * (t_T + t_S)

Algorithm A4 ( primary index on non-key, equality): This algorithm retrieves multiple records. The records will be on consecutive blocks.

Let b = number of blocks containing matching records.
h_i * (t_T + t_S) +t_S +t_T* b

Algorithm A5 (equality on search-key of secondary index): This algorithm retrieves a single record if the search-key is a candidate key whose cost =( h_i + 1) * (t_T + t_S). It retrieves multiple keys if the search-key is not a candidate key where each of n matching records may be on a different block whose cost =( h_i + n) * (t_T + t_S). This can be expensive. It can implement selections of the form σ_{A≤ V}(r) or σ_A_{≥ V}(r) by using a linear file scan or binary search or by using indices.

Algorithm A6 (primary index, comparison): For σ_{A≤ V}(r), use index to find first tuple≥ v and scan relation sequentially from there. For σ_A_{≥ V}(r) just scan the relation sequentially till first tuple >v and do not use index.

Algorithm A7 (secondary index, comparison): For σ_{A≤ V}(r), use index to find first index entry ≥ v and scan index sequentially from there. For σ_A_{≥ V}(r) just scan leaf pages of index finding pointers to records, till first entry >v. In either case, retrieve records that are pointed to

requires an I/O for each record
Linear file scan may be cheaper

Sorting

Sorting may be requested by the query for example order by. It is an important preprocessing steps for other queries such as those that involve a join. If records are completely in main memory then standard sorting algorithms such as quick sort apply. Otherwise some records are still on disk resulting in what is called external sorting. Common external sorting techniques is an external short - merge which cumulatively sorts multiple runs of the data based on amount that fits in memory at one time.

Join Operation

There are several different algorithms to implement joins and they are as follows:

Nested-loop join
Merge-join
Hash-join

Choices to implement joins are taken based on the cost estimations.

Nested-Loop Join

To compute the theta join, râ‹ˆθ S

for each tuple t_rin r do begin

for each tuple t_sin s do begin

test pair (t_r,t_s) to see if they satisfy the join condition θ

if they do add t_r* t_sto the result.

end

Here, r is called the outer relation and s is called the inner relation of the join. The nested-loop join does not require any indices and can be used with any kind of join condition. It is expensive since it examines every pair of tuples in the two relations. In the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is n_r * b_s+ b_r block transfers, plus n_s + b_rseeks. If the smaller relation fits entirely in memory, it is used as the inner relation. Doing this, the cost is reduced to b_r+ b_sblock transfers and 2 seeks. Assuming the availability of worst case memory the cost estimate is

with depositor as outer relation
5000 * 400 + 100 = 2,000,100 block transfers,
5000 + 100 = 5100 seeks
with customer as outer relation
10000 * 100 + 400 = 1,000,400 block transfers and 10,400 seeks

If smaller relation (depositor) fits entirely in memory, the cost estimate will be 500 block transfers.

Examples of Nested-loop join costs

Compute depositor customer, with depositor as the outer relation. Let a customer have a primary B⁺-tree index on the join attribute customer-name, which contains 20 entries in each index node. Since the customer has 10,000 tuples, the height of the tree s 4, and one more access is needed to find the actual data. The depositor has 500 tuples.

Cost of block nested loops join
400 * 100 + 100 = 40,100 block transfers + 2 * 100 = 200 seeks. Hence this can be considered as the worst case memory and it can be significantly less with more memory.
Cost of indexed nested loops join
+ 5000 * 5 = 25,100 block transfers and seeks. The cost of CPU is likely to be less than that for block nested loop joins.

Merge-Join

This type of join sorts both relations on their join attributes (if not already sorted on the join attributes). Then it merges the sorted relations to join them

Join step is similar to the merge stage of the sort- merge algorithm.
The main difference is to handle duplicate values in join attribute where every pair with same value on join attribute must be matched.

This join can be used only for equi-joins and natural joins. Each block needs to be read only once (assuming all tuples for any given value of the join attribute fits in memory). Thus the cost of merge join is:

b_r + b_sblock transfers + ⌈b_r / b_b⌉+ ⌈b_s / b_b⌉seeks. + The costs of sorting if relations are unsorted.

If one relation is sorted and the other relation has a secondary B⁺- tree index on the join attributes then such join is referred to as hybrid merge-join.

Merge the sorted relation with the leaf entries of the B⁺-tree
Sort the result on the addresses of the unsorted relation's tuples.
Scan the unsorted relation in physical address order and merge with previous result, to replace addresses by the actual tuples. Sequential scan is more efficient than the random lookup

Hash-Join

Hash-Join is applicable for equi-joins and natural joins. A hash function h is used to partition tuples of both relations. h maps JoinAttrs values to {0, 1, ......, n}, where JoinAttrs denotes the common attributes of r and s used in the natural join. r₀,r₁,....., r_ndenotes partitions of r tuples. Each tuple t_r∈ r is put in partition r_iwhere i = h (t_r[JoinAttrs]).

r₀, r₁,....., r_ndenotes partitions of s tuples. Each tuple t_s∈ s is put in partition s_iwhere i = h (t_s[JoinAttrs]).

r tuples in r_ineed only to be compared with s tuples in s_i and it does not need to be compared with s tuples in any other partition, since an r tuple and an s tuple that satisfy the join condition will have the same value for the join attributes and ifthis value is hashed to some value i then the r tuple has to be in r_i and the s tuple in s_i.

Example of Cost of Hash-Join: customerâ‹ˆdepositor,

Assume that memory size is 20 blocks
b_depositor = 100 and b_customer = 400.
Depositor is to be used as built input. Partition it into five partitions, each of size 20 blocks. This partitioning can be done in one pass.
Similarly partition the customer into five partitions, each of size 80. This is also done in one past.
Therefore if we ignore the cost of writing partially filled blocks then the total cost is:
3 (100 + 400) = 1500 block transfers + 2 (⌈ 100/3⌉ +⌈ 400/3⌉) = 336 seeks.

References:

H.F.Korth and A. Silberschatz,"Database system concepts",McGraw Hill,2010
A.K.Majumdar and p, Bhatt acharya,"Database Management Systems",Tata McGraw Hill,India,2004
F.Korth, Henry. Database System Concepts. 6th edition.

Lesson

Query Processing and Optimization

Subject

Computer Engineering

Grade

Engineering

Recent Notes

No recent notes.

Related Notes

No related notes.