Evaluation of Expressions

The algorithms that are shown so far pertains to individual relational algebra operations. In reality a typical query combines them. For example: select customer_name from account natural inner join customer where balance < 2500 consists of a selection, a natural join and a projection. There are two general approaches for this and they are materialization and pipelining. Materialization executes a single operation at a time which generates a temporary file that will be used as an input to the next operation. This is also considered as a time consuming approach because it generates and sorts many temporary files. The materialized evaluation walks the parse or expression tree of the relational algebra operation and performs the innermost or leaf-level operations first. The intermediate result of each operation is materialized which is an actual but temporary relation. Hence it becomes an input for subsequent operations. The cost of the materialization is the sum of the individual operations plus the cost of writing the intermediate results to disk. With the use of pipelined evaluation, operations form a queue and results are passed from one operation to another as they are calculated. Hence this technique’s name is pipelining. This method avoids write-outs of entire intermediate relations. The general approach of pipelining is to restructure the individual operation algorithms so that they take streams of tuples as both input and output. As the result tuples from one operation are produced, they are provided as input for parent operations. So there is no need to store temporary files to disk which makes pipelining much cheaper than materialization. For pipelining to be effective, we need to use evaluation algorithms that generate output tuples even as tuples are received for inputs to the operation. Pipelines can be executed in two ways: demand driven and producer driven. In demand driven or lazy evaluation, systems repeatedly requests next tuple from top level operation. Each operation requests next tuple from children operations as required in order giving out its next tuple. In between calls, operation has to maintain “state” so it knows what to return next. In produce-driven or eager pipelining, operations produce tuples eagerly and pass them up to their parents. While the buffer is maintained between operators child puts tuples in buffer and parent removes tuples from buffer. If the buffer is full, child waits till there is space in the buffer and then generates more tuples. System schedules operations that have space in output buffer and can process more tuples.

Summary

The algorithms that are shown so far pertains to individual relational algebra operations. In reality a typical query combines them. For example: select customer_name from account natural inner join customer where balance < 2500 consists of a selection, a natural join and a projection. There are two general approaches for this and they are materialization and pipelining. Materialization executes a single operation at a time which generates a temporary file that will be used as an input to the next operation. This is also considered as a time consuming approach because it generates and sorts many temporary files. The materialized evaluation walks the parse or expression tree of the relational algebra operation and performs the innermost or leaf-level operations first. The intermediate result of each operation is materialized which is an actual but temporary relation. Hence it becomes an input for subsequent operations. The cost of the materialization is the sum of the individual operations plus the cost of writing the intermediate results to disk. With the use of pipelined evaluation, operations form a queue and results are passed from one operation to another as they are calculated. Hence this technique’s name is pipelining. This method avoids write-outs of entire intermediate relations. The general approach of pipelining is to restructure the individual operation algorithms so that they take streams of tuples as both input and output. As the result tuples from one operation are produced, they are provided as input for parent operations. So there is no need to store temporary files to disk which makes pipelining much cheaper than materialization. For pipelining to be effective, we need to use evaluation algorithms that generate output tuples even as tuples are received for inputs to the operation. Pipelines can be executed in two ways: demand driven and producer driven. In demand driven or lazy evaluation, systems repeatedly requests next tuple from top level operation. Each operation requests next tuple from children operations as required in order giving out its next tuple. In between calls, operation has to maintain “state” so it knows what to return next. In produce-driven or eager pipelining, operations produce tuples eagerly and pass them up to their parents. While the buffer is maintained between operators child puts tuples in buffer and parent removes tuples from buffer. If the buffer is full, child waits till there is space in the buffer and then generates more tuples. System schedules operations that have space in output buffer and can process more tuples.

Things to Remember

  • The algorithms that are shown so far pertains to individual relational algebra operations. In reality a typical query combines them
  • There are two general approaches for this and they are materialization and pipelining.
  • Materialization executes a single operation at a time which generates a temporary file that will be used as an input to the next operation. This is also considered as a time consuming approach because it generates and sorts many temporary files.
  •  With the use of pipelined evaluation, operations form a queue and results are passed from one operation to another as they are calculated. Hence this technique’s name is pipelining. 
  • For pipelining to be effective, we need to use evaluation algorithms that generate output tuples even as tuples are received for inputs to the operation. 
  • Pipelines can be executed in two ways: demand driven and producer driven.
  • In demand driven or lazy evaluation, systems repeatedly requests next tuple from top level operation. Each operation requests next tuple from children operations as required in order giving out its next tuple. In between calls, operation has to maintain “state” so it knows what to return next.
  • In produce-driven or eager pipelining, operations produce tuples eagerly and pass them up to their parents. While the buffer is maintained between operators child puts tuples in buffer and parent removes tuples from buffer. If the buffer is full, child waits till there is space in the buffer and then generates more tuples. System schedules operations that have space in output buffer and can process more tuples.

MCQs

No MCQs found.

Subjective Questions

No subjective questions found.

Videos

No videos found.

Evaluation of Expressions

Evaluation of Expressions

Evaluation of Expressions

The algorithms that are shown so far pertains to individual relational algebra operations. In reality, a typical query combines them. For example: select customer_name from account natural inner join customer where balance < 2500 consists of a selection, a natural join, and a projection. There are two general approaches for this:

  • Materialization
  • Pipelining

As with the other algorithms, one is general but expensive while the other is more efficient but doesn't apply to all cases.

  • Materialization: This process executes a single operation at a time which generates a temporary file that will be used as an input to the next operation. This is also considered as a time-consuming approach because it generates and sorts many temporary files. The materialized evaluation walks the parse or expression tree of the relational algebra operation and performs the innermost or leaf-level operations first. The intermediate result of each operation is materialized which is an actual but temporary relation. Hence it becomes an input for subsequent operations. The cost of the materialization is the sum of the individual operations plus the cost of writing the intermediate results to disk. Here the disk is a function of the blocking factor or the number of records per block of the temporaries. For example: In the figure below, compute and storeσbalance<2500 (account); then compute and store its join with the customer, and finally compute the projecttion on customer-name.


    .

    Materialized application is always applicable. The cost of writing results to disk and reading them back can be quite high. The cost formulas for operations ignore cost of writing results to disk, so

    Overall cost = Sum of costs of individual operations + cost of writing intermediate results to disk

    Double buffering can be understood as the process of using two buffers for each operation when one is full write it to disk while the other is getting filled. It allows overlap of disk writes with computation and reduces execution time.

  • Pipelining

    The problem with materialization is that there are lots of temporary files and inputs/outputs. With the use of pipelined evaluation, operations form a queue and results are passed from one operation to another as they are calculated. Hence this technique’s name is pipelining. This method avoids write-outs of entire intermediate relations. The general approach of pipelining is to restructure the individual operation algorithms so that they take streams of tuples as both input and output. As the result tuples from one operation are produced, they are provided as input for parent operations. So there is no need to store temporary files to disk which makes pipelining much cheaper than materialization. However pipelining may not always be possible – e.g., sort, hash-join. For pipelining to be effective, we need to use evaluation algorithms that generate output tuples even as tuples are received for inputs to the operation. Pipelines can be executed in two ways: demand driven and producer driven.

    In demand driven or lazy evaluation, systems repeatedly request next tuple from the top level operation. Each operation requests next tuple from children operations as required in order giving out its next tuple. In between calls, operation has to maintain “state” so it knows what to return next.

    In produce-driven or eager pipelining, operations produce tuples eagerly and pass them up to their parents. While the buffer is maintained between operators child puts tuples in buffer and parent removes tuples from the buffer. If the buffer is full, a child waits till there is space in the buffer and then generates more tuples. The system schedules operations that have space in the output buffer and can process more tuples.

References:

  1. H.F.Korth and A. Silberschatz,"Database system concepts",McGraw Hill,2010
  2. A.K.Majumdar and p, Bhatt acharya,"Database Management Systems",Tata McGraw Hill,India,2004
  3. F.Korth, Henry. Database System Concepts. 6th edition.

Lesson

Query Processing and Optimization

Subject

Computer Engineering

Grade

Engineering

Recent Notes

No recent notes.

Related Notes

No related notes.