Relational
engine is also called the query processor. It includes the components of SQL
Server that determine exactly what your query needs to do and the best way to
do it. The relational engine is shown as two primary components: Query Optimization and Query
Execution.
By far the most complex component of the query processor, and maybe
even of the entire SQL Server product, is the Query Optimizer, which determines
the best execution plan for the queries in the batch.The
relational engine also manages the execution of queries as it requests data
from the storage engine and processes the results returned. Communication
between the relational engine and the storage engine is generally in terms of
OLE DB row sets. (Row set is the OLE DB term for a result
set.) The storage engine comprises the
components needed to actually access and modify data on disk.
The Query Optimizer
The
Query Optimizer takes the query tree from the command parser and prepares it
for execution. Statements that can’t be optimized, such as flow-of-control and
Data Defi nition Language (DDL) commands, are compiled into an internal form.
The statements that are optimizable are marked as such and then passed to the
Query Optimizer. The Query Optimizer is mainly concerned with the Data
Manipulation Language (DML) statements SELECT,
INSERT, UPDATE, and
DELETE, which can be processed in more than one
way, and it is the Query Optimizer’s job to determine which of the many
possible ways the best is. It compiles an entire command batch, optimizes
queries that are optimizable, and checks security. The query optimization and
compilation result in an execution plan. The first step in producing such a plan
is to normalize each query, which potentially breaks down a single query
into multiple, fine-grained queries. After the Query Optimizer normalizes a
query, it optimizes it, which means that it determines a plan for executing that
query. Query optimization is cost-based; the Query Optimizer chooses the plan
that it determines would cost the least based on internal metrics that include
estimated memory requirements, CPU utilization, and number of required I/Os.
The
Query Optimizer considers the type of
statement requested, checks the amount of data in the various tables
affected, looks at the indexes available for each table, and then looks at a
sampling of the data values kept for each index or column referenced in the
query. The sampling of the data values is called distribution statistics.
Based on the available information, the Query Optimizer considers the various
access methods and processing strategies that it could use to resolve a query
and chooses the most cost-effective plan. The Query Optimizer also uses pruning
heuristics to ensure that optimizing a query doesn’t take longer than it would
take to simply choose a plan and execute it. The Query Optimizer doesn’t
necessarily perform exhaustive optimization. Some products consider every
possible plan and then choose the most cost-effective one.
The
advantage of this exhaustive optimization is that the syntax chosen for a query
theoretically never causes a performance difference, no matter what syntax the
user employed. But with a complex query, it could take much longer to estimate
the cost of every conceivable plan than it would to accept a good plan, even if
it is not the best one, and execute it. After normalization and optimization
are completed, the normalized tree produced by those processes is compiled into
the execution plan, which is actually a data structure. Each command included
in it specifies exactly which table will be affected, which indexes will be used
(if any), which security checks must be made, and which criteria (such as
equality to a specified value) must evaluate to TRUE for selection. This
execution plan might be considerably more complex than is immediately apparent.
In addition to the actual commands, the execution plan includes all the steps
necessary to ensure that constraints are checked. Steps for calling a trigger
are slightly different from those for verifying constraints. If a trigger is
included for the action being taken, a call to the procedure that comprises the
trigger is appended. If the trigger is an instead-of trigger, the call to the trigger’s plan replaces the actual
data modification command. For after triggers, the trigger’s plan is branched to right after the
plan for the modification statement that fired the trigger, before that
modification is committed. The specific steps for the trigger are not compiled
into the execution plan, unlike those for constraint verification.
A
simple request to insert one row into a table with multiple constraints can
result in an execution plan that requires many other tables to be accessed or
expressions to be evaluated as well. In addition, the existence of a trigger
can cause many more steps to be executed. The step that carries out the actual INSERT statement might
be just a small part of the total execution plan necessary to ensure that all
actions and constraints associated with adding a row are carried out.
The Query Executor
The query
executor runs the execution plan that the Query Optimizer produced, acting as a
dispatcher for all the commands in the execution plan. This module steps
through each command of the execution plan until the batch is complete. Most of
the commands require interaction with the storage engine to modify or retrieve
data and to manage transactions and
locking.
The Storage Engine
The SQL Server storage engine includes all the components involved with the accessing and managing of data in your database. In SQL Server 2008, the storage engine is composed of three main areas: access methods, locking and transaction services, and utility commands.
Access Methods
When SQL Server needs to locate data, it calls the access methods code. The access methods code sets up and requests scans of data pages and index pages and prepares the OLE DB row sets to return to the relational engine. Similarly, when data is to be inserted, the access methods code can receive an OLE DB row set from the client. The access methods code contains components to open a table, retrieve qualified data, and update data. The access methods code doesn't actually retrieve the pages; it makes the request to the buffer manager, which ultimately serves up the page in its cache or reads it to cache from disk. When the scan starts, a look-ahead mechanism qualifies the rows or index entries on a page. The retrieving of rows that meet specified criteria is known as a qualified retrieval. The access methods code is employed not only for SELECT statements but also for qualified UPDATE and DELETE statements (for example, UPDATE with a WHERE clause) and for any data modification operations that need to modify index entries. Some types of access methods are listed below.
Row and Index Operations
You can consider row and index operations to be components of the access methods code because they carry out the actual method of access. Each component is responsible for manipulating and maintaining its respective on-disk data structures—namely, rows of data or B-tree indexes, respectively. They understand and manipulate information on data and index pages.
The row operations code retrieves, modifies, and performs operations on individual rows. It performs an operation within a row, such as “retrieve column 2” or “write this value to column 3.” As a result of the work performed by the access methods code, as well as by the lock and transaction management components (discussed shortly), the row is found and appropriately locked as part of a transaction. After formatting or modifying a row in memory, the row operations code inserts or deletes a row. There are special operations that the row operations code needs to handle if the data is a Large Object (LOB) data type—text, image, or ntext—or if the row is too large to fit on a single page and needs to be stored as overflow data.
The index operations code maintains and supports searches on B-trees, which are used for SQL Server indexes. An index is structured as a tree, with a root page and intermediate-level and lower-level pages. (If the tree is very small, there might not be intermediate-level pages.) A B-tree groups records that have similar index keys, thereby allowing fast access to data by searching on a key value. The B-tree’s core feature is its ability to balance the index tree. (B stands for balanced.) Branches of the index tree are spliced together or split apart as necessary so that the search for any given record always traverses the same number of levels and therefore requires the same number of page accesses.