SQL Server supports a mechanism called read-ahead,
whereby the need for data and index pages can be anticipated and pages can be
brought into the buffer pool before they’re actually needed.
This performance optimization allows large amounts of data to be processed effectively. Read-ahead is managed completely internally, and no configuration adjustments are necessary.
There are two
kinds of read-ahead: one for table scans on heaps and one for index ranges. For
table scans, the table’s allocation structures are consulted to read the table
in disk order. Up to 32 extents (32 * 8 pages/extent * 8,192 bytes/page = 2 MB)
of read-ahead may be outstanding at a time. Four extents (32 pages) at a time
are read with a single 256-KB scatter read. If the table is spread across
multiple files in a file group, SQL Server attempts to distribute the read-ahead
activity across the files evenly.
For index ranges, the scan uses level 1
of the index structure (the level immediately above the leaf) to determine
which pages to read ahead. When the index scan starts, read-ahead is invoked on
the initial descent of the index to minimize the number of reads performed. For
instance, for a scan of WHERE state = ‘WA’,
read-ahead searches the index for key = ‘WA’,
and it can tell from the level-1 nodes how many pages must be examined to
satisfy the scan. If the anticipated number of pages is small, all the pages
are requested by the initial read-ahead;
if the pages are noncontiguous, they’re fetched in scatter reads. If the range
contains a large number of pages, the initial read-ahead is performed and
thereafter, every time another 16 pages are consumed by the scan, the index is
consulted to read in another 16 pages. This has several interesting effects:
Small ranges can be processed in a single read at the data
page level whenever the index is contiguous.
The scan range (for example, state = ‘WA’) can be used to prevent reading ahead of
pages that won’t be used because this information is available in the index.
Read-ahead is not slowed by having to follow
page linkages at the data page level. (Read-ahead can be done on both clustered
indexes and nonclustered indexes.)