Clustered Index Scan vs Clustered Index Seek in SQL Server

Introductions 

Writing a select statement in the SQL Server may do 2 types of operations
  •     Table Scan
  •     Index Scan
If the table is a small one, you are writing the select query upon, then table scan may not be a problem. But when your select query is operating upon a table that contain some millions of records then it might be a problem as each time you are trying to select something then the table scan is performed in the background.

Just consider a simple case of GMAIL. There are approx. 50 millions of user ( lets say ) of Google, so in each and every time when a user tries to log in to his/her account then you think what will be happening in the background ? If we assume 50,000 users try to log in to their webmail at a single time then what will be the count that the [User] table will be executed ??  Is it be 50 millions X 50,000 ?

Here is the problem with table scan, so we need to decrease the scan count and for which the Indexing functionality comes to action.

Here in this article I will explain the difference between Clustered Index Scan and Clustered Index Seek. And their usefulness depending on their execution plan and execution cost.

Before we proceed you need the AdventureWorks2008R2 Database installed in your SQL Server 2008 R2. Download the database here from Codeplex Download

And download the query presenting all the things I am going to explain here from the below download link.



Behind The Scene 

Clustered Index Scan:

      As scan means touching through all the records present, hence while you are performing any clustered index scan then it touches all the rows present in the table. 
      Here the cost is  proportional to the total no of rows present in the table. Let you have written a select statement

---------------------------------
-- Culstered Index Scan and Culstered Index Seek
---------------------------------
-- CTRL + M
-- Build Sample Example

SET STATISTICS IO ON 
GO 
SELECT * FROM MySalesOrdreDetails
-- logical reads 1495
GO

This select statement will select all the rows present in the MySalesOrderDetails table. and to see the execution plan please press CTRL + M while executing the above query. Here I have written the STATISTICS IO ON  in order to see the IO cost.

While executing the above query I found a logical read of 1495.

Consider another case ;

 Now, just change the above query and add a WHERE clause there

---------------------------------
-- Culstered Index Scan and Culstered Index Seek
---------------------------------
-- CTRL + M
-- Build Sample Example

SET STATISTICS IO ON 
GO 
SELECT * FROM MySalesOrdreDetails
WHERE SalesOrderID = 60726 AND SalesOrderDetailID = 74616
-- logical reads 1495
GO

Here if you look into the message tab after executing this then also you will find that the logical read is the same 1495. Now the question arises why after giving a condition why the logical read comes to 1495.
   Well, the answer is that in both cases if you look into the execution plan then you will find in both cases a "Table Scan" opertaion is running in the back end. that's why the logical read counts remains the same in both cases.

Clustered Index Seek

  As seek means selecting a particular( qualifying ) record form a group of records. Hence, when you perform a clustered index seek in sql server the compiler will touch only the qualifying rows.
      The clustered index seek happens in a table when the table got a primary key and upon which the table is re structured in to a B-Tree while execution, returning a lower logical reads.

Now lets add the clustered key to the table you have created before ( MySalesOrderDetails )

---------------------------------
-- Culstered Index Scan and Culstered Index Seek
---------------------------------
-- CTRL + M
-- Build Sample Example
-- Create clustered Index
ALTER TABLE MySalesOrdreDetails
ADD CONSTRAINT [PK_MySalesOrderDetail_SalesOrderID_SalesOrderDetailID]
PRIMARY KEY CLUSTERED
(
[SalesOrderID] ASC,
[SalesOrderDetailID] ASC
)

 After altering the table now go ahead and do the same thing we did previously, Run the First Script
 ( select * from MySalesOrdreDetails )  here you will see the logical reads as 1501. A bit more than the previous one the more 6 steps are added during the creation of the initial node and the transition between the nodes of the B-Tree.

If you notice the execution plan over there then you will find that there is no table scan is done here instead a clustered index scan is done here.

Now run the second script ( with where cluse )  you will find a logical read of 3 only. and if you mark in the execution plan then  there will find this as a clustered index seek operation.

Note :

 

While using SQL Server we should think about the query optimization, I mean the efficiency of the SQL query in terms of CPU cost, IO cost and execution time.

So It is a better idea to implement the clustered indexing ( Primary Key ) in the tables, so that in each case a request for selecting any document goes to the server, it does not need to scan all the records present instead the qualifying ones.


Happy Coding...

No comments:

Post a Comment