Wednesday, August 27, 2014

Indexes in SQL Server

In this article, I am giving a quick overview about INDEX in SQL Server. After completing this article you will understand:

What is meant by INDEX?
What are different types of INDEX in SQL Server?
How to create INDEX?
How to DROP INDEX?
How to Enable-Disable INDEX?
What is difference between clustered and non-clustered indexes?
What are the disadvantages of indexes?

Please give your valuable suggestions and feedback to improve this article.

What is meant by INDEX?

Index is a database object which is used to improve query performance for data retrieval but it also adds some burden on data modification queries such as INSERT, UPDATE & DELETE. So it is advised to use correct Index type. If your table gets many data modification request then it is advised not to have Clustered index on such table, because it will slow down you DML operations. Index can be created on one or more column (max 16 columns) on existing table or views. When we create the index, it reads the column’s data and forms a relevant data structure to minimize the number of data comparison. Indexes are automatically maintained for a table or view whenever the table data is modified.

Let’s understand Index with some real world examples.

Example 1

Suppose A table CUSTOMERS has around 20 Million records. When we try to retrieve records for two or three customers based on their customer id, all 20 Million records are taken and comparison is made to get a match on the supplied customer id. Just think about now how much time that will take if it is a web based application and there are 30-50 customers that want to access their data through internet.
Does the data server do 20 Million X 30 searches?  The answer is NO because all modern database use the concept of Index.

Example 2

Suppose you have a book which has thousands of pages, Thousands of pages are divided into many chapters. Each chapter has many pages and each page has related topic written, if you want to read a particular topic then how do you find it in your book? Answer is you just look into the Table of Content which is always present in the beginning from Chapter 1.This Table of contents helps you search your topics in all topics and then you jump to a particular page out of those thousands pages. So this Table of Content is nothing but Index for your book. You better know how helpful are those table of content 10-20 pages, if it was not there then it would have been very difficult to search needed topic in those thousands of pages. Table of content will always be created after completing the book.

Let’s relate this understanding with Database, now you assume your Book as a Database Table and Table of Content as Index. After creating and populating a Table or view, you create index on it. Once you create an index on a field, automatically another data structure will be created which holds the field value and pointer to the record it relates to. This index structure is then sort and stores the data rows in the database which allows Binary search on it, which means these Indexes require additional space on the disk, and get stored with a table using MYISAM engine.

Summary

1. Like an index in a book, an index in a database lets you quickly find specific information in a table or indexed view.
2. An Index contains keys built from one or more columns in a Table, or view and pointers that map to the storage location of the specified data.
3. These keys are stored in structure (B-Tree) that enables SQL Server to find the rows associated with the key values quickly and efficiently.
4. We can significantly improve the performance of database queries and applications by creating well designed indexes to support your queries.
5. Indexes can reduce the amount of data that must be read to return the query result set.
6. Indexes can also enforce uniqueness on the rows in a table, ensuring the data integrity of the table data.

What are different types of INDEX in SQL Server?

1. Clustered Index
2. Non Clustered Index
3. Unique
4. Filtered
5. Xml
6. Full Text
7. Spatial
8. Columnstore
9. Index with Included Columns
10. Index on Computed Columns

In this article we will focus only on Clustered and Non Clustered Index.

How to create INDEX?

We can create Index on an empty table or one that contains data, creating an index on an empty table has no performance implications at the time of Index is creation, however performance will be affected when data is added to the table.

Creating Index on large table should be planned carefully so that database performance is not hindered. The best way to create Index on large table is to first create Clustered Index and then create any Non Clustered Index.

When a table has no Clustered Index then its data will be stored in unsorted way and it will be called as Heap. When you insert a new record in Heap, it will be inserted in any random position in the table and position depends on the page free space. If A Clustered index is created on a heap with several existing nonclustered indexes, all the nonclustered indexes must be rebuilt so that they contain the clustering key value instead of the row identifier (RID).

Syntax to Create Index

CREATE [UNIQUE] CLUSTERED |NONCLUSTERED INDEX index_name ON table_name|view_name (Column ASC|DESC [,..n])

Uniqueness can be a property of both clustered and nonclustered indexes. If UNIQUE and CLUSTERED keywords are missing in CREATE INDEX command then it will create Non-Unique Non Clustered Index.

Clustered Index

1. When Clustered Index is created it first sorts the data rows and then stores the data rows in the table or view based on their key values.
2. Whenever new records come into the table, it gets fit to its position based on the key values that is the reason INSERT, UPDATE and DELETE are slow on INDEXED table because it will take some time to rearrange the data Pages.
3. When we create Primary Key constraint in a Table and if Clustered Index is not present on the table then it will automatically create Clustered Index on primary key column with the same name as the Primary Key constraint has.
4. We can have only 1 Clustered index on a Table.
5. Reason for having only 1 Clustered Index on a table is the data rows themselves can be sorted in only one order either ASC or DESC.
6. When a table has Clustered Index then it will be called as Clustered Table.
7. When creating the Clustered Index, SQL Server reads the Primary Key column values and forms a Binary Tree on it. This Binary Tree information is then stored separately in the disc, you can find Index created on Primary Key in database Index Folder.
8. Clustered Index can be created on Non - Primary Key columns. When we create Primary Key later in that case it will not create Clustered index because it is already available, it will create Unique Non Cluster Index.
9. Uniqueness is the property of Clustered Index and Non Clustered Index.
10. When we drop the Index, It leaves the data in sorted order and if new records come then it will be inserted at the random position in the table.
11. Clustered Index leaf nodes contains the Data Pages means row’s data.

Let’s go through with some example to understand Clustered Index

Let’s create a table name as EMP in our tutorial database TEACHMESQLSERVER

USE TEACHMESQLSERVER
IF OBJECT_ID('EMP','U') IS NOT NULL
DROP TABLE EMP
CREATE TABLE EMP(EMPID INT NOT NULL)

As you can see from the below image right now EMP table doesn’t has Index. It is because we have not created the INDEX or we have not defined any Unique or Primary Key constraint on EMPID column at the time of table creation.
Let’s insert few records in the EMP table.

INSERT INTO EMP VALUES(1),(3),(2),(0)
GO
SELECT * FROM EMP


As you can see EMPID was not sorted and stored because it doesn’t have Clustered Index on EMP Table. I have already discussed that we can also create Clustered Index on Non Primary Key columns.

Let’s Create Clustered Index on EMPID column in EMP table.


CREATE CLUSTERED INDEX EMPIDINDEX ON EMP (EMPID ASC)


As you can see now EMP table has one Clustered Index name as EMPIDINDEX.

Now let’s retrieve all the records we had inserted last time from EMP table.

GO
SELECT * FROM EMP


As you can see once after creating Clustered Index we retrieved the data, it is coming in sorted form. It is because at the time of Clustered Index creation we had given sorting Order as ASC. Data can be sort either ASC or DESC. This is the main reason we can have only one Clustered Index on a Table.
Now let’s try to create one more Clustered Index. We already know the answer that SQL Server will not allow to create another Clustered Index on EMP table.

CREATE CLUSTERED INDEX EMPIDINDEX1 ON EMP (EMPID ASC)

Here you go – got error message

Msg 1902, Level 16, State 3, Line 1
Cannot create more than one clustered index on table 'EMP'. Drop the existing clustered index 'EMPIDINDEX' before creating another.

Let’s try to insert new records

INSERT INTO EMP VALUES (8)
INSERT INTO EMP VALUES(6)
GO
SELECT * FROM EMP

As you can see from the above result output, records are coming in ASC order even after we inserted 8 before 6.

As I have already discussed if our table already has Clustered Index and Primary key constraint was not defined at the time of table creation, if we create Primary Key constraints using Alter command on EMP table then it will never create a Clustered Index, It will create a Unique Non Clustered Index.

Let’s create Primary Key constraint on EMPID column.

ALTER TABLE EMP ADD CONSTRAINT PRIMARYKEY PRIMARY KEY (EMPID)


As you can see PRIMARYKEY INDEX is a Unique Non Clustered Index.

You can find all Indexes list in SQL Server from SYS.INDEXES table.

SELECT * FROM SYS.INDEXES

Dropping index will not affect the sorted stored records. Let’s Drop the INDEX. I will discuss how to DROP Index in this article, if DROP command looks strange here then don’t worry.

DROP INDEX EMPIDINDEX ON EMP
ALTER TABLE EMP DROP CONSTRAINT PRIMARYKEY
GO
SELECT * FROM EMP

As you can see Result set is same after dropping the Index. Now let’s insert new records in EMP table.

INSERT INTO EMP VALUES (7)
GO
SELECT * FROM EMP

As you can see from the result set new records were inserted at the end not its key value position.

NonClustered Index
1.  We can have many Nonclustered (max 999) index on a Table.
2.  Uniqueness is the property of Clustered Index and Non Clustered Index.
3.  By default a Nonclustered Index is created if Clustered or Unique options are not specified in the Create Index command.
4.  A Nonclustered index never store data in sorted order if your table has more than 1 column, ASC and DESC keywords will be ignored if used at the time of Nonclustered index creation. If your table has only column then only it will store data in sorted order.
5.  A Nonclustered Index is stored separately from the table, so additional space is required. The amount of space required will depend on the size of table and the number and types of columns used in index.
6.  A Nonclustered Index contains the nonclustered Index key values and each key value entry has a pointer to the data row that contains the key value.
7.  The pointer from an index row in a nonclustered index to a data row is called a row locator.
8.  The Structure of the row locator depends on whether the data pages are stored in a heap or a clustered table. For a heap, a Row locator is a pointer to the row. For a clustered Index, the row locator is the clustered index key.
9. Whenever you create constraint, nonclustered index will be created.

Let’s go through with some example to understand Nonclustered Index

Let’s create a table name as EMP in our tutorial database TEACHMESQLSERVER

USE TEACHMESQLSERVER
IF OBJECT_ID('EMP','U') IS NOT NULL
DROP TABLE EMP
CREATE TABLE EMP(EMPID INT NOT NULL, DEPID INT NOT NULL)


 As you can see from the above image right now EMP table doesn’t has any Index. It is because we have not created any INDEX or we have not defined any constraint at the time of table creation.

Let’s insert few records in the EMP table.

INSERT INTO EMP VALUES(1,11),(3,5),(2,4),(0,7),(1,1)
GO
SELECT * FROM EMP


As you can see from the above result set records are not stored in sorted order. It was stored in the sequence it was written in query.

Let’s Create Nonclustered Index on EMPID column in EMP table.

CREATE NONCLUSTERED INDEX EMPIDINDEX ON EMP (EMPID ASC)
GO
SELECT * FROM EMP



As you can see from the above image Non-Uique, Non-Clustered index has been created.

Now let’s retrieve all the records we had inserted last time from EMP table.

GO
SELECT * FROM EMP


As you can see from the above result set, records were not stored in sorted ordered even after defining the ordering type (ASC) at the time of EMPIDINDEX Index creation.

Now let’s create one more Nonclustered Index on DEPID column in EMP table. We have already discussed that we can create many nonclustered index on a table.

CREATE NONCLUSTERED INDEX DEPIDINDEX ON EMP (DEPID DESC)
GO
SELECT * FROM EMP

Now once again let’s retrieve all the records we had inserted last time from EMP table.

GO
SELECT * FROM EMP

As you can see from the above result set, records were not stored in sorted ordered even after defining the ordering type (DESC) at the time of DEPIDINDEX Index creation. This clearly shows that Nonclustered Index never stores the data in any order.

Let’s insert a new record in EMP table

INSERT INTO EMP VALUES (-2,6)
GO
SELECT * FROM EMP

As you can see from the above result set, new record was inserted at the end of all records.

What is difference between clustered and non-clustered indexes?

1.   A table can have multiple non-clustered indexes. But, a table can have only one clustered index.
2.  A clustered index can be a disadvantage because any time a change is made to a value of an indexed column, the subsequent possibility of re-sorting rows to maintain order is a definite performance hit.
3.   A Clustered Index is faster than a Nonclustered Index but in some condition Non-Clustered index would perform better. You can refer to this article.
4.  If Clustered Index is disabled then you can’t perform DML operation, but if Nonclustered index is disabled then also you can perform DML operation.
5.   Clustered Index determines the Storage Order of rows in the table, hence doesn’t require additional disk space but where as a Nonclustered Index is stored separately from the table, so additional space is required.
6.   A clustered index determines the order in which the rows of the table will be stored on disk (asc or desc) – and it actually stores row level data in the leaf nodes of the index itself. A non-clustered index has no effect on which the order of the rows will be stored.
7.   Using a clustered index is an advantage when groups of data that can be clustered are frequently accessed by some queries. This speeds up retrieval because the data lives close to each other on disk. Also, if data is accessed in the same order as the clustered index, the retrieval will be much faster because the physical data stored on disk is sorted in the same order as the index.
8.   Non clustered indexes store both a value and a pointer to the actual row that holds that value. Clustered indexes don’t need to store a pointer to the actual row because of the fact that the rows in the table are stored on disk in the same exact order as the clustered index – and the clustered index actually stores the row-level data in it’s leaf nodes.

How to DROP INDEX?
Implicitly created index can’t be dropped using DROP INDEX command which means you can’t drop the Indexes if Indexes were created automatically by any of Primary Key or Unique constraint. To drop the index you have to drop the constraint it will automatically delete the Index.
If a clustered index is dropped on a table that has several nonclustered indexes, the nonclustered indexes are all rebuilt as part of the DROP operation. This may take significant time on large tables.

How to Enable-Disable INDEX?

DROP INDEX index_name ON table_name|view_name

Syntax to Disable Index

You can disable a particular Index using below query syntax
ALTER INDEX Index_name ON table_name|view_name  DISABLE

You can disable all available Index using below query syntax
ALTER INDEX ALL ON table_name|view_name  DISABLE

When you disable the clustered index then you can’t perform DML (INSERT, UPDATE & DELETE) operation. If you try to do then you will get below error message

Msg 8655, Level 16, State 1, Line 1
The query processor is unable to produce a plan because the index ‘Index_name’on table or view 'table_name or View_name' is disabled.

You can perform DML operation if your nonclustered index is disabled.

Syntax to Enable Index

You can enable a particular Index using below query syntax
ALTER INDEX Index_name ON table_name|view_name  REBUILD

You can enable all available Index using below query syntax
ALTER INDEX ALL ON table_name|view_name  REBUILD

What are the disadvantages of indexes?

Additional Disk Space – Clustered Index does not require any additional storage. Every Non-Clustered index requires additional space as it is stored separately from the table. The amount of space required will depend on the size of the table, and the number and types of columns used in index.

Insert, Update and Delete Operation can become Slow  When DML statements modify data in a table, the data in all the indexes also needs to be updated. Indexes can help, to search and locate the rows, that we want to delete, but too many indexes to update can actually hurt the performance of data modifications.

Click here to know how to create index on View.

I hope you enjoyed the article don't forget to share on Facebook. Please leave your comments below if there are any questions.


3 comments:

Anonymous said...

Good effort but a few inaccuracies in the article -

Ordered return of data, even with a clustered index, is not guaranteed without an ORDER BY clause.

Please research more on these two statements from your article. The first one is not always true, NC can be faster than CI in many cases. The second one is incorrect because CI (or any physical construct) needs disk space -
3. A Clustered Index is faster than a Nonclustered Index.
5. Clustered Index determines the Storage Order of rows in the table, hence doesn’t require additional disk space

Unknown said...

Thanks for your comment.

I will definitely look into your suggestions and I will also update the content if required.

ganng said...

Nyce thanks