Index basically is a way to find the relevant data row faster. so sql provides us the features of using indexes.
There are two types of indexes basically:
Clustered Index: A clustered index is one of the main index types in SQL Server. A clustered index stores the index key in a B-tree structure along with the actual table data in each leaf node of the index. Having a clustered index defined on a table eliminates the heap table structure we described in the previous section. Since the rest of the table data (eg. non-key columns) is stored in the leaf nodes of the index structure, a table can only have one clustered index defined on it.
Clustered Index Advantages:
- speeding up query performance. Queries that contain the index key columns in the WHERE clause use the index structure to go straight to the table data
- removes the need for an extra lookup, to get the rest of the column data, when querying based on the index key values
- eliminate the need to sort data. If the ORDER BY clause of a query is based on the index key values then a sort is not required since the data is already ordered by these values.
Clustered Index Disadvantages:
- overhead in maintaining the index structure with respect to any DML operation (INSERT, UPDATE, DELETE). This is especially true if you are updating the actual key values in the index as in this case all of the associated table data also has to be moved as it is stored in the leaf node of the index entry. In each case there will be some performance impact to your DML query.
Non Clustered Index: In it the index key columns are stored in a B-tree structure except in this case the actual data is not stored in the leaf nodes. In this type of index, a pointer to the actual table data is stored in the leaf node. This could point to the data value in the clustered index or in a heap structure depending on how the table data is stored.
Non Clustered Index Advantages:
- multiple non-clustered indexes defined on a single table
- Less overhead for a non-clustered index when it comes to DML operations than its clustered counterpart
Non Clustered Index Disadvantages:
- extra overhead required in maintaining the index during DML operations. It can sometimes be tricky to balance query performance as having too many non-clustered indexes on a table, while they will help all of your SELECT queries, can sometimes really slow down DML performance.
Bullet Points of clustered and non clustered indexes:
- Clustered: This is the main index of any table.
- Any table can have only one clustered index.
- whenever we define a primary key, sql automatically creates the relevant clustered index.
- if a table has already a clustered index and then a primary key is added then the new primary key index will be saved as non clustered index.
- it provides fast retrieval of data rows
- Non – Clustered: this is the secondary indexes of the table.
- A table can have multiple non clustered indexes.
- Reason of using clustered index is to help the user to get some data a bit more faster.
- It takes up more space in the memory as it takes the data and save it separately.
IF a table doesn’t have a clustered index, then the data is stored in a heap table.
Syntax of Creating an Index
create clustered index <index_name> on <table>(<column name>) (ASC/DESC)
create nonclustered index <index_name> on <table>(<column name>) (ASC/DESC)
Unique index: whenever we create a unique index it make sure that the values which are there must be unique. non-unique indexes are used solely to improve query performance by maintaining a sorted order of data values that are used frequently.
create unique index <index_name> on <table>(<column name>)(ASC/DESC)
when ever we see the execution plan of a query, we can see two components:
- Index Seek (NonClustered)
- Key Lookup (Clustered)
Above are the two steps that happens whenever we try to find a row using non clustered index. Index seek, basically get the the primary id or clustered index id and then KeyLookup, search from the clustered index.
Index Seek, basically hold the primary key value at the leaf nodes so whenever we search for the non clustered index, we basically get the primary key or clustered index id and then that id is used to get the data from the clustered index.
Types of Indexes
Index type | Description |
---|---|
Hash | With a hash index, data is accessed through an in-memory hash table. Hash indexes consume a fixed amount of memory, which is a function of the bucket count. |
memory-optimized Nonclustered | For memory-optimized nonclustered indexes, memory consumption is a function of the row count and the size of the index key columns |
Clustered | A clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. The clustered index is implemented as a B-tree index structure that supports fast retrieval of the rows, based on their clustered index key values. |
Nonclustered | A nonclustered index can be defined on a table or view with a clustered index or on a heap. Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value. The rows in the index are stored in the order of the index key values, but the data rows are not guaranteed to be in any particular order unless a clustered index is created on the table. |
Unique | A unique index ensures that the index key contains no duplicate values and therefore every row in the table or view is in some way unique. Uniqueness can be a property of both clustered and nonclustered indexes. |
Columnstore | An in-memory columnstore index stores and manages data by using column-based data storage and column-based query processing. Columnstore indexes work well for data warehousing workloads that primarily perform bulk loads and read-only queries. Use the columnstore index to achieve up to 10x query performance gains over traditional row-oriented storage, and up to 7x data compression over the uncompressed data size. |
Index with included columns | A nonclustered index that is extended to include nonkey columns in addition to the key columns. |
Index on computed columns | An index on a column that is derived from the value of one or more other columns, or certain deterministic inputs. |
Filtered | An optimized nonclustered index, especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table. A well-designed filtered index can improve query performance, reduce index maintenance costs, and reduce index storage costs compared with full-table indexes. |
Spatial | A spatial index provides the ability to perform certain operations more efficiently on spatial objects (spatial data) in a column of the geometry data type. The spatial index reduces the number of objects on which relatively costly spatial operations need to be applied. |
XML | A shredded, and persisted, representation of the XML binary large objects (BLOBs) in the xml data type column. |
Full-text | A special type of token-based functional index that is built and maintained by the Microsoft Full-Text Engine for SQL Server. It provides efficient support for sophisticated word searches in character string data. |
SORT_IN_TEMPDB Option For Indexes
When you create or rebuild an index, by setting the SORT_IN_TEMPDB option to ON you can direct the SQL Server Database Engine to use tempdb to store the intermediate sort results that are used to build the index. Although this option increases the amount of temporary disk space that is used to create an index, the option could reduce the time that is required to create or rebuild an index when tempdb is on a set of disks different from that of the user database.