Back to articles list
- 11 minutes read

What Is a Database Index, and What Does It Do?

    Tags:
  • database index

Understand what a database index does and how it can improve SQL query performance.

Modern databases store terabytes or petabytes of data and are a key resource for organizational operations and decision-making. A database index is a special data structure that allows quick access to specific pieces of information without having to read all data stored in a particular table. They ensure database performance in transactional environments.

How Does a Database Index Enhance Access to Information?

Let’s forget about databases and tables and use a “real life” example to describe and solve a problem. Assume that while you are on the bus heading to university, you suddenly remember that you have an assignment to write a few lines about Fitzgerald Scott. Unfortunately, you forgot to charge your cell phone, so you do not have internet access. Fortunately, you have just bought the book the professor suggested on the subject. You have 10 minutes to find information about the author in a 300-page book; unless you are The Flash, you cannot read the entire book!

Here is when the book’s index comes to help! At the very end of the book, the Index section has an ordered list of important words where you can easily find what you are looking for (in our scenario, author names). Next to each topic, there are some numbers that tell you in which pages of the book the author is mentioned.

Database index

Now, you can quickly turn to pages 120, 121, and 197 and read them to get the information you need. You complete the assignment before the bus reaches the university!

Database Tables and Indexes

Database tables are, in a way, like books. Data that is inserted into a table is internally stored on blocks or pages (the name varies depending on the database engine, but the meaning is the same: a set of contiguous bytes on disk) on the first empty space that is available in a block assigned to a table, with no specific order. When a block is full, a new one is assigned to the table and new data is inserted on the new block. When someone deletes data, an empty space is created on the existing blocks; that space may be used by the next insertions.

If we do not have an index and we need to find some specific information, we need to read the entire “book” (table) from beginning to end. Although modern computers and storage devices are really fast, the massive volume of some database tables would require a lot of time to find specific information.

That is the reason why database indexes are created: to allow quick access to data in a table. You can create indexes on different table columns. They help in the same way as a book index does, since they store all the values existing in that column together with a reference to all the blocks (or pages) where the values are stored or mentioned.

Indexes are automatically maintained by the database engine. This means that whenever a DML (an insert, update or delete) operation is performed on the table, the same operation (insert, update or delete) is done on each of the indexes of the table.

Take a look at All About Indexes: The Very Basics to get details on how indexes help enhance database performance when searching for specific data on large datasets.

Not All that Glitters Is Gold

We have seen the pros of database indexes, but there are some cons that must be considered when creating indexes.

First, an index is an additional data structure. That means that it needs space, both in “disk” to be stored and in “memory” to be accessed. Indexes consume blocks like a table does, so creating unnecessary indexes in a table will use more storage and consume more memory when data in the table is being accessed, inserted, updated, or deleted.

Second, the fact that indexes are automatically maintained means that the database engine needs to perform additional actions whenever a DML sentence is issued in the table. For example, if the last name of a customer is changed from “Thompson” to “Johnson”, then the database engine needs to update the block where the data is stored with the new value, remove the old “Thompson” value from any indexes where the last_name column is used, and insert the new “Johnson” value on the same indexes. This requires CPU time and disk access. Thus, keeping unnecessary indexes makes DML operations slower.

Like any other tool, indexes can be of great help and at the same time can be a problem if they are created needlessly or without good planning.

Table and Index Structures

In this article, we are going to use some examples based on a very simple Vertabelo data model for a small car rental company:

Database index

Regular tables (also known as heap tables) store data in no specific or guaranteed order. The image below represents the storage of the Vehicles table. It is composed of 14 blocks (or pages) containing the data shown on the diagram (not all columns are represented in the image). It clearly shows that the data is not ordered in any particular way:

Database index

Indexes are implemented using a binary tree structure. (This is called B-Tree or B+Tree, depending on whether it stores pointers only on leaf tables or not. You can find out more in All About Indexes Part 2: MySQL Index Structure and Performance.) The values of the column or columns used as  the index key are stored, ordered, and organized in a root block, branch blocks (if required), and leaf blocks:

Database index

In the example shown above, which uses a B+Tree structure, we have an index created on the Brand column (values in blue), with a root block (block #70) with ranges of values and a pointer (in red) to each leaf block (#71, #72 and #73) where values for that range are located. Leaf blocks include each indexed value (again in blue) and a pointer (also in red) to the table block or blocks where data is present in the table.

If we want to search for customers with a particular last name and we don’t have an index, we would have to read the entire table (14 blocks). Having an index reduces that search to 3 blocks:

  1. First read the root block to find the leaf block showing where the desired value is located in the index.
  2. Next, read the leaf index block to find the specific last name and the block (or blocks) where data is located.
  3. Finally, read the blocks with the data.

As tables get bigger, indexes may require intermediate branch blocks to store more values. In this example, we need to read 3 blocks  (root, branch, and leaf) to find a specific value:

Database index

The syntax for index creation may vary slightly between different database engines, but it is usually in this form:

CREATE INDEX IndexName 
ON TableName (IndexedColumn [, IndexedColumnN]);

Example (available in the Vertabelo diagram):

CREATE INDEX IX_Vehicle_Brand ON Vehicle (Brand);

Note: Indexes can be created on more than one column, as shown in the syntax. They are effective as long as the filter condition includes all leading columns. If we have an index on “ColumnA”, “ColumnB” and “ColumnC”, then any query that filters only by “ColumnC” will not benefit from the index, since it is not the leading column.

Index Types

There are different ways to classify indexes. We’ll review some of the most used.

Clustered Indexes (Index Organized Tables)

When we mentioned that data is not stored in a table in any particular way, we were referring to how many database engines work by default. Database engines allow data to be stored in an order defined by the user; this kind of table is referred to as a clustered index (SQL Server) or an Index Organized Table (Oracle).

This kind of structure is a mix between a table and an index. Index key column data is ordered (like in an index), but instead of having a pointer to where the actual data is in the table, it includes the rest of the data (in green; only some columns are shown) after the indexed column or columns (in blue). You can see this in the image below:

Database index

Clustered indexes are a very efficient way to retrieve any information, since there is no need to use the pointer to retrieve the actual data once the value has been found on the index. However, they have some caveats; the most important one is that the column or columns chosen as the index key should not change in order to avoid “moving” data from one block / page to another.

Syntax:

CREATE CLUSTERED INDEX IndexName 
ON TableName (IndexedColumn [, IndexedColumnN]);

Note 1: Tables can have only one clustered index, since it actually replaces the table storage. Additional non-clustered indexes can be added to the table.

Note 2: By default, SQL Servers creates a clustered index whenever a primary key is defined for a table. however, you can choose other columns for the index key or create the table as a normal heap table.

Filtered Indexes

Some database engines like SQL Server or PostgreSQL allow you to specify a filter condition for an index. This reduces the number of rows that are indexed, making the index smaller and quicker to access.

Syntax:

CREATE INDEX IndexName 
ON TableName (IndexedColumn [, IndexedColumnN])
WHERE Condition;

Example (available in the Vertabelo diagram):

CREATE INDEX IX_Rental_PlannedReturn_fEffectiveReturnIsNULL 
ON Rental (PlannedReturn)
WHERE EffectiveReturn IS NULL;

This will create an index on the Rental table. It's based on the PlannedReturn column, but only for those rows where EffectiveReturn is NULL (meaning the car has not been returned yet, so the rental is still active).

Note: Clustered Indexes cannot be filtered, since they must store all the data in the table.

Index with INCLUDE (Covering Index)

Some database engines like SQL Server allow users to specify an INCLUDE clause. This lets values for additional columns be included (but not used for ordering) besides the index key columns. We can consider an index with INCLUDE like a reduced version of a clustered index, since we store only some columns together with the index key, rather than the entire row.

Including frequently-used columns may reduce the need to access the actual table to find values for those columns, making queries even faster.

Syntax:

CREATE INDEX IndexName 
ON TableName (IndexedColumn [, IndexedColumnN])
INCLUDE (Column1 [,Column2]);

Example (Available on the Vertabelo diagram):

CREATE INDEX IX_Vehicle_LicensePlate_iBrandModel
ON Vehicle (LicensePlate)
INCLUDE (Brand, Model);

With this index, the following query can be executed without reading data from the table, as the filter condition and the columns are available in the index:

SELECT LicensePlate, Brand, Model
FROM Vehicle
WHERE LicensePlate = ‘6TRJ999’;

When creating indexes with the INCLUDE clause, remember:

  • Such indexes require more space. They need to store the index key columns, the pointer to the actual data, AND the included columns.
  • If the included column values are changed, they need to be updated both in the table and in each index where they are included. This causes overheads to DML operations.

Note: Clustered Indexes cannot use the INCLUDE clause, since they actually store all the table columns.

Bitmap Index

The BITMAP index type, available in Oracle, PostgreSQL, and Teradata, allows you to efficiently index low cardinality columns (like Boolean columns or columns having few distinct values); such columns do not store each entry.

Bitmap indexes use a block for each distinct value in a column and then set a bit for each row in the table. This bit indicates if the row matches the value of the block. In the example below, we can see that row #1 has “Red” defined as its color; the first bit (which represents the first row) in the RED block is set to true. It is  set to false in the other three blocks.

Database index

Bitmaps indexes are extremely efficient for bitwise comparisons (including OR) and consume less space. Still, the cost to maintain bitmap indexes is higher than normal B-Tree indexes, so they are not recommended for columns where data is frequently updated. They are very frequently found in data warehouses, where data is relatively stable.

Syntax:

CREATE BITMAP INDEX IndexName 
ON TableName (IndexedColumn [, IndexedColumnN]);

Example (Available on another  Vertabelo diagram designed for Oracle):

CREATE BITMAP INDEX IX_Vehicle_Color
ON Vehicle (Color);

Function-Based (Expression) Index

Some database engines like Oracle, DB2, Informix, or PostgreSQL allow users to create indexes on deterministic expressions including system provided or user defined functions without having to create a calculated or computed column on the table. (SQL Server requires a computed column to be created.)

Syntax:

CREATE INDEX IndexName 
ON TableName (Expression);

Examples:

CREATE INDEX IX_Rental_RentalYear
ON Rental (EXTRACT(YEAR FROM RentalDate));

With this index, the following query will use the index to find all products where characters in positions 3 to 6 of the ProductCode are ‘TECH’:

SELECT * FROM Rental
WHERE EXTRACT(Year FROM RentalDate) = 2021;

Other Database Index Types

There are several other index types available on different database engines; they are designed to solve specific issues (like index size) or to deal with specific data (like JSON, XML or spatial). The following is a short list of some of them:

TEXT Search

Indexes the contents of CLOB or VARCHAR columns word by word. Available in Oracle, SQL Server, and MySQL.

XML Search

Indexes internal contents of JSON documents. Available in Oracle, DB2, and SQL Server.

Spatial

Indexes geographical and spatial data. Available in Oracle and SQL Server.

JSON Search

Indexes the internal contents of JSON documents. Available in Oracle.

Reverse Index

Designed to allow a balanced distribution of leaves on indexes with ever-growing values (like creation date or identity / sequence). Available in Oracle.

Compressed

When multiple columns are defined as index keys, compressed indexes do not store repeated occurrences of leading columns; this saves space. Available in Oracle.

go to top

Our website uses cookies. By using this website, you agree to their use in accordance with the browser settings. You can modify your browser settings on your own. For more information see our Privacy Policy.

哆哆女性网周易免费生辰八字起名测名汽车网站设计网站有声小说算死命周公解梦安装桐乡设计网站起名双胞胎女沈沈姓男孩起名大全永远恋爱真美周易五行起名网俄语网站建设专家永城信息港车辆出售网站符合优化网站数据优化禹城网站制作超恐怖短片鬼故事南通seo营销高端签名台州网站排名优化建设网站需要什麽网站优化办法初夏养生宝宝起名大全姓张鼠年求职网站设计长篇鬼故事在线收听mp3黄瓜啥时候种比较好包装公司名称起名大全店铺起名名称测试早餐店起名字看着就干净的陈妙瑛演的电视剧起名家政公司淀粉肠小王子日销售额涨超10倍罗斯否认插足凯特王妃婚姻不负春光新的一天从800个哈欠开始有个姐真把千机伞做出来了国产伟哥去年销售近13亿充个话费竟沦为间接洗钱工具重庆警方辟谣“男子杀人焚尸”男子给前妻转账 现任妻子起诉要回春分繁花正当时呼北高速交通事故已致14人死亡杨洋拄拐现身医院月嫂回应掌掴婴儿是在赶虫子男孩疑遭霸凌 家长讨说法被踢出群因自嘲式简历走红的教授更新简介网友建议重庆地铁不准乘客携带菜筐清明节放假3天调休1天郑州一火锅店爆改成麻辣烫店19岁小伙救下5人后溺亡 多方发声两大学生合买彩票中奖一人不认账张家界的山上“长”满了韩国人?单亲妈妈陷入热恋 14岁儿子报警#春分立蛋大挑战#青海通报栏杆断裂小学生跌落住进ICU代拍被何赛飞拿着魔杖追着打315晚会后胖东来又人满为患了当地回应沈阳致3死车祸车主疑毒驾武汉大学樱花即将进入盛花期张立群任西安交通大学校长为江西彩礼“减负”的“试婚人”网友洛杉矶偶遇贾玲倪萍分享减重40斤方法男孩8年未见母亲被告知被遗忘小米汽车超级工厂正式揭幕周杰伦一审败诉网易特朗普谈“凯特王妃P图照”考生莫言也上北大硕士复试名单了妈妈回应孩子在校撞护栏坠楼恒大被罚41.75亿到底怎么缴男子持台球杆殴打2名女店员被抓校方回应护栏损坏小学生课间坠楼外国人感慨凌晨的中国很安全火箭最近9战8胜1负王树国3次鞠躬告别西交大师生房客欠租失踪 房东直发愁萧美琴窜访捷克 外交部回应山西省委原副书记商黎光被逮捕阿根廷将发行1万与2万面值的纸币英国王室又一合照被质疑P图男子被猫抓伤后确诊“猫抓病”

哆哆女性网 XML地图 TXT地图 虚拟主机 SEO 网站制作 网站优化