Database indexes

November, 2023

Database indexes are always brought up as part of the optimization methods for networked services that maintain a persistent connection to a database. In everyday software engineering work, many engineers see indexes as database constraints in the ORM (Object Relational Mapper) that is being used in the project codebase they are working on. Other times, engineers set indexes directly on the database.

Something I've noticed in my short stint interviewing engineers is that a lot of people understand indexes from the perspective of constraints set when defining the database table model in their codebase. This commonon phenomenon leads to shallow understanding of what's actually being done under the hood by the ORM. I personally don't think it's a satisfactory level of system understanding as engineers rise through the ranks.

Nonetheless, in this article we will mostly be talking about some of the uses of indexes, the different types of indexes that exist and some patterns that I've noticed in my work while using database indexes.

Types of Indexes

Single key indexes

This is mostly the primary key in many database tables. It's mostly set up as the 'id' column in tables. In most cases, it is automatically added to a database schema for you when setting up the database schema via an ORM. It can also be specified by informing an ORM that a key/column is unique. This index is primarily for retrieving single items from a database table. Think of it as a hashmap/dictionary of the unique single key column to their associated rows. It can also be used to ensure some kind of uniqueness. Whenever you need a single db table to not have duplicates based on a single attribute, single key indexes are your best bet.

Compound indexes (multi-key indexes)

This is when we want to ensure that a group of keys are unique together or are mostly combined for database queries. This type of index is used in situations where a set of keys are used hand-in-hand for data retrieval. Also, they can be used to ensure data uniqueness across the set of keys at point of insertion or update. In most databases, the order of the multi-key index matters. The index normally follows the pattern of a key (or a group of keys within the index key space) being a filter key(s) and the others being range keys so as to limit the amount of data traversed during search queries.

Uses of Indexes

Indexes are used to ensure and express uniqueness for a table key or set of keys.
They are used to make the job of database query planners easier in cases like unique indexes on a key or set of keys. Since database engines build a reference map from index keys to their associated database table rows, indexes are a good lever to pull when we want fast queries.
They can be used to solve the problem of race conditions which happens when application code tries to check the existence of records before doing insertions/updates. Also, they reduce the amount of atomic db operations needed to achieve data uniqueness especially when billed on a per-query basis as seen in many cloud database providers.

Patterns in the use of indexes

Queries that require a range filtering: A non-unique index can be added on a set of columns of a database table of interest so as to enable fast range filtering queries. For example, in a ecommerce marketplace orders table with `store_id`, `order_date`, `product_id`, `quantity` etc. columns, a non-unique index can be added on the `store_id` and `order_date` columns so as to improve queries related to retrieving data on a particular store across a date range.
Multi-key uniqueness constraints: this is mostly deployed when we don't want to run uniqueness checks on the application level but on the database table level. For example, we can set a multi-key index on an ecommerce store coupons usage table to prevent one customer from using a coupon multiple times. This will involve setting a multi-key unique index on the `customer_id` and `coupon_id` column so that we can prevent multiple coupon usage by a single customer at the database level.
Stable single item retrieval pattern: This pattern is mostly represented as unique or primary keys on tables.
Non-unique multiple rows data retrieval over a single key: When you have a problem where a column value can be repeated over multiple rows. It's normally represented as a foreign key constraint.

Summary

Database indexes are very important when dealing with data retrieval from databases as well as the integrity of data stored in a database table. Understanding the different types of database indexes, their usage and scenarios where you can apply the different index types will help us improve the efficiency of our database queries and invariably the user experience of our customers.