Indexes and Performance

An index is a data structure that PostgreSQL maintains alongside your table data to allow faster lookups. Without indexes, every query must scan the entire table (a sequential scan or seq scan), reading every row to find the matching ones. With the right indexes, PostgreSQL can jump directly to the relevant rows (an index scan), which is orders of magnitude faster on large tables.

Understanding when indexes help, which type to use, and how to read EXPLAIN ANALYZE output is essential for building databases that perform well at scale.

How Indexes Work

When you create a B-tree index on a column, PostgreSQL builds a balanced binary search tree containing the column values and pointers to the corresponding rows (called heap tuples). When a query filters or sorts on that column, PostgreSQL searches the tree in O(log n) time instead of scanning all rows in O(n) time.

For a table with 1,000,000 rows:

Sequential scan: reads all 1,000,000 rows
Index scan: reads approximately 20 rows (log₂(1,000,000) ≈ 20)

The difference becomes dramatic as tables grow.

B-tree Indexes — The Workhorse

B-tree is the default and most versatile index type. It supports:

Equality: WHERE email = '[email protected]'
Range: WHERE price > 100 AND price < 500
Sorting: ORDER BY created_at DESC
Prefix matching: WHERE name LIKE 'Laptop%' (not %Laptop%)

-- Index for filtering by category
CREATE INDEX idx_products_category ON products (category);

-- Index for sorting by date
CREATE INDEX idx_orders_date ON orders (ordered_at DESC);

-- Composite index: covers queries that filter by (status, customer_id)
-- Order matters: put the most selective column first, or the column used in equality conditions
CREATE INDEX idx_orders_status_customer ON orders (status, customer_id);

Index Column Order in Composites

For a composite index (a, b):

Queries filtering on a alone can use the index
Queries filtering on a AND b can use the index
Queries filtering on b alone cannot use the index

This is called the "leading column" rule. Always put columns used in equality conditions before range conditions.

Partial Indexes

A partial index indexes only the rows that satisfy a WHERE condition. This creates a smaller, faster index for queries that frequently target a subset of rows:

-- Most queries only care about pending or processing orders
-- A partial index on just those rows is much smaller
CREATE INDEX idx_orders_active
ON orders (ordered_at, customer_id)
WHERE status IN ('pending', 'processing');

-- Queries that match the partial index condition use this tiny index
SELECT * FROM orders
WHERE status = 'pending'
  AND ordered_at > NOW() - INTERVAL '7 days';

A partial index for a 100,000-row orders table where 95% of orders are completed would index only 5,000 rows instead of 100,000 — a 20x size reduction with proportionally faster lookups.

Unique Indexes

A unique index both enforces uniqueness AND serves as a regular index:

-- These are equivalent — UNIQUE constraint creates a unique index
CREATE UNIQUE INDEX idx_users_email ON users (email);
ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);

-- Partial unique index: email must be unique among active users only
CREATE UNIQUE INDEX idx_active_users_email ON users (email)
WHERE is_active = TRUE;

GIN Indexes for Full-Text Search and JSON

GIN (Generalized Inverted Index) indexes are designed for values that contain multiple components — arrays, JSON, and full-text search documents:

-- Full-text search index
CREATE INDEX idx_articles_fts
ON articles
USING GIN (to_tsvector('english', title || ' ' || body));

-- Search articles containing both 'postgresql' and 'index'
SELECT title FROM articles
WHERE to_tsvector('english', title || ' ' || body)
      @@ to_tsquery('english', 'postgresql & index');

-- JSONB index: speeds up queries on JSON key access
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);

-- Query using the GIN index
SELECT name FROM products WHERE metadata @> '{"color": "red"}';

For full-text search, pg_trgm extension provides trigram-based GIN/GiST indexes that even support ILIKE '%keyword%' searches efficiently:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX idx_products_name_trgm ON products USING GIN (name gin_trgm_ops);

-- This now uses the trigram index (fast even with leading wildcard!)
SELECT name FROM products WHERE name ILIKE '%keyboard%';

EXPLAIN ANALYZE — Reading Query Plans

EXPLAIN ANALYZE is your primary tool for understanding and optimizing query performance. It shows the actual execution plan that PostgreSQL chose and the real-world time spent at each step:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT c.full_name, COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
WHERE c.created_at >= '2026-01-01'
GROUP BY c.full_name
ORDER BY order_count DESC;

Sample output:

Sort  (cost=142.23..143.48 rows=500 width=50) (actual time=2.341..2.356 rows=245 loops=1)
  Sort Key: (count(o.order_id)) DESC
  ->  HashAggregate  (actual time=2.145..2.234 rows=245 loops=1)
        ->  Hash Left Join  (actual time=0.892..1.876 rows=980 loops=1)
              Hash Cond: (o.customer_id = c.customer_id)
              ->  Seq Scan on orders o  (actual time=0.015..0.432 rows=1000 loops=1)
              ->  Hash  (actual time=0.723..0.724 rows=245 loops=1)
                    ->  Index Scan using idx_customers_created_at on customers c
                          (actual time=0.023..0.687 rows=245 loops=1)
                          Index Cond: (created_at >= '2026-01-01')
Planning Time: 0.842 ms
Execution Time: 2.523 ms

Key things to look for:

Seq Scan on large tables: may indicate a missing index
Hash Join vs Nested Loop: Hash Join is usually better for large tables
actual time: the real execution time — if far from cost estimate, run ANALYZE to update statistics
rows= after actual: compare to estimated rows; big differences signal stale statistics

Running ANALYZE and VACUUM

PostgreSQL's query planner uses statistics about each table (column value distributions, row counts) to estimate query costs and choose execution plans. These statistics go stale as data changes:

-- Update statistics for one table
ANALYZE products;

-- Update statistics for all tables in the database
ANALYZE;

-- VACUUM: reclaim space from dead rows (usually runs automatically via autovacuum)
VACUUM ANALYZE orders;

PostgreSQL's autovacuum daemon runs VACUUM ANALYZE automatically in the background. For tables with unusually high write rates, you may need to tune autovacuum settings or run it manually after large bulk operations.

Index Maintenance

-- Find unused indexes (candidates for removal)
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE '%_pkey'
ORDER BY schemaname, tablename, indexname;

-- Find tables with sequential scans (possible missing indexes)
SELECT relname, seq_scan, seq_tup_read, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > 1000
ORDER BY seq_tup_read DESC;

-- Rebuild a bloated index
REINDEX INDEX idx_orders_customer_id;

-- Rebuild all indexes on a table
REINDEX TABLE orders;

Indexing is one of the highest-leverage performance optimizations available. A single well-placed index can reduce a 10-second query to 10 milliseconds. The next lesson covers transactions and concurrency — understanding how PostgreSQL handles multiple simultaneous users safely.