On this page

Subqueries and CTEs

14 min read TextCh. 3 — Advanced Queries

Subqueries and CTEs

As queries grow more complex, you need tools for building queries incrementally — using the results of one query as the input to another. PostgreSQL provides two primary mechanisms: subqueries (queries nested inside another query) and CTEs (Common Table Expressions, introduced with the WITH keyword).

Both solve the same fundamental problem: how to work with derived data sets without creating permanent database objects. The choice between them is mostly about readability and reuse.

Subqueries

A subquery is a SELECT statement nested inside another SQL statement. Subqueries can appear in three positions:

  1. In the SELECT list (scalar subquery)
  2. In the FROM clause (derived table or inline view)
  3. In the WHERE clause (filtering subquery)

Scalar Subqueries

A scalar subquery returns exactly one row and one column. It can be used anywhere a single value is expected:

-- Show each product's price and how much it deviates from the global average
SELECT
    name,
    price,
    ROUND(AVG(price) OVER (), 2) AS global_avg,   -- window function alternative
    ROUND(price - (SELECT AVG(price) FROM products), 2) AS deviation
FROM products
ORDER BY deviation DESC;

If a scalar subquery returns more than one row, PostgreSQL throws a runtime error. If it returns zero rows, the result is NULL.

Subqueries in FROM — Derived Tables

A subquery in the FROM clause creates a temporary result set that the outer query treats as a table:

-- Find categories where the average price is above the overall average
SELECT category, avg_price
FROM (
    SELECT
        category,
        ROUND(AVG(price), 2) AS avg_price
    FROM products
    GROUP BY category
) AS category_stats
WHERE avg_price > (SELECT AVG(price) FROM products)
ORDER BY avg_price DESC;

The subquery inside FROM must always have an alias (AS category_stats). This is called a derived table or inline view.

Subqueries in WHERE

Subqueries in WHERE are used for filtering. They come in three forms:

Comparison Subqueries

-- Products priced above the overall average
SELECT name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products)
ORDER BY price DESC;

IN Subqueries

-- Customers who have placed at least one completed order
SELECT full_name, email
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE status = 'completed'
);

EXISTS Subqueries

EXISTS is often more efficient than IN because it short-circuits as soon as it finds one matching row:

-- Customers who have at least one order (using EXISTS)
SELECT c.full_name, c.email
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

-- Customers who have NEVER ordered (NOT EXISTS anti-join)
SELECT c.full_name, c.email
FROM customers c
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

The SELECT 1 inside EXISTS is conventional — EXISTS only checks whether any row is returned, not what the row contains.

Correlated Subqueries

A correlated subquery references a column from the outer query. It is re-evaluated for every outer row:

-- Products priced above their own category's average
SELECT name, category, price
FROM products p
WHERE price > (
    SELECT AVG(price)
    FROM products
    WHERE category = p.category  -- 'p.category' references the outer query
)
ORDER BY category, price DESC;

While powerful, correlated subqueries can be slow because they execute once per row in the outer query. For large tables, rewrite them as JOINs against aggregated subqueries or use window functions.

Common Table Expressions (CTEs)

A CTE (Common Table Expression) is a named, temporary result set defined before the main query with the WITH keyword:

WITH cte_name AS (
    SELECT ...
)
SELECT ... FROM cte_name;

CTEs make complex queries readable by naming intermediate results. They are especially valuable when the same intermediate result is referenced multiple times.

Multiple CTEs

WITH
high_value_orders AS (
    SELECT order_id, customer_id, total_amount
    FROM orders
    WHERE total_amount > 500 AND status = 'completed'
),
top_customers AS (
    SELECT customer_id, COUNT(*) AS order_count, SUM(total_amount) AS lifetime_value
    FROM high_value_orders
    GROUP BY customer_id
    HAVING COUNT(*) >= 3
),
customer_details AS (
    SELECT c.customer_id, c.full_name, c.email, tc.lifetime_value
    FROM customers c
    JOIN top_customers tc ON tc.customer_id = c.customer_id
)
SELECT
    full_name,
    email,
    ROUND(lifetime_value, 2) AS lifetime_value,
    RANK() OVER (ORDER BY lifetime_value DESC) AS rank
FROM customer_details
ORDER BY lifetime_value DESC;

Each CTE builds on the previous ones, making the query read almost like a recipe.

Writeable CTEs

CTEs can include INSERT, UPDATE, or DELETE statements, making complex multi-step data modifications possible:

WITH moved_orders AS (
    DELETE FROM orders_temp
    WHERE ordered_at < NOW() - INTERVAL '1 year'
    RETURNING *
)
INSERT INTO orders_archive
SELECT * FROM moved_orders;

This atomically moves old orders from a temporary table to an archive table in a single statement.

Recursive CTEs

A recursive CTE references itself, enabling queries that traverse hierarchical or graph structures — category trees, organizational charts, bill of materials, comment threads.

WITH RECURSIVE org_chart AS (
    -- Anchor member: the CEO (no manager)
    SELECT
        employee_id,
        full_name,
        manager_id,
        title,
        1 AS level,
        ARRAY[employee_id] AS path
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive member: direct reports of already-found employees
    SELECT
        e.employee_id,
        e.full_name,
        e.manager_id,
        e.title,
        oc.level + 1,
        oc.path || e.employee_id
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.employee_id
    WHERE NOT e.employee_id = ANY(oc.path)  -- cycle guard
)
SELECT
    REPEAT('  ', level - 1) || full_name AS indented_name,
    title,
    level
FROM org_chart
ORDER BY path;

The structure of every recursive CTE is:

  1. Anchor member: the base case (non-recursive part), connected by UNION ALL
  2. Recursive member: references the CTE name itself, adding one more level per iteration
  3. Termination: recursion stops when the recursive member returns no more rows

The path array and NOT e.employee_id = ANY(oc.path) cycle guard prevent infinite loops if the data has cycles.

CTE vs Subquery — When to Use Which

Scenario Use
Single use, simple condition Inline subquery
Intermediate result used multiple times CTE
Complex query needing step-by-step logic Multiple CTEs
Hierarchical / recursive data Recursive CTE
Performance optimization needed Test both — CTEs are optimization fences in older PostgreSQL versions

In PostgreSQL 12+, CTEs are "inlined" by default (treated like subqueries by the planner), removing the old performance concern. For the rare cases where you want the old materialization behavior (forcing the CTE to run independently), use WITH MATERIALIZED.

Subqueries and CTEs are the building blocks for expressing any query logic, no matter how complex. Combined with the JOINs from the previous lesson, they give you the full power of relational algebra. Next, we look at views and PL/pgSQL functions for encapsulating this logic permanently in the database.

CTEs Improve Readability
Use CTEs (WITH clauses) to break complex queries into named, readable steps. Instead of a deeply nested query that is hard to debug, a CTE lets you name each intermediate result and refer to it by name. Think of CTEs as temporary named views scoped to a single query.
Correlated vs. Uncorrelated Subqueries
An uncorrelated subquery runs once and returns the same value for every outer row. A correlated subquery references a column from the outer query and runs once per outer row — which can be slow on large tables. Often, a correlated subquery can be rewritten as a JOIN or window function for better performance.
Recursive CTE Safety
Always add a termination condition to recursive CTEs to prevent infinite loops. A runaway recursive CTE can consume all server memory. Use the depth counter pattern and add a WHERE depth < 50 guard. PostgreSQL also supports the MAXRECURSION option for safety.
sql
-- Scalar subquery: returns a single value
SELECT name, price,
    price - (SELECT AVG(price) FROM products) AS diff_from_avg
FROM products
ORDER BY diff_from_avg DESC;

-- Subquery in WHERE: products priced above category average
SELECT name, category, price
FROM products p
WHERE price > (
    SELECT AVG(price)
    FROM products
    WHERE category = p.category  -- correlated: references outer query
)
ORDER BY category, price DESC;

-- Subquery in FROM (derived table)
SELECT category, avg_price
FROM (
    SELECT category, ROUND(AVG(price), 2) AS avg_price
    FROM products
    GROUP BY category
) AS category_averages
WHERE avg_price > 50.00;

-- CTE: WITH clause — reusable named subquery
WITH revenue_per_customer AS (
    SELECT
        c.customer_id,
        c.full_name,
        ROUND(SUM(oi.quantity * oi.unit_price), 2) AS total_spent
    FROM customers c
    JOIN orders      o  ON o.customer_id  = c.customer_id
    JOIN order_items oi ON oi.order_id    = o.order_id
    WHERE o.status = 'completed'
    GROUP BY c.customer_id, c.full_name
)
SELECT
    full_name,
    total_spent,
    CASE
        WHEN total_spent >= 1000 THEN 'VIP'
        WHEN total_spent >= 500  THEN 'Regular'
        ELSE 'New'
    END AS customer_tier
FROM revenue_per_customer
ORDER BY total_spent DESC;

-- Recursive CTE: traverse a category hierarchy
WITH RECURSIVE category_tree AS (
    -- Anchor: top-level categories
    SELECT category_id, name, parent_id, 0 AS depth
    FROM categories
    WHERE parent_id IS NULL

    UNION ALL

    -- Recursive: children of each found row
    SELECT c.category_id, c.name, c.parent_id, ct.depth + 1
    FROM categories c
    JOIN category_tree ct ON c.parent_id = ct.category_id
)
SELECT depth, REPEAT('  ', depth) || name AS indented_name
FROM category_tree
ORDER BY depth, name;