beam-migrate: CREATE INDEX support by sheaf · Pull Request #789 · haskell-beam/beam

sheaf · 2026-03-16T19:15:28Z

This PR adds functionality to add secondary indices to a database to speed up queries.

The user-facing API consists of the addTableIndex function, and the helper indexCol function.

Backend support goes via the new typeclass IsSql92CreateDropIndexSyntax, with support in both the SQLite and Postgres backends.

sheaf · 2026-03-16T19:16:23Z

+     -- Collect user-created secondary indices.
+     --
+     -- Excludes:
+     --   - primary keys
+     --   - indices that back a constraint (i.e. those created implicitly by UNIQUE/EXCLUDE)
+     --   - expression indices e.g. CREATE INDEX ON users (LOWER(email))
+     secondaryIndexes <-
+       map (\(schema, tblNm, idxNm, isUniq, cols) ->
+              Db.SomeDatabasePredicate
+                (Db.TableHasIndex (Db.QualifiedName schema tblNm) idxNm (V.toList cols) isUniq)) <$>
+       Pg.query_ conn (fromString (unlines
+         [ -- NULL out 'public' since it is the implicit default schema in Postgres
+           "SELECT NULLIF(ns.nspname, 'public'), c.relname, i.relname, ix.indisunique,"
+           -- re-aggregate column names in index-key order (see ORDINALITY below)
+         , "       array_agg(a.attname ORDER BY k.n ASC)"
+         , "FROM pg_index ix"
+         , "JOIN pg_class c ON c.oid = ix.indrelid"
+         , "JOIN pg_class i ON i.oid = ix.indexrelid"
+         , "JOIN pg_namespace ns ON ns.oid = c.relnamespace"
+           -- ORDINALITY allows retaining ordering of index columns
+         , "CROSS JOIN unnest(ix.indkey) WITH ORDINALITY k(attid, n)"
+         , "JOIN pg_attribute a ON a.attnum = k.attid AND a.attrelid = ix.indrelid"
+           -- only regular tables (not views, sequences, etc.)
+         , "WHERE c.relkind = 'r'"
+           -- exclude Postgres system schemas
+         , "  AND ns.nspname NOT LIKE 'pg_%'"
+         , "  AND ns.nspname != 'information_schema'"
+           -- exclude primary key indices
+         , "  AND NOT ix.indisprimary"
+           -- exclude indices created implicitly by a UNIQUE or EXCLUDE constraint
+         , "  AND NOT EXISTS (SELECT 1 FROM pg_constraint con WHERE con.conindid = ix.indexrelid)"
+           -- exclude expression indices: a key column number of 0 means that
+           -- position is an expression (e.g. lower(col)) rather than a plain
+           -- column reference, which TableHasIndex cannot represent
+         , "  AND NOT EXISTS (SELECT 1 FROM unnest(ix.indkey) AS k(attnum) WHERE k.attnum = 0)"
+         , "GROUP BY ns.nspname, c.relname, i.relname, ix.indisunique" ]))
+


Claude Code helped me write this, because I was completely out of my depth with it. I can't really vouch for its correctness.

Based on a cursory reading of the documentation it looks fine. It's exercised in tests

LaurentRDC · 2026-03-16T19:43:47Z

Thanks! I should be able to review in the next few days.

As a side-note: did you check out other attempts at this? e.g. #335

sheaf · 2026-03-16T20:10:13Z

As a side-note: did you check out other attempts at this? e.g. #335

I wasn't aware of this approach, I'll take a look this week.

sheaf · 2026-03-17T08:31:28Z

#335 was adding the notion of index to beam-core, at a more fundamental level in the database schema, while this PR only adds it to beam-migrate. This means the current PR is more limited in scope, as it only really supports creating/dropping secondary indices (and computing their presence by introspection), but it doesn't include them in the schema itself.

#335 has a lot of complexity due to trying to derive indices from a separate record description of indices using generics, with a separate DatabaseIndices type. This allows indices to be automatically generated, but judging from PR review the approach had some fundamental limitations.
In this PR the approach is more low-level, not far off from just declaring the indices as simple field names (but using the indexCol function to retrieve the names instead of manually writing them).

#335 was missing SQLite support, and had no schema introspection queries.

#335 also made DROP INDEX into a sub-command of ALTER TABLE, which I don't think is correct.

I took a couple of changes from that PR and pushed them here:

use MigrationKeepsData for DROP INDEX because it doesn't actually lose data in the database (the indices can be recalculated)
introduce an IndexOptions datatype instead of raw Bool for uniqueness, to make the design more extensible

sheaf · 2026-03-17T12:07:05Z

In conclusion, I think this is simpler than #335 because it avoids the machinery for deriving secondary indices using Generics. That part also seemed to be what caused #335 to get bogged down. With this PR the approach is a bit more manual (in particular, the secondary indices are named manually).

None of this is meant as a judgement on #335 because that functionality is indeed quite appealing.

My use case is mainly to automate away manual index creation and have the migrations framework handle it, which this PR does quite well. But otherwise the approach here is a bit barebones in comparison.

LaurentRDC

Looking pretty good!

LaurentRDC · 2026-03-21T01:30:55Z

+      , Eq       (Sql92CreateIndexOptionsSyntax syntax)
+      , Hashable (Sql92CreateIndexOptionsSyntax syntax)
+      ) => IsSql92CreateDropIndexSyntax syntax where
+  data family Sql92CreateIndexOptionsSyntax syntax


I'm not familiar with data families. I would have expected a closed type family instead:

class ( IsSql92DdlCommandSyntax syntax , Show (Sql92CreateIndexOptionsSyntax syntax) , Eq (Sql92CreateIndexOptionsSyntax syntax) , Hashable (Sql92CreateIndexOptionsSyntax syntax) ) => IsSql92CreateDropIndexSyntax syntax where type Sql92CreateIndexOptionsSyntax syntax

That's how other syntaxes are represented in Beam.

Is there an advantage to using data families?

It's one less indirection. With this setup one writes:

instance IsSql92CreateDropIndexSyntax MySyntax where data Sql92CreateIndexOptionsSyntax MySyntax = MySyntaxIndexOptions { field1 :: Ty1, field2 :: Ty2 } deriving stock (Eq, Ord, Generic) deriving anyclass Hashable

whereas with a type family it would be:

instance IsSql92CreateDropIndexSyntax MySyntax where type Sql92CreateIndexOptionsSyntax MySyntax = MySyntaxIndexOptions data MySyntaxIndexOptions = MySyntaxIndexOptions { field1 :: Ty1, field2 :: Ty2 } deriving stock (Eq, Ord, Generic) deriving anyclass Hashable

The latter is strictly more boilerplate, and also less permissive as one cannot write unsaturated type families while one can write unsaturated data families. Perhaps not so relevant here, but it does make some type-level programming idioms impossible, e.g. (rough sketch):

type KnownSyntaxes = [PgCommandSyntax, SqliteCommandSyntax] type AllIndicesSupport :: (Type -> Constraint) -> Constraint type AllIndicesSupport c = All (c . Sql92CreateIndexOptionsSyntax) KnownSyntaxes

All that said, if you think it would be better for consistency I can switch the code to using type families.

In the type family case, MySyntaxIndexOptions has a standalone type, whereas the data family does not. Can the data family instance be used in a standalone way?

Imagine a backend-specific function on an index. Something like:

someFunc :: PgSyntaxIndexOptions -> Pg ()

In the data family case, would that be written like this?

someFunc :: Sql92CreateIndexOptionsSyntax PgSyntax -> Pg ()

If so, I lean towards keeping the consistency with other bits of beam by using type families

Yes, it would be written as you say.

Allright, let's keep the consistency by using a type family instead of data family and then we can wrap this PR up!
Thank you for your patience through this review process

I've updated the class to use a type family. It has the unfortunate effect of being more ambiguous, because a type signature such as

indexIsUnique :: Sql92CreateIndexOptionsSyntax syntax -> Bool

which used to be unambiguous is now ambiguous (because syntax only appears guarded under a type family).

If you look at the commit I think you'll agree things are quite a bit less ergonomic like this, but I agree it's also important to keep the interface consistent.

LaurentRDC · 2026-03-21T01:41:27Z

+     -- Collect user-created secondary indices.
+     --
+     -- Excludes:
+     --   - primary keys
+     --   - indices that back a constraint (i.e. those created implicitly by UNIQUE/EXCLUDE)
+     --   - expression indices e.g. CREATE INDEX ON users (LOWER(email))
+     secondaryIndexes <-
+       map (\(schema, tblNm, idxNm, isUniq, cols) ->
+              Db.SomeDatabasePredicate
+                (Db.TableHasIndex (Db.QualifiedName schema tblNm) idxNm (V.toList cols) isUniq)) <$>
+       Pg.query_ conn (fromString (unlines
+         [ -- NULL out 'public' since it is the implicit default schema in Postgres
+           "SELECT NULLIF(ns.nspname, 'public'), c.relname, i.relname, ix.indisunique,"
+           -- re-aggregate column names in index-key order (see ORDINALITY below)
+         , "       array_agg(a.attname ORDER BY k.n ASC)"
+         , "FROM pg_index ix"
+         , "JOIN pg_class c ON c.oid = ix.indrelid"
+         , "JOIN pg_class i ON i.oid = ix.indexrelid"
+         , "JOIN pg_namespace ns ON ns.oid = c.relnamespace"
+           -- ORDINALITY allows retaining ordering of index columns
+         , "CROSS JOIN unnest(ix.indkey) WITH ORDINALITY k(attid, n)"
+         , "JOIN pg_attribute a ON a.attnum = k.attid AND a.attrelid = ix.indrelid"
+           -- only regular tables (not views, sequences, etc.)
+         , "WHERE c.relkind = 'r'"
+           -- exclude Postgres system schemas
+         , "  AND ns.nspname NOT LIKE 'pg_%'"
+         , "  AND ns.nspname != 'information_schema'"
+           -- exclude primary key indices
+         , "  AND NOT ix.indisprimary"
+           -- exclude indices created implicitly by a UNIQUE or EXCLUDE constraint
+         , "  AND NOT EXISTS (SELECT 1 FROM pg_constraint con WHERE con.conindid = ix.indexrelid)"
+           -- exclude expression indices: a key column number of 0 means that
+           -- position is an expression (e.g. lower(col)) rather than a plain
+           -- column reference, which TableHasIndex cannot represent
+         , "  AND NOT EXISTS (SELECT 1 FROM unnest(ix.indkey) AS k(attnum) WHERE k.attnum = 0)"
+         , "GROUP BY ns.nspname, c.relname, i.relname, ix.indisunique" ]))
+


Based on a cursory reading of the documentation it looks fine. It's exercised in tests

This commit adds functionality to add secondary indices to a database to speed up queries. The user-facing API consists of the 'addTableIndex' function, and the helper 'selectorColumnName' function. Backend support goes via the new typeclass 'IsSql92CreateDropIndexSyntax', with support in both the SQLite and Postgres backends.

sheaf · 2026-03-21T09:31:35Z

The failures on 9.12 and 9.14 seem spurious as they have to do with the installation of alex/happy in the CI environment. 9.14 works fine for me locally.

LaurentRDC · 2026-03-24T13:52:35Z

Awesome @sheaf ! Thanks for your contribution

LaurentRDC · 2026-03-24T14:01:15Z

I should be able to make new releases for beam-migrate / beam-sqlite / beam-postgres today

LaurentRDC · 2026-03-24T17:45:28Z

Released:

sheaf commented Mar 16, 2026

View reviewed changes

sheaf force-pushed the create-index branch from bd14fd9 to 5878c39 Compare March 17, 2026 12:00

LaurentRDC reviewed Mar 17, 2026

View reviewed changes

Comment thread beam-migrate/Database/Beam/Migrate/SQL/SQL92.hs Outdated

Comment thread beam-migrate/Database/Beam/Migrate/Actions.hs Outdated

sheaf force-pushed the create-index branch from 5878c39 to 77103dc Compare March 19, 2026 10:20

LaurentRDC reviewed Mar 21, 2026

View reviewed changes

sheaf force-pushed the create-index branch 2 times, most recently from 80445a4 to 8a454b7 Compare March 21, 2026 09:11

sheaf force-pushed the create-index branch from 8a454b7 to 623a1fe Compare March 21, 2026 09:22

Make Sql92CreateIndexOptionsSyntax into a type family

7f45fc3

LaurentRDC approved these changes Mar 24, 2026

View reviewed changes

LaurentRDC merged commit 670eb81 into haskell-beam:master Mar 24, 2026
13 checks passed

Conversation

sheaf commented Mar 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LaurentRDC commented Mar 16, 2026

Uh oh!

sheaf commented Mar 16, 2026

Uh oh!

sheaf commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sheaf commented Mar 17, 2026

Uh oh!

LaurentRDC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sheaf Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sheaf commented Mar 21, 2026

Uh oh!

LaurentRDC commented Mar 24, 2026

Uh oh!

Uh oh!

LaurentRDC commented Mar 24, 2026

Uh oh!

LaurentRDC commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sheaf commented Mar 17, 2026 •

edited

Loading

sheaf Mar 21, 2026 •

edited

Loading