Skip to content

beam-migrate: CREATE INDEX support#789

Merged
LaurentRDC merged 2 commits intohaskell-beam:masterfrom
sheaf:create-index
Mar 24, 2026
Merged

beam-migrate: CREATE INDEX support#789
LaurentRDC merged 2 commits intohaskell-beam:masterfrom
sheaf:create-index

Conversation

@sheaf
Copy link
Copy Markdown
Contributor

@sheaf sheaf commented Mar 16, 2026

This PR adds functionality to add secondary indices to a database to speed up queries.

The user-facing API consists of the addTableIndex function, and the helper indexCol function.

Backend support goes via the new typeclass IsSql92CreateDropIndexSyntax, with support in both the SQLite and Postgres backends.

Comment on lines +408 to +444
-- Collect user-created secondary indices.
--
-- Excludes:
-- - primary keys
-- - indices that back a constraint (i.e. those created implicitly by UNIQUE/EXCLUDE)
-- - expression indices e.g. CREATE INDEX ON users (LOWER(email))
secondaryIndexes <-
map (\(schema, tblNm, idxNm, isUniq, cols) ->
Db.SomeDatabasePredicate
(Db.TableHasIndex (Db.QualifiedName schema tblNm) idxNm (V.toList cols) isUniq)) <$>
Pg.query_ conn (fromString (unlines
[ -- NULL out 'public' since it is the implicit default schema in Postgres
"SELECT NULLIF(ns.nspname, 'public'), c.relname, i.relname, ix.indisunique,"
-- re-aggregate column names in index-key order (see ORDINALITY below)
, " array_agg(a.attname ORDER BY k.n ASC)"
, "FROM pg_index ix"
, "JOIN pg_class c ON c.oid = ix.indrelid"
, "JOIN pg_class i ON i.oid = ix.indexrelid"
, "JOIN pg_namespace ns ON ns.oid = c.relnamespace"
-- ORDINALITY allows retaining ordering of index columns
, "CROSS JOIN unnest(ix.indkey) WITH ORDINALITY k(attid, n)"
, "JOIN pg_attribute a ON a.attnum = k.attid AND a.attrelid = ix.indrelid"
-- only regular tables (not views, sequences, etc.)
, "WHERE c.relkind = 'r'"
-- exclude Postgres system schemas
, " AND ns.nspname NOT LIKE 'pg_%'"
, " AND ns.nspname != 'information_schema'"
-- exclude primary key indices
, " AND NOT ix.indisprimary"
-- exclude indices created implicitly by a UNIQUE or EXCLUDE constraint
, " AND NOT EXISTS (SELECT 1 FROM pg_constraint con WHERE con.conindid = ix.indexrelid)"
-- exclude expression indices: a key column number of 0 means that
-- position is an expression (e.g. lower(col)) rather than a plain
-- column reference, which TableHasIndex cannot represent
, " AND NOT EXISTS (SELECT 1 FROM unnest(ix.indkey) AS k(attnum) WHERE k.attnum = 0)"
, "GROUP BY ns.nspname, c.relname, i.relname, ix.indisunique" ]))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code helped me write this, because I was completely out of my depth with it. I can't really vouch for its correctness.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LaurentRDC
Copy link
Copy Markdown
Member

Thanks! I should be able to review in the next few days.

As a side-note: did you check out other attempts at this? e.g. #335

@sheaf
Copy link
Copy Markdown
Contributor Author

sheaf commented Mar 16, 2026

As a side-note: did you check out other attempts at this? e.g. #335

I wasn't aware of this approach, I'll take a look this week.

@sheaf
Copy link
Copy Markdown
Contributor Author

sheaf commented Mar 17, 2026

#335 was adding the notion of index to beam-core, at a more fundamental level in the database schema, while this PR only adds it to beam-migrate. This means the current PR is more limited in scope, as it only really supports creating/dropping secondary indices (and computing their presence by introspection), but it doesn't include them in the schema itself.

#335 has a lot of complexity due to trying to derive indices from a separate record description of indices using generics, with a separate DatabaseIndices type. This allows indices to be automatically generated, but judging from PR review the approach had some fundamental limitations.
In this PR the approach is more low-level, not far off from just declaring the indices as simple field names (but using the indexCol function to retrieve the names instead of manually writing them).

#335 was missing SQLite support, and had no schema introspection queries.

#335 also made DROP INDEX into a sub-command of ALTER TABLE, which I don't think is correct.

I took a couple of changes from that PR and pushed them here:

  • use MigrationKeepsData for DROP INDEX because it doesn't actually lose data in the database (the indices can be recalculated)
  • introduce an IndexOptions datatype instead of raw Bool for uniqueness, to make the design more extensible

@sheaf
Copy link
Copy Markdown
Contributor Author

sheaf commented Mar 17, 2026

In conclusion, I think this is simpler than #335 because it avoids the machinery for deriving secondary indices using Generics. That part also seemed to be what caused #335 to get bogged down. With this PR the approach is a bit more manual (in particular, the secondary indices are named manually).

None of this is meant as a judgement on #335 because that functionality is indeed quite appealing.

My use case is mainly to automate away manual index creation and have the migrations framework handle it, which this PR does quite well. But otherwise the approach here is a bit barebones in comparison.

Copy link
Copy Markdown
Member

@LaurentRDC LaurentRDC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good!

Comment thread beam-migrate/Database/Beam/Migrate/SQL/SQL92.hs Outdated
Comment thread beam-migrate/Database/Beam/Migrate/Actions.hs Outdated
, Eq (Sql92CreateIndexOptionsSyntax syntax)
, Hashable (Sql92CreateIndexOptionsSyntax syntax)
) => IsSql92CreateDropIndexSyntax syntax where
data family Sql92CreateIndexOptionsSyntax syntax
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with data families. I would have expected a closed type family instead:

class ( IsSql92DdlCommandSyntax syntax
      , Show     (Sql92CreateIndexOptionsSyntax syntax)
      , Eq       (Sql92CreateIndexOptionsSyntax syntax)
      , Hashable (Sql92CreateIndexOptionsSyntax syntax)
      ) => IsSql92CreateDropIndexSyntax syntax where
  type Sql92CreateIndexOptionsSyntax syntax

That's how other syntaxes are represented in Beam.

Is there an advantage to using data families?

Copy link
Copy Markdown
Contributor Author

@sheaf sheaf Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's one less indirection. With this setup one writes:

instance IsSql92CreateDropIndexSyntax MySyntax where
  data Sql92CreateIndexOptionsSyntax MySyntax
       = MySyntaxIndexOptions { field1 :: Ty1, field2 :: Ty2 }
    deriving stock (Eq, Ord, Generic)
    deriving anyclass Hashable

whereas with a type family it would be:

instance IsSql92CreateDropIndexSyntax MySyntax where
  type Sql92CreateIndexOptionsSyntax MySyntax = MySyntaxIndexOptions

data MySyntaxIndexOptions
  = MySyntaxIndexOptions { field1 :: Ty1, field2 :: Ty2 }
    deriving stock (Eq, Ord, Generic)
    deriving anyclass Hashable

The latter is strictly more boilerplate, and also less permissive as one cannot write unsaturated type families while one can write unsaturated data families. Perhaps not so relevant here, but it does make some type-level programming idioms impossible, e.g. (rough sketch):

type KnownSyntaxes = [PgCommandSyntax, SqliteCommandSyntax]

type AllIndicesSupport :: (Type -> Constraint) -> Constraint
type AllIndicesSupport c = All (c . Sql92CreateIndexOptionsSyntax) KnownSyntaxes

All that said, if you think it would be better for consistency I can switch the code to using type families.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the type family case, MySyntaxIndexOptions has a standalone type, whereas the data family does not. Can the data family instance be used in a standalone way?

Imagine a backend-specific function on an index. Something like:

someFunc :: PgSyntaxIndexOptions -> Pg ()

In the data family case, would that be written like this?

someFunc :: Sql92CreateIndexOptionsSyntax PgSyntax -> Pg ()

If so, I lean towards keeping the consistency with other bits of beam by using type families

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be written as you say.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allright, let's keep the consistency by using a type family instead of data family and then we can wrap this PR up!
Thank you for your patience through this review process

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the class to use a type family. It has the unfortunate effect of being more ambiguous, because a type signature such as

indexIsUnique :: Sql92CreateIndexOptionsSyntax syntax -> Bool

which used to be unambiguous is now ambiguous (because syntax only appears guarded under a type family).

If you look at the commit I think you'll agree things are quite a bit less ergonomic like this, but I agree it's also important to keep the interface consistent.

Comment thread beam-migrate/Database/Beam/Migrate/Checks.hs Outdated
Comment on lines +408 to +444
-- Collect user-created secondary indices.
--
-- Excludes:
-- - primary keys
-- - indices that back a constraint (i.e. those created implicitly by UNIQUE/EXCLUDE)
-- - expression indices e.g. CREATE INDEX ON users (LOWER(email))
secondaryIndexes <-
map (\(schema, tblNm, idxNm, isUniq, cols) ->
Db.SomeDatabasePredicate
(Db.TableHasIndex (Db.QualifiedName schema tblNm) idxNm (V.toList cols) isUniq)) <$>
Pg.query_ conn (fromString (unlines
[ -- NULL out 'public' since it is the implicit default schema in Postgres
"SELECT NULLIF(ns.nspname, 'public'), c.relname, i.relname, ix.indisunique,"
-- re-aggregate column names in index-key order (see ORDINALITY below)
, " array_agg(a.attname ORDER BY k.n ASC)"
, "FROM pg_index ix"
, "JOIN pg_class c ON c.oid = ix.indrelid"
, "JOIN pg_class i ON i.oid = ix.indexrelid"
, "JOIN pg_namespace ns ON ns.oid = c.relnamespace"
-- ORDINALITY allows retaining ordering of index columns
, "CROSS JOIN unnest(ix.indkey) WITH ORDINALITY k(attid, n)"
, "JOIN pg_attribute a ON a.attnum = k.attid AND a.attrelid = ix.indrelid"
-- only regular tables (not views, sequences, etc.)
, "WHERE c.relkind = 'r'"
-- exclude Postgres system schemas
, " AND ns.nspname NOT LIKE 'pg_%'"
, " AND ns.nspname != 'information_schema'"
-- exclude primary key indices
, " AND NOT ix.indisprimary"
-- exclude indices created implicitly by a UNIQUE or EXCLUDE constraint
, " AND NOT EXISTS (SELECT 1 FROM pg_constraint con WHERE con.conindid = ix.indexrelid)"
-- exclude expression indices: a key column number of 0 means that
-- position is an expression (e.g. lower(col)) rather than a plain
-- column reference, which TableHasIndex cannot represent
, " AND NOT EXISTS (SELECT 1 FROM unnest(ix.indkey) AS k(attnum) WHERE k.attnum = 0)"
, "GROUP BY ns.nspname, c.relname, i.relname, ix.indisunique" ]))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sheaf sheaf force-pushed the create-index branch 2 times, most recently from 80445a4 to 8a454b7 Compare March 21, 2026 09:11
This commit adds functionality to add secondary indices to a database
to speed up queries.

The user-facing API consists of the 'addTableIndex' function, and the
helper 'selectorColumnName' function.

Backend support goes via the new typeclass 'IsSql92CreateDropIndexSyntax',
with support in both the SQLite and Postgres backends.
@sheaf
Copy link
Copy Markdown
Contributor Author

sheaf commented Mar 21, 2026

The failures on 9.12 and 9.14 seem spurious as they have to do with the installation of alex/happy in the CI environment. 9.14 works fine for me locally.

@LaurentRDC
Copy link
Copy Markdown
Member

Awesome @sheaf ! Thanks for your contribution

@LaurentRDC LaurentRDC merged commit 670eb81 into haskell-beam:master Mar 24, 2026
13 checks passed
@LaurentRDC
Copy link
Copy Markdown
Member

I should be able to make new releases for beam-migrate / beam-sqlite / beam-postgres today

@LaurentRDC
Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants