Skip to content

Clean table alias generation and generate unique aliases up-front #32784

@roji

Description

@roji

Table aliases need to be unique in the SQL tree (at least within a given SELECT) Our current table alias management works as follows:

  • SelectExpression has private _usedAliases state that tracks which aliases have already been used.
  • During translation, tables are created with aliases that don't yet take into account aliases used further up. As a result, when a new SQL clause (predicate, ordering...) is added to a SelectExpression, we run a visitor (AliasUniquifier) over the clause, which accepts the containing select's _usedAliases and rewrites aliases recursively inside the clause to make sure they don't conflict.
  • This has the following consequences:
    • This notably means that the same node can get visited multiple times for uniquification, as tree fragments get added to a select, and then that select gets added to another select, and so on.
    • Since we uniquify aliases late - after the table has already been created - the alias is mutable, and AliasUniquifier simply sets it. This is one place where our SQL tree isn't fully immutable, and can cause the usual issues related to mutability.
    • When we added support for UpdateExpression, we forgot to invoke the AliasUniquifier, causing #31208.

We're now exploring ways to simplify the query tree and its processing, and specifically to get rid of TableReferenceExpression and have ColumnExpression just contain its table alias directly as a string. For this to be feasible, the alias on a table cannot change after its constructed, since a column may already be referencing it via its alias (or more precisely, it can be changed, but then we need to recursively look for column in the tree and change them as well). We also generally want to make things more immutable, as well as remove private state from SelectExpression, making it more like a regular expression.

As a result, we should do the following:

  • When instantiating a table, pass an already generated, fully unique alias to it.
    • This allows referencing the table immediately from a column, and removes the need to do alias uniquification in an additional pass every time we add a clause to a SelectExpression.
  • Track used aliases globally in the query, rather than storing that state on SelectExpression.
    • We'll do this by introducing a new SqlAliasManager service, which is owned by QueryCompilationContext (so scoped to the entire compilation). The holds the state of which aliases are already in use, and is responsible for generating unique ones.
    • This also concentrates alias generation in a single, pluggable place - the actual policy for generating aliases (i.e. take the 1st character and lower-case) is currently spread all over the place, including in table expression types. Providers/users will be able to replace this service to plug in new alias generation policies.
  • Tables can get pruned out of the tree because they're not needed. In this case, we currently leave gaps in the alias numbering (#32757)
    • While this in itself isn't a big problem, uniquifying aliases eagerly as proposed above causes even more gaps, since during translation we sometimes translate a fragment and then throw it away (or retranslate it). This causes the alias to be considered "taken", although it never actually made its way into the tree.
    • To make things nice and consecutive, we'll add a post-processing step where SqlAliasManager gets a chance to go over the tree and rewrite, or "finalize" aliases. This will close any numbering gaps, and can also do fancier things in the future, like turn some aliases into wider, multi-character ones for readability rather than just adding numbers (#31134).
    • Note that alias post-processing is fully optional - removing that step would never cause any queries to break, since aliases are already uniquified at the very start. It would just cause uglier/less consecutive aliases to be generated.

Metadata

Metadata

Assignees

No fields configured for Feature.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions