Skip to content

Add GROUP BY alias rendering to ClickHouse SQLAlchemy compiler#655

Merged
joe-clickhouse merged 2 commits into
mainfrom
joe/add-groupby-alias-rendering
Feb 26, 2026
Merged

Add GROUP BY alias rendering to ClickHouse SQLAlchemy compiler#655
joe-clickhouse merged 2 commits into
mainfrom
joe/add-groupby-alias-rendering

Conversation

@joe-clickhouse
Copy link
Copy Markdown
Contributor

@joe-clickhouse joe-clickhouse commented Feb 21, 2026

Summary

ClickHouse resolves SELECT aliases inside GROUP BY expressions which causes circular reference errors when an alias shadows a source column name. For example

SELECT toStartOfDay(toDateTime(time)) AS time
...
GROUP BY toStartOfDay(toDateTime(time))

fails because time in the expression resolves to the alias, not the column.

This PR adds group_by_clause and visit_label overrides to ChStatementCompiler so that labeled expressions in GROUP BY render as their alias name instead of the full expression. Note that SELECT and ORDER BY rendering is unaffected.

This fixes the root cause and underlying issue behind apache/superset#33551

Consequently, this should eliminate the need for Superset's alias-mangling workaround that appends hash suffixes to all ClickHouse column aliases to avoid these GROUP BY collisions.

To fully close the loop here I'll need to close apache/superset#34091 and then open another PR that no-ops or removes the current _mutate_label behavior and then require a version of ciikchouse-connect in superset greater than or equal to whatever version we cut this release in.

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG

Copy link
Copy Markdown

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the Superset community will definitely ❤️ this improvement!

@joe-clickhouse
Copy link
Copy Markdown
Contributor Author

Tested end to end by:

  1. ripping out the superset mutation logic in the CH db engine spec
  2. used this local clickhouse-connect with the group by rendering

The chart renders correctly in the UI and the rendered SQL is as follows:

SELECT
  dateTrunc('DAY', toDateTime(dateTrunc('DAY', dt))) AS "startOfDay",
  sum(number) AS "sum(number)"
FROM "default"."test_dates"
GROUP BY
  "startOfDay"
ORDER BY
  "sum(number)" DESC
LIMIT 10000

@joe-clickhouse joe-clickhouse merged commit 44db042 into main Feb 26, 2026
34 checks passed
@joe-clickhouse
Copy link
Copy Markdown
Contributor Author

joe-clickhouse commented Feb 26, 2026

Also worth noting that this change is backwards compatible with superset mutation logic. This GROUP BY change operates at the rendering level. So it doesn't care if the alias is time or time_b1bc24. However, once removing the mutation logic from superset, we'll want to pin clickhouse-connect >= whatever release this change makes it into.

@joe-clickhouse joe-clickhouse deleted the joe/add-groupby-alias-rendering branch May 18, 2026 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants