Skip to content

Fix(athena): Correctly handle PartitionByProperty when it contains Iceberg transforms#4962

Closed
erindru wants to merge 1 commit intomainfrom
erin/athena-ctas-iceberg-transforms
Closed

Fix(athena): Correctly handle PartitionByProperty when it contains Iceberg transforms#4962
erindru wants to merge 1 commit intomainfrom
erin/athena-ctas-iceberg-transforms

Conversation

@erindru
Copy link
Collaborator

@erindru erindru commented Apr 9, 2025

This addresses part of SQLMesh issue: TobikoData/sqlmesh#4090

Given the following:

>>> prop = exp.PartitionedByProperty(this=exp.Schema(expressions=[exp.func("bucket", 4, exp.column("foo"))]))

Prior to this PR, when generating against Athena's Trino engine, the value would be truncated:

>>> prop.sql(dialect="athena")
"partitioned_by=ARRAY['bucket']" #should be `partitioned_by=ARRAY['bucket(4, foo)']`

The same value works correctly in Hive mode:

>>> prop.sql(dialect="hive")
'PARTITIONED BY (BUCKET(4, foo))'

However, simply fixing the truncation is not enough. The arguments are swapped when running in Hive mode vs Trino Mode. The argument order is documented in the AWS docs here for Hive and here for Trino.

This PR does the following:

  • Fix the truncation in the Presto generator (which is a superclass of Athena's Trino generator). Presto and Trino both support the Iceberg partition transform syntax and the current truncation is a bug for them as well.
  • Reorder the arguments for the bucket and truncate Iceberg transforms to be correct for whatever Athena generator is being used (Hive or Trino)

…eberg transforms and is used in both CREATE TABLE and CTAS
# Hive ref: https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html#querying-iceberg-partitioning
# Trino ref: https://docs.aws.amazon.com/athena/latest/ug/create-table-as.html#ctas-table-properties

def _reorder(e: exp.Expression) -> exp.Expression:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this is right, you just want to alter the generator of the property, not do this complex transform

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just reordering the arguments before it hits the func generator.

I didnt think this logic belonged in the generator for exp.Func because it's very specific:

  • It only applies to exp.Func instances that are nested inside an exp.Schema that is itself nested inside exp.PartitionByProperty
  • It only applies to Athena
  • It only applies to two very specific functions that are used in Iceberg partition transforms, bucket and truncate

Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, although I would clean up a couple of things. If @tobymao's good with getting this in I can do a quick refactor afterwards and share it with you folks.

@erindru
Copy link
Collaborator Author

erindru commented Apr 9, 2025

Closing in favour of a different approach that factors Iceberg partition transforms into their own AST nodes so they can be handled as first-class AST nodes instead of anonymous functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants