Skip to content

feat(duckDB): Add transpilation support for ANY_VALUE function with HAVING MAX and MIN clauses#6325

Merged
georgesittas merged 2 commits intomainfrom
RD-1049502-transpile-big-querys-any-value-aggregate-function-to-duck-db
Nov 14, 2025
Merged

feat(duckDB): Add transpilation support for ANY_VALUE function with HAVING MAX and MIN clauses#6325
georgesittas merged 2 commits intomainfrom
RD-1049502-transpile-big-querys-any-value-aggregate-function-to-duck-db

Conversation

@fivetran-amrutabhimsenayachit
Copy link
Collaborator

Add transpilation support for ANY_VALUE function with HAVING MAX and MIN clauses.

Issue:

Transpilation:

python -m sqlglot --read bigquery --write duckdb 'WITH Store AS (SELECT 20 AS sold, "apples" AS fruit UNION ALL SELECT 30 AS sold, "pears" AS fruit UNION ALL SELECT 30 AS sold, "bananas" AS fruit UNION ALL SELECT 10 AS sold, "oranges" AS fruit) SELECT ANY_VALUE(fruit HAVING MAX sold) AS a_highest_selling_fruit FROM Store' --no-pretty
-->
WITH "Store" AS (SELECT 20 AS "sold", 'apples' AS "fruit" UNION ALL SELECT 30 AS "sold", 'pears' AS "fruit" UNION ALL SELECT 30 AS "sold", 'bananas' AS "fruit" UNION ALL SELECT 10 AS "sold", 'oranges' AS "fruit") SELECT ANY_VALUE("fruit" HAVING MAX "sold") AS "a_highest_selling_fruit" FROM "Store"

Duckdb:
duckdb -c "WITH "Store" AS (SELECT 20 AS "sold", 'apples' AS "fruit" UNION ALL SELECT 30 AS "sold", 'pears' AS "fruit" UNION ALL SELECT 30 AS "sold", 'bananas' AS "fruit" UNION ALL SELECT 10 AS "sold", 'oranges' AS "fruit") SELECT ANY_VALUE("fruit" HAVING MAX "sold") AS "a_highest_selling_fruit" FROM "Store""
Parser Error:
syntax error at or near "HAVING"

LINE 1: ... SELECT 10 AS sold, 'oranges' AS fruit) SELECT ANY_VALUE(fruit HAVING MAX sold) AS a_highest_selling_fruit FROM Store

Fix:
Transform ANY_VALUE(expr HAVING MAX/MIN having_expr) to ARG_MAX/ARG_MIN
MAX:

python -m sqlglot --read bigquery --write duckdb 'WITH Store AS (SELECT 20 AS sold, "apples" AS fruit UNION ALL SELECT 30 AS sold, "pears" AS fruit UNION ALL SELECT 30 AS sold, "bananas" AS fruit UNION ALL SELECT 10 AS sold, "oranges" AS fruit) SELECT ANY_VALUE(fruit HAVING MAX sold) AS a_highest_selling_fruit FROM Store' --no-pretty
--> WITH "Store" AS (SELECT 20 AS "sold", 'apples' AS "fruit" UNION ALL SELECT 30 AS "sold", 'pears' AS "fruit" UNION ALL SELECT 30 AS "sold", 'bananas' AS "fruit" UNION ALL SELECT 10 AS "sold", 'oranges' AS "fruit") SELECT ARG_MAX("fruit", "sold") AS "a_highest_selling_fruit" FROM "Store"

sqlglot % duckdb -c "WITH "Store" AS (SELECT 20 AS "sold", 'apples' AS "fruit" UNION ALL SELECT 30 AS "sold", 'pears' AS "fruit" UNION ALL SELECT 30 AS "sold", 'bananas' AS "fruit" UNION ALL SELECT 10 AS "sold", 'oranges' AS "fruit") SELECT ARG_MAX("fruit", "sold") AS "a_highest_selling_fruit" FROM "Store""
┌─────────────────────────┐
│ a_highest_selling_fruit │
│         varchar         │
├─────────────────────────┤
│ pears                   │
└─────────────────────────┘

MIN:

python -m sqlglot --read bigquery --write duckdb 'WITH Store AS (SELECT 20 AS sold, "apples" AS fruit UNION ALL SELECT 30 AS sold, "pears" AS fruit UNION ALL SELECT 30 AS sold, "bananas" AS fruit UNION ALL SELECT 10 AS sold, "oranges" AS fruit) SELECT ANY_VALUE(fruit HAVING MIN sold) AS a_lowest_selling_fruit FROM Store' --no-pretty
--> WITH "Store" AS (SELECT 20 AS "sold", 'apples' AS "fruit" UNION ALL SELECT 30 AS "sold", 'pears' AS "fruit" UNION ALL SELECT 30 AS "sold", 'bananas' AS "fruit" UNION ALL SELECT 10 AS "sold", 'oranges' AS "fruit") SELECT ARG_MIN("fruit", "sold") AS "a_lowest_selling_fruit" FROM "Store"

sqlglot % duckdb -c "WITH "Store" AS (SELECT 20 AS "sold", 'apples' AS "fruit" UNION ALL SELECT 30 AS "sold", 'pears' AS "fruit" UNION ALL SELECT 30 AS "sold", 'bananas' AS "fruit" UNION ALL SELECT 10 AS "sold", 'oranges' AS "fruit") SELECT ARG_MIN("fruit", "sold") AS "a_lowest_selling_fruit" FROM "Store""
┌────────────────────────┐
│ a_lowest_selling_fruit │
│        varchar         │
├────────────────────────┤
│ oranges                │
└────────────────────────┘

Copy link
Collaborator

@geooo109 geooo109 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a general comment here regarding the transpilation and the tie-breaking.

WITH data AS (
  SELECT 'A' AS fruit, 20 AS sold UNION ALL
  SELECT 'D' AS fruit, 20 AS sold UNION ALL
  SELECT 'C' AS fruit, 0 AS sold UNION ALL
  SELECT 'B' AS fruit, 0 AS sold 
)     
SELECT
  ANY_VALUE(fruit = 'D' HAVING MAX sold) AS res
FROM data;

^ this query in bq always results to false.

WITH data AS (
  SELECT 'A' AS fruit, 20 AS sold UNION ALL
  SELECT 'D' AS fruit, 20 AS sold UNION ALL
  SELECT 'C' AS fruit, 0 AS sold UNION ALL
  SELECT 'B' AS fruit, 0 AS sold 
)
SELECT arg_max(fruit = 'D', sold) AS res
 FROM data;

^ this query in duckdb results to true or false.

The comment here is about the non-deterministic nature of agg functions without ordering. This is an expected behavour, despite the fact that for multiple runs bq returns false the result is ANY non-deterministic . Same for duckdb AGG funcs non-deterministic.

@georgesittas any thoughts on that ?

Copy link
Collaborator

@geooo109 geooo109 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bq:
WITH data AS (
  SELECT 'A' AS fruit, 20 AS sold UNION ALL
  SELECT NULL AS fruit, 25 AS sold UNION ALL
  SELECT 'D' AS fruit, 20 AS sold
)
SELECT ANY_VALUE(fruit HAVING MAX sold) AS res
FROM data;
> null
duckdb:
WITH data AS (
  SELECT 'A' AS fruit, 20 AS sold UNION ALL
  SELECT NULL AS fruit, 25 AS sold UNION ALL
  SELECT 'D' AS fruit, 20 AS sold
 )
SELECT arg_max(fruit, sold) AS res
FROM data;
┌─────────┐
│ res     │
│ varchar │
├─────────┤
│ A       │
└─────────┘

As it seems current approach doesn't support the correct null handling. (maybe arg_max_null ?)

@georgesittas
Copy link
Collaborator

@georgesittas any thoughts on that ?

I'd say it seems fine, since BigQuery's docs specify that the result is non-deterministic:

Returns expression for some row chosen from the group. Which row is chosen is nondeterministic, not random.

This seems more important, and we should fix it:

As it seems current approach doesn't support the correct null handling. (maybe arg_max_null ?)

@geooo109
Copy link
Collaborator

@fivetran-amrutabhimsenayachit so, let's just solve the null problem, we can try the arg_max_null and arg_min_null approach.

@fivetran-amrutabhimsenayachit fivetran-amrutabhimsenayachit force-pushed the RD-1049502-transpile-big-querys-any-value-aggregate-function-to-duck-db branch from a89eef2 to 4a1350b Compare November 14, 2025 17:15
@fivetran-amrutabhimsenayachit
Copy link
Collaborator Author

After using arg_max_null and arg_max_min
MAX:

 bq query --use_legacy_sql=false 'WITH data AS (SELECT "A" AS fruit, 20 AS sold UNION ALL SELECT NULL AS fruit, 25 AS sold UNION ALL SELECT "D" AS fruit, 20 AS sold) SELECT ANY_VALUE(fruit HAVING MAX sold) AS result FROM data'
+--------+
| result |
+--------+
| NULL   |
+--------+
sqlglot % python -m sqlglot --read bigquery --write duckdb 'WITH data AS (SELECT "A" AS fruit, 20 AS sold UNION ALL SELECT NULL AS fruit, 25 AS sold UNION ALL SELECT "D" AS fruit, 20 AS sold) SELECT ANY_VALUE(fruit HAVING MAX sold) AS result FROM data' --no-pretty | duckdb
┌─────────┐
│ result  │
│ varchar │
├─────────┤
│ NULL    │
└─────────┘

MIN:

bq query --use_legacy_sql=false 'WITH data AS (SELECT "A" AS fruit, 20 AS sold UNION ALL SELECT NULL AS fruit, 5 AS sold UNION ALL SELECT "D" AS fruit, 10 AS sold) SELECT ANY_VALUE(fruit HAVING MIN sold) AS result FROM data'
+--------+
| result |
+--------+
| NULL   |
+--------+
sqlglot % python -m sqlglot --read bigquery --write duckdb 'WITH data AS (SELECT "A" AS fruit, 20 AS sold UNION ALL SELECT NULL AS fruit, 5 AS sold UNION ALL SELECT "D" AS fruit, 10 AS sold) SELECT ANY_VALUE(fruit HAVING MIN sold) AS result FROM data' --no-pretty | duckdb
┌─────────┐
│ result  │
│ varchar │
├─────────┤
│ NULL    │
└─────────┘

Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@georgesittas georgesittas merged commit b71990f into main Nov 14, 2025
7 checks passed
@georgesittas georgesittas deleted the RD-1049502-transpile-big-querys-any-value-aggregate-function-to-duck-db branch November 14, 2025 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants