Skip to content

fix(bigquery)!: Do not normalize JSON fields in dot notation#6320

Merged
VaggelisD merged 6 commits intomainfrom
vaggelisd/bq_json_key_normalization
Nov 14, 2025
Merged

fix(bigquery)!: Do not normalize JSON fields in dot notation#6320
VaggelisD merged 6 commits intomainfrom
vaggelisd/bq_json_key_normalization

Conversation

@VaggelisD
Copy link
Collaborator

In BigQuery, JSON field lookups in dot notation are case sensitive, e.g:

bq> WITH t AS (SELECT PARSE_JSON('{"fOo": {"BaR": 1}}') AS bla) SELECT t.bla.fOo.BaR FROM t;
BaR
1

bq> WITH t AS (SELECT PARSE_JSON('{"fOo": {"BaR": 1}}') AS bla) SELECT t.bla.foo.bar FROM t;
bar
NULL

However, up until this point SQLGlot normalized all BigQuery identifiers as case insensitive which is valid for STRUCT lookups or other identifiers but may alter the semantics of these JSON accesses.


This PR fixes this behavior through the following steps:

  1. During normalize_identifiers: Preserve the original column+dot parts in the column's meta
  2. During qualify_columns: Distinguish the column from the dot parts i.e keeping only the latter in the meta
  3. During annotate_types: Once all of the types are fully known, traverse JSON columns upwards to revert/repair the dot parts

@VaggelisD VaggelisD force-pushed the vaggelisd/bq_json_key_normalization branch 2 times, most recently from 8c22aeb to 2cb0475 Compare November 13, 2025 16:01
@VaggelisD VaggelisD force-pushed the vaggelisd/bq_json_key_normalization branch from 2cb0475 to 5c9c45d Compare November 13, 2025 16:03
@VaggelisD
Copy link
Collaborator Author

VaggelisD commented Nov 14, 2025

Here's my research thus far:

Case sensitivity

Dialect JSON Case sensitivity
BigQuery
Snowflake
Databricks
DuckDB
Clickhouse

Repros:

snowflake> WITH t AS (SELECT PARSE_JSON('{"a": {"A": 1}, "A": {"a": 2}}') AS col) SELECT col:a.A, col:A.a from t;
COL:A.A | COL:A.A
-- | --
1 | 2

databricks> WITH t AS (SELECT PARSE_JSON('{"a": {"A": 1}, "A": {"a": 2}}') AS col) SELECT col:a.A, col:A.a from t;
A	a
1	2

duckdb> WITH t AS (SELECT JSON '{"a": {"A": 1}, "A": {"a": 2}}' AS col) SELECT col.a.A, col.A.a from t;
┌──────┬──────┐
│  A   │  a   │
│ json │ json │
├──────┼──────┤
│ 12    │
└──────┴──────┘

clickhouse (online playground)> WITH t AS (SELECT '{"a": {"A": 1}, "A": {"a": 2}}'::JSON AS col) SELECT col.a.A, col.A.a from t;
col.a.A | col.A.a 
    1        2        


Other dialects

  • Postgres: Does not have dot notation, JSON extract operator requires string literals for names:
postgres> WITH t AS (SELECT '{"a": {"A": 1}, "A": {"a": 2}}'::JSON AS col) SELECT col.a.A, col.A.a from t;
ERROR:  missing FROM-clause entry for table "a"

postgres> WITH t AS (SELECT '{"a": {"A": 1}, "A": {"a": 2}}'::JSON AS col) SELECT col->a->A, col->A->a from t;
ERROR:  column "a" does not exist
                                                             ^
postgres> WITH t AS (SELECT '{"a": {"A": 1}, "A": {"a": 2}}'::JSON AS col) SELECT col->'a'->'A', col->'A'->'a' from t;
 ?column? | ?column?
----------+----------
 1        | 2
(1 row)

  • Presto/Trino: Dot notation is limited to ROW objects:
trino> WITH t AS (SELECT CAST('{"a": {"A": 1}, "A": {"a": 2}}' AS JSON) AS col) SELECT col.a.A, col.A.a from t;
Query 20251114_094456_00003_maqp5 failed: line 1:81: Expression col is not of type ROW

@georgesittas
Copy link
Collaborator

Nice work @VaggelisD, seems like Toby's hunch was right. Let's get rid of the flag and just assume that JSON values will always have case-sensitive keys.

VaggelisD and others added 2 commits November 14, 2025 11:49
Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>
@VaggelisD VaggelisD merged commit 85ddcc5 into main Nov 14, 2025
7 checks passed
@VaggelisD VaggelisD deleted the vaggelisd/bq_json_key_normalization branch November 14, 2025 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants