Skip to content

Fix ODBC datetime literal parsing in IN clauses#1

Closed
tkaunlaky-e6 wants to merge 438 commits intomainfrom
fix-odbc-date-literals-in-clause
Closed

Fix ODBC datetime literal parsing in IN clauses#1
tkaunlaky-e6 wants to merge 438 commits intomainfrom
fix-odbc-date-literals-in-clause

Conversation

@tkaunlaky-e6
Copy link
Owner

Problem

ODBC datetime literals like {d '2025-05-31'} were failing to parse when used inside IN clauses in Databricks SQL queries, throwing "Expecting )" errors. The parser's _parse_primary() method would fall back to _parse_paren() which only handles L_PAREN tokens, not the L_BRACE tokens used by ODBC literals. This caused complex Databricks queries with multiple date literals in IN clauses to fail parsing.

Fix

Added ODBC literal detection directly in _parse_primary() method before falling back to _parse_paren(), checking for the exact pattern: {, followed by a valid ODBC type (d, t, or ts), followed by a STRING token. This leverages the existing _parse_odbc_datetime_literal() method that was already implemented but unreachable from the primary expression parsing path. The fix also ensures struct/map literals like {d: Map(...)} are not incorrectly identified as ODBC literals by verifying the third token is a STRING.

Testing

  • Added comprehensive test cases covering all ODBC datetime formats (date, time, timestamp)
  • Tests verify parsing in various SQL contexts: SELECT, WHERE, IN clauses, BETWEEN
  • All tests pass successfully with proper transpilation to E6 and other dialects

geooo109 and others added 30 commits July 3, 2025 18:15
* feat(teradata): parse column format syntax

Support Teradata FORMAT column syntax

Add Teradata format tests

Add comments and docs for Teradata FORMAT column

Modified Comments

* style and linter modifications

* feat(optimizer)!: annotate type for SHA and SHA2 (tobymao#5346)

* chore(optimizer)!: annotate type SHA1, SHA256, SHA512 for BigQuery (tobymao#5347)

* additional linter cleanup

---------

Co-authored-by: Giorgos Michas <geomichas96@gmail.com>
* Feat(duckdb): support new lambda syntax

* Improve testing coverage, fix multi-arg version
* feat(optimizer)!: annotate type for DATETIME

* fix format
* feat(optimizer)!: annotate type for ENDS_WITH

* minor test refactor
)

* feat(fabric): Treat TIMESTAMPTZ as TIMESTAMP if not used with AT TIME ZONE

* fix(fabric): simplify TIMESTAMPTZ handling

* fix(fabric): Convert TIMESTAMPTZ to UTC if not within AT TIME ZONE
* Override round for postgres generator

* Code style changes

* Include `ROUND(x, y)` test

---------

Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>
* feat(optimizer)!: parse and annotate type for ASCII

* fix annotation
…o#5380)

* chore(postgres, hive): use ASCII node instead of UNICODE node

* refactor tests
* feat(dremio): Add TIME_MAPPING for Dremio dialect

* Fix linter checks

* Address comments

---------

Co-authored-by: Mateusz Poleski <Mateusz.Poleski@imc.com>
…o#5387)

* fix(snowflake): transpile bigquery CURRENT_DATE with timezone

* PR feedback 1 (vag)

* fix test
NiranjGaurav and others added 28 commits July 31, 2025 13:57
P2 - INTERVAL '5 hours 30 minutes' (as discussed on Zepto channel, you mentioned it has to be developed and you will check on it)
P2 - INTERVAL '5 hours 30 minutes' (as discussed on Zepto channel, you mentioned it has to be developed and you will check on it)
…ning_collabrative

# Conflicts:
#	apis/utils/supported_functions_in_all_dialects.json
#	sqlglot/dialects/databricks.py
#	sqlglot/dialects/e6.py
#	sqlglot/dialects/spark.py
#	sqlglot/expressions.py
#	tests/dialects/test_e6.py
Add TRANSLATE, TYPEOF, and FIND_IN_SET function mappings
- Add TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions to Spark parser (Spark 3.1+)
- Add TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions to Databricks parser
- Convert microseconds to milliseconds by dividing by 1000 in E6 dialect
- Use proper UnixToTime scale constants for consistency across dialects
- Merge statistical functions support (CORR, COVAR_POP, COVAR_SAMP, VARIANCE_SAMP, VAR_SAMP)
- Add GROUP BY ALL support
- Integrate TYPEOF, TIMEDIFF, and INTERVAL functions
- Update supported functions list for e6 dialect
…unctions

Add GROUP BY ALL and statistical functions support
I also fixed the comma missed in the json file.
there was some merge conflicts when rebase was merged that included some changes in the transforms.py. While solving merge conflicts i added some things on my branch that lead to error thats also sorted now.
Along with this ran make check too.
…ONCAT_WS

- Replace hardcoded 'ARRAY' string with proper exp.Array expression node
- Use self.func('ARRAY_TO_STRING', ...) directly to avoid ARRAY_JOIN mapping
- Maintain all Databricks CONCAT_WS behaviors (NULL filtering, array flattening)
feat: implement CONCAT_WS function for E6 dialect
Previously, ODBC datetime literals like {d '2025-05-31'} would fail to parse
when used inside IN clauses, throwing 'Expecting )' errors. This occurred
because _parse_primary() only called _parse_paren() for unmatched tokens,
but _parse_paren() only handles L_PAREN tokens, not L_BRACE tokens used by
ODBC literals.

The fix adds ODBC literal detection directly in _parse_primary() before
falling back to _parse_paren(). This leverages the existing
_parse_odbc_datetime_literal() method that was already implemented but
unreachable from the primary expression parsing path.

Changes:
- Added ODBC literal check in _parse_primary() method (lines 5813-5821)
- Supports {d 'date'}, {t 'time'}, and {ts 'timestamp'} formats
- Works in all SQL contexts: SELECT, WHERE, IN clauses, etc.

Fixes parsing errors for Databricks queries with ODBC escape sequences
in complex expressions.
Enhanced the ODBC datetime literal parsing to be more robust by checking
for the exact pattern (L_BRACE, VAR, STRING, R_BRACE) to avoid false
positives with struct/map literals.

Changes:
- Updated parser to verify the third token is STRING to distinguish ODBC
  literals from struct literals like {d: Map(...)}
- Added comprehensive test cases for all ODBC datetime formats (date, time, timestamp)
- Tests cover various SQL contexts: SELECT, WHERE, IN clauses, BETWEEN
- Included tests for single and multiple ODBC literals in IN clauses
- Added test case similar to the original complex Databricks query

This ensures ODBC datetime literals work correctly in all SQL contexts
while avoiding conflicts with other brace-delimited syntax.
@github-actions
Copy link

github-actions bot commented Aug 7, 2025

Benchmark for 0bb5632

Click to view benchmark
Test Base PR %
long 212.7±2.42µs 216.8±2.07µs +1.93%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.