Fix ODBC datetime literal parsing in IN clauses#1
Closed
tkaunlaky-e6 wants to merge 438 commits intomainfrom
Closed
Fix ODBC datetime literal parsing in IN clauses#1tkaunlaky-e6 wants to merge 438 commits intomainfrom
tkaunlaky-e6 wants to merge 438 commits intomainfrom
Conversation
* feat(teradata): parse column format syntax Support Teradata FORMAT column syntax Add Teradata format tests Add comments and docs for Teradata FORMAT column Modified Comments * style and linter modifications * feat(optimizer)!: annotate type for SHA and SHA2 (tobymao#5346) * chore(optimizer)!: annotate type SHA1, SHA256, SHA512 for BigQuery (tobymao#5347) * additional linter cleanup --------- Co-authored-by: Giorgos Michas <geomichas96@gmail.com>
* Feat(duckdb): support new lambda syntax * Improve testing coverage, fix multi-arg version
* feat(optimizer)!: annotate type for DATETIME * fix format
* feat(optimizer)!: annotate type for ENDS_WITH * minor test refactor
* Override round for postgres generator * Code style changes * Include `ROUND(x, y)` test --------- Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>
* feat(optimizer)!: parse and annotate type for ASCII * fix annotation
…o#5380) * chore(postgres, hive): use ASCII node instead of UNICODE node * refactor tests
* feat(dremio): Add TIME_MAPPING for Dremio dialect * Fix linter checks * Address comments --------- Co-authored-by: Mateusz Poleski <Mateusz.Poleski@imc.com>
…o#5387) * fix(snowflake): transpile bigquery CURRENT_DATE with timezone * PR feedback 1 (vag) * fix test
P2 - INTERVAL '5 hours 30 minutes' (as discussed on Zepto channel, you mentioned it has to be developed and you will check on it)
P2 - INTERVAL '5 hours 30 minutes' (as discussed on Zepto channel, you mentioned it has to be developed and you will check on it)
…r for FIND_IN_SET
…ning_collabrative # Conflicts: # apis/utils/supported_functions_in_all_dialects.json # sqlglot/dialects/databricks.py # sqlglot/dialects/e6.py # sqlglot/dialects/spark.py # sqlglot/expressions.py # tests/dialects/test_e6.py
Add TRANSLATE, TYPEOF, and FIND_IN_SET function mappings
- Add TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions to Spark parser (Spark 3.1+) - Add TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions to Databricks parser - Convert microseconds to milliseconds by dividing by 1000 in E6 dialect - Use proper UnixToTime scale constants for consistency across dialects
- Merge statistical functions support (CORR, COVAR_POP, COVAR_SAMP, VARIANCE_SAMP, VAR_SAMP) - Add GROUP BY ALL support - Integrate TYPEOF, TIMEDIFF, and INTERVAL functions - Update supported functions list for e6 dialect
…unctions Add GROUP BY ALL and statistical functions support
Map TIMESTAMP_MILLIS to FROM_UNIXTIME
I also fixed the comma missed in the json file.
Rebase demo
there was some merge conflicts when rebase was merged that included some changes in the transforms.py. While solving merge conflicts i added some things on my branch that lead to error thats also sorted now. Along with this ran make check too.
EXTRACT- ISSUE
…ONCAT_WS
- Replace hardcoded 'ARRAY' string with proper exp.Array expression node
- Use self.func('ARRAY_TO_STRING', ...) directly to avoid ARRAY_JOIN mapping
- Maintain all Databricks CONCAT_WS behaviors (NULL filtering, array flattening)
feat: implement CONCAT_WS function for E6 dialect
Previously, ODBC datetime literals like {d '2025-05-31'} would fail to parse
when used inside IN clauses, throwing 'Expecting )' errors. This occurred
because _parse_primary() only called _parse_paren() for unmatched tokens,
but _parse_paren() only handles L_PAREN tokens, not L_BRACE tokens used by
ODBC literals.
The fix adds ODBC literal detection directly in _parse_primary() before
falling back to _parse_paren(). This leverages the existing
_parse_odbc_datetime_literal() method that was already implemented but
unreachable from the primary expression parsing path.
Changes:
- Added ODBC literal check in _parse_primary() method (lines 5813-5821)
- Supports {d 'date'}, {t 'time'}, and {ts 'timestamp'} formats
- Works in all SQL contexts: SELECT, WHERE, IN clauses, etc.
Fixes parsing errors for Databricks queries with ODBC escape sequences
in complex expressions.
Enhanced the ODBC datetime literal parsing to be more robust by checking
for the exact pattern (L_BRACE, VAR, STRING, R_BRACE) to avoid false
positives with struct/map literals.
Changes:
- Updated parser to verify the third token is STRING to distinguish ODBC
literals from struct literals like {d: Map(...)}
- Added comprehensive test cases for all ODBC datetime formats (date, time, timestamp)
- Tests cover various SQL contexts: SELECT, WHERE, IN clauses, BETWEEN
- Included tests for single and multiple ODBC literals in IN clauses
- Added test case similar to the original complex Databricks query
This ensures ODBC datetime literals work correctly in all SQL contexts
while avoiding conflicts with other brace-delimited syntax.
Benchmark for 0bb5632Click to view benchmark
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ODBC datetime literals like
{d '2025-05-31'}were failing to parse when used inside IN clauses in Databricks SQL queries, throwing "Expecting )" errors. The parser's_parse_primary()method would fall back to_parse_paren()which only handlesL_PARENtokens, not theL_BRACEtokens used by ODBC literals. This caused complex Databricks queries with multiple date literals in IN clauses to fail parsing.Fix
Added ODBC literal detection directly in
_parse_primary()method before falling back to_parse_paren(), checking for the exact pattern:{, followed by a valid ODBC type (d,t, orts), followed by a STRING token. This leverages the existing_parse_odbc_datetime_literal()method that was already implemented but unreachable from the primary expression parsing path. The fix also ensures struct/map literals like{d: Map(...)}are not incorrectly identified as ODBC literals by verifying the third token is a STRING.Testing