fix(duckdb)!: Clean up representation of exp.HexString#6045
Conversation
|
So both with and without this PR's changes, the DuckDB SQL returns a BLOB. It seems, though, that the before vs after values don't match: D SELECT CAST(HEX(FROM_HEX('ABCD')) AS VARBINARY) = FROM_HEX('ABCD');
┌──────────────────────────────────────────────────────────┐
│ (CAST(hex(from_hex('ABCD')) AS BLOB) = from_hex('ABCD')) │
│ boolean │
├──────────────────────────────────────────────────────────┤
│ false │
└──────────────────────────────────────────────────────────┘Can you provide some more details on why the new one is correct and matches Snowflake's behavior better? (I'm assuming DuckDB roundtrip isn't affected in any way? Do we test this?) |
DuckDB doesn't have a hex string afaik, so
It seemed to me that the DuckDB post-PR results matched better with Snowflake's, e.g I'd expect that fetching them directly would net However, interestingly enough this what their DataFrames actually return (using SQLMesh's engine adapter):
>>> ctx.engine_adapter
<sqlmesh.core.engine_adapter.snowflake.SnowflakeEngineAdapter object at ...>
>>> ctx.engine_adapter.fetchdf("SELECT x'ABCD'")
X'ABCD'
0 b'\xab\xcd'
>>> ctx.engine_adapter
<sqlmesh.core.engine_adapter.duckdb.DuckDBEngineAdapter object at ...>
>>> ctx.engine_adapter.fetchdf("SELECT FROM_HEX('ABCD')")
from_hex('ABCD')
0 [171, 205]
>>> ctx.engine_adapter.fetchdf("SELECT CAST(HEX(FROM_HEX('ABCD')) AS BLOB)")
CAST(hex(from_hex('ABCD')) AS BLOB)
0 [65, 66, 67, 68]So, it looks to me like In terms of actual DF values, it seems to me that the previous one matches better semantically (?) |
|
Yeah. I don't really understand the motivation behind the original issue, tbh. @kyle-cheung, can you shed some light on this? |
|
@georgesittas Sigma (the BI tool) uses Hex strings for many of their queries that involve user inputs. Sigma casts the readable hex strings to text and subsequently applies a filter on them. For example below: However, since SQLGlot transpiles this to create the value as |
|
I see, thank you for clarifying. @VaggelisD can you take a look at what other dialects do? Hex strings are natively supported in BigQuery, Postgres, etc. I wonder if "fixing" DuckDB's generator to match Snowflake's behavior will break others. In that case, we should probably match the most common semantics (if possible). |
|
@georgesittas These are the results from a quick pass:
So, it looks to me like most dialects convert hex strings to the decimal representation. |
|
These are the dataframe results for most of the dialects referenced, will be more representative:
|
|
Ok, it doesn't seem like there's a standard representation. I don't see a compelling reason to change the existing behavior without controlling the output with a new arg to accommodate other transpilation paths as well. |
georgesittas
left a comment
There was a problem hiding this comment.
Discussed in Slack and concluded that this is fine after all.
For dialects like BigQuery, doing CAST(0xA AS STRING) results in 10, so this should be transpiled to DuckDB correctly today.
There are a couple exceptions that return non-readable characters when casting these bytes sequences to a string, but I think it's fine to not deal with them for now.
LGTM @VaggelisD 👍

Fixes #6035
Note that DuckDB does not allow hex strings: