test: demonstrate INT96 read as TimestampNTZ correctness issue#4154
Draft
andygrove wants to merge 1 commit intoapache:mainfrom
Draft
test: demonstrate INT96 read as TimestampNTZ correctness issue#4154andygrove wants to merge 1 commit intoapache:mainfrom
andygrove wants to merge 1 commit intoapache:mainfrom
Conversation
0lai0
reviewed
Apr 30, 2026
| val actual = rows.head.getAs[LocalDateTime](0) | ||
| assert( | ||
| actual != LocalDateTime.parse("2020-01-01T12:00:00"), | ||
| s"native_datafusion returned the original wall-clock value $actual; " + |
Contributor
There was a problem hiding this comment.
Thanks @andygrove . It looks like the strings on lines 75-76 have an s prefix but don't contain any variables.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Related to #3720.
Rationale for this change
Issue #3720 documents that the
native_datafusionscan can silently return incorrect timestamp values when a Parquet file stores INT96 timestamps and the read schema requestsTimestampNTZ. Spark itself raises (SPARK-36182) to prevent the unsafe LTZ to NTZ reinterpretation. There is no regression test onmainthat captures the silent miscompute, so future changes could mask or unmask it without anyone noticing.This PR adds a single targeted test that demonstrates the bug as it exists on
main, so we have a reproducer recorded in the test suite that PR #4087 (or any future fix) can convert into a correctness assertion.What changes are included in this PR?
A new
ParquetInt96NtzCorrectnessSuitecontaining one test:SESSION_LOCAL_TIMEZONE=America/Los_Angeles,PARQUET_OUTPUT_TIMESTAMP_TYPE=INT96, andUSE_V1_SOURCE_LIST=parquet.2020-01-01 12:00:00asTimestampType(encoded as INT96).spark.read.schema("ts timestamp_ntz").parquet(...)raisesSparkException(Spark's reference behavior).spark.comet.scan.impl=native_datafusion, reads the same file asTimestampNTZand asserts the returnedLocalDateTimedoes not equal the original2020-01-01T12:00:00, capturing the silent wall-clock divergence.How are these changes tested?
The new suite is the test. Verified locally against
apache/mainat050e1e2f7with Spark 3.5: the test passes, confirming the divergence between Spark's behavior (throws) andnative_datafusion's behavior (returns shifted wall-clock value).