Skip to content

test: demonstrate INT96 read as TimestampNTZ correctness issue#4154

Draft
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:tests/issue-3720-int96-ntz-correctness
Draft

test: demonstrate INT96 read as TimestampNTZ correctness issue#4154
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:tests/issue-3720-int96-ntz-correctness

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Related to #3720.

Rationale for this change

Issue #3720 documents that the native_datafusion scan can silently return incorrect timestamp values when a Parquet file stores INT96 timestamps and the read schema requests TimestampNTZ. Spark itself raises (SPARK-36182) to prevent the unsafe LTZ to NTZ reinterpretation. There is no regression test on main that captures the silent miscompute, so future changes could mask or unmask it without anyone noticing.

This PR adds a single targeted test that demonstrates the bug as it exists on main, so we have a reproducer recorded in the test suite that PR #4087 (or any future fix) can convert into a correctness assertion.

What changes are included in this PR?

A new ParquetInt96NtzCorrectnessSuite containing one test:

  1. Configures SESSION_LOCAL_TIMEZONE=America/Los_Angeles, PARQUET_OUTPUT_TIMESTAMP_TYPE=INT96, and USE_V1_SOURCE_LIST=parquet.
  2. Writes 2020-01-01 12:00:00 as TimestampType (encoded as INT96).
  3. With Comet disabled, asserts spark.read.schema("ts timestamp_ntz").parquet(...) raises SparkException (Spark's reference behavior).
  4. With spark.comet.scan.impl=native_datafusion, reads the same file as TimestampNTZ and asserts the returned LocalDateTime does not equal the original 2020-01-01T12:00:00, capturing the silent wall-clock divergence.

How are these changes tested?

The new suite is the test. Verified locally against apache/main at 050e1e2f7 with Spark 3.5: the test passes, confirming the divergence between Spark's behavior (throws) and native_datafusion's behavior (returns shifted wall-clock value).

val actual = rows.head.getAs[LocalDateTime](0)
assert(
actual != LocalDateTime.parse("2020-01-01T12:00:00"),
s"native_datafusion returned the original wall-clock value $actual; " +
Copy link
Copy Markdown
Contributor

@0lai0 0lai0 Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove . It looks like the strings on lines 75-76 have an s prefix but don't contain any variables.

@andygrove andygrove marked this pull request as draft April 30, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants