Skip to content

[native_datafusion] [Spark SQL Tests] Bucketing not exposed by CometNativeScan #3319

@andygrove

Description

@andygrove

Summary

~7 tests in DisableUnnecessaryBucketedScanSuite and BucketedReadSuite fail because CometNativeScan doesn't expose bucketing information.

Failing Tests

  • DisableUnnecessaryBucketedScanSuite: "SPARK-32859: disable unnecessary bucketed table scan" (basic, multiple joins, multiple bucketed columns, other operators)
  • DisableUnnecessaryBucketedScanSuite: "SPARK-33075: not disable bucketed table scan for cached query"
  • DisableUnnecessaryBucketedScanSuite: "Aggregates with no groupby over tables having 1 BUCKET, return multiple rows"
  • BucketedReadSuite: "disable bucketing when the output doesn't contain all bucketing columns"
  • BucketedReadSuite: "bucket coalescing is applied when join expressions match with partitioning expressions"

Error Pattern

ArrayBuffer() had length 0 instead of expected length 1 (DisableUnnecessaryBucketedScanSuite.scala:79)

Tests look for FileSourceScanExec nodes to inspect bucketing state. CometNativeScan isn't matched, so no scan nodes are found.

Root Cause

CometNativeScan replaces FileSourceScanExec in the plan but doesn't expose bucketing metadata (bucket count, bucket columns, etc.). Tests that inspect or modify bucketing behavior can't find the scan node.

Related

Discovered in CI for #3307 (enable native_datafusion in auto scan mode).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions