Skip to content

Conversation

@MetalBlueberry
Copy link
Contributor

@MetalBlueberry MetalBlueberry commented May 7, 2025

This exact test seems to be passing on v17 version. More investigation is needed

Rationale for this change

After updating from v17 to v18 I noticed a test case failing that was passing before.

What changes are included in this PR?

This PR contains a test case to reproduce the issue. A reference PR for version v17 can be found here

Are these changes tested?

The PR itself just contains the test

Are there any user-facing changes?

This exact test seems to be passing on v17 version. More investigation is needed
@MetalBlueberry MetalBlueberry changed the title add TestDeltaByteArray test fix: TestDeltaByteArray test May 7, 2025
@zeroshade
Copy link
Member

Can you provide the csv file that you're testing with so that I can try to reproduce?

}
require.DirExists(t, dir)

expected, err := os.ReadFile(path.Join(dir, "delta_byte_array_expect.csv"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the CSV

records = records[1:] // skip header

props := parquet.NewReaderProperties(memory.DefaultAllocator)
fileReader, err := file.OpenParquetFile(path.Join(dir, "delta_byte_array.parquet"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the Parquet

@MetalBlueberry
Copy link
Contributor Author

Can you provide the csv file that you're testing with so that I can try to reproduce?

I got the files from the parquet-testing repository. You can find the info here

https://github.com/apache/parquet-testing/blob/master/data/delta_byte_array.md

I'm unsure about what the difference is, but by looking at version v17 code, I noticed a difference between Int32 and Int64 decoder. Int32 used totalValues and Int64 used nvals.

Looks like when merging implementations, the Int64 approach was used. Which seems to not be working properly for Int32.

This change at least seems to be able to read the test file, which makes me think it is the right approach.
@MetalBlueberry
Copy link
Contributor Author

@zeroshade see last commit for the fix.

I'm just pattern matching, I do not really know what those values mean 😅 .

@MetalBlueberry MetalBlueberry marked this pull request as ready for review May 9, 2025 08:55
@MetalBlueberry MetalBlueberry requested a review from zeroshade as a code owner May 9, 2025 08:55
@MetalBlueberry MetalBlueberry changed the title fix: TestDeltaByteArray test fix: TestDeltaByteArray implementation and fix May 9, 2025
@zeroshade
Copy link
Member

I figured out what the underlying issue was and updated this to be a better fix that won't cause the other tests to fail 😄

@zeroshade zeroshade merged commit f464c83 into apache:main May 10, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants