-
Notifications
You must be signed in to change notification settings - Fork 89
fix: TestDeltaByteArray implementation and fix #369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This exact test seems to be passing on v17 version. More investigation is needed
|
Can you provide the csv file that you're testing with so that I can try to reproduce? |
| } | ||
| require.DirExists(t, dir) | ||
|
|
||
| expected, err := os.ReadFile(path.Join(dir, "delta_byte_array_expect.csv")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the CSV
| records = records[1:] // skip header | ||
|
|
||
| props := parquet.NewReaderProperties(memory.DefaultAllocator) | ||
| fileReader, err := file.OpenParquetFile(path.Join(dir, "delta_byte_array.parquet"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the Parquet
I got the files from the parquet-testing repository. You can find the info here https://github.com/apache/parquet-testing/blob/master/data/delta_byte_array.md |
I'm unsure about what the difference is, but by looking at version v17 code, I noticed a difference between Int32 and Int64 decoder. Int32 used totalValues and Int64 used nvals. Looks like when merging implementations, the Int64 approach was used. Which seems to not be working properly for Int32. This change at least seems to be able to read the test file, which makes me think it is the right approach.
|
@zeroshade see last commit for the fix. I'm just pattern matching, I do not really know what those values mean 😅 . |
|
I figured out what the underlying issue was and updated this to be a better fix that won't cause the other tests to fail 😄 |
This exact test seems to be passing on v17 version. More investigation is needed
Rationale for this change
After updating from v17 to v18 I noticed a test case failing that was passing before.
What changes are included in this PR?
This PR contains a test case to reproduce the issue. A reference PR for version v17 can be found here
Are these changes tested?
The PR itself just contains the test
Are there any user-facing changes?