-
Notifications
You must be signed in to change notification settings - Fork 14
Set max message size on parquet v3 reader #1198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
As I understand, PR increased message size limit from 100 megabytes to 2 gigabytes. |
I have briefly checked the |
|
Was this change tested? |
Not with real data. @dima-altinity was going to test it with real customer data, I have not heard from him yet. On my side, I have tried creating a parquet file with a big metadata footer with the following ChatGPT provided script: Before these changes, I would get: After these changes, I get: I opted for not introducing a test because the |
|
It may be interesting to know if this memory is under control of CH memory quotes (some libraries have issues with this, some others don't), but probably this is out of the scope of this PR. |
ilejn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
Verification test: https://github.com/Altinity/clickhouse-regression/blob/52cf60980b118efa4fe7dc0d790b1d7de2c757a0/parquet/tests/native_reader.py#L237 The test dynamically generates a parquet file, puts it to the user_files in clickhouse and does a cleanup after execution in order to not keep 300MB size parquet file. Currently reading this file is only possible with Just |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Set max message size on parquet v3 reader to avoid getting DB::Exception: apache::thrift::transport::TTransportException: MaxMessageSize reached
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: