-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Add support for Iceberg table identifiers with special characters #33648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…cters (apache#33293)" (apache#33575) This reverts commit bb2e0ad.
|
Assigning reviewers. If you would like to opt out of this review, comment R: @kennknowles for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
R: @ahmedabu98 |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
|
Hey @regadas, I skimmed over this and noticed that a large portion of this PR includes changes that are not update compatible. In general for streaming pipelines we try not to change the PCollection element type (more importantly, the coder) unless it's really necessary. Otherwise, streaming pipelines will break when trying to update to the newer SDK version It might be better to add a (static?) cache like @Abacn suggested |
|
Hey @ahmedabu98 yup that is true; the reason I went for it was because:
That said, if you still feel caching is the best bet to add support for this then I'll make a new PR 👍 |
|
Hi @regadas, thanks again for this and other contributions :) To clarify, do the performance gains here come from eliminating per-element "TableIdentifier.parse(element.getKey())" calls or something else ? Also, do you have any idea of perf gains received through this approach compared to a caching based approach ? If pref gains are similar, I think it's preferable to not break update compatibility :) Also, as a side note, we heavily discourage using Java serialization for coders. |
|
Hello @chamikaramj Thanks glad to help 👍
yeah ... but with the proposal on #33293 things got a bit worse since parsing Json per element becomes very expensive;
agree, tbh haven't done benchmark yet, let me get back to you on this
for good reasons 😄 , need to double check where TableIdentifierCoder is not used |
|
Hi @regadas, friendly ping to revisit this PR. Thanks! |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
This is a follow up PR to #33293 and #33575
proposal to address the performance issue by moving away from constant parsing and "stringified"
TableIdentifier's making things a bit more type safe.@ahmedabu98 @Abacn can you take look?
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.