Add support for Iceberg table identifiers with special characters #33648

regadas · 2025-01-17T20:53:30Z

This is a follow up PR to #33293 and #33575

proposal to address the performance issue by moving away from constant parsing and "stringified" TableIdentifier's making things a bit more type safe.

@ahmedabu98 @Abacn can you take look?

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

…cters (apache#33293)" (apache#33575) This reverts commit bb2e0ad.

github-actions · 2025-01-17T22:05:58Z

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @kennknowles for label java.
R: @damccorm for label build.
R: @johnjcasey for label io.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

kennknowles · 2025-01-20T20:47:04Z

R: @ahmedabu98

github-actions · 2025-01-20T20:48:14Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

ahmedabu98 · 2025-01-21T20:20:52Z

Hey @regadas, I skimmed over this and noticed that a large portion of this PR includes changes that are not update compatible. In general for streaming pipelines we try not to change the PCollection element type (more importantly, the coder) unless it's really necessary. Otherwise, streaming pipelines will break when trying to update to the newer SDK version

It might be better to add a (static?) cache like @Abacn suggested

regadas · 2025-01-21T20:30:10Z

Hey @ahmedabu98 yup that is true; the reason I went for it was because:

this is a fairly new IO so probably not a lot of stuff running out there
I believe, the change is fundamentally a better approach enabling the usage of namespaces/table-names with special chars and supporting nested namespaces without requiring constant parsing.

That said, if you still feel caching is the best bet to add support for this then I'll make a new PR 👍

chamikaramj · 2025-01-21T22:40:44Z

Hi @regadas, thanks again for this and other contributions :)

To clarify, do the performance gains here come from eliminating per-element "TableIdentifier.parse(element.getKey())" calls or something else ?

Also, do you have any idea of perf gains received through this approach compared to a caching based approach ?

If pref gains are similar, I think it's preferable to not break update compatibility :)

Also, as a side note, we heavily discourage using Java serialization for coders.

regadas · 2025-01-21T23:07:10Z

Hello @chamikaramj Thanks glad to help 👍

To clarify, do the performance gains here come from eliminating per-element "TableIdentifier.parse(element.getKey())" calls or something else ?

yeah ... but with the proposal on #33293 things got a bit worse since parsing Json per element becomes very expensive;

If pref gains are similar, I think it's preferable to not break update compatibility :)

agree, tbh haven't done benchmark yet, let me get back to you on this

Also, as a side note, we heavily discourage using Java serialization for coders.

for good reasons 😄 , need to double check where TableIdentifierCoder is not used

derrickaw · 2025-03-18T03:44:23Z

Hi @regadas, friendly ping to revisit this PR. Thanks!

github-actions · 2025-05-17T12:46:04Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

github-actions · 2025-05-25T12:46:15Z

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

regadas added 2 commits January 15, 2025 06:39

Reapply "Add support for Iceberg table identifiers with special chara…

3a8d185

…cters (apache#33293)" (apache#33575) This reverts commit bb2e0ad.

Avoid TableIdentifier json parsing

b526c90

github-actions bot added java build io labels Jan 17, 2025

github-actions bot added the Next Action: Reviewers label Jan 17, 2025

github-actions bot added the stale label May 17, 2025

github-actions bot closed this May 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for Iceberg table identifiers with special characters #33648

Add support for Iceberg table identifiers with special characters #33648

Uh oh!

regadas commented Jan 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jan 17, 2025

Uh oh!

kennknowles commented Jan 20, 2025

Uh oh!

github-actions bot commented Jan 20, 2025

Uh oh!

ahmedabu98 commented Jan 21, 2025

Uh oh!

regadas commented Jan 21, 2025 •

edited

Loading

Uh oh!

chamikaramj commented Jan 21, 2025

Uh oh!

regadas commented Jan 21, 2025 •

edited

Loading

Uh oh!

derrickaw commented Mar 18, 2025

Uh oh!

github-actions bot commented May 17, 2025

Uh oh!

github-actions bot commented May 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add support for Iceberg table identifiers with special characters #33648

Add support for Iceberg table identifiers with special characters #33648

Uh oh!

Conversation

regadas commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GitHub Actions Tests Status (on master branch)

Uh oh!

github-actions bot commented Jan 17, 2025

Uh oh!

kennknowles commented Jan 20, 2025

Uh oh!

github-actions bot commented Jan 20, 2025

Uh oh!

ahmedabu98 commented Jan 21, 2025

Uh oh!

regadas commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chamikaramj commented Jan 21, 2025

Uh oh!

regadas commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

derrickaw commented Mar 18, 2025

Uh oh!

github-actions bot commented May 17, 2025

Uh oh!

github-actions bot commented May 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

regadas commented Jan 17, 2025 •

edited

Loading

regadas commented Jan 21, 2025 •

edited

Loading

regadas commented Jan 21, 2025 •

edited

Loading