KAFKA-14670: (part 1) Wrap Connectors in IsolatedConnector objects by gharris1727 · Pull Request #13185 · apache/kafka

gharris1727 · 2023-02-01T23:05:11Z

Jira
This is the first part of the above ticket, applied only to SinkConnector and SourceConnector plugins.
Additional PRs will cover the other plugins, as the refactor was too large to reasonably review at once.

Design decisions:

The IsolatedPlugin<P> class will be a common superclass for all plugin wrappers.
The IsolatedPlugin superclass provides utility methods for subclasses to manage swapping the ThreadContextClassLoader for each call in a way that has minimal boilerplate.
The Isolated* classes are intended to only be constructed within the plugin isolation infrastructure, and will all have package-local constructors.
Testing runtime code that uses wrapped plugins will require mocking the wrappers, or instantiating a real Plugins class.
Subclasses should define public methods which match the plugin class they are wrapping without being an explicit subclass. These methods should be marked with throws Exception to remind callers that they may throw arbitrary exceptions.

Open questions/issues:

The hashCode, equals, and toString methods do not have throws Exception as the Object class does not have these throws clauses. That means that calling code cannot be forced to handle exceptions from these methods. For toString, the exception message is provided in place of the toString result, and the hashCode and equals are wholly decoupled from the underlying hashCode and equals implementations.
The wrapper method signatures throw Exception and not Throwable. The distinction being that Exceptions are considered by the Java Language to be reasonable to catch in an application, and Throwables were not. I wasn't sure whether the Connect runtime should be forced to handle errors like OutOfMemoryError, LinkageError, etc, or just let them propagate and kill the calling thread.
These wrappers do not enforce that the methods are not called on the herder thread, because I didn't come up with an elegant way to do so.
I did a first-pass at propagating and handling the exceptions thrown by the connector classes, but I don't know if they are reasonable. Now that the exceptions are checked, the code enforces that exceptions are handled, but it is still up to us to determine the proper way to handle the exceptions.
This PR does not remove existing loaderSwap calls that are currently ensuring isolation. Those can be moved/removed after all of this refactor lands, as it may still be necessary for the other plugins.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

Signed-off-by: Greg Harris <greg.harris@aiven.io>

C0urante

Thanks @gharris1727. This is a great improvement that makes several common bugs much harder to write and I'm excited to see it land so that we can stop worrying about what's running on the herder thread, doing our due diligence around the context classloader, etc.

I've only taken a look at the functional changes and haven't reviewed changes to tests yet. I hope to do a full pass sometime this week.

C0urante · 2023-02-28T20:52:42Z

+        try {
+            updateConnectorTasks(connName);
+        } catch (Exception e) {
+            log.error("Unable to generate task configs for {}", connName, e);
+        }


This is a change in behavior too, right? We no longer throw in ConnectorContext::requestTaskReconfiguration if we encounter any errors.

This also seems reasonable (it aligns the behavior across standalone and distributed modes), but it does have consequences for the REST API, where restarting a connector no longer fails if we're unable to generate task configs for it (which is currently the case for both distributed and standalone modes).

Yes this is a change in behavior.

There is precedent for throwing ConnectException from ConnectorContext::requestTaskReconfiguration, so perhaps wrapping this in a ConnectException and propagating it would be a better behavior. I can move this to HerderConnectorContext, except it would only be effective for the standalone herder.

We can also see this as an opportunity to improve the StandaloneHerder by handling reconfigurations asynchronously and retry them in the background, rather than 500'ing the REST API or dropping the failure silently.

Perhaps we can also consider this a failure of the signature of Herder::requestTaskReconfiguration. The DistributedHerder makes this asynchronous, but provides no future or callback to confirm the progress of the request.
Arguably StandaloneHerder is implementing the function signature correctly as a request that either succeeds or fails.

It also makes me think that a connector which repeatedly calls requestTaskReconfiguration (and then always fails in generateTaskConfigs) could spam the herder with retried restart requests. This is such a messy situation that the old function signatures hid from us :)

Okay, a lot to unpack here!

The more I think about it, the more I like the existing behavior for handling failures in task config generation. We automatically retry in distributed mode in order to absorb the risk of writing to the config topic or issuing a REST request to the leader, but since neither of those take place in standalone mode, it's fine to just throw the exception back to the caller (either a connector invoking ConnectorContext::requestTaskReconfiguration, or a REST API call to restart the connector) since the likeliest cause is a failed call to Connector::taskConfigs and automatic retries are less likely to be useful.

I think we should basically just preserve existing behavior here, with the one exception of fixing how we handle failed calls to requestTaskReconfiguration that occur during a call to restartConnector. Right now we don't handle any of those and, IIUC, just cause the REST request to time out after 90 seconds. Instead of timing out, we should return a 500 response in that case.

I don't think it's especially likely for connectors to continually invoke requestTaskReconfiguration given the automatic retry logic in distributed mode, and as of #13276, the impact of ongoing retries for that operation is drastically reduced.

Signed-off-by: Greg Harris <greg.harris@aiven.io>

…onnectors

Signed-off-by: Greg Harris <greg.harris@aiven.io>

…onnectors

Signed-off-by: Greg Harris <greg.harris@aiven.io>

github-actions · 2023-09-07T03:33:22Z

This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch)

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

…onnectors

Signed-off-by: Greg Harris <greg.harris@aiven.io>

…onnectors

github-actions · 2024-12-27T03:38:14Z

This PR is being marked as stale since it has not had any activity in 90 days. If you
would like to keep this PR alive, please leave a comment asking for a review. If the PR has
merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

github-actions · 2025-01-27T03:36:27Z

This PR has been closed since it has not had any activity in 120 days. If you feel like this
was a mistake, or you would like to continue working on it, please feel free to re-open the
PR and ask for a review.

gharris1727 added 6 commits February 1, 2023 14:02

Add IsolatedPlugin

2d5ea1c

Signed-off-by: Greg Harris <greg.harris@aiven.io>

Add Isolated*Connector classes

768bb22

Signed-off-by: Greg Harris <greg.harris@aiven.io>

Return IsolatedConnector from Plugins

367f2cb

Signed-off-by: Greg Harris <greg.harris@aiven.io>

Use Isolated*Connector in Herder and Worker classes

fb07a9b

Signed-off-by: Greg Harris <greg.harris@aiven.io>

fixup: remove unnecessary catch clauses

e35c06f

Signed-off-by: Greg Harris <greg.harris@aiven.io>

fixup: revert config transformer error handling

72cf9c7

Signed-off-by: Greg Harris <greg.harris@aiven.io>

C0urante added the connect label Feb 2, 2023

C0urante reviewed Feb 28, 2023

View reviewed changes

gharris1727 added 7 commits March 1, 2023 15:58

fixup: review comments for IsolatedPlugin part 1

84acab0

Signed-off-by: Greg Harris <greg.harris@aiven.io>

fixup: Review comments for Isolated*Connector part 1

3780596

Signed-off-by: Greg Harris <greg.harris@aiven.io>

Merge remote-tracking branch 'upstream/trunk' into kafka-14670-wrap-c…

05ce444

…onnectors

fixup: remove checkstyle suppression

8035077

Signed-off-by: Greg Harris <greg.harris@aiven.io>

fixup: merge conflict

bb99bad

Signed-off-by: Greg Harris <greg.harris@aiven.io>

Merge remote-tracking branch 'upstream/trunk' into kafka-14670-wrap-c…

493bd0a

…onnectors

fixup: merge conflict

ee93a63

Signed-off-by: Greg Harris <greg.harris@aiven.io>

github-actions Bot added the stale Stale PRs label Sep 7, 2023

Merge remote-tracking branch 'upstream/trunk' into kafka-14670-wrap-c…

5906ed6

…onnectors

C0urante mentioned this pull request Oct 31, 2023

KAFKA-13328, KAFKA-13329 (2): Add custom preflight validation support for connector header, key, and value converters #14309

Merged

3 tasks

gharris1727 added 4 commits January 3, 2024 10:55

Merge remote-tracking branch 'upstream/trunk' into kafka-14670-wrap-c…

8cd9d11

…onnectors

Merge StandaloneHerderTest

561c174

fixup: fix this-escape warning on jdk21

2abb38c

Signed-off-by: Greg Harris <greg.harris@aiven.io>

Merge remote-tracking branch 'upstream/trunk' into kafka-14670-wrap-c…

2bf6ea5

…onnectors

gharris1727 removed the stale Stale PRs label Jan 12, 2024

gharris1727 added 2 commits February 12, 2024 11:33

Merge remote-tracking branch 'upstream/trunk' into kafka-14670-wrap-c…

8df55c0

…onnectors

Merge remote-tracking branch 'upstream/trunk' into kafka-14670-wrap-c…

15f3e46

…onnectors

github-actions Bot added the stale Stale PRs label Dec 27, 2024

github-actions Bot added the closed-stale PRs that were closed due to inactivity label Jan 27, 2025

github-actions Bot closed this Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-14670: (part 1) Wrap Connectors in IsolatedConnector objects #13185

KAFKA-14670: (part 1) Wrap Connectors in IsolatedConnector objects #13185
gharris1727 wants to merge 20 commits intoapache:trunkfrom
gharris1727:kafka-14670-wrap-connectors

gharris1727 commented Feb 1, 2023 •

edited

Loading

Uh oh!

C0urante left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

C0urante Feb 28, 2023 •

edited

Loading

Uh oh!

gharris1727 Mar 2, 2023

Uh oh!

gharris1727 Mar 2, 2023

Uh oh!

C0urante Mar 2, 2023

Uh oh!

C0urante Mar 2, 2023

Uh oh!

github-actions Bot commented Sep 7, 2023

Uh oh!

github-actions Bot commented Dec 27, 2024

Uh oh!

github-actions Bot commented Jan 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gharris1727 commented Feb 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

C0urante left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

C0urante Feb 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gharris1727 Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

gharris1727 Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

C0urante Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

C0urante Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Sep 7, 2023

Uh oh!

github-actions Bot commented Dec 27, 2024

Uh oh!

github-actions Bot commented Jan 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gharris1727 commented Feb 1, 2023 •

edited

Loading

C0urante left a comment •

edited

Loading

C0urante Feb 28, 2023 •

edited

Loading