Skip to content

KAFKA-8591: WorkerConfigTransformer NPE on connector configuration reloading#6991

Merged
hachikuji merged 3 commits intoapache:trunkfrom
nachomdo:config-provider-bug
Jul 9, 2019
Merged

KAFKA-8591: WorkerConfigTransformer NPE on connector configuration reloading#6991
hachikuji merged 3 commits intoapache:trunkfrom
nachomdo:config-provider-bug

Conversation

@nachomdo
Copy link
Copy Markdown
Contributor

A bug in WorkerConfigTransformer prevents the connector configuration reload when the ConfigData TTL expires.

The issue boils down to the fact that worker.herder().restartConnector is receiving a null callback.

[2019-06-17 14:34:12,320] INFO Scheduling a restart of connector workshop-incremental in 60000 ms (org.apache.kafka.connect.runtime.WorkerConfigTransformer:88)
[2019-06-17 14:34:12,321] ERROR Uncaught exception in herder work thread, exiting:  (org.apache.kafka.connect.runtime.distributed.DistributedHerder:227)
java.lang.NullPointerException
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1187)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1183)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:273)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:219)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)

This patch keeps the same behaviour than before in the WorkerConfigTransformer in terms of ignoring any error returned from the callback. Do we still want to behave in the same way or we would like to handle any potential error? 🤔 maybe we can rely on the scheduleReload retrying again whenever the TTL expires or the connector task fails due to a stale configuration.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@nachomdo nachomdo changed the title WorkerConfigTransformer NPE on connector configuration reloading KAFKA-8591: WorkerConfigTransformer NPE on connector configuration reloading Jun 24, 2019
@nachomdo nachomdo force-pushed the config-provider-bug branch 2 times, most recently from 103f217 to 625a720 Compare July 1, 2019 16:18
Copy link
Copy Markdown
Contributor

@rayokota rayokota Jul 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps instead

CallBack<Void> cb = new Callback<Void>() {
    @Override
    public void onCompletion(Throwable error, Void result) {
        if (error != null) {
            log.error("Unexpected error during connector restart: ", error);
        }
    }
};

Copy link
Copy Markdown
Contributor

@rayokota rayokota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @nachomdo ! Left a small comment.

@nachomdo nachomdo force-pushed the config-provider-bug branch from 625a720 to aee5672 Compare July 3, 2019 14:30
@nachomdo
Copy link
Copy Markdown
Contributor Author

nachomdo commented Jul 3, 2019

Feedback applied ⚡️ Thanks for the 👀 @rayokota! 🏆

Copy link
Copy Markdown
Contributor

@rayokota rayokota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nachomdo ! LGTM

@rhauch, can you review and merge when you get a chance?

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. Left a couple small comments.

public void testReplaceVariableWithTTLFirstCancelThenScheduleRestart() {
EasyMock.expect(worker.herder()).andReturn(herder);
EasyMock.expect(herder.restartConnector(1L, MY_CONNECTOR, null)).andReturn(requestId);
EasyMock.expect(herder.restartConnector(eq(1L), eq(MY_CONNECTOR), anyObject())).andReturn(requestId);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra space before anyObject

EasyMock.expect(worker.herder()).andReturn(herder);
EasyMock.expect(herder.restartConnector(1L, MY_CONNECTOR, null)).andReturn(requestId);

EasyMock.expect(herder.restartConnector(eq(1L), eq(MY_CONNECTOR), anyObject())).andReturn(requestId);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. These tests all pass even if the callback is null. Perhaps we can use EasyMock.notNull()?

@hachikuji
Copy link
Copy Markdown
Contributor

retest this please

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

@hachikuji hachikuji merged commit 289ac09 into apache:trunk Jul 9, 2019
hachikuji pushed a commit that referenced this pull request Jul 9, 2019
…loading (#6991)

A bug in `WorkerConfigTransformer` prevents the connector configuration reload when the ConfigData TTL expires. 

The issue boils down to the fact that `worker.herder().restartConnector` is receiving a null callback. 

```
[2019-06-17 14:34:12,320] INFO Scheduling a restart of connector workshop-incremental in 60000 ms (org.apache.kafka.connect.runtime.WorkerConfigTransformer:88)
[2019-06-17 14:34:12,321] ERROR Uncaught exception in herder work thread, exiting:  (org.apache.kafka.connect.runtime.distributed.DistributedHerder:227)
java.lang.NullPointerException
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1187)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1183)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:273)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:219)
```
This patch adds a callback which just logs the error.

Reviewers: Robert Yokota <rayokota@gmail.com>, Jason Gustafson <jason@confluent.io>
hachikuji pushed a commit that referenced this pull request Jul 9, 2019
…loading (#6991)

A bug in `WorkerConfigTransformer` prevents the connector configuration reload when the ConfigData TTL expires. 

The issue boils down to the fact that `worker.herder().restartConnector` is receiving a null callback. 

```
[2019-06-17 14:34:12,320] INFO Scheduling a restart of connector workshop-incremental in 60000 ms (org.apache.kafka.connect.runtime.WorkerConfigTransformer:88)
[2019-06-17 14:34:12,321] ERROR Uncaught exception in herder work thread, exiting:  (org.apache.kafka.connect.runtime.distributed.DistributedHerder:227)
java.lang.NullPointerException
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1187)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1183)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:273)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:219)
```
This patch adds a callback which just logs the error.

Reviewers: Robert Yokota <rayokota@gmail.com>, Jason Gustafson <jason@confluent.io>
rayokota pushed a commit to confluentinc/kafka that referenced this pull request Jul 16, 2019
…loading (apache#6991)

A bug in `WorkerConfigTransformer` prevents the connector configuration reload when the ConfigData TTL expires. 

The issue boils down to the fact that `worker.herder().restartConnector` is receiving a null callback. 

```
[2019-06-17 14:34:12,320] INFO Scheduling a restart of connector workshop-incremental in 60000 ms (org.apache.kafka.connect.runtime.WorkerConfigTransformer:88)
[2019-06-17 14:34:12,321] ERROR Uncaught exception in herder work thread, exiting:  (org.apache.kafka.connect.runtime.distributed.DistributedHerder:227)
java.lang.NullPointerException
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1187)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1183)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:273)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:219)
```
This patch adds a callback which just logs the error.

Reviewers: Robert Yokota <rayokota@gmail.com>, Jason Gustafson <jason@confluent.io>
ijuma added a commit to confluentinc/kafka that referenced this pull request Jul 20, 2019
* apache-github/2.3:
  MINOR: Update documentation for enabling optimizations (apache#7099)
  MINOR: Remove stale streams producer retry default docs. (apache#6844)
  KAFKA-8635; Skip client poll in Sender loop when no request is sent (apache#7085)
  KAFKA-8615: Change to track partition time breaks TimestampExtractor (apache#7054)
  KAFKA-8670; Fix exception for kafka-topics.sh --describe without --topic mentioned (apache#7094)
  KAFKA-8602: Separate PR for 2.3 branch (apache#7092)
  KAFKA-8530; Check for topic authorization errors in OffsetFetch response (apache#6928)
  KAFKA-8662; Fix producer metadata error handling and consumer manual assignment (apache#7086)
  KAFKA-8637: WriteBatch objects leak off-heap memory (apache#7050)
  KAFKA-8620: fix NPE due to race condition during shutdown while rebalancing (apache#7021)
  HOT FIX: close RocksDB objects in correct order (apache#7076)
  KAFKA-7157: Fix handling of nulls in TimestampConverter (apache#7070)
  KAFKA-6605: Fix NPE in Flatten when optional Struct is null (apache#5705)
  Fixes apache#8198 KStreams testing docs use non-existent method pipe (apache#6678)
  KAFKA-5998: fix checkpointableOffsets handling (apache#7030)
  KAFKA-8653; Default rebalance timeout to session timeout for JoinGroup v0 (apache#7072)
  KAFKA-8591; WorkerConfigTransformer NPE on connector configuration reloading (apache#6991)
  MINOR: add upgrade text (apache#7013)
  Bump version to 2.3.1-SNAPSHOT
xiowu0 pushed a commit to linkedin/kafka that referenced this pull request Aug 22, 2019
… connector configuration reloading (apache#6991)

TICKET = KAFKA-8591
LI_DESCRIPTION =
EXIT_CRITERIA = HASH [4fdfe2b]
ORIGINAL_DESCRIPTION =

A bug in `WorkerConfigTransformer` prevents the connector configuration reload when the ConfigData TTL expires.

The issue boils down to the fact that `worker.herder().restartConnector` is receiving a null callback.

```
[2019-06-17 14:34:12,320] INFO Scheduling a restart of connector workshop-incremental in 60000 ms (org.apache.kafka.connect.runtime.WorkerConfigTransformer:88)
[2019-06-17 14:34:12,321] ERROR Uncaught exception in herder work thread, exiting:  (org.apache.kafka.connect.runtime.distributed.DistributedHerder:227)
java.lang.NullPointerException
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1187)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1183)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:273)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:219)
```
This patch adds a callback which just logs the error.

Reviewers: Robert Yokota <rayokota@gmail.com>, Jason Gustafson <jason@confluent.io>
(cherry picked from commit 4fdfe2b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants