Skip to content

KAFKA-9673: Filter and Conditional SMTs#8699

Merged
rhauch merged 13 commits intoapache:trunkfrom
tombentley:KAFKA-9673-conditional-smt
May 28, 2020
Merged

KAFKA-9673: Filter and Conditional SMTs#8699
rhauch merged 13 commits intoapache:trunkfrom
tombentley:KAFKA-9673-conditional-smt

Conversation

@tombentley
Copy link
Copy Markdown
Member

  • Add Predicate interface
  • Add Filter SMT
  • Add the predicate implementations defined in the KIP.
  • Create abstraction in ConnectorConfig for configuring Transformations and Connectors with the "alias prefix" mechanism
  • Add tests and fix existing tests.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@tombentley
Copy link
Copy Markdown
Member Author

Still need to review test coverage, but @kkonstantine, @mimaison, @bbejeck you might want to give it an initial pass.

@tombentley
Copy link
Copy Markdown
Member Author

@C0urante you might also want to take a look.

Copy link
Copy Markdown
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good @tombentley! Thanks for the ping.

I was able to get through all of the non-test files except for ConnectorConfig; will try to return and make another pass by the end of the week to cover the rest.

Also, do you think it might be worth it to write some integration tests for this stuff? I'm thinking at the very least it'd be cool to verify that Predicates work end-to-end for both source and sink connectors, and since we'd need to pick some SMTs to test the functionality of the Predicate plugin with, it might be a convenient excuse to also end-to-end test the new Filter SMT introduced here.

Comment thread connect/transforms/src/main/java/org/apache/kafka/connect/transforms/Filter.java Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here as with Filter; probably want to return a non-null ConfigDef here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the issue discussed about configure().

We could return an empty ConfigDef here, but that would be a lie which could ultimately lead some other error if someone tried to use it with configure().

We can't invent a ConfigDef schema for this because the PredicatedTransformer would need to know about the Transformer is was going to be wrapping, but it can't know that before it's been configured with at least the Transformer's ConfigDef and it can't be configured before config() has been called. So we have a chicken and egg problem. Something (ConnectorConfig) must have some a priori knowledge of either PredicatedTransformer's ConfigDef, or know how to configure it without needing to call config() at all.

Since PredicatedTransformer is a purely internal class which will never be directly exposed to Connect users, we're not obliged to stick to the contract of config() and configure(). i.e. So both PredicatedTransformer.config and PredicatedTransformer.configure can throw when called, since we know no one else can call them and we know ConnectorConfig never will.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I think there might be some awkwardness here with trying to make PredicatedTransformer implement the Transformation interface. Could we replace every Transformation in the TransformationChain's transformation list with a PredicatedTransformer and, if there are no predicates configured for a transform by the user, make the default behavior for the PredicatedTransformer class to blindly apply its transformation?

This would solve a few problems:

  • No risk of users trying to actually use a PredicatedTransformer in a connector config, which they may try to do if we don't add logic to prevent it from being picked up during plugin path scanning on startup and logged as an SMT plugin
  • No need to implement methods that aren't used
  • One code path instead of two for application of transformations
  • More flexibility in instantiation and, possibly, the ability to encapsulate some of the ConfigDef generation logic in a separate class from ConnectorConfig (haven't looked into the specifics of this yet so may not actually be feasible or that elegant)

Comment on lines 282 to 283
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to remove these properties here, or can we just read them? Removing might cause issues with SMTs that have config properties with these names; would leaving them in be likely to cause issues as well?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compatibility section of the KIP says that if a connector already has these configs then they'll be masked by the new implicit configs. If we don't remove them here then we'd be passing the KIP-585 configs to a connector which had it's own semantics for those config keys, which would be incorrect.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... wish we'd caught that earlier. Seems safer to just leave the properties in, but unless we want to call for a re-vote and an extension on the KIP deadline guess we'll have to keep this as-is.

@tombentley
Copy link
Copy Markdown
Member Author

@C0urante thanks for the review, some excellent points there! I think an integration test is a great idea, which I'll work on next. I've addressed all your other comments.

Copy link
Copy Markdown
Contributor

@kkonstantine kkonstantine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @tombentley. This looks in the right direction.
I managed to do a first pass. Comments are mostly minor.

Yet, we'll need to consider correct that the new package and interfaces are correctly included or excluded in PluginUtils to make these classes compatible with classloading isolation.

* Add Predicate interface
* Add Filter SMT
* Add the predicate implementations defined in the KIP.
* Create abstraction in ConnectorConfig for configuring Transformations and Connectors with the "alias prefix" mechanism
* Add tests and fix existing tests.
@tombentley tombentley force-pushed the KAFKA-9673-conditional-smt branch from 9d0ec29 to e76a0b4 Compare May 27, 2020 10:57
@tombentley
Copy link
Copy Markdown
Member Author

Rebased for conflict.

@kkonstantine I've addressed those first comments, thanks! Still some work on the integration test (not passing when run via gradle).

@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented May 27, 2020

ok to test

Copy link
Copy Markdown
Contributor

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tombentley. I made a first pass, and for the most part this looks good. A few comments/questions below. I'll keep reviewing in more detail.

.build();

// start the clusters
connect.start();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we wait until all brokers and Connect workers are available, via something like:

        connect.assertions().assertExactlyNumBrokersAreUp(numBrokers, "Brokers did not start in time.");
        connect.assertions().assertExactlyNumWorkersAreUp(numWorkers, "Worker did not start in time.");

connectorHandle.expectedCommits(numFooRecords);

// start a sink connector
connect.configureConnector(CONNECTOR_NAME, props);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an asynchronous method, and it's likely the connector will not be started and running before the test proceeds to the next statements. This can lead to very flaky tests.

We could instead wait until the connector is actually running, using something like:

        connect.assertions().assertConnectorAndAtLeastNumTasksAreRunning(CONNECTOR_NAME, NUM_TASKS,
                "Connector tasks did not start in time.");

Comment on lines +60 to +65
String value = simpleConfig.getString(PATTERN_CONFIG);
try {
result = Pattern.compile(value);
} catch (PatternSyntaxException e) {
throw new ConfigException(PATTERN_CONFIG, value, "entry must be a Java-compatible regular expression: " + e.getMessage());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ever get to line 64? The constructor of the config (line 58) should fail if the pattern validator fails to ensure the pattern is a valid regex, which means that if we make it past 58 then line 62 will never fail.

Am I missing something?

Co-authored-by: Randall Hauch <rhauch@gmail.com>
Copy link
Copy Markdown
Contributor

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got through the test code, and found a few more suggestions.

}
}

@Ignore("Is this really an error. There's no actual need for the predicates config (unlike transforms where it defines the order).")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to have the test reflect the current behavior.

Comment on lines +313 to +318
try {
new ConnectorConfig(MOCK_PLUGINS, props);
fail();
} catch (ConfigException e) {
assertTrue(e.getMessage().contains("Value must be at least 80"));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've moved to using assertThrows here, which would look something like:

Suggested change
try {
new ConnectorConfig(MOCK_PLUGINS, props);
fail();
} catch (ConfigException e) {
assertTrue(e.getMessage().contains("Value must be at least 80"));
}
ConfigException e = assertThrows(ConfigException.class, () -> new ConnectorConfig(MOCK_PLUGINS, props));
assertTrue(e.getMessage().contains("Value must be at least 42"));

@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented May 27, 2020

ok to test

@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented May 28, 2020

Added several commits that fixed the failing unit and integration tests, and addressed several of my other comments.

@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented May 28, 2020

ok to test

@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented May 28, 2020

retest this please

1 similar comment
@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented May 28, 2020

retest this please

@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented May 28, 2020

@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented May 28, 2020

retest this please

Copy link
Copy Markdown
Contributor

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tombentley! LGTM, pending a green build (which are currently waiting for executors).

@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented May 28, 2020

ok to test

@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented May 28, 2020

@kkonstantine
Copy link
Copy Markdown
Contributor

retest this please

Copy link
Copy Markdown
Contributor

@kkonstantine kkonstantine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending a green build
Nice work @tombentley !

@tombentley
Copy link
Copy Markdown
Member Author

@rhauch @kkonstantine thanks very much for making those fixes for me, I appreciate the time that must've taken. I've made one more trivial correction to the integration test.

@rhauch
Copy link
Copy Markdown
Contributor

rhauch commented May 28, 2020

Thanks, @tombentley. Merging with 3 green builds that Jenkins did not tie back to this PR.

@rhauch rhauch merged commit 1c4eb1a into apache:trunk May 28, 2020
Kvicii pushed a commit to Kvicii/kafka that referenced this pull request May 30, 2020
* 'trunk' of github.com:apache/kafka: (36 commits)
  Remove redundant `containsKey` call in KafkaProducer (apache#8761)
  KAFKA-9494; Include additional metadata information in DescribeConfig response (KIP-569) (apache#8723)
  KAFKA-10061; Fix flaky `ReassignPartitionsIntegrationTest.testCancellation` (apache#8749)
  KAFKA-9130; KIP-518 Allow listing consumer groups per state (apache#8238)
  KAFKA-9501: convert between active and standby without closing stores (apache#8248)
  KAFKA-10056; Ensure consumer metadata contains new topics on subscription change (apache#8739)
  MINOR: Log the reason for coordinator discovery failure (apache#8747)
  KAFKA-10029; Don't update completedReceives when channels are closed to avoid ConcurrentModificationException (apache#8705)
  MINOR: remove unnecessary timeout for admin request (apache#8738)
  MINOR: Relax Percentiles test (apache#8748)
  MINOR: regression test for task assignor config (apache#8743)
  MINOR: Update documentation.html to refer to 2.6 (apache#8745)
  MINOR: Update documentation.html to refer to 2.5 (apache#8744)
  KAFKA-9673: Filter and Conditional SMTs (apache#8699)
  KAFKA-9971: Error Reporting in Sink Connectors (KIP-610) (apache#8720)
  KAFKA-10052: Harden assertion of topic settings in Connect integration tests (apache#8735)
  MINOR: Slight MetadataCache tweaks to avoid unnecessary work (apache#8728)
  KAFKA-9802; Increase transaction timeout in system tests to reduce flakiness (apache#8736)
  KAFKA-10050: kafka_log4j_appender.py fixed for JDK11 (apache#8731)
  KAFKA-9146: Add option to force delete active members in StreamsResetter (apache#8589)
  ...

# Conflicts:
#	core/src/main/scala/kafka/log/Log.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants