Add Integration Test for functionality of kinesis ingestion by maytasm · Pull Request #9576 · apache/druid

maytasm · 2020-03-28T03:28:11Z

Add Integration Test for functionality of kinesis ingestion

Description

The new set of integration test for Kinesis follows the same concept as Bring Your Own Cloud (BYOC) that S3, GCS, Azure Integration tests uses (#9501). Basically, anyone running will have to provide their own Kinesis credentials in a conf file and pass the file to mvn using -Doverride.config.path

Added following Integration Test for functionality of kinesis ingestion:

Functional tests when Druid and Kafka are in stable state

legacy parser
inputFormat
Greater than 1 taskCount

Functional tests when Druid is in an unstable state

losing nodes
Stop/start supervisor

Functional tests when Kafka is in an unstable state

adding partitions
removing partitions

To verify ingestion:

Kafka lag should be minimal, the consumer should be able to pull off the queue at a comparable rate to the producer.
Realtime queries works from the indexing tasks
Queries works reading from historical segments (after handed off)
Queries return expected count/value/etc.

Added integration test infrastructure/helper:

Data generator - Generate data to kinesis stream (Kafka can also be refactor to use this later)
DruidAdminClient - to control the Integration test's Druid Docker cluster
KinesisAdminClient - Create stream, delete stream, etc.
Since tests can fail or be killed un-expectedly, added druid-ci-expire-after tag to all stream created by integration test. The value to this tag is a timestamp that can be used by a lambda function to remove unused stream. (help make clean up easier)
Also made debugging integration test easier by automatically enabling debug mode and exposing debug port on integration test's druids docker processes

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths.
added integration tests.
been tested in a test Druid cluster.

suneet-s

Super cool! I hope to use the DruidAdminClient to automate some of these restart tests

suneet-s · 2020-03-31T16:46:57Z

 Druid's configuration (using Docker) can be overrided by providing -Doverride.config.path=<PATH_TO_FILE>. 
 The file must contain one property per line, the key must start with `druid_` and the format should be snake case. 

+## Debugging Druid while running tests


suneet-s · 2020-03-31T16:50:08Z


+FILE_CHECK_IF_RAN=/tls/server.key
+if [ -f "$FILE_CHECK_IF_RAN" ]; then
+  echo "Script was ran already. Skip running again."


nit: No need to change if everything else looks good. If I saw the log line as is, it's a little ambiguous - which script? what's the impact of skipping running again?

Suggested change

echo "Script was ran already. Skip running again."

echo "Using existing tls keys since /tls/server.key exists - skipping generation of all certs. To generate certs, delete this file"

jon-wei · 2020-04-01T02:54:57Z

  # Run all integration tests that have been verified to work against a quickstart cluster.
  mvn verify -P int-tests-config-file -Dgroups=quickstart-compatible
 ```
+>>>>>>> upstream/master


Looks like something from a conflict

Oops. Good catch. Removed

jon-wei · 2020-04-01T03:29:05Z

+    event.put("robot", "false");
+    event.put("anonymous", "false");
+    event.put("namespace", "article");
+    event.put("continent", "North Americ");


North Americ -> North America

jon-wei · 2020-04-01T03:32:31Z

+{
+  private static final Logger LOG = new Logger(AbstractKafkaIndexerTest.class);
+  private static final int KINESIS_SHARD_COUNT = 2;
+  private static final String STREAM_EXPIRE_TAG = "druid-ci-expire-after";


How is the expire tag used?

Ah, nevermind, just saw that part of the description

Can you add a comment here explaining what it's used for?

It's to help people cleanup the test streams if the IT cleanup method fails or didn't run (this shouldn't happen normally but can such as if the test unexpectedly terminates midway). Added the comment

jon-wei · 2020-04-01T03:37:59Z

+      int secondsToGenerateSecondRound = TOTAL_NUMBER_OF_SECOND / 3;
+      secondsToGenerateRemaining = secondsToGenerateRemaining - secondsToGenerateSecondRound;
+      wikipediaStreamEventGenerator.start(streamName, kinesisEventWriter, secondsToGenerateSecondRound, FIRST_EVENT_TIME.plusSeconds(secondsToGenerateFirstRound));
+      // Wait for kinesis stream to finish resharding


For the resharding test, I think you'll want to have longer timers for the event generation, with only 3s here I think it's maybe possible that AWS doesn't actually begin the resharding until you've already finished this second phase. Maybe 30s is better.

Or maybe it could check for the stream status becoming UPDATING and start the second phase then.

Changed the logic to:

after issuing reshard call

do DescribeStream polling for an updating status or an active status with the final expected number of shards

begin second phase when ^ true

check that stream is active status (no need to check the number of shards since earlier we already check for "updating status or an active status with the final expected number of shards", hence if it is active now it was be the active after resharding)

begin third phase when ^ true

From running locally, I can see that resharding does takes around 30000-40000ms (3-4 mins). This means that after issuing reshard call, when we check for "updating status or an active status with the final expected number of shards" immediately after, then very most likely it will be "updating status" that returns true (rather than "active status with the final expected number of shards"). I am only including "active status with the final expected number of shards" check in case the reshard finish by the time we do the check (most likely wont happen)

jon-wei · 2020-04-01T03:43:01Z

+      );
+      // Start generating remainding data (after resharding)
+      wikipediaStreamEventGenerator.start(streamName, kinesisEventWriter, secondsToGenerateRemaining, FIRST_EVENT_TIME.plusSeconds(secondsToGenerateFirstRound + secondsToGenerateSecondRound));
+      // Verify supervisor is healthy after suspension


Suggest having a supervisor healthy check as well before the resharding occurs, so the resharding occurs while the supervisor is running

jon-wei · 2020-04-01T03:43:24Z

+    this.queryHelper.testQueriesFromString(querySpec, 2);
+    LOG.info("Shutting down supervisor");
+    indexer.shutdownSupervisor(supervisorId);
+    // wait for all kafka indexing tasks to finish


kafka -> kinesis

jon-wei · 2020-04-01T03:44:39Z

+  }
+
+  @Test
+  public void testKineseIndexDataWithLegacyParserStableState() throws Exception


In the test names, Kinese -> Kinesis here and elsewhere

…pache#9576)

* backport Add Integration Test for functionality of kinesis ingestion (#9576) * backport Add integration tests for kafka ingestion (#9724) * resolve merge conflict * integration test cluster prop change to support parallel

) * kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * fix kinesis timeout * Kinesis IT * Kinesis IT * fix checkstyle * Kinesis IT * address comments * fix checkstyle

maytasm added 6 commits March 24, 2020 12:18

kinesis IT

5f0f36f

Merge remote-tracking branch 'upstream/master' into IMPLY-2223

fde2a1a

Kinesis IT

8d6a353

Kinesis IT

747dcf7

Kinesis IT

e0b0e30

fix conflict

052bc09

maytasm changed the title ~~Add Integration Test for functionality of kinesis ingestion~~ [WIP] Add Integration Test for functionality of kinesis ingestion Mar 28, 2020

maytasm added 6 commits March 27, 2020 17:39

Kinesis IT

77599e6

Kinesis IT

d76fcc4

Kinesis IT

fd928f2

Kinesis IT

08edce8

Kinesis IT

b70712a

Kinesis IT

7a26fa6

maytasm changed the title ~~[WIP] Add Integration Test for functionality of kinesis ingestion~~ Add Integration Test for functionality of kinesis ingestion Mar 30, 2020

maytasm added 4 commits March 30, 2020 00:47

Kinesis IT

65e0ca4

Kinesis IT

2ddfe1a

Kinesis IT

8505e79

Kinesis IT

7c89471

clintropolis added Area - Testing Area - Streaming Ingestion labels Mar 30, 2020

maytasm added 6 commits March 30, 2020 20:54

Kinesis IT

1d7a20e

fix kinesis timeout

29a4388

Kinesis IT

2f3ae22

Kinesis IT

c5f4b14

fix checkstyle

bb24f1c

Kinesis IT

d7637d5

suneet-s reviewed Mar 31, 2020

View reviewed changes

jon-wei reviewed Apr 1, 2020

View reviewed changes

maytasm added 2 commits April 1, 2020 14:52

address comments

0aa938c

fix checkstyle

c094119

jon-wei approved these changes Apr 3, 2020

View reviewed changes

jon-wei merged commit 1852bf3 into apache:master Apr 3, 2020

maytasm added a commit to maytasm/druid that referenced this pull request Apr 22, 2020

backport Add Integration Test for functionality of kinesis ingestion (a…

9173e51

…pache#9576)

maytasm mentioned this pull request Apr 22, 2020

[Backport] Integration tests for Kafka & Kinesis ingestion #9745

Merged

clintropolis added this to the 0.19.0 milestone Jun 26, 2020

jihoonson mentioned this pull request Sep 29, 2021

Add killAndRestart for container for integration tests #11754

Merged

1 task

	echo "Script was ran already. Skip running again."
	echo "Using existing tls keys since /tls/server.key exists - skipping generation of all certs. To generate certs, delete this file"

Conversation

maytasm commented Mar 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

suneet-s left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maytasm Apr 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maytasm commented Mar 28, 2020 •

edited

Loading

maytasm Apr 2, 2020 •

edited

Loading