Add Integration Test for functionality of kinesis ingestion#9576
Add Integration Test for functionality of kinesis ingestion#9576jon-wei merged 24 commits intoapache:masterfrom
Conversation
suneet-s
left a comment
There was a problem hiding this comment.
Super cool! I hope to use the DruidAdminClient to automate some of these restart tests
| Druid's configuration (using Docker) can be overrided by providing -Doverride.config.path=<PATH_TO_FILE>. | ||
| The file must contain one property per line, the key must start with `druid_` and the format should be snake case. | ||
|
|
||
| ## Debugging Druid while running tests |
|
|
||
| FILE_CHECK_IF_RAN=/tls/server.key | ||
| if [ -f "$FILE_CHECK_IF_RAN" ]; then | ||
| echo "Script was ran already. Skip running again." |
There was a problem hiding this comment.
nit: No need to change if everything else looks good. If I saw the log line as is, it's a little ambiguous - which script? what's the impact of skipping running again?
| echo "Script was ran already. Skip running again." | |
| echo "Using existing tls keys since /tls/server.key exists - skipping generation of all certs. To generate certs, delete this file" |
| # Run all integration tests that have been verified to work against a quickstart cluster. | ||
| mvn verify -P int-tests-config-file -Dgroups=quickstart-compatible | ||
| ``` | ||
| >>>>>>> upstream/master |
There was a problem hiding this comment.
Looks like something from a conflict
There was a problem hiding this comment.
Oops. Good catch. Removed
| event.put("robot", "false"); | ||
| event.put("anonymous", "false"); | ||
| event.put("namespace", "article"); | ||
| event.put("continent", "North Americ"); |
There was a problem hiding this comment.
North Americ -> North America
| { | ||
| private static final Logger LOG = new Logger(AbstractKafkaIndexerTest.class); | ||
| private static final int KINESIS_SHARD_COUNT = 2; | ||
| private static final String STREAM_EXPIRE_TAG = "druid-ci-expire-after"; |
There was a problem hiding this comment.
How is the expire tag used?
There was a problem hiding this comment.
Ah, nevermind, just saw that part of the description
There was a problem hiding this comment.
Can you add a comment here explaining what it's used for?
There was a problem hiding this comment.
It's to help people cleanup the test streams if the IT cleanup method fails or didn't run (this shouldn't happen normally but can such as if the test unexpectedly terminates midway). Added the comment
| int secondsToGenerateSecondRound = TOTAL_NUMBER_OF_SECOND / 3; | ||
| secondsToGenerateRemaining = secondsToGenerateRemaining - secondsToGenerateSecondRound; | ||
| wikipediaStreamEventGenerator.start(streamName, kinesisEventWriter, secondsToGenerateSecondRound, FIRST_EVENT_TIME.plusSeconds(secondsToGenerateFirstRound)); | ||
| // Wait for kinesis stream to finish resharding |
There was a problem hiding this comment.
For the resharding test, I think you'll want to have longer timers for the event generation, with only 3s here I think it's maybe possible that AWS doesn't actually begin the resharding until you've already finished this second phase. Maybe 30s is better.
Or maybe it could check for the stream status becoming UPDATING and start the second phase then.
There was a problem hiding this comment.
Changed the logic to:
- after issuing reshard call
- do DescribeStream polling for an updating status or an active status with the final expected number of shards
- begin second phase when ^ true
- check that stream is active status (no need to check the number of shards since earlier we already check for "updating status or an active status with the final expected number of shards", hence if it is active now it was be the active after resharding)
- begin third phase when ^ true
There was a problem hiding this comment.
From running locally, I can see that resharding does takes around 30000-40000ms (3-4 mins). This means that after issuing reshard call, when we check for "updating status or an active status with the final expected number of shards" immediately after, then very most likely it will be "updating status" that returns true (rather than "active status with the final expected number of shards"). I am only including "active status with the final expected number of shards" check in case the reshard finish by the time we do the check (most likely wont happen)
| ); | ||
| // Start generating remainding data (after resharding) | ||
| wikipediaStreamEventGenerator.start(streamName, kinesisEventWriter, secondsToGenerateRemaining, FIRST_EVENT_TIME.plusSeconds(secondsToGenerateFirstRound + secondsToGenerateSecondRound)); | ||
| // Verify supervisor is healthy after suspension |
There was a problem hiding this comment.
Suggest having a supervisor healthy check as well before the resharding occurs, so the resharding occurs while the supervisor is running
| this.queryHelper.testQueriesFromString(querySpec, 2); | ||
| LOG.info("Shutting down supervisor"); | ||
| indexer.shutdownSupervisor(supervisorId); | ||
| // wait for all kafka indexing tasks to finish |
| } | ||
|
|
||
| @Test | ||
| public void testKineseIndexDataWithLegacyParserStableState() throws Exception |
There was a problem hiding this comment.
In the test names, Kinese -> Kinesis here and elsewhere
) * kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * Kinesis IT * fix kinesis timeout * Kinesis IT * Kinesis IT * fix checkstyle * Kinesis IT * address comments * fix checkstyle
Add Integration Test for functionality of kinesis ingestion
Description
The new set of integration test for Kinesis follows the same concept as Bring Your Own Cloud (BYOC) that S3, GCS, Azure Integration tests uses (#9501). Basically, anyone running will have to provide their own Kinesis credentials in a conf file and pass the file to mvn using -Doverride.config.path
Added following Integration Test for functionality of kinesis ingestion:
To verify ingestion:
Added integration test infrastructure/helper:
This PR has: