Skip to content

Adding s3, gcs, azure integration tests#9501

Merged
clintropolis merged 15 commits intoapache:masterfrom
maytasm:IMPLY-2324
Mar 17, 2020
Merged

Adding s3, gcs, azure integration tests#9501
clintropolis merged 15 commits intoapache:masterfrom
maytasm:IMPLY-2324

Conversation

@maytasm
Copy link
Copy Markdown
Contributor

@maytasm maytasm commented Mar 10, 2020

Adding s3, gcs, azure integration tests

Description

Adding s3, gcs, azure integration tests which can be run BYO cloud style.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths.
  • added integration tests.
  • been tested in a test Druid cluster.

@maytasm maytasm changed the title Adding s3, gcs, azure integration tests [WIP] Adding s3, gcs, azure integration tests Mar 10, 2020
} No newline at end of file
}

setupData()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work for how our travis CI is setup but if user try running manually locally with...

  1. -DexcludedGroups=not-query
  2. without -Dgroups and without DexcludedGroups
    then the data won't be setup

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments indicating the above.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, one more thing I forgot about, please update the docs in https://github.com/apache/druid/blob/master/integration-tests/README.md to include instructions for how to supply credentials and run these tests

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@maytasm maytasm changed the title [WIP] Adding s3, gcs, azure integration tests Adding s3, gcs, azure integration tests Mar 13, 2020
Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 overall, will be nice to have integration tests available for stuff that isn't easily tested in travis due to external dependencies.

public class ITAzureParallelIndexTest extends AbstractITBatchIndexTest
{
// START: Change this with the configs for your Azure data
private static final String CONTAINER = "my-container";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean you need to edit this code to run the tests? Can this be a config so it can be passed in?

Copy link
Copy Markdown
Contributor Author

@maytasm maytasm Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change these to configs.


# move extensions into a seperate extension folder
# For druid-s3-extensions
mkdir -p $SHARED_DIR/docker/extensions/druid-s3-extensions
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this doesn't seem a great way to handle this and doesn't really reflect how this will likely be run in practice, because the extension dependency jars will still all be on the classpath, just the extension jar itself will be in another folder. I think we will probably want to get the druid install similar to how it is in normal packaging to provide a more realistic test environment, which probably precludes using mvn -B dependency:copy-dependencies, or at least the way we are currently using it, so that way all extensions are split out in the separate extensions folder and not on the classpath.

I guess this could be fine for now though, and we could figure out how to do this in a future PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It's very hard to separate out the extensions cleanly using mvn -B dependency:copy-dependencies
Moving away from mvn -B dependency:copy-dependencies to how it is in normal packaging using the distribution is the right way in my opinion but is a much bigger/unrelated change than the purpose of this PR. For the purpose of this PR, as long as we don't load the extension jar which checks for configs/credentials to be set, we are good.

druid_azure_account=<OVERRIDE_THIS>
druid_azure_key=<OVERRIDE_THIS>
druid_azure_container=<OVERRIDE_THIS>
druid_extensions_loadList=["druid-azure-extensions"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this approach with trying to use overrides seems like it is going to be sort of brittle and hard to maintain. Like, if i want to ingest data from s3 into hdfs deep storage, or any combination of extensions, I'm not sure how well this approach will hold up, I guess we'll need separate override files for each configuration? That said, maybe is fine until we determine a better solution to integration test config management.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The override-examples directory is meant to be a general guideline of what to override. You may override more or less depending on what test you are running/writing. You may combine multiple files from the override-examples directory if you are running combination of extensions (like ingest data from s3 into hdfs deep storage).
Maybe we can revisit this better for a more user-friendly and easier way

# The "query" and "security" test groups require data to be setup before running the tests.
# In particular, they requires segments to be download from a pre-existing s3 bucket.
# This is done by using the loadSpec put into metadatastore and s3 credientials set below.
if [ "$DRUID_INTEGRATION_TEST_GROUP" = "query" ] || [ "$DRUID_INTEGRATION_TEST_GROUP" = "security" ]; then
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this doesn't necessarily need to be changed now, but I think longer term we are going to need a mapping of test groups to configurations, but I'm not sure it should be encoded in a file in the container. This part should maybe be repurposed to allow running of initialization scripts that configurations bring with them to the container, and this stuff be in a script that initializes the legacy integration test environment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea. Should definitely revisit this

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants