Skip to content

add test for batch indexing from hadoop#2650

Merged
himanshug merged 1 commit intoapache:masterfrom
rasahner:IT_index_hadoop
Mar 23, 2016
Merged

add test for batch indexing from hadoop#2650
himanshug merged 1 commit intoapache:masterfrom
rasahner:IT_index_hadoop

Conversation

@rasahner
Copy link
Copy Markdown
Contributor

This PR includes a new integration test that does hadoop indexing.

I wanted to keep this new test separate from the existing tests, so that it wouldn't run with the "mvn -P integration-tests" command. That's because I thought it reasonable to expect that not all test environments would have access to hadoop (that's true of some our test environments). Also, the current setup of docker does not include hadoop access. I expect it's possible to add that, but I didn't want to do that as part of this PR.

So, I put the test into a new package "hadoop" so that it's easy to exclude it in testng.xml. Testng does not seem to provide a way to exclude classes or methods from included packages. I could have changed testng.xml to list the classes, but that would preclude adding an indexer or query test without editing testng.xml.

I decided to make configurable the location of the input data in hdfs, because (like for us) folks running the test might be constrained as to the location of space in hdfs that's available for their use.

I made a couple of changes to make it possible to express that configured value when using docker, in anticipation of running the test that way someday.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Mar 14, 2016

👍

FWIW, I think it'd be awesome to have a Hadoop test farm where integration tests run against different versions of Hadoop and ppl can contribute their own versions

@fjy fjy added this to the 0.9.1 milestone Mar 14, 2016
Comment thread integration-tests/pom.xml Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it documented that this must be set in the docker image?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or is this just evaluated in the host env? I'm a bit flummoxed on when this gets executed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ${env.xxx} things have to be present in environment variables when the mvn command is given to run tests with docker. Magically, the values from the environment are injected into DockerConfigProvider.

If we had docker set up so it could do hadoop, we'd edit integration-tests/README.md to say to export HADOOP_DIR to the name of a directory containing batchHadoop1, where the test data is in hdfs.

Even though the tests can't be run in docker yet, I wanted to add this line and make the mods to DockerConfigProvider.java to be able to do the configuration for the hdfs location, while I had it in my head what needed to be done. Maybe I should file an issue about hadoop-in-docker, and say that the support for using it is already there in the code, and what would need to be added to the README.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified issue #2531, which was for a batch indexing test, to be for being able to run such a test with docker.

Any other comments on this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants