add test for batch indexing from hadoop#2650
Conversation
|
👍 FWIW, I think it'd be awesome to have a Hadoop test farm where integration tests run against different versions of Hadoop and ppl can contribute their own versions |
There was a problem hiding this comment.
Is it documented that this must be set in the docker image?
There was a problem hiding this comment.
or is this just evaluated in the host env? I'm a bit flummoxed on when this gets executed.
There was a problem hiding this comment.
The ${env.xxx} things have to be present in environment variables when the mvn command is given to run tests with docker. Magically, the values from the environment are injected into DockerConfigProvider.
If we had docker set up so it could do hadoop, we'd edit integration-tests/README.md to say to export HADOOP_DIR to the name of a directory containing batchHadoop1, where the test data is in hdfs.
Even though the tests can't be run in docker yet, I wanted to add this line and make the mods to DockerConfigProvider.java to be able to do the configuration for the hdfs location, while I had it in my head what needed to be done. Maybe I should file an issue about hadoop-in-docker, and say that the support for using it is already there in the code, and what would need to be added to the README.
There was a problem hiding this comment.
I've modified issue #2531, which was for a batch indexing test, to be for being able to run such a test with docker.
Any other comments on this PR?
ebabc05 to
e7a7ecd
Compare
This PR includes a new integration test that does hadoop indexing.
I wanted to keep this new test separate from the existing tests, so that it wouldn't run with the "mvn -P integration-tests" command. That's because I thought it reasonable to expect that not all test environments would have access to hadoop (that's true of some our test environments). Also, the current setup of docker does not include hadoop access. I expect it's possible to add that, but I didn't want to do that as part of this PR.
So, I put the test into a new package "hadoop" so that it's easy to exclude it in testng.xml. Testng does not seem to provide a way to exclude classes or methods from included packages. I could have changed testng.xml to list the classes, but that would preclude adding an indexer or query test without editing testng.xml.
I decided to make configurable the location of the input data in hdfs, because (like for us) folks running the test might be constrained as to the location of space in hdfs that's available for their use.
I made a couple of changes to make it possible to express that configured value when using docker, in anticipation of running the test that way someday.