Skip to content

Abstractify HadoopTask#1367

Merged
cheddar merged 1 commit intoapache:masterfrom
metamx:hadoopTaskAbstraction
May 21, 2015
Merged

Abstractify HadoopTask#1367
cheddar merged 1 commit intoapache:masterfrom
metamx:hadoopTaskAbstraction

Conversation

@drcrallen
Copy link
Copy Markdown
Contributor

  • Add invokeForeignLoader to commonize the way tasks are attempted to be launched in a foreign class loader
  • Add buildClassLoader to accomplish the common tasks for hadoop jobs when building a ClassLoader

This is to help take a baby step towards abstracting hadoop tasks better (and allowing things like Conversion tasks also use hadoop)

* Add `invokeForeignLoader` to commonize the way tasks are attempted to be launched in a foreign class loader
* Add `buildClassLoader` to accomplish the common tasks for hadoop jobs when building a ClassLoader
@cheddar
Copy link
Copy Markdown
Contributor

cheddar commented May 18, 2015

This generally LGTM, assuming it still works :). Is there a way to add tests?

@drcrallen
Copy link
Copy Markdown
Contributor Author

@cheddar : @xvrl and I briefly spoke about what the best way to test this kind of stuff is. Due to the overhead of setting up and tearing down a yarn and hdfs temporary single-node-cluster, a test outside the default maven test route would be preferable

It is worth noting that I used this technique for the recompressing segments and reduced our metrics datasize by about 20%. So in general the technique works. The question simply remains how best to test that this task still functions.

A few options for adding in hadoopy tests are:

  1. Add hadoopy stuff in the official integration test
  2. Add self-contained but still integration tests as a separate maven profile (example maven clean test -P IT)
  3. Manually set @Ignore until we decide it is time to run the test

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

earlier it was HadoopIndexGeneratorInnerProcessing.class.getName() which is more compiler friendly. You can possibly change the invokeForeignLoader(Class clazz,...) instead of invokeForeignLoader(String clazz,..) too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way on purpose. The prior way didn't work as expected because it was causing the clazz to get loaded in the current classloader instead of the foreign classloader. This way achieves complete isolation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have an example of what did not work as a consequence of this? This is something we should have a test for

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was in some modifications I did for the compression task, not in the hadoop task (though some OTHER hadoop problems may have been a result of this).

I tried to find a way to add unit tests but couldn't find a clean method (all of maybe 20 minutes). If anyone knows how to do classloading isolation unit tests I'm all ears.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, there were some classes in the target clazz (method parameters, etc) that were getting loaded in the current classloader, so when it came time to use them, they were failing because the class was not loaded properly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to make sure this doesn't introduce new problems that the old way of loading somehow masked.

@drcrallen
Copy link
Copy Markdown
Contributor Author

It works locally on a yarn single-node cluster:

screen shot 2015-05-21 at 10 22 02 am

cheddar added a commit that referenced this pull request May 21, 2015
@cheddar cheddar merged commit fae86e8 into apache:master May 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants