Make Cascading Fabric selection configurable, no longer hardcoding Hadoo...#1220
Make Cascading Fabric selection configurable, no longer hardcoding Hadoo...#1220ianoc merged 6 commits intotwitter:developfrom
Conversation
|
Hello, while experimenting with cascading-3.0.0-wip-75, noticed HadoopFlowConnector was always pulled in. Thanks :) |
|
This doesn't look like it would want to necessarily be merged to develop, the hadoop2 settings reflect some changed classes in cascading3 by your comment? and with other non-compatible changes expected maybe there needs to be a cascading3 branch? |
|
Well, this may be a little premature indeed, but this was pretty much the only breaking API change I encountered today. There may very well be data-breaking changes down the road (just dipping my toes at this point), but compile-wise, that seems it as of 3.0.0-wip-75. To the best of my knowledge, hadoop2-mr1 is a 2.6.x feature, which is already hard to access in scalding due to the hard dependency on HadoopFlowConnector; this is why it seemed to me a good idea to remove the hard dep and use a crude reflection-based factory. Would removing the tez defaults (keeping the command-line override for the adventurous) make this more suitable for inclusion? Le 6 mars 2015 17:10:17 GMT+01:00, ianoc notifications@github.com a écrit :
|
|
If the new args don't break anything in cascading 2.6 (ideally 2.5? ), just that the tez options don't work, then i've actually no real problem with the inclusion. The new class for hadoop2 is in 2.6 with the other name is it? Most existing users will continue to use --hdfs so the way you've done it shouldn't break anything for them. With these changes were you able to run a job on tez? |
There was a problem hiding this comment.
Should probably try catch around this and print a good error message, why we are loading it and that it comes from that field in the jobconf
There was a problem hiding this comment.
Can we use the config passed into this method instead of the jobConf?
There was a problem hiding this comment.
Yes, it'd make sense to wrap classloader exceptions here and at least steer the user to the wiki, as already done on other exceptions. And on the wiki, a longer explanation that the user should provide the appropriate cascading fabric jar.
Will update that on Monday! Thanks for the suggestion.
|
I'll check and report (Monday) the 2.5 / 2.6 story. Yes, the intent is to have --hdfs behave as it always has. On tez, yes, I have some bits running, some failing (definitely not yet prime time), will have more to report on Monday (including a preliminary answer to the question 'did it come to the same results😜'). Le 6 mars 2015 18:12:53 GMT+01:00, ianoc notifications@github.com a écrit :
|
|
Awesome, this is great work btw thanks. Very interested to hear how you get on with the tez experiements! |
|
Thanks for the feedback, @ianoc ; updated with more explicit diagnostics checks. Running with --hadoop2-mr1 works with Cascading 2.6.3 and triggers the use of Hadoop2MR1FlowConnector as expected. With regards to "Can we use the config passed into this method instead of the jobConf?": I'm not sure what this entails. I was after using command-line switched in order to make it easier to override the default FlowConnector / FlowProcess class selection, but anything could be done. RFC: at this point, the XHandlerTest logic fails, because RichXHandler$'s apply method wants to match the mapping only against the root cause exception. I'm considering changing this to looking at the mapping at each exception nesting level, until one non-null mapping or the root cause is found. I'm not sure this would risk breaking much beyond the #anchor of the error url and guess, but I'm a bit afraid of the consequences… |
a5b23ef to
e210437
Compare
|
Somehow I missed this (I had just glanced at it). This is amazing. It would be great to get this merged in time for the next release. @cchepelov can you link and issues you hit? I saw this one: |
|
thanks @johnynek, the credit really goes to the many teams building all those nice LEGO bricks. I'm just throwing assemblies at walls until they break ;) https://issues.apache.org/jira/browse/TEZ-2237 is where I'm hitting a bump at the moment; it appears it might be an avatar of https://issues.apache.org/jira/browse/TEZ-2214 after all. I intend to send a writeup of the experience to the mailing lists, with numbers, as soon as I reach comparable outputs (hopefully in about a week), but the really terrific news is that the changes, if any, are mostly localised to SBT and the command line (in my setup, which is a little crude: CSV to CPU to CSV). Didn't give the hadoop2-mr1 backend the love it also deserves, but it appears to perform so far (it'll be in our Jenkins compar-o-matic by the time we report). |
|
@cchepelov Did you manage to compile scalding against cascading 3 for your tests? or build binaries with the cascading2.x jar and at runtime try use tez? |
|
Hello @ianoc; I'm using scalding built against cascading 2.6.1 (or whatever was in remotes/upstream/develop as of mid-March). My application does pull cascading 3.0.0-wip-X which evicts the 2.6.1 pulled by scalding-0.13.1-PR1220. |
|
So you've never tried to run the scalding tests against cascading 3 then? your going between two major versions of cascading that aren't necessarily binary compatible. (indeed when i tried it tests don't pass, which would be worrysome to merge this). |
|
Indeed no, I haven't tried to run the scalding tests against cascading As I understand it, enabling fabrics and upgrading the version of the With respect to Cascading 3.0.0-wip-X and Scalding (whichever the Le 06/05/2015 17:23, ianoc a écrit :
|
|
@cchepelov i think we could make a cascading-3 branch of scalding. How does this sound @isnotinvain @ianoc ? Then we can make PRs against that, and try to get the tests green. When we can get the existing tests green and some new Tez tests, we can look at publishing a jar. How does this sound? |
|
This looks good to me now. Pretty small change. Can we merge with develop, make sure we are still green and then we can merge this. Next step, get the cascading 3 branch. |
e210437 to
973bc77
Compare
|
Uh oh, it seems Travis has trouble downloading SBT today http://stackoverflow.com/questions/31556049/configure-repo-for-sbt-launcher-in-travis-build which causes the test failures here. There seems to be a copy around at http://dl.bintray.com/typesafe/ivy-releases/org.scala-sbt/sbt-launch/0.13.8/ but I haven't checked personally the checksums. Currently the test suite locally (the patch went stale because of a very trivial import issue, now fixed), will report. |
…doopFlowConnector
07b71be to
59899eb
Compare
|
Hi guys, what is the plan about merging this commit and enabling the scalding on top of the Tez in general? |
|
I think this can be merged, per @johnynek prior review. However this by itself won't make cascading3/Tez stuff work fully is my understanding. Scalding will not compile against cascading 3.0 on develop since the binary signatures changed there. I've not seen any fully working PR's so far to get a vertically working cascading 3.0 patch in. Few people have made some strides but not sure when the leap will be made is the short answer. |
|
Thanks for the feedback @ianoc. Yes, this patch is a stepping stone, but not the end of the "vertically In addition to PR1220 and PR1446, I'll tip my hat to @themodernlife, who |
|
@cchepelov Btw thanks again for driving this all along. Twitter should be putting some resources to this effort starting this month to get us internally using your work so that should hopefully help you with testing and getting it all done. |
Make Cascading Fabric selection configurable, no longer hardcoding Hadoo...
This removes hardcoding for HadoopFlowConnector in c.t.s.Mode, and adds in new job configuration parameters to allow for supplying the desired class instead.
Also, the --hdfs parameter is complemented with --hadoop2 (same behaviour), --hadoop2-mr1 and --hadoop2-tez which supply implementation names for FlowConnector and FlowProcess
--local remains hard-wired to LocalFlowConnector