Skip to content

Make Cascading Fabric selection configurable, no longer hardcoding Hadoo...#1220

Merged
ianoc merged 6 commits intotwitter:developfrom
cchepelov:free-flow-connector
Jan 4, 2016
Merged

Make Cascading Fabric selection configurable, no longer hardcoding Hadoo...#1220
ianoc merged 6 commits intotwitter:developfrom
cchepelov:free-flow-connector

Conversation

@cchepelov
Copy link
Copy Markdown
Contributor

This removes hardcoding for HadoopFlowConnector in c.t.s.Mode, and adds in new job configuration parameters to allow for supplying the desired class instead.

Also, the --hdfs parameter is complemented with --hadoop2 (same behaviour), --hadoop2-mr1 and --hadoop2-tez which supply implementation names for FlowConnector and FlowProcess

--local remains hard-wired to LocalFlowConnector

@cchepelov
Copy link
Copy Markdown
Contributor Author

Hello,

while experimenting with cascading-3.0.0-wip-75, noticed HadoopFlowConnector was always pulled in.
This just enables experimentation, even though looking at https://groups.google.com/forum/#!topic/scalding-dev/h7Rk0VZ6BUg , some breakage's going to be expected.

Thanks :)

@ianoc
Copy link
Copy Markdown
Contributor

ianoc commented Mar 6, 2015

This doesn't look like it would want to necessarily be merged to develop, the hadoop2 settings reflect some changed classes in cascading3 by your comment? and with other non-compatible changes expected maybe there needs to be a cascading3 branch?

@cchepelov
Copy link
Copy Markdown
Contributor Author

Well, this may be a little premature indeed, but this was pretty much the only breaking API change I encountered today. There may very well be data-breaking changes down the road (just dipping my toes at this point), but compile-wise, that seems it as of 3.0.0-wip-75.

To the best of my knowledge, hadoop2-mr1 is a 2.6.x feature, which is already hard to access in scalding due to the hard dependency on HadoopFlowConnector; this is why it seemed to me a good idea to remove the hard dep and use a crude reflection-based factory.

Would removing the tez defaults (keeping the command-line override for the adventurous) make this more suitable for inclusion?

Le 6 mars 2015 17:10:17 GMT+01:00, ianoc notifications@github.com a écrit :

This doesn't look like it would want to necessarily be merged to
develop, the hadoop2 settings reflect some changed classes in
cascading3 by your comment? and with other non-compatible changes
expected maybe there needs to be a cascading3 branch?


Reply to this email directly or view it on GitHub:
#1220 (comment)

@ianoc
Copy link
Copy Markdown
Contributor

ianoc commented Mar 6, 2015

If the new args don't break anything in cascading 2.6 (ideally 2.5? ), just that the tez options don't work, then i've actually no real problem with the inclusion. The new class for hadoop2 is in 2.6 with the other name is it? Most existing users will continue to use --hdfs so the way you've done it shouldn't break anything for them.

With these changes were you able to run a job on tez?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably try catch around this and print a good error message, why we are loading it and that it comes from that field in the jobconf

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the config passed into this method instead of the jobConf?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it'd make sense to wrap classloader exceptions here and at least steer the user to the wiki, as already done on other exceptions. And on the wiki, a longer explanation that the user should provide the appropriate cascading fabric jar.

Will update that on Monday! Thanks for the suggestion.

@cchepelov
Copy link
Copy Markdown
Contributor Author

I'll check and report (Monday) the 2.5 / 2.6 story. Yes, the intent is to have --hdfs behave as it always has.

On tez, yes, I have some bits running, some failing (definitely not yet prime time), will have more to report on Monday (including a preliminary answer to the question 'did it come to the same results😜').

Le 6 mars 2015 18:12:53 GMT+01:00, ianoc notifications@github.com a écrit :

If the new args don't break anything in cascading 2.6 (ideally 2.5? ),
just that the tez options don't work, then i've actually no real
problem with the inclusion. The new class for hadoop2 is in 2.6 with
the other name is it? Most existing users will continue to use --hdfs
so the way you've done it shouldn't break anything for them.

With these changes were you able to run a job on tez?


Reply to this email directly or view it on GitHub:
#1220 (comment)

@ianoc
Copy link
Copy Markdown
Contributor

ianoc commented Mar 6, 2015

Awesome, this is great work btw thanks. Very interested to hear how you get on with the tez experiements!

@cchepelov
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback, @ianoc ; updated with more explicit diagnostics checks.

Running with --hadoop2-mr1 works with Cascading 2.6.3 and triggers the use of Hadoop2MR1FlowConnector as expected.

With regards to "Can we use the config passed into this method instead of the jobConf?": I'm not sure what this entails. I was after using command-line switched in order to make it easier to override the default FlowConnector / FlowProcess class selection, but anything could be done.

RFC: at this point, the XHandlerTest logic fails, because RichXHandler$'s apply method wants to match the mapping only against the root cause exception. I'm considering changing this to looking at the mapping at each exception nesting level, until one non-null mapping or the root cause is found. I'm not sure this would risk breaking much beyond the #anchor of the error url and guess, but I'm a bit afraid of the consequences…
Something like that: https://gist.github.com/cchepelov/9241e7618dcd19c7f6d9

@cchepelov cchepelov force-pushed the free-flow-connector branch 2 times, most recently from a5b23ef to e210437 Compare March 17, 2015 08:13
@johnynek
Copy link
Copy Markdown
Contributor

Somehow I missed this (I had just glanced at it).

This is amazing. It would be great to get this merged in time for the next release.

@cchepelov can you link and issues you hit? I saw this one:
https://issues.apache.org/jira/browse/TEZ-2237

@cchepelov
Copy link
Copy Markdown
Contributor Author

thanks @johnynek, the credit really goes to the many teams building all those nice LEGO bricks. I'm just throwing assemblies at walls until they break ;)

https://issues.apache.org/jira/browse/TEZ-2237 is where I'm hitting a bump at the moment; it appears it might be an avatar of https://issues.apache.org/jira/browse/TEZ-2214 after all.

I intend to send a writeup of the experience to the mailing lists, with numbers, as soon as I reach comparable outputs (hopefully in about a week), but the really terrific news is that the changes, if any, are mostly localised to SBT and the command line (in my setup, which is a little crude: CSV to CPU to CSV).

Didn't give the hadoop2-mr1 backend the love it also deserves, but it appears to perform so far (it'll be in our Jenkins compar-o-matic by the time we report).

@ianoc
Copy link
Copy Markdown
Contributor

ianoc commented Apr 23, 2015

@cchepelov Did you manage to compile scalding against cascading 3 for your tests? or build binaries with the cascading2.x jar and at runtime try use tez?

@cchepelov
Copy link
Copy Markdown
Contributor Author

Hello @ianoc; I'm using scalding built against cascading 2.6.1 (or whatever was in remotes/upstream/develop as of mid-March). My application does pull cascading 3.0.0-wip-X which evicts the 2.6.1 pulled by scalding-0.13.1-PR1220.

@ianoc
Copy link
Copy Markdown
Contributor

ianoc commented May 6, 2015

So you've never tried to run the scalding tests against cascading 3 then? your going between two major versions of cascading that aren't necessarily binary compatible. (indeed when i tried it tests don't pass, which would be worrysome to merge this).

@cchepelov
Copy link
Copy Markdown
Contributor Author

Indeed no, I haven't tried to run the scalding tests against cascading
3; wanted to patch with as light a touch as I could. However, the patch
is also valid on a pure Cascading 2.6.X perspective: it enables running
with the /hadoop2-mr1/ fabric rather than the legacy /hadoop/ fabric.
And the tests did pass, at least when I last updated the PR.

As I understand it, enabling fabrics and upgrading the version of the
Cascading frameworks involved are quite orthogonal undertakings. This PR
only enables the selection of the fabric, bumping cascading's major
version remains an application-level responsibility (thankfully, the API
subset used by Scalding hasn't changed).

With respect to Cascading 3.0.0-wip-X and Scalding (whichever the
backend between local, hadoop, hadoop2-mr1 and hadoop2-tez): we're now
running all our application-level acceptance tests on all four backends
at all times (a milestone which we hadn't achieved a month ago when I
last updated this PR). This has since my initial public announcement
uncovered a few subtle nasty behaviours, which we're reporting on a
real-time basis to @cwensel. We have almost-daily exchanges of reports
and updated -wips, and from our consumer perspective, the progress is great.
I wonder if it wouldn't make sense to start something automated in
scalding to track those non-passing scalding tests as the 3.0.0-wip-X
creeps towards general release, at least on the -hadoop backend. Again,
this is a distinct issue from fabric-switching. Would a separate PR be
useful to track cascading-3 compatibility?

Le 06/05/2015 17:23, ianoc a écrit :

So you've never tried to run the scalding tests against cascading 3
then? your going between two major versions of cascading that aren't
necessarily binary compatible. (indeed when i tried it tests don't
pass, which would be worrysome to merge this).


Reply to this email directly or view it on GitHub
#1220 (comment).

@johnynek
Copy link
Copy Markdown
Contributor

johnynek commented May 7, 2015

@cchepelov i think we could make a cascading-3 branch of scalding. How does this sound @isnotinvain @ianoc ? Then we can make PRs against that, and try to get the tests green. When we can get the existing tests green and some new Tez tests, we can look at publishing a jar.

How does this sound?

@johnynek
Copy link
Copy Markdown
Contributor

This looks good to me now. Pretty small change. Can we merge with develop, make sure we are still green and then we can merge this.

Next step, get the cascading 3 branch.

@cchepelov cchepelov force-pushed the free-flow-connector branch from e210437 to 973bc77 Compare July 22, 2015 07:46
@cchepelov
Copy link
Copy Markdown
Contributor Author

Uh oh, it seems Travis has trouble downloading SBT today http://stackoverflow.com/questions/31556049/configure-repo-for-sbt-launcher-in-travis-build which causes the test failures here. There seems to be a copy around at http://dl.bintray.com/typesafe/ivy-releases/org.scala-sbt/sbt-launch/0.13.8/ but I haven't checked personally the checksums.

Currently the test suite locally (the patch went stale because of a very trivial import issue, now fixed), will report.

@cchepelov cchepelov force-pushed the free-flow-connector branch from 07b71be to 59899eb Compare July 23, 2015 06:39
@JTaky
Copy link
Copy Markdown

JTaky commented Dec 24, 2015

Hi guys, what is the plan about merging this commit and enabling the scalding on top of the Tez in general?
Thanks

@ianoc
Copy link
Copy Markdown
Contributor

ianoc commented Jan 4, 2016

I think this can be merged, per @johnynek prior review. However this by itself won't make cascading3/Tez stuff work fully is my understanding. Scalding will not compile against cascading 3.0 on develop since the binary signatures changed there. I've not seen any fully working PR's so far to get a vertically working cascading 3.0 patch in.

Few people have made some strides but not sure when the leap will be made is the short answer.

@cchepelov
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback @ianoc.

Yes, this patch is a stepping stone, but not the end of the "vertically
working cascading 3.0 patch" indeed. There's more in
#1446 (now revised to include
your earlier feedback).

In addition to PR1220 and PR1446, I'll tip my hat to @themodernlife, who
already took it one step further
http://themodernlife.github.io/scala/hadoop/hdfs/sclading/flink/streaming/realtime/2015/12/20/running-scalding-jobs-on-apache-flink/
­— there's eagerness out there I guess ;-)

@ianoc
Copy link
Copy Markdown
Contributor

ianoc commented Jan 4, 2016

@cchepelov Btw thanks again for driving this all along. Twitter should be putting some resources to this effort starting this month to get us internally using your work so that should hopefully help you with testing and getting it all done.

ianoc added a commit that referenced this pull request Jan 4, 2016
Make Cascading Fabric selection configurable, no longer hardcoding Hadoo...
@ianoc ianoc merged commit 55f03f7 into twitter:develop Jan 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants