-
Notifications
You must be signed in to change notification settings - Fork 506
Conversation
exclude hadoop-common deps from management pom
|
A couple of points of context around what changed here:
One thing to note, is that when this PR is merged, we will need to regenerate the image used in vagrant for |
|
The Dyn DNS attacks seem to be affecting the Travis website. Will check on this later. |
|
I ran this up on EC2 and found a couple of issues:
|
|
What should the path be for EC2? In the Flume Ansible scripts I see the following - java_home: /usr/jdk64/jdk1.8.0_60 |
|
Can we close and reopen this so travis runs again, @mmiklavc ? |
|
It appears that the first issue (snort) is being addressed as part of METRON-514. Any insight on what's going on with the |
|
There also appears to be a version issue with the Storm-kafka client and server versions due to a mismatch with HDP. HDP 2.5 pulls in some commits from a later version of Storm. http://stackoverflow.com/questions/39932441/storm-ui-throwing-offset-lags-for-kafka-not-supported-for-older-versions-pleas This doesn't appear to affect the topologies working, but it does make it appear like there's a problem through the UI. Should we hold off on this PR until this is resolved? One suggestion is to leverage profiles to enable building against different repos. There would still need to be an Apache version that is built and deployed with EC2, full-dev, and quick-dev. And all of those environments are based on HDP bits, meaning the lag error will show up there regardless of what we do with profiles. We might also try modifying the classpath used when submitting the topology to use the local storm-kafka bits. Community thoughts welcome. |
|
@cestella the /tmp issue did not reappear after a redeploy |
|
So long as the warning does not affect functionality (any storm committer around want to comment on this assertion?), I would vote that it is not a blocker for this PR. I would suggest a follow-on PR to introduce profiles to support the HDP repos if we really don't like this error. If the |
|
The /tmp issue didn't re-occur. I'd like to see if we can get this out without the error display in the UI. I was confused by it and I suspect others would be as well. |
|
@cestella - I do. It's something we'll need to do anyway, may as well. I can't think of anything easier that'd do it. |
|
I've added a profile and am currently testing this out. |
|
I'm now seeing a unit test failure when swapping out Apache Storm 1.0.1 for the HDP repo version. Tests pass in IntelliJ, not on the CLI. Investigating. EDIT: |
|
It looks like those are just testing defaults that we don't actually set. Do I have that right? |
|
Ok, looks like Travis is failing due to a license check. PMC members, do we need to run this for all profiles, or just the default? |
|
I just added a commit that should address the issues with the licenses. I've modified the verify_license.py to print a list of offending licenses rather than print them 1-by-1. Also, the script will now check licenses for the default profile as well as the HDP-2.5.0.0 profile. |
|
Before we accept this, I want to point out that I've changed the dependencies_with_url.csv file and that it's probably worth a look. |
|
This ran well on EC2- deployment was good, expected data flow was good, Kafka offset tracking worked as expected. I'm +1, but there's some things to do prior to pulling this in or Quick Dev and the Docker containers will break.
I'm all set, +1, great job all! |
|
Are there instructions for doing either of those things? |
|
Yes. The Packer stuff is part of Metron and those instructions are in a README. The Docker stuff is something I maintain as a courtesy to the community based on docker-ambari. My fork with the latest jdk8 stuff is here. That's what I intend to update to use HDP 2.5. |
|
@mmiklavc - I had to make a small tweak to the Quick Dev Vagrantfile for the new image. It's backwardly compatible, fwiw. Just added ambari-slave to the default tags. Do you want that as a PR against your branch or a separate Jira/PR pair? |
|
@dlyle65535 I don't have a strong opinion on it. I'm giving attribution to @justinleet on this PR, since he laid the foundation of pretty much everything here. If you file separately we can give you attribution for the vagrant change. |
|
A note for the community - the /tmp file problem did reoccur for us. As it turns out, the timeout default for starting up topologies in Monit was set too low. Normally, Storm cleans up after itself whether a topology succeeds or fails. But due to Monit's timeout setting, it was killing the process prior to completion. As a result, the tmp jar files were being left in /tmp, and Monit continued to retry every minute or two, subsequently filling up the disk space pretty quickly with the ~70MB uber jars. |
|
@mmiklavc PR sent. Thanks! |
Make sure ambari-agent has started prior to starting services.
|
I think this monit timeout issue is part of the problem on low resource machines, and with 'zombie' storm threads being left behind |
|
@ottobackwards - concur. If it exceeds the start/stop timeout (defaults to 30 seconds), Monit will terminate the start/stop process and try again. So, 60 worked on my larger machines and on my quick dev testing, but may not be correct for everybody. Maybe monitor and adjust if necessary? |
|
I checked the licensing changes @mmiklavc and they look sensible to me. |
|
@dlyle65535 gave it a provisional +1 (pending docker images), but I want to pile on with a +1 (non-binding since I have some commits in here). Great job @mmiklavc seeing this to completion. Very non-trivial, so kudos. |
|
Hello, I am not sure if this is a good place for jumping it, but I have installed Metron with HDP 2.5 using this great article: Fixed few issues here and there I was able to make it running. However all my Storm topologies are having: Considering that sir Michael Miklavcic said: "Modifying build versions for Storm removes the Storm Kafka lag error from the UI." I have mavened my built like that: Is there anything else I need to do in order to get Storm working with HDP 2.5? p.s. I've used latest metron code-base from apache incubator. |
|
Once you have data in your kafka queue this should go away. |
|
Would it make sense to put an instantiation or genesis message on the topic Jon On Tue, Nov 8, 2016, 11:47 James Sirota notifications@github.com wrote:
Jon Sent from my mobile device |
|
Thank you James,
That is true! Once I create a topic and stream data through it the error is gone. My data is now going to enrichment and both bolts and spouts (all of them) are having this weird error: And supervisor crashes also after 5-10 minutes with: Even though I have more than 30 GB RAM available. Do I need to tune Storm for better memory usage?
|



Note: @cestella and I picked up Justin's work while he's out, but attribution for this PR should go to @justinleet.
Original testing plan here
Big thanks to @ottobackwards for assisting with testing and verification.