-
Notifications
You must be signed in to change notification settings - Fork 505
METRON-1732: Fix job status liveness bug and parallelize finalizer file writing #1157
METRON-1732: Fix job status liveness bug and parallelize finalizer file writing #1157
Conversation
…to account for 25% of total job progress
…lism as property in Ambari.
|
This PR also updates the status reporting to include 25% of the progress to include the finalizer. Testing locally found that a query via the Alerts PCAP UI with page size set small (10 results per page), resulting in 7,299 pages took 15 minutes with parallelism set to 1. With parallelism set to 8 it went down to 2-3 minutes. |
| Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. | ||
|
|
||
| Pcap query jobs have a finalization routine that writes their results out to HDFS in pages. There is a threadpool used for this finalization that can be configured to use a specified number of threads. | ||
| This setting is exposed as the Spring property `pcap.finalizer.threadpool.size` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document the default value for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any advice on when a user should increase/decrease this value? Are there errors I might see that would be resolved by increasing/decreasing this value?
If you don't have a good understanding of this, then we don't need to worry about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mention that 1C, 4C are valid values in addition to integers? Perhaps just copy the text you have in the Ambari description into the README. Good stuff.
| private static int getNumThreads(String numThreads) { | ||
| String numThreadsStr = ((String) numThreads).trim().toUpperCase(); | ||
| if (numThreadsStr.endsWith("C")) { | ||
| Integer factor = Integer.parseInt(numThreadsStr.replace("C", "")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a catch block for when a user enters an invalid value?
We could catch and provide a helpful exception message like "Invalid value for property 'pcap.finalizer.threadpool.size'; value='3CCC'". Otherwise, its kind of murky to the user what happened.
| } | ||
| mrJob.submit(); | ||
| jobStatus.withState(State.SUBMITTED).withDescription("Job submitted").withJobId(mrJob.getJobID().toString()); | ||
| synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment about why we need the lock here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. This lock is about thread visibility as opposed to actual issues with concurrent modification. It may be that this lock is not need with getStatus being synchronized. I will double check and report back via modified code and/or code comment on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, turns out I was right first time around. Synchronization is necessary for visibility in the timer thread that is started after these modifications. I've updated the comments in code to describe this.
| outFiles.add(path); | ||
| } | ||
| } catch (IOException ioe) { | ||
| throw new RuntimeException("Failed to write results", ioe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the path that failed to write to the exception message?
| ForkJoinPool tp = new ForkJoinPool(parallelism); | ||
| try { | ||
| tp.submit(() -> { | ||
| toWrite.entrySet().parallelStream().forEach(e -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be calling tp.submit for each (path, data)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, submit is effectively submitting the set of tasks for the parallel stream to execute within this threadpool, e.g. https://www.baeldung.com/java-8-parallel-streams-custom-threadpool. As a side note, the reason for a custom threadpool at all is so that this doesn't cause issues with other streams since the default in Java is to use a global context for this sort of thing. Liveness issues may arise when using the shared global context.
|
Good feedback @nickwallen, I'll make adjustments. |
|
@nickwallen I've addressed your review comments. Let me know what you think. |
|
Testing Test plan pulled from here - #1081 (comment) Get PCAP data into Metron:
Fixed filter
Query filter
Also run riffs on the fixed query via the Metron Alerts UI PCAP query panel. |
|
Note - I also added a small blurb about pcap page size to the README.md alongside the notes on setting the finalizer threads. This was missed previously. |
|
I ran a quick test in REST and it looks like the status never gets to Here is my request: After the job finishes (looking at the RM UI), the status is: If I keep requesting status it always returns this. |
|
Please disregard. I failed to deploy the Ambari changes correctly. |
|
Awesome! Thanks for the review and smoke test @nickwallen and @merrimanr. I am going to go ahead and merge this. |
…le writing (mmiklavc via mmiklavc) closes #1157
|
Closing as merged |

Contributor Comments
https://issues.apache.org/jira/browse/METRON-1732
This still needs to have the # of finalizer threads option exposed for the REST application, but since it's multi-threaded code I wanted to get the review process started while I finish that part up.
Test plan and more detailed description to follow.
Pull Request Checklist
For all changes:
For code changes:
Have you included steps to reproduce the behavior or problem that is being changed or addressed?
Have you included steps or a guide to how the change may be verified and tested manually?
Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
Have you written or updated unit tests and or integration tests to verify your changes?
n/a
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?
For documentation related changes:
Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via
site-book/target/site/index.html:Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.