Skip to content
This repository was archived by the owner on Aug 20, 2025. It is now read-only.

Conversation

@mmiklavc
Copy link
Contributor

Contributor Comments

https://issues.apache.org/jira/browse/METRON-1641

This enables Pcap Jobs to be run asynchronously. The PcapJob class itself is now a Statusable implementation that allows clients to poll for current JobStatus. This implementation exposes the new functionality on the job class but keeps the existing PcapCli functionality intact and unchanged. The tests for this will be in a comment below, taken from #256.

This validation should check that the current pcap functionality does not break. Follow on PR's will leverage the new asynchronous capabilities.

Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.

In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:

For all changes:

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?

For code changes:

  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?

  • Have you included steps or a guide to how the change may be verified and tested manually?

  • Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:

    mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh 
    
  • Have you written or updated unit tests and or integration tests to verify your changes?

  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via site-book/target/site/index.html:

    cd site-book
    mvn site
    

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.

@mmiklavc
Copy link
Contributor Author

Testing

Get PCAP data into Metron:

  1. Install and setup pycapa - look for "Install pycapa" here https://cwiki.apache.org/confluence/display/METRON/Metron+0.4.1+with+HDP+2.5+bare-metal+install+on+Centos+7+with+MariaDB+for+Metron+REST
  2. (if using singlenode vagrant) Kill the enrichment and sensor topologies via for i in bro enrichment yaf snort;do storm kill $i;done
  3. Start the pcap topology via $METRON_HOME/bin/start_pcap_topology.sh
  4. Start the pycapa packet capture producer on eth1 via /usr/bin/pycapa --producer --topic pcap -i eth1 -k node1:6667
  5. Watch the topology in the Storm UI and kill the packet capture utility from before, when the number of packets ingested is over 3k.
  6. Ensure that at at least 3 files exist on HDFS by running hadoop fs -ls /apps/metron/pcap
  7. Choose a file (denoted by $FILE) and dump a few of the contents using the pcap_inspector utility via $METRON_HOME//bin/pcap_inspector.sh -i $FILE -n 5
  8. Choose one of the lines and note the protocol.
  9. Note that when you run the commands below, the resulting file will be placed in the execution directory where you kicked off the job from.

Fixed filter

  1. Run a fixed filter query by executing the following command with the values noted above (match your start_time format to the date format provided - default is to use millis since epoch)
  2. $METRON_HOME/bin/pcap_query.sh fixed -st <start_time> -df "yyyyMMdd" -p <protocol_num> -rpf 500
  3. Verify the MR job finishes successfully. Upon completion, you should see multiple files named with relatively current datestamps in your current directory, e.g. pcap-data-20160617160549737+0000.pcap
  4. Copy the files to your local machine and verify you can them it in Wireshark. I chose a middle file and the last file. The middle file should have 500 records (per the records_per_file option), and the last one will likely have a number of records <= 500.

Query filter

  1. Run a Stellar query filter query by executing a command similar to the following, with the values noted above (match your start_time format to the date format provided - default is to use millis since epoch)
  2. $METRON_HOME/bin/pcap_query.sh query -st "20160617" -df "yyyyMMdd" -query "protocol == '6'" -rpf 500
  3. Verify the MR job finishes successfully. Upon completion, you should see multiple files named with relatively current datestamps in your current directory, e.g. pcap-data-20160617160549737+0000.pcap
  4. Copy the files to your local machine and verify you can them it in Wireshark. I chose a middle file and the last file. The middle file should have 500 records (per the records_per_file option), and the last one will likely have a number of records <= 500.

Copy link
Member

@cestella cestella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! I especially like the statusable abstraction here. Good job; I'm +1 after the full-dev testing checkbox is checked and the small comment I had.

<module>metron-enrichment</module>
<module>metron-solr</module>
<module>metron-parsers</module>
<module>metron-job</module>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we adjust the indentation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, the pom.xml has tabs instead of spaces. Rather than reformat everything in the file I just changed that line use tabs.

@merrimanr
Copy link
Contributor

Looks good! One thing I'm trying to wrap my head around is how we get status if we only have a job id or unique identifier for a job? JobStatus doesn't have an id so I'm assuming resultPath is the unique identifier here.

As far as I can tell an instance of org.apache.hadoop.mapreduce.Job is kept in memory and is responsible for reporting status. I can think of a couple scenarios where this might be problematic.
One is if I ran a query from the CLI but then wanted to get status from REST. How would that work? That's probably not a likely use case so maybe not an issue there. What happens if I submit a query through REST and REST is restarted while jobs are running? Do we lose job status information?

@cestella
Copy link
Member

I think for a first cut, it's ok to have the restrictions that:

  • the REST API controls only the jobs it creates. Otherwise, we would need more refactoring in the CLI to drop the output in the same HDFS directory rather than it be user specifiable and output locally. Ultimately, while they use the same mechanism, the UX is different between the two approaches (e.g. the CLI entirely cleans up after itself and outputs to the local directory whereas the REST approach stores the results in HDFS until manual cleanup).
  • If a job is running while the REST API dies, we should consider that job to be runaway and needs to be killed by the admin or left to complete without the result being published. One thing that we might consider doing is enabling the job naming to have a prefix of METRON_REST_PCAP so, upon REST start, it can kill existing jobs. I think for THIS PR, we should just have REST pcap jobs have that prefix and leave it to a follow-on PR to do the actual killing.

@cestella
Copy link
Member

One thing I didn't see. Can we make sure we pass along the yarn queue to the job?

@cestella
Copy link
Member

One more comment about restartability, I think we could potentially support this with this architecture in the future. You can recover the Job object from MR via the JobClient

Configuration conf = new Configuration();
JobClient jobClient = new JobClient(new JobConf(conf)); // deprecation WARN
JobID jobID = JobID.forName("job_201107011451_0001");   // deprecation WARN
RunningJob runningJob = jobClient.getJob(jobID);

We could look for jobs which are completed but not in the HDFS structure and recover them on REST start. I would suggest doing that as a follow-on though.

@merrimanr
Copy link
Contributor

Perfect. That addresses my concern. Doing this in a follow on is fine since it's not necessary when using the CLI.

@merrimanr
Copy link
Contributor

I spun this up in full dev, ran a fixed query, and got this error:

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file://./, expected: hdfs://node1:8020
	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:666)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:214)
	at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1181)
	at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1177)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1195)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1169)
	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1925)
	at org.apache.metron.common.utils.HDFSUtils.write(HDFSUtils.java:71)
	at org.apache.metron.pcap.writer.ResultsWriter.write(ResultsWriter.java:38)
	at org.apache.metron.pcap.mr.PcapJob.writeResults(PcapJob.java:270)
	at org.apache.metron.pcap.query.PcapCli.run(PcapCli.java:155)
	at org.apache.metron.pcap.query.PcapCli.main(PcapCli.java:52)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

Looks like the MR jobs succeeded but partitioning the files to the local FS did not work.

@mmiklavc
Copy link
Contributor Author

Good catch on the FS @merrimanr - also finding that via manual testing. I believe I have a workaround that degrades nicely to the configuration default and also allows you to pass in the scheme in the path.

FileSystem fs = FileSystem.newInstance(outPath.toUri(), config);

@mmiklavc
Copy link
Contributor Author

Also @merrimanr agreed about the points you made regarding restart-ability, etc. in the long run. In the short term, @cestella has done a rather good job of summarizing my thoughts on a v1 pass at this feature set.

I will:

  • add Job ID in the return status (was going to add that when I do the job manager follow-on to this PR, but I'll just do it here)
  • add the ability to pass a job name, ie job.setJobName(name) - I'll handle the actual job naming in the pcapservice and pass that as a parameter. I think that's a natural place for that logic.
  • Add ability to pass the queue name

How's that sound to you both?

@mmiklavc
Copy link
Contributor Author

Heads up, hadoop config class is where you set queue name iirc. We already pass that in as an arg. This would simply need to be provided via the calling job manager class. config.setProperty("mapreduce.job.queuename", "somequeue");

@merrimanr
Copy link
Contributor

That sounds good to me.

@merrimanr
Copy link
Contributor

I tested this and it's working for me in full dev. I think it's good enough to go into the feature branch. +1

@merrimanr
Copy link
Contributor

Can we merge this? Any other items you would like addressed @cestella?

@cestella
Copy link
Member

+1, lgtm

@cestella
Copy link
Member

@mmiklavc can you merge and close this PR?

asfgit pushed a commit that referenced this pull request Jul 11, 2018
@mmiklavc
Copy link
Contributor Author

Committed to the feature branch.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants