Skip to content
This repository was archived by the owner on Aug 20, 2025. It is now read-only.

Conversation

@cestella
Copy link
Member

As it stands, the existing approach to handling PCAP data has some issues handling high volume packet capture data. With the advent of a DPDK plugin for capturing packet data, we are going to hit some limitations on the throughput of consumption if we continue to try to push packet data into HBase at line-speed.

Furthermore, storing PCAP data into HBase limits the range of filter queries that we can perform (i.e. only those expressible within the key). As of now, we require all fields to be present (source IP/port, destination IP/port and protocol), rather than allowing any wildcards.

To address these issues, we should create a higher performance topology which attaches the appropriate header to the raw packet and timestamp read from Kafka (as placed onto kafka by the packet capture sensor) and appends this packet to a sequence file in HDFS. The sequence file will be rolled based on number of packets or time (e.g. 1 hrs worth of packets in a given sequence file).

On the query side, we should adjust the middle tier service layer to start a MR job on the appropriate set of sequence files to filter out the appropriate packets. NOTE: the UI modifications to make this reasonable for the end-user will need to be done in a follow-on JIRA.

In order to test this PR, I would suggest doing the following as the "happy path":

  1. Install the pycapa library & utility via instructions here
  2. (if using singlenode vagrant) Kill the enrichment and sensor topologies via for i in bro enrichment yaf snort;do storm kill $i;done
  3. Start the pcap topology via /usr/metron/0.1BETA/bin/start_pcap_topology.sh
  4. Start the pycapa packet capture producer on eth1 via /usr/bin/pycapa --producer --topic pcap -i eth1 -k node1:6667
  5. Watch the topology in the Storm UI and kill the packet capture utility from before when the number of packets ingested is over 1k.
  6. Ensure that at at least 2 files exist on HDFS by running hadoop fs -ls /apps/metron/pcap
  7. Choose a file (denoted by $FILE) and dump a few of the contents using the pcap_inspector utility via /usr/metron/0.1BETA/bin/pcap_inspector.sh -i $FILE -n 5
  8. Choose one of the lines and note the source ip/port and dest ip/port
  9. Go to the kibana app at http://node1:5000 on the singlenode vagrant (ymmv on ec2) and input that query in the kibana PCAP panel.
  10. Wait patiently while the MR job completes and the results are sent back in the form of a valid PCAP payload suitable for opening in wireshark
  11. Open in wireshark to ensure the payload is valid.

If the payload is not valid PCAP, then please look at the job history and note the reason for job failure if any.

Also, please note changes and addition to the documentation for the pcap service and pcap backend.

command: storm jar {{ metron_directory }}/lib/{{ metron_parsers_jar_name }} org.apache.storm.flux.Flux --filter {{ metron_parsers_properties_config_path }} --remote {{ item }}
command: "{{ metron_directory }}/bin/start_parser_topology.sh {{ item }}"
with_items:
- "{{ storm_parser_topologies }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything worked well in EC2. Could you add an auto-start capability to deployment? Perhaps just add pcap to the list of parser topologies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, adding pcap to the list of parser topologies won't do it because pcap has a special script (start_pcap_topology.sh) due to it having a different config file (all of the parser topologies share the same config). Also, it's just a different sort of beast than a parser topology (i.e. we don't actually parse anything, we just take the raw data, slap on a header and put it in HDFS).

That being said, what I think we need to do is start the pcap topology when pycapa is installed. I'll have to look into where and how to do that in ansible. If you have any thoughts or suggestions, I'd be all ears. ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In retrospect, why don't we push this to a follow-on JIRA?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm for that. Lets us have a bit of a think on it without holding this up.

@dlyle65535
Copy link
Contributor

Since we're fixed on Java 8 after this, do you think it would make sense to get rid of all of the jvm parameters that cause these kinds of warnings during the Maven run:

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; support was removed in 8.0

@cestella
Copy link
Member Author

@dlyle65535 Definitely agreed, I'll submit a change this morning to remove the warnings.

@dlyle65535
Copy link
Contributor

+1 on this, looks great!

Ran it up in EC2 with pycapa enabled (the default) after starting the topology, everything just worked.

@nickwallen
Copy link
Contributor

+1 Deployed successfully on EC2. All existing feeds worked out of the box. Followed manual instructions to deploy topology. Was able to successfully open and validate the pcap file produced by the Metron UI.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants