Skip to content
This repository was archived by the owner on Aug 20, 2025. It is now read-only.

Conversation

@mmiklavc
Copy link
Contributor

@mmiklavc mmiklavc commented Aug 16, 2019

Contributor Comments

https://issues.apache.org/jira/browse/METRON-2217

UPDATE - Test plan has been added below. I ran through this plan and variations of it many times over as I went through testing.

I want to get this in front of people to start reviewing asap. It's going to take me a couple days to work through a reasonable test plan for this, but this should not hold up reviewing the approach. Of note is that the change from HTableInterface to Table by HBase has now shifted the burden of connection management from the HTable implementation to the end user/client. We previously had very little, if any, hooks to close our HBase tables or attempt to clean up resources. In response to this change, there were a couple options for dealing with this:

  1. Completely rewrite our HBase client logic to fully manage connection lifecycle
  2. Isolate the connection management change to the existing HTableProvider implementation used throughout Metron and make a smaller, incremental change to set us up for the eventual upgrade to HBase 2.x.

This PR takes the approach in option 2. The biggest question surrounding this approach is whether the included connection management changes introduced in the TableProvider are sufficient, or if we need to immediately take a more robust connection pooling approach to dealing with HBase connections. I spent some time looking at the current HTable implementation that we depend on. Every time an HTable is created, the underlying code makes a call to an internal connection manager. It's unclear to me what the connection management contract is for the user in this case, e.g. stale connection cleanup, connection retries, connection pooling, etc. This is probably the riskiest part of this change. The way that I'm handling this is to

  1. Make the connections thread safe through use of a ThreadLocal connection variable. There are issues with instantiating a ThreadLocal variable by default in code that will be serialized by Storm. ThreadLocal is not serializable. In order to get around a massive API rewrite that would add initialization similar to our other startup hooks, e.g. StellarFunctions.initialize(), I opted for an approach that would allow us to get a similar effect via lazy initialization.
  2. Providing some basic connection retry logic that will initiate a new connection if the thread's current connection happens to have closed for whatever reason.

The combination of these 2 options provides a semi-robust way to handle connections without boiling the ocean as well as offering per-thread connection re-use that should limit the overall number of connections we keep open to HBase in a reasonable and reliable way.

I'd like to connect with some HBase committers or just hop on their dev list and ask what the implications of leaving connections alone would be. Again, we don't really do anything to manage creating/closing connections currently, and afaict we have never had any reports of leaking connections. This doesn't mean we won't now, but it's worth noting what the status quo has been for a while now.

Note: This is a backwards compatible change

Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.

In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:

For all changes:

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?

For code changes:

  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?

  • Have you included steps or a guide to how the change may be verified and tested manually?

  • Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:

    mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh 
    
  • Have you written or updated unit tests and or integration tests to verify your changes?

  • n/a If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

For documentation related changes:

  • n/a Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via site-book/target/site/index.html:

    cd site-book
    mvn site
    
  • n/a Have you ensured that any documentation diagrams have been updated, along with their source files, using draw.io? See Metron Development Guidelines for instructions.

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.

return getConnection(config).getTable(TableName.valueOf(tableName));
}

private Connection getConnection(Configuration config) throws IOException {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we use different Configurations here?

* the interface everywhere we touch HBase, we can use a lazy initialization scheme to encapsulate
* this within the HTableProvider. This is a sort of poor man's connection pool.
*/
private static Map<Configuration, ThreadLocal<RetryingConnection>> connMap = new ConcurrentHashMap<>();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually, Connection is thread safe, so you only need to cache Tables. And since the Table is getting from Connection, which means it will not make new TCP connections to HBaseCluster, maybe we even do not need to cache the Tables, as it is only some in memory operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus you are going to leak HBaseConnections here, as they never get cleaned up.

@mmiklavc
Copy link
Contributor Author

Linking this thread from the Apache HBase dev list - https://lists.apache.org/thread.html/6b83cd7548efb8c37899063affc97e4c5ce823a13359a49b477e3c07@%3Cdev.hbase.apache.org%3E

Thanks @Apache9 for the feedback and assistance here - greatly appreciated!

Copy link
Contributor

@tigerquoll tigerquoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hbaseconnections are thread-safe objects. Setting up a HBaseConnection is pretty resource intensive. If you are authenticating with a static identity that is not going to change (i.e. a service account), and you are constantly interacting with HBase, then I believe it is recommended that you keep the HBaseConnection object around in a cache or context, and reuse it to pull out Table and Mutation objects.

* the interface everywhere we touch HBase, we can use a lazy initialization scheme to encapsulate
* this within the HTableProvider. This is a sort of poor man's connection pool.
*/
private static Map<Configuration, ThreadLocal<RetryingConnection>> connMap = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus you are going to leak HBaseConnections here, as they never get cleaned up.

@mmiklavc
Copy link
Contributor Author

Hbaseconnections are thread-safe objects. Setting up a HBaseConnection is pretty resource intensive. If you are authenticating with a static identity that is not going to change (i.e. a service account), and you are constantly interacting with HBase, then I believe it is recommended that you keep the HBaseConnection object around in a cache or context, and reuse it to pull out Table and Mutation objects.

Yes, that is what we're doing here. The connection model differs subtly from a standard request/response lifecycle web app in that we don't need to open/close the connections. They are bound to the process on startup for the duration of the program's execution. There should be no instance where we are left with a leaked connection because we're only opening a new one when the given thread does not already have an open connection. This generally only happens during application initialization/startup for us.

@mmiklavc
Copy link
Contributor Author

mmiklavc commented Aug 27, 2019

Test Plan

Enrichments

This will cover enrichments, threat intel, and the bulk loading utilities that write data to HBase

Test basic enrichment

Spin up full dev

Optional - free up resources. We're going to be spinning up some additional topologies. The resources in full dev are limited, so you'll probably want to stop non-critical topologies in order to have enough Storm slots.

for parser in bro__snort__yaf profiler pcap batch_indexing; do storm kill $parser; don

Follow the following [updated] blog series steps here to get some data into Metron using Squid along with an enrichment

  1. https://cwiki.apache.org/confluence/display/METRON/2016/04/25/Metron+Tutorial+-+Fundamentals+Part+1%3A+Creating+a+New+Telemetry
  2. https://cwiki.apache.org/confluence/display/METRON/2016/04/28/Metron+Tutorial+-+Fundamentals+Part+2%3A+Creating+a+New+Enrichment

Test threat intel

  1. https://cwiki.apache.org/confluence/display/METRON/2016/05/02/Metron+Tutorial+-+Fundamentals+Part+4%3A+Pluggable+Threat+Intelligence

Test multi-threading

For the final step, we'll deviate from the blog a bit so we can test that the thread pool doesn't cause any deadlocking/threading issues on the new HBase connection approach. Taken from https://cwiki.apache.org/confluence/display/METRON/2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment. Follow the steps in the blog tutorial for setting up the user streaming enrichment, but instead of modifying/using bro as suggested at the end, follow the below instructions.

Let's load the original whois list from step 1 as a threatintel for added fun. This way we can run multiple enrichments and also have it trigger threat intel from the same messages. Create a file blocklist2.csv with the following contents:

[root@node1: ~]
# cat blocklist2.csv
aliexpress.com,squidblacklist.org
pravda.ru,squidblacklist.org
google.com,squidblacklist.org
brightsideofthesun.com,squidblacklist.org
microsoftstore.com,squidblacklist.org
autonews.com,squidblacklist.org
facebook.com,squidblacklist.org
ebay.com,squidblacklist.org
recruit.jp,squidblacklist.org
lada.ru,squidblacklist.org
aliexpress.com,squidblacklist.org

Load the threat intel into HBase
${METRON_HOME}/bin/flatfile_loader.sh -i blocklist2.csv -t threatintel -c t -e threatintel_extractor_config.json

Clear the squid logs

rm /var/log/squid/access.log
touch /var/log/squid/access.log
chown squid:squid /var/log/squid/access.log
service squid restart

Re-run new squid client commands similar to step 1. Rather than a fraction of the records matching on domain for the whois enrichment, we'll have them all match for this test.

squidclient "https://www.google.com/maps/place/Waterford,+WI/@42.7639877,-88.2867248,12z/data=!4m5!3m4!1s0x88059e67de9a3861:0x2d24f51aad34c80b!8m2!3d42.7630722!4d-88.2142563"
squidclient "http://www.help.1and1.co.uk/domains-c40986/transfer-domains-c79878"
squidclient "https://community.cisco.com/t5/technology-and-support/ct-p/technology-support"
squidclient "https://www.capitalone.com/support-center"
squidclient "https://www.cnn.com/about"
squidclient "https://contact.nba.com/"
squidclient "https://www.espn.com/nfl/team/_/name/cle/cleveland-browns"

Update your squid.json enrichment to include Stellar enrichments. We're going to duplicate the whois enrichment multiple times for the sake of simplicity.

# cat $METRON_HOME/config/zookeeper/enrichments/squid.json
{
  "enrichment" : {
    "fieldMap" : {
      "hbaseEnrichment" : [ "domain_without_subdomains" ],
      "stellar" : {
       "config" : {
         "e1" : {
           "user" : "ENRICHMENT_GET('user', ip_src_addr, 'enrichment', 't')"
         },
         "e2" : {
           "dws1" : "ENRICHMENT_GET('whois', domain_without_subdomains, 'enrichment', 't')"
         },
         "e3" : {
           "dws2" : "ENRICHMENT_GET('whois', domain_without_subdomains, 'enrichment', 't')"
         },
         "e4" : {
           "dws3" : "ENRICHMENT_GET('whois', domain_without_subdomains, 'enrichment', 't')"
         },
         "e5" : {
           "dws4" : "ENRICHMENT_GET('whois', domain_without_subdomains, 'enrichment', 't')"
         },
         "e6" : {
           "dws5" : "ENRICHMENT_GET('whois', domain_without_subdomains, 'enrichment', 't')"
         }
       }
     }
    },
    "fieldToTypeMap" : {
      "domain_without_subdomains" : [ "whois" ]
    },
    "config" : { }
  },
  "threatIntel" : {
    "fieldMap" : {
      "hbaseThreatIntel" : [ "domain_without_subdomains" ]
    },
    "fieldToTypeMap" : {
      "domain_without_subdomains" : [ "squidBlacklist" ]
    },
    "config" : { },
    "triageConfig" : {
      "riskLevelRules" : [ ],
      "aggregator" : "MAX",
      "aggregationConfig" : { }
    }
  },
  "configuration" : { }
}

Load the changed enrichment

${METRON_HOME}/bin/zk_load_configs.sh -m PUSH -z $ZOOKEEPER -i ${METRON_HOME}/config/zookeeper
# verify it loaded
${METRON_HOME}/bin/zk_load_configs.sh -m DUMP -z $ZOOKEEPER -c ENRICHMENT -n squid

Wipe your squid indexes in ES

curl -XDELETE "http://node1:9200/squid*"

Stop the enrichment topology

In Ambari, navigate to Metron > Configs > Enrichment. Make the following config adjustments:

  1. Set Unified Enrichment Parallelism to 3
  2. Set Unified Threat Intel Parallelism to 3
  3. Set Unified Enrichment Cache Size to 0 (force cache misses so we hit HBase)
  4. Set Unified Threat Intel Cache Size to 0 (force cache misses so we hit HBase)
  5. Set Unified Enrichment Thread Pool Size to 5.

Restart the enrichment topology. You should see a log message in the storm worker logs similar to the following:

2019-08-26 17:52:40.162 o.a.m.e.b.UnifiedEnrichmentBolt Thread-8-threatIntelBolt-executor[7 7] [INFO] Creating new threadpool of size 5

Import the squid access data to Kafka. Run it multiple times by running the following:

for in in {1..30}; do cat /var/log/squid/access.log | ${HDP_HOME}/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic squid; done

After a bit of time, you should see new records in the squid index that have the new enrichment and threat intel fields (note the fields dws #1-4). You should get 210 records in your squid index assuming you setup your squid access log with 7 records during the earlier squidclient setup.

{
"_index": "squid_index_2019.08.24.00",
"_type": "squid_doc",
"_id": "AWzBEZ7MZrHsl7xo6X-6",
"_version": 1,
"_score": 1,
"_source": {
"enrichments:hbaseEnrichment:domain_without_subdomains:whois:owner": "ESPN, Inc.",
"full_hostname": "www.espn.com",
"dws1:home_country": "US",
"dws1:domain": "espn.com",
"dws2:domain": "espn.com",
"dws3:home_country": "US",
"dws1:domain_created_timestamp": "781268400000",
"enrichments:hbaseEnrichment:domain_without_subdomains:whois:home_country": "US",
"enrichments:hbaseEnrichment:domain_without_subdomains:whois:domain_created_timestamp": "781268400000",
"dws5:home_country": "US",
"parallelenricher:enrich:end:ts": "1566607252930",
"adapter:threatinteladapter:end:ts": "1566607252930",
"original_string": "1566604971.782 732 127.0.0.1 TCP_MISS/200 331562 GET https://www.espn.com/nfl/team/_/name/cle/cleveland-browns - DIRECT/54.152.255.68 text/html",
"dws3:registrar": "ESPN, Inc.",
"dws4:owner": "ESPN, Inc.",
"action": "TCP_MISS",
"dws4:domain": "espn.com",
"dws5:domain": "espn.com",
"dws3:domain": "espn.com",
"enrichments:hbaseEnrichment:domain_without_subdomains:whois:registrar": "ESPN, Inc.",
"dws5:domain_created_timestamp": "781268400000",
"method": "GET",
"parallelenricher:enrich:begin:ts": "1566607252928",
"user:user": "mmiklavcic",
"adapter:simplehbaseadapter:end:ts": "1566607252925",
"dws3:domain_created_timestamp": "781268400000",
"dws2:domain_created_timestamp": "781268400000",
"user:timestamp": 1566598784187,
"dws2:registrar": "ESPN, Inc.",
"user:source:type": "user",
"dws4:domain_created_timestamp": "781268400000",
"adapter:threatinteladapter:begin:ts": "1566607252928",
"guid": "919b421a-b2ec-4e82-951e-3ee031c5a394",
"dws3:owner": "ESPN, Inc.",
"dws2:owner": "ESPN, Inc.",
"code": 200,
"adapter:stellaradapter:end:ts": "1566607252922",
"enrichments:hbaseEnrichment:domain_without_subdomains:whois:domain": "espn.com",
"dws2:home_country": "US",
"dws4:home_country": "US",
"dws1:registrar": "ESPN, Inc.",
"elapsed": 732,
"source:type": "squid",
"ip_dst_addr": "54.152.255.68",
"dws5:registrar": "ESPN, Inc.",
"domain_without_subdomains": "espn.com",
"ip_src_addr": "127.0.0.1",
"timestamp": 1566604971782,
"adapter:stellaradapter:begin:ts": "1566607252906",
"url": "https://www.espn.com/nfl/team/_/name/cle/cleveland-browns",
"dws1:owner": "ESPN, Inc.",
"parallelenricher:splitter:begin:ts": "1566607252928",
"dws5:owner": "ESPN, Inc.",
"user:guid": "d8fb60b7-1670-4f96-a413-cb185afbe0de",
"bytes": 331562,
"parallelenricher:splitter:end:ts": "1566607252928",
"user:original_string": "mmiklavcic,127.0.0.1",
"dws4:registrar": "ESPN, Inc.",
"adapter:simplehbaseadapter:begin:ts": "1566607252906"
}
}

Test recoverability with HBase down

Now, again clear your squid index.

curl -XDELETE "http://node1:9200/squid*"

Stop HBase and wait a few moments. Import the squid data again:

cat /var/log/squid/access.log | ${HDP_HOME}/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic squid

Wait about a minute and check your squid index. You should not see any new data in the index. Now, restart HBase again in Ambari. After HBase has restarted, check the squid index. After some amount of time, the data should be able to flow through enrichments and make it to the squid index.

After completing the above steps you should not see any HBase exceptions or errors in the enrichment logs.

Profiler

Stop the profiler. In Ambari, set the profiler period duration to 1 minute via the Profiler config section.
Adjust $METRON_HOME/config/zookeeper/global.json to adjust the capture duration:

vim ${METRON_HOME}/config/zookeeper/global.json
"profiler.client.period.duration" : "1",
"profiler.client.period.duration.units" : "MINUTES",

Create $METRON_HOME/config/zookeeper/profiler.json and save the following contents:

{
  "profiles": [
    {
      "profile": "hello-world",
      "onlyif":  "exists(ip_dst_addr)",
      "foreach": "ip_dst_addr",
      "init":    { "count": "0" },
      "update":  { "count": "count + 1" },
      "result":  "count"
    }
  ]
}

Modify ${METRON_HOME}/config/zookeeper/enrichments/squid.json so it pulls values from the profiler. Update our previous example to add the following Stellar enrichment "e7":

         "e6" : {
           "dws5" : "ENRICHMENT_GET('whois', domain_without_subdomains, 'enrichment', 't')"
         },
         "e7" : {
           "profile_for_ip_dst_addr" : "PROFILE_GET( 'hello-world', ip_dst_addr, PROFILE_FIXED(2, 'MINUTES'))"
         }

Push your changes to Zookeeper

${METRON_HOME}/bin/zk_load_configs.sh -m PUSH -z $ZOOKEEPER -i ${METRON_HOME}/config/zookeeper

Restart the profiler again.

Clear your squid data

curl -XDELETE "http://node1:9200/squid*"

And publish some squid data to the squid topic for roughly 500 seconds. This is a somewhat arbitrary choice, but we want to give the profiles enough time to flush in order for the enrichments to start picking up the profile data from HBase.

for in in {1..100}; do cat /var/log/squid/access.log | ${HDP_HOME}/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic squid; sleep 5; done

Once this process completes, you should note the following:

  1. No errors/exceptions in the profiler or enrichment Storm logs
  2. 700 records get written to the Squid index in ES
  3. You should see many (not all, especially the early records) records written with non-empty values for field profile_for_ip_dst_addr. e.g.
    curl -XGET "http://node1:9200/squid*/_search?size=700&pretty=true" | grep -A 2 profile_for_ip_dst_addr
    

@tigerquoll
Copy link
Contributor

tigerquoll commented Aug 27, 2019

Metron tutorial needs updating from the new Kafka client tools. Step 5 needs to be updated to read Kafka via:
${HDP_HOME}/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server $BROKERLIST --topic squid --from-beginning

@tigerquoll
Copy link
Contributor

Metron tutorial step 5 global.json additional to support squid needs a comma after last square bracket.

@tigerquoll
Copy link
Contributor

Metron tutorial Step 5, no spare workers available to run squid topology. Need to add instructions on adding another supervisor slot port.

@tigerquoll
Copy link
Contributor

Metron tutorial part 2. To "Check if Zookeeper enrichment tag was properly populated"
command should be changed to
${METRON_HOME}/bin/zk_load_configs.sh -m DUMP -z $ZOOKEEPER -c ENRICHMENT -n squid

@tigerquoll
Copy link
Contributor

Metron tutorial part 2, should have instructions to retrigger squid data feed, and directions on how to drive the Rest interface to prove data is being enriched.

@tigerquoll
Copy link
Contributor

Threatintel tutorial: Dropping of elasticsearch index requires a sudo as their owned by root

@tigerquoll
Copy link
Contributor

What commands are you using to query the ES index?

@mmiklavc
Copy link
Contributor Author

@tigerquoll thanks for the test script feedback! Ultimately, I think it would be nice to get the blog entries added in our use-cases directory where we can track and maintain them more easily. This is a good first step.

Metron tutorial needs updating from the new Kafka client tools. Step 5 needs to be updated to read Kafka via:
${HDP_HOME}/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server $BROKERLIST --topic squid --from-beginning

Updated

Metron tutorial step 5 global.json additional to support squid needs a comma after last square bracket.

Not sure what you mean here - this is a snippet to use as guidance to modify the global.json, not replace it.

Metron tutorial Step 5, no spare workers available to run squid topology. Need to add instructions on adding another supervisor slot port.

I added instructions to cleanup resources. I'm going to add the same to my test script here.

Metron tutorial part 2. To "Check if Zookeeper enrichment tag was properly populated"
command should be changed to
${METRON_HOME}/bin/zk_load_configs.sh -m DUMP -z $ZOOKEEPER -c ENRICHMENT -n squid

Fixed

Metron tutorial part 2, should have instructions to retrigger squid data feed, and directions on how to drive the Rest interface to prove data is being enriched.

That tutorial references back to part 1 for instructions, but I added an extra line with the command detail. The Head plugin can be used for searching the ES indexes in this context.

Threatintel tutorial: Dropping of elasticsearch index requires a sudo as their owned by root

This series of tutorials assumes full dev - you should be running as root by default in that environment

What commands are you using to query the ES index?

Head browser plugin. From the first blog
image

@mmiklavc
Copy link
Contributor Author

@tigerquoll - Needed to update my enrichment config - I used stellar expression groups and forgot to update my script for the enrichment config accordingly.

@mmiklavc
Copy link
Contributor Author

I just merged this PR with master, which includes the latest Maven changes from #1436. This PR (1483) doesn't include any changes to Maven dependencies. I reviewed the test plan on 1436 and it looks like we may have missed a test case for that PR which introduces a regression for streaming enrichments. Logging a ticket against master.

@mmiklavc
Copy link
Contributor Author

I was getting classpath issues with my PR that now seem to be fixed with #1498. That PR should get merged tomorrow sometime, barring any discussion to the contrary. In the meantime, I've merged those changes into this PR for sake of testing. I ran through the major parts of the test plan one more time and everything works as expected. I did not repeat the failure recovery scenario.

For reference, this another exception I ran into while running the basic parser test. Again, this issue appears to have been resolved now with the changes in pr 1498:

2019-08-27 20:38:55.531 o.a.s.d.executor Thread-12-parserBolt-executor[5 5] [ERROR]
java.lang.IllegalStateException: Unable to process transformation: DOMAIN_REMOVE_SUBDOMAINS(full_hostname) for domain_without_subdomains because Unable to parse: DOMAIN_REMOVE_SUBDOMAINS(full_hostname) due to: EXACT with relevant variables full_hostname=www.aliexpress.com
        at org.apache.metron.common.field.transformation.StellarTransformation.map(StellarTransformation.java:73) ~[stormjar.jar:?]
        at org.apache.metron.common.configuration.FieldTransformer.transform(FieldTransformer.java:139) ~[stormjar.jar:?]
        at org.apache.metron.common.configuration.FieldTransformer.transformAndUpdate(FieldTransformer.java:159) ~[stormjar.jar:?]
        at org.apache.metron.parsers.ParserRunnerImpl.applyFieldTransformations(ParserRunnerImpl.java:299) ~[stormjar.jar:?]
        at org.apache.metron.parsers.ParserRunnerImpl.processMessage(ParserRunnerImpl.java:254) ~[stormjar.jar:?]
        at org.apache.metron.parsers.ParserRunnerImpl.lambda$execute$0(ParserRunnerImpl.java:151) ~[stormjar.jar:?]
        at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_112]
        at org.apache.metron.parsers.ParserRunnerImpl.execute(ParserRunnerImpl.java:150) ~[stormjar.jar:?]
        at org.apache.metron.parsers.bolt.ParserBolt.execute(ParserBolt.java:257) [stormjar.jar:?]
        at org.apache.storm.daemon.executor$fn__10252$tuple_action_fn__10254.invoke(executor.clj:735) [storm-core-1.1.0.2.6.5.1175-1.jar:1.1.0.2.6.5.1175-1]
        at org.apache.storm.daemon.executor$mk_task_receiver$fn__10171.invoke(executor.clj:466) [storm-core-1.1.0.2.6.5.1175-1.jar:1.1.0.2.6.5.1175-1]
        at org.apache.storm.disruptor$clojure_handler$reify__9685.onEvent(disruptor.clj:40) [storm-core-1.1.0.2.6.5.1175-1.jar:1.1.0.2.6.5.1175-1]
        at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:472) [storm-core-1.1.0.2.6.5.1175-1.jar:1.1.0.2.6.5.1175-1]
        at org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:451) [storm-core-1.1.0.2.6.5.1175-1.jar:1.1.0.2.6.5.1175-1]
        at org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73) [storm-core-1.1.0.2.6.5.1175-1.jar:1.1.0.2.6.5.1175-1]
        at org.apache.storm.daemon.executor$fn__10252$fn__10265$fn__10320.invoke(executor.clj:855) [storm-core-1.1.0.2.6.5.1175-1.jar:1.1.0.2.6.5.1175-1]
        at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484) [storm-core-1.1.0.2.6.5.1175-1.jar:1.1.0.2.6.5.1175-1]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
Caused by: org.apache.metron.stellar.dsl.ParseException: Unable to parse: DOMAIN_REMOVE_SUBDOMAINS(full_hostname) due to: EXACT with relevant variables full_hostname=www.aliexpress.com
        at org.apache.metron.stellar.common.BaseStellarProcessor.createException(BaseStellarProcessor.java:173) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.stellar.common.BaseStellarProcessor.parse(BaseStellarProcessor.java:154) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.stellar.common.CachingStellarProcessor.parseUncached(CachingStellarProcessor.java:161) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.stellar.common.CachingStellarProcessor.parse(CachingStellarProcessor.java:155) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.common.field.transformation.StellarTransformation.map(StellarTransformation.java:50) ~[stormjar.jar:?]
        ... 18 more
Caused by: java.lang.NoSuchFieldError: EXACT
        at org.apache.metron.guava.17.0.net.InternetDomainName.findPublicSuffix(InternetDomainName.java:173) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.guava.17.0.net.InternetDomainName.<init>(InternetDomainName.java:158) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.guava.17.0.net.InternetDomainName.from(InternetDomainName.java:213) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.stellar.dsl.functions.NetworkFunctions.toDomainName(NetworkFunctions.java:271) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.stellar.dsl.functions.NetworkFunctions.access$000(NetworkFunctions.java:34) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.stellar.dsl.functions.NetworkFunctions$RemoveSubdomains.apply(NetworkFunctions.java:86) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.stellar.dsl.BaseStellarFunction.apply(BaseStellarFunction.java:30) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.stellar.common.StellarCompiler.lambda$exitTransformationFunc$13(StellarCompiler.java:664) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.stellar.common.StellarCompiler$Expression.apply(StellarCompiler.java:259) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.stellar.common.BaseStellarProcessor.parse(BaseStellarProcessor.java:151) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.stellar.common.CachingStellarProcessor.parseUncached(CachingStellarProcessor.java:161) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.stellar.common.CachingStellarProcessor.parse(CachingStellarProcessor.java:155) ~[dep-stellar-common-0.7.2-uber-6958fda1-1cb6-4e03-80ba-c5171221d03a.jar.1566935872000:?]
        at org.apache.metron.common.field.transformation.StellarTransformation.map(StellarTransformation.java:50) ~[stormjar.jar:?]
        ... 18 more

@tigerquoll
Copy link
Contributor

For the test script: full dev still needed another supervisor slot in Storm to allow the squid enrichment topology to run.

@mmiklavc
Copy link
Contributor Author

@tigerquoll what all are you running? This leaves me with 1 slot available still
image

@mmiklavc
Copy link
Contributor Author

@tigerquoll I've updated the test script on this PR to reflect the note in the first tutorial.

@MohanDV
Copy link
Contributor

MohanDV commented Aug 30, 2019

@mmiklavc I have completed tests on a 5 node cluster, for the profiler tests I see below exceptions in the worker log of enrichment
java.lang.IllegalStateException: ENRICHMENT error with stellar failed: Unable to parse: PROFILE_GET( 'hello-world', ip_dst_addr, PROFILE_FIXED(2, 'MINUTES')) due to: null with relevant variables ip_dst_addr=missing at org.apache.metron.enrichment.parallel.ParallelEnricher.lambda$apply$0(ParallelEnricher.java:195) ~[stormjar.jar:?] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_112]

@mmiklavc
Copy link
Contributor Author

@MohanDV thanks for the review! The enrichment I constructed wasn't robust in the sense that if an ip_dst_addr is missing, it would be expected to blow up like this. The data samples I provided for squid should all have had an ip_dst_addr. This looks like one of the records may be null for some reason or another. Can you share the sample squid access log? I suspect there's a record in there that doesn't look right.

@MohanDV
Copy link
Contributor

MohanDV commented Aug 30, 2019

@mmiklavc Yes I have an entry 1567166198.507 0 ::1 TAG_NONE/400 3983 GET ;i=1;doned42.7630722pwdd-88.2142563 - HIER_NONE/- text/html in the squid access log for
squidclient "https://www.google.com/maps/place/Waterford,+WI/@42.7639877,-88.2867248,12z/data=!4m5!3m4!1s0x88059e67de9a3861:0x2d24f51aad34c80b!8m2!3d42.7630722!4d-88.2142563"

@mmiklavc
Copy link
Contributor Author

@MohanDV that makes sense then - this would be an expected exception in that case.

@tigerquoll
Copy link
Contributor

Tests completed on a fresh pull / full dev deployment. All tests passed successfully. One nit: Command for stopping unneeded parsers should be:
for parser in bro__snort__yaf profiler pcap batch_indexing; do storm kill $parser; done

@tigerquoll
Copy link
Contributor

One thing to note, there was a non-fatal class cast exception in the nimbus logs:

2019-09-02 01:51:04.756 o.a.s.m.ClusterMetricsConsumerExecutor main [ERROR] Could not instantiate or prepare Cluster Metrics Consumer with fully qualified name org.apache.storm.metric.LoggingMetricsConsumer
java.lang.ClassCastException: org.apache.storm.metric.LoggingMetricsConsumer cannot be cast to org.apache.storm.metric.api.IClusterMetricsConsumer

@MohanDV
Copy link
Contributor

MohanDV commented Sep 3, 2019

Thanks @mmiklavc ! it's +1 with verification from my side .

@asfgit asfgit closed this in c402e64 Sep 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants