-
Notifications
You must be signed in to change notification settings - Fork 505
METRON-2193 Upgrade Enrichments for HBase 2.0.2 #1470
METRON-2193 Upgrade Enrichments for HBase 2.0.2 #1470
Conversation
…m.google.thirdparty.publicsuffix.PublicSuffixPatterns not being relocated with the rest of Guava.
…ot be needed at all
…ld not initialize class com.github.fge.jackson.JsonLoader
…mentConverterTest failures. Getting family, qualifier, value from Cell works differently in 2.0.2
…egration test will just run forever
nickwallen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some commentary on these changes.
| </relocation> | ||
| <relocation> | ||
| <pattern>com.google.thirdparty</pattern> | ||
| <shadedPattern>org.apache.metron.guava.thirdparty.${guava_version}</shadedPattern> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes an error with SimpleFlatFileSummarizerTest.testWholeFile caused by how we relocate Guava.
We relocate com.google.common which does not include all packages within Guava. It specifically misses com.google.thirdparty.publicsuffix.PublicSuffixPatterns which is the root cause of this test failure.
We may want to ensure that we relocate everything under com.google everywhere, but I didn't want to make such a drastic change just yet. And I know @merrimanr has a PR out that may shuffle what we need to do here.
| <dependency> | ||
| <groupId>org.apache.hbase</groupId> | ||
| <artifactId>hbase-server</artifactId> | ||
| <artifactId>hbase-mapreduce</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the newer version of HBase, we can pull in hbase-mapreduce rather than all of hbase-server. The hbase-mapreduce package did not exist in older versions of Hbase.
| <!-- Test dependencies needed preferentially --> | ||
| <dependency> | ||
| <groupId>com.github.fge</groupId> | ||
| <artifactId>jackson-coreutils</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed issue with metron-parsers unit tests: NoClassDefFoundError: Could not initialize class com.github.fge.jackson.JsonLoader
…3.1' into METRON-2193
…mentLookup and is only used in the tests
| boolean initialized = false; | ||
| private static Cache<Table, EnrichmentLookup> enrichmentCollateralCache = CacheBuilder.newBuilder() | ||
| .build(); | ||
| private static Cache<Table, EnrichmentLookup> lookupCache = createCache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Altering the HBaseClient so that it can support multiple tables with the same connection would make sense here, instead of creating a separate HBaseClient for each Table. This was originally discussed here. There is a similar scenario in the TaxiiHandler.
I'd like to do this work as a follow-on PR, rather than increasing the heft of this PR.
…e of enum or class
…n.rest.service.impl.SensorEnrichmentConfigServiceImpl required a bean of type 'org.apache.metron.hbase.client.HBaseClient' that could not be found.
|
I'm currently hunting the source of this issue down for this PR. It's clearly a classpath conflict of some sort, but it will take a bit of sleuthing to find out why the dep is a problem now and wasn't in HDP 2.x. |
…y the mpack. Thanks @mmiklavc
| chdir: "{{ metron_build_dir }}" | ||
| with_items: | ||
| - mvn package -DskipTests -T 2C -P HDP-2.5.0.0,mpack | ||
| - mvn package -DskipTests -T 2C -P HDP-3.1,mpack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How exciting! 🎉
| import java.text.SimpleDateFormat; | ||
| import java.util.Date; | ||
|
|
||
| public class LeastRecentlyUsedPruner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Removes the LeastRecentlyUsedPruner. This logic is exposed to the user in the script bin/threatintel_bulk_prune.sh. I had some difficulty getting the integration test working and I do not believe this is worth the effort to upgrade. I found almost no documentation around this functionality. I fully expect to initiate more discussion around this. If there is a need for this, I can work further on upgrading it.
I think deprecating an entire feature set probably requires a discussion before we accept work that completely removes it. Typically when we've done this in the past there's been at least a complementary, if not more robust, alternative to the existing functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree that deprecating a feature merits a community discussion. I'd start a discuss thread about this whole PR if there are any breaking changes.
For the record, I created the feature and would be in favor of deprecating it, but only after a discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I can open that discuss thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cestella can you answer the discuss around what this actually does? What the goal was?
Nice to see you ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice to see you too! I just responded.
|
This PR piqued my interest. :) First off, I'm glad to see we're fixing the use of the deprecated I will say, however, that I was surprised at the size of a single PR until I looked. This PR seems to be a mix of:
I have some concerns about this approach. Conflating so many things inside of a PR which is already inherently risky seems to increase the risk multiplicatively. It is also difficult to review, I'd say. I would suggest that these three separate concerns be split across 3 separate PRs:
As I say, please don't take my feedback as an indication that I don't appreciate the work that went into this. There is much good here, but there is just..well..very much here. :) |
I do agree with you @cestella that this PR is large and I very much prefer small, isolated PRs. The HDP-3.1 upgrade has been a huge effort so far and I've tried to break it down into small, reviewable PRs. Here are all the PRs that we've been able to land so far as part of the upgrade.
I tried to follow that similar pattern prior to opening this PR and ran into some problems. I attempted to open separate PRs for each functional area included here. Something along the lines of...
But since (1) the changes in the areas listed below are not backwards compatible (unlike all the other preceding PRs listed above) and (2) there are many interdependencies between these areas, I was not able to submit separate PRs that would actually compile. To help cut the fat and reduce the size of this PR, there are some changes here that I could try to undo or extract into separate PRs. These come to mind immediately
Do the items I listed above (1) public field access and (2) LookupKV -> EnrichmentResult cover what you mean by rearchitecture? Are there other items that you are thinking of that fall under this heading? |
Many of the changes here are not backwards compatible which prevents me from introducing them against master, unfortunately.
For the work on this feature branch, the There are still a few remaining places that don't directly use an For example, we have to integration test "streaming enrichments" against HBase, Kafka and Storm. With our current IT approach, these all have to run in memory. But we cannot run HBase in-memory with Storm and Kafka, which is why the integration tests in For those that were not ported to |
|
Still trying to wrap my head around this PR and the scope of changes.
HBase introduced the new API in earlier versions, prior to deprecation. I don't doubt the problem exists, but I'm having a hard time understanding what we depended on that has 2 incompatible analogs in the newer versions of the HBase API. Can you provide some concrete examples? |
|
The Coprocessor changes are definitely not backwards compatible. That's what I can remember off-hand. I'd have to try and split this PR apart to see what else breaks to provide more detail. |
|
It took some work, but I was at least able to extract the core Enrichment components, the HBase adapter and Enrichment functions, into their own separate PR #1482 . That should help reviewers. Please take a look at #1482 and provide feedback. I will continue to work on cutting-up this PR into additional bite-size chunks. I am going to close-out this PR, since it is not longer needed. |
This change upgrades all Enrichment and Data Management related components to work with HBase 2.0.2.
feature/METRON-2088-support-HDP-3.1feature branch.This PR is dependent on the following PRs and the diff will show those changes here until that PR is merged
Changes
Changes the default build profile so that everything is built against HBase 2.0.2. Up until this PR, all changes on the
feature/METRON-2088-support-HDP-3.1feature branch were backwards compatible. I was not able to find a way to make many of these changes backwards compatible.Replaces all usages of the
LegacyHBaseClientwith theHBaseClient. TheLegacyHBaseClientis not compatible with 2.0.2.Removed all of the legacy
TableProviderand mock HBase related classes. These cannot be upgraded to HBase 2.0.2.Creates the
EnrichmentLookupinterface so that different implementations can be swapped in for testing where needed. For example, aFakeEnrichmentLookupallows the Enrichment integration test to function where we are not able to run a live HBase instance.Updated the TAXII loader to use an
HBaseClient.Updated Streaming Enrichments to use an
HBaseClient.Updated the Enrichment coprocessor to use an
HBaseClient. The interfaces provided by HBase for a coprocessor also changed and required an update.Updated the Stellar functions
ENRICHMENT_GETandENRICHMENT_EXISTSfor HBase 2.0.2.Updated the legacy HBase adapters for HBase 2.0.2.
Removes the
LeastRecentlyUsedPruner. This logic is exposed to the user in the scriptbin/threatintel_bulk_prune.sh. I had some difficulty getting the integration test working and I do not believe this is worth the effort to upgrade. I found almost no documentation around this functionality. I fully expect to initiate more discussion around this. If there is a need for this, I can work further on upgrading it.To simplify the class structure,
LookupKV<KEY_T, VALUE_T>was replaced byEnrichmentResult. The onlyKEY_Tever used isEnrichmentKeyand the onlyVALUE_Tever used isEnrichmentValue.Acceptance Testing
Basics
Verify data is flowing through the system, from parsing to indexing
Open Ambari and navigate to the Metron service http://node1:8080/#/main/services/METRON/summary
Open the Alerts UI
Verify alerts show up in the main UI - click the search icon (you may need to wait a moment for them to appear)
Head back to Ambari and select the Kibana service http://node1:8080/#/main/services/KIBANA/summary
Open the Kibana dashboard via the "Metron UI" option in the quick links
Verify the dashboard is populating
Enrichment Coprocessor
Run the following command from the CLI - you should see the coprocessor in the table attributes. Ambari should set this up as part of the MPack installation.
Before we start adding enrichments, let's verify the enrichment_list table is empty
Go to Swagger
Click the
sensor-enrichment-config-controlleroption.Click the
GET /api/v1/sensor/enrichment/config/list/available/enrichmentsoption.And finally click the "Try it out!" button. You should see an empty array returned in the response body.
Streaming Enrichments and Enrichment Stellar Functions in the REPL
Create a Streaming Enrichment by following these instructions.
Define the streaming enrichment and save it as a new source of telemetry.
Go to the Management UI and start the new parser called 'user'.
Create some test telemetry.
Ensure that the enrichments are persisted in HBase.
Load CSV enrichment data
Now, let's perform an enrichment load. We'll do this as the metron user
Download the Alexa 1m dataset:
Create an extractor.json for the CSV data by editing
extractor.jsonand pasting in these contents:Import the data.
Validate that the data was loaded. Expect at least 10k records.
Enrichment Coprocessor
Confirm that the enrichment added in the previous steps were 'found' by the coprocessor.
Go to Swagger
Click the
sensor-enrichment-config-controlleroption.Click the
GET /api/v1/sensor/enrichment/config/list/available/enrichmentsoption.Click the "Try it out!" button. You should see a array returned with the value of each enrichment type that you have loaded.
[ "alexa", "user" ]Enrichment Stellar Functions in Storm
Follow instructions similar to these to load
the user data.
Create a simple file called
user.csv.jdoe,192.168.138.2Create a file called
user-extractor.json.Import the data.
Enrich the Bro telemetry using the "user" data. Similar to here.
Validate that the enrichment loaded successfully.
Use the User data to enrich the telemetry. Run the following commands in the REPL.
Wait for the new configuration to be picked up by the running topology.
Review the telemetry indexing into Elasticsearch. Look for records where the
ip_dst_addris192.168.138.2. Ensure that some of the messages have a field calledalexacreated from this enrichment.Legacy Adapters in Storm
A legacy HBase adapter is used in the default demo telemetry.
Review the telemetry indexed into Elasticsearch. Ensure that additional enrichment fields from the "malicious_ip" data is indexed into Elasticsearch.
Pull Request Checklist