Skip to content
This repository was archived by the owner on Aug 20, 2025. It is now read-only.

Conversation

@merrimanr
Copy link
Contributor

@merrimanr merrimanr commented Mar 5, 2019

Contributor Comments

This PR includes the necessary changes to move Storm-dependent code to a separate module. In most cases classes were moved unchanged with the exception of the package changing from org.apache.metron.common.utils.* to org.apache.metron.storm.common.utils.*. More complex changes are listed in the next section.

Changes Included

  • Removal of Storm and Flux Maven dependencies from the metron-common pom file.
  • Additional of the org.ow2.asm:asm:5.0.3 Maven dependency to metron-common pom file. This dependency was inherited from the Storm dependencies.
  • Created a new metron-storm-common module for classes migrated out of metron-common.
  • Refactored the RawMessageUtil and MetadataUtil classes to separate Storm dependent code. The MetadataUtil.extractMetadata method's purpose is to extract metadata from a tuple and should be in metron-storm-common while the other methods in MetadataUtil I believe are reusable and should be kept in metron-common. For now I moved RawMessageUtil to metron-storm-common and moved the method to RawMessageUtil.extractMetadata.
  • Removed the TopologyContext parameter from BulkMessageWriter.init and adjusted implementations to match.
  • All BulkMessageWriter implementations did not use TopologyContext except for HdfsWriter. This class is tightly coupled to Storm dependencies and will likely need to be rewritten without them for other streaming platforms. In this PR, it's treated as a special case in BulkMessageWriterBolt with the TopologyContext call in a separate HdfsWriter method.
  • Storm-dependent classes and tests were moved from metron-common to metron-storm-common including:
    • Classes in the org.apache.metron.common.bolt package
    • MessageGetStrategy interface and implementations
    • RawMessageUtil including the extractMetadata method

Testing

I have done some basic testing in full dev. I've spun it up and made some basic assertions:

To test streaming enrichments in the enrichment topology:

  1. Add an ip address that matches the bro data set to the 'user' enrichment set up in METRON-2012 Unable to Execute Stellar Functions Against HBase in the REPL #1345:
[Stellar]>>> msgs := ["user1,192.168.66.1"]
[user1,192.168.66.1]
[Stellar]>>> KAFKA_PUT("user", msgs)
1
  1. Add the enrichment to the bro enrichment config:
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
  "enrichment": {
    "fieldMap": {
      "geo": [
        "ip_dst_addr",
        "ip_src_addr"
      ],
      "host": [
        "host"
      ],
      "stellar": {
        "config": {
          "enrichment": "ENRICHMENT_GET('user', ip_src_addr, 'enrichment', 't')"
        }
      }
    },
    "fieldToTypeMap": {},
    "config": {}
  },
  "threatIntel": {
    "fieldMap": {
      "hbaseThreatIntel": [
        "ip_src_addr",
        "ip_dst_addr"
      ]
    },
    "fieldToTypeMap": {
      "ip_src_addr": [
        "malicious_ip"
      ],
      "ip_dst_addr": [
        "malicious_ip"
      ]
    },
    "config": {},
    "triageConfig": {
      "riskLevelRules": [],
      "aggregator": "MAX",
      "aggregationConfig": {}
    }
  },
  "configuration": {}
}' 'http://user:password@node1:8082/api/v1/sensor/enrichment/config/bro'
  1. Verify there are no errors in the enrichment topology and that the new enrichment field appears in the bro index.

Outstanding Issues

  • I think we could improve on the MetadataUtil refactor. Should we rename RawMessageUtil to something more appropriate? Should we reorganize that code differently?
  • Is the HdfsWriter approach good enough? How else could we do this?
  • Is metron-platform/metron-storm-common the right module for common Storm code?

Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.

In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:

For all changes:

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?

For code changes:

  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?

  • Have you included steps or a guide to how the change may be verified and tested manually?

  • Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:

    mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh 
    
  • Have you written or updated unit tests and or integration tests to verify your changes?

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via site-book/target/site/index.html:

    cd site-book
    mvn site
    

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.

@mmiklavc
Copy link
Contributor

mmiklavc commented Mar 6, 2019

@merrimanr - thanks for taking this on! These refactoring PR's can get pretty deep, and your work here is greatly appreciated.

After a quick pass, I have a couple initial Q's/thoughts

  1. I see the writer interface change to remove TopologyContext topologyContext. It doesn't look like we actually used that context anywhere explicitly, other than maybe the HDFSWriter. Seems like this was a pretty clean change for us in that regard. Is that the case, or can you call out any specific places where this may have caused drama?
  2. metron-storm-common is a desirable split. What do you think about exploring following a similar packaging/module approach as was taken in metron-parsing. ie
* metron-common/
    * metron-common-storm/
    * metron-common-spark/ (upcoming)

I think that might organize things for us just a bit neater.

@merrimanr
Copy link
Contributor Author

The HDFSWriter class was the only special case. Let me know if my explanation about how that was handled doesn't make sense or if you see a better way.

Sure I can move metron-storm-common there. Are you thinking like this:

* metron-common/
    * metron-common/
    * metron-common-storm/
    * metron-common-spark/ (upcoming)

@merrimanr
Copy link
Contributor Author

Actually we can't have 2 modules with the same name. Would it be metron-common-common?

# Conflicts:
#	metron-platform/metron-common/pom.xml
@mmiklavc
Copy link
Contributor

@merrimanr - I believe a module can be an aggregator, a parent, and a jar-producing pom. So you could still keep it as:

* metron-common/
    * metron-common-storm/
    * metron-common-spark/ (upcoming)

It doesn't fit the layout of our other modules, however. What do you think of keeping metron-common by itself and creating another parent module for the streaming/platform code? e.g.

* metron-common/
* metron-common-streaming/
    * metron-common-storm/
    * metron-common-spark/ (upcoming)

@merrimanr
Copy link
Contributor Author

@mmiklavc latest commit updates the module structure to match your suggestion. Let me know what you think.

# Conflicts:
#	metron-platform/metron-enrichment/metron-enrichment-storm/src/main/java/org/apache/metron/enrichment/bolt/GenericEnrichmentBolt.java
TestingServer testZkServer = new TestingServer(true);
this.zookeeperUrl = testZkServer.getConnectString();
byte[] globalConfig = ConfigurationsUtils.readGlobalConfigFromFile(TestConstants.SAMPLE_CONFIG_PATH);
byte[] globalConfig = ConfigurationsUtils.readGlobalConfigFromFile("../" + TestConstants.SAMPLE_CONFIG_PATH);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make all of these relative paths changes a local test constant? We could probably spend more time to make this less brittle, but for now I think it's a reasonable compromise. See https://github.com/apache/metron/blob/master/metron-platform/metron-enrichment/metron-enrichment-storm/src/test/java/org/apache/metron/enrichment/integration/EnrichmentIntegrationTest.java#L94, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

HdfsWriter writer = new HdfsWriter().withFileNameFormat(testFormat);
writer.init(new HashMap<String, String>(),createTopologyContext(), config);
writer.init(new HashMap<String, String>(), config);
writer.initFileNameFormat(createTopologyContext());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for reference later - this is not part of the metron-common refactoring, but we should remove our Storm dependencies from the writer infrastructure as well.

@mmiklavc
Copy link
Contributor

@merrimanr Save for a small nit with the relative paths "../" duplicated in multiple places within individual tests, I think this is pretty solid. Also looks like a small conflict with metron-platform/pom.xml. Thanks for the effort on this!

You mentioned doing some light testing. Considering the classpath issues we've recently worked through, do you think it's worth running through this test script before we bring this in? #1368 (comment)

@merrimanr
Copy link
Contributor Author

I will spin this up in full dev again and run through testing.

@merrimanr
Copy link
Contributor Author

I ran through the testing instructions again and everything worked as expected.

@mmiklavc
Copy link
Contributor

@merrimanr - looks good. Just to confirm, did you run through the script from 1368? I'm +1 pending confirmation on that as we need to be extremely careful about regressions.

@merrimanr
Copy link
Contributor Author

I ran the tests again and did not notice any regressions.

@mmiklavc
Copy link
Contributor

Cool, just to be clear, +1 stands - go ahead and merge this.

@asfgit asfgit closed this in 2f2c802 Apr 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants