-
Notifications
You must be signed in to change notification settings - Fork 505
METRON-777 Metron Extension System and Parser Extensions #530
Conversation
Based on Apache Nifi Nars NAR changes * new lib , rebrand to bundles from NAR * port to VFS/FileObject from File based * ability to set property values * Rework FileUtils so that you can derive and override * added initializers to set 'classes' that we care about instead of hard coding them, still needs defaults * added components nec. for integration tests ( do not want dep. on metron-* ) * VFSClassloader for NarClassLoader * Hdfs based integration test version of unpacknars tests * HDFS ( filesystem ) based fileutilities to cover for writes to hdfs, since VFS is currently R/O HDFS * modified plugin to support configuration of outputs * use class index not service loader ( both subclass and annotated supported ) Archetype * Parser Extension archetyp * incudes all configuration * creates tar.gz with bundle and configuration * class index support ( automatic generation ) Extensions * new extensions modules * parser * archetype built module for each parser type * support for configuration only parsers with tests Parsers * moved all but json, csv, grok to extensions * Bolt now loads from bundle properties Deployment * rpms for parsers * create extension directories * ambari initializes zookeeper per parser * amabri creates hdfs directories * ISSUE: Writing to hdfs Rest-API * only test against parsers in metron-parsers * still needs integration
|
Woah, big contribution here; thanks @ottobackwards ! So this one is hard to review because a lot of it is:
Would you mind giving us a list of files where changes are made that don't fit those two categories? I think that'd help us isolate the bits to review easier. In the meantime, I have a couple of questions:
|
|
RE: Travis The system is out of resources. |
|
RE: parser size |
|
So do you like want a listing of files? Here is a high level by the file guide, but I may be missing something changes to OOM issues in travis container.travis.yml packaging and deploymentmetron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/configuration/metron-env.xml metron-extensions/README.md asa integration test, but from bundle, not dep on asametron-extensions/metron-parser-extensions/metron-parser-bundle-tests/pom.xml fixes for testsmetron-interface/metron-rest/src/main/java/org/apache/metron/rest/MetronRestConstants.java new functions for bundle.properties and tests - dealing with paths etcmetron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/ConfigurationsUtils.java support new directoriesmetron-platform/metron-enrichment/src/test/java/org/apache/metron/enrichment/integration/components/ConfigUploadComponent.java load parsers from bundlesmetron-platform/metron-parsers/src/main/java/org/apache/metron/parsers/bolt/ParserBolt.java the system bundle properties to testsmetron-platform/metron-integration-test/src/main/config/zookeeper/bundle.properties test function fixes for new pathsmetron-platform/metron-management/src/test/java/org/apache/metron/management/ConfigurationFunctionsTest.java fix to work with new pathsmetron-platform/metron-test-utilities/src/main/java/org/apache/metron/test/utils/SampleDataUtils.java |
|
if you don't review the parsers, because the code didn't change, you still need to review or think about the concept that each parser, as an extension should have all of the things ( configuration etc ) that it needs within it's package. So the parsers have their grok statements, their configuration ( index, enrichment , and parsers and in the future ES + log rotate ) |
… it is when running from archetype, causing the extension versio to be used for this dependency
…archetype to use metronVersion
|
@mattf-horton I really like how this is evolving. One thing I have been thinking of since adding the BundleSystem interface ( which should be the main external interface ) is that I would like to refactor the packages, and introduce a .core, with the current roots and subfolders, exposing only the BundleSystem at the root. Do you have any thoughts on this? I know it assumes you have caught up with and are OK with the BundleSystem.... sorry |
|
@ottobackwards , many good improvements here. A few comments on the singleton idiom, and a couple other details. |
| public final class BundleClassLoaders { | ||
|
|
||
| private static volatile BundleClassLoaders bundleClassLoaders; | ||
| private volatile InitContext initContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since BundleClassLoaders is a singleton, this (initContext) should be static too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| } | ||
| } | ||
| return result; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this can return an uninitialized singleton. It is better to have getInstance() just test for null and throw UnitializedException or return an initialized instance. The creation of the singleton should be in init() -- which must be static so you can invoke before the instance exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| * @throws IllegalStateException when already initialized with a given set of extension directories | ||
| * and extensionDirs does not match | ||
| */ | ||
| public void init(final FileSystemManager fileSystemManager, final List<FileObject> extensionsDirs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted above, this method should be static, and create the singleton instance at the same time as initializing it. Creation and init of both BundleClassLoaders and InitContext instances should be in the same synchronized block, and of course be conditional on either not already existing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| synchronized (this) { | ||
| if(initContext != null) { | ||
| initContext = null; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend combining reset and unInit methods. This avoids the issues of
- having reset static while unInit is per-instance (doesn't matter which since is singleton, but confusing to use both)
- using double synchronize calls (ok even with both static or both per-instance, because Java locks are reentrant, but unnecessary and raises concerns in paranoid reviewers like me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| static final Logger LOG = LoggerFactory.getLogger(BundleThreadContextClassLoader.class); | ||
| static final ContextSecurityManager contextSecurityManager = new ContextSecurityManager(); | ||
| private final ClassLoader forward = ClassLoader.getSystemClassLoader(); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much like that you made BundleThreadContextClassLoader statically self-initializing, by using ExtensionManager.getInstance().getExtensionClasses() instead of keeping a copy here that had to be initialized. Good job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't get rid of all the static init, at least we can get rid of as much as possible.
| } | ||
|
|
||
| private InitContext discoverExtensions(final List<Class> classes, final Bundle systemBundle, final Set<Bundle> bundles) | ||
| throws NotInitializedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can NotInitializedException still occur? If so, something below is calling getInstance() instead of static methods, which it will need to change anyway in order to change init() to static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gone
| * | ||
| * {@see com.google.common.collect} | ||
| */ | ||
| public class ImmutableCollectionUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you're trying to assure thread-safeness. Is it worth the extra effort and object spawning? Are the underlying objects likely to be attempted to be modified by other threads, given first-level ("shallow") immutability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more const correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted.
| */ | ||
| public BundleSystem build() throws NotInitializedException { | ||
| if (this.properties == null) { | ||
| throw new IllegalArgumentException("BundleProperties are required"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you also said the only use of properties file is for the extension classes. This suggests an either/or with the "extensionClasses" field. But if you feel making properties mandatory is future-proofing, that's okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the properties for other things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, then please correct the above comment on withExtensionClasses() that says "If [ExtensionClasses] provided, the properties file will not be used.".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make it more clear they will not be used for the classes, but will be used for other things
| * The ExtensionMapping represents a mapping of the extensions available to the system. | ||
| * It is the product of the BundleMapper. | ||
| * | ||
| * It is NOT used at runtime for loading extensions, rather it may be used by a system to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note which method does load extensions at startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| * extension type. | ||
| * @param extensionTypeName the extension type name, such as parser, stellar, indexing | ||
| * @return Map of extension class name to a Set of BundleCoordinates | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good. Thanks for adding all the javadocs.
|
@mattf-horton all feedback addressed, as noted above |
|
I'm seeing this error spinning up topologies in full-dev. I get this for both an existing parser, yaf in this case, as well as a newly-created parser. Would any of this have changed in this PR? |
|
I don't think so. Are you running the latest? I'll run up full dev and check. |
| return bundleClassLoaders; | ||
| } | ||
| return result; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no longer a need for getInstance() to synchronize, if we deal with bundleClassLoader atomically. (See also init() method comment below):
public static BundleClassLoaders getInstance() throws NotInitializedException {
BundleClassLoaders result = bundleClassLoaders;
if (result == null) {
throw new NotInitializedException("BundleClassLoaders not initialized");
}
return result;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| throw new IllegalStateException("BundleClassloader already exists"); | ||
| } | ||
| bundleClassLoaders = new BundleClassLoaders(); | ||
| initContext = bundleClassLoaders.load(fileSystemManager, extensionsDirs, props); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that getInstance() can safely not synchronize, do this:
BundleClassLoaders b = new BundleClassLoaders();
InitContext ic = b.load(fileSystemManager, extensionsDirs, props);
initContext = ic;
bundleClassLoaders = b;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| return extensionManager; | ||
| } | ||
| return result; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with BundleClassLoader, to deal with extensionManager atomically:
public static ExtensionManager getInstance() throws NotInitializedException {
ExtensionManager result = extensionManager;
if (result == null) {
throw new NotInitializedException("ExtensionManager not initialized");
}
return result;
}| throw new IllegalStateException("ExtensionManager already exists"); | ||
| } | ||
| extensionManager = new ExtensionManager(); | ||
| initContext = extensionManager.discoverExtensions(classes, systemBundle, bundles); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to protect unsynchronized getInstance():
ExtensionManager em = new ExtensionManager();
InitContext ic = em.discoverExtensions(classes, systemBundle, bundles);
initContext = ic;
extensionManager = em;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| * before the rest of the methods are called afterwards. | ||
| * This is for TESTING ONLY at this time. Reset does not unload or clear any loaded classloaders. | ||
| */ | ||
| public static void reset() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add @ VisibleForTesting annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| @AfterClass | ||
| public static void after() { | ||
| BundleClassLoaders.reset(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intend to do ExtensionManager.reset(); here also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate this static stuff.
|
@ottobackwards , looking very good. Couple more comments above (unfortunately the system decided they are on "outdated" code, so you'll have to press the "Show outdated" buttons), then you have my +1. |
|
Looks good, you've responded to all my comments satisfactorily, and Travis is still happy. |
|
And @ottobackwards , I want to say thank you for all your patience plowing thru this stuff, with unavoidably slow rates of review. It's been a monumental contribution, and I look forward to seeing it in. |
|
Thank you @mattf-horton. |
|
Here is the current status
I think I have the hdfs grok stuff fixed, accept in the rest services which are super tightly coupled to metron-parsers and count on the common rules being in the classloader. |
|
OK @merrimanr , @mmiklavc Please review so we can get this into 777. |
Because I was already tasked with making this PR smaller ( don't laugh ), I left the Configuration UI and REST not integrated with Bundles. That means that they only have access to metron-parsers homed parser types ( JSON, CSV, GROK ) when creating parsers. All of the ZK based stuff works, and all parsers are listed. But there are areas where the REST API is tightly coupled to metron-parsers and is using the service loader pattern that have technically regressed. If we need this to be addressed as part of this PR, I can do that. |
|
And also, the PR for fixing grok needs to go in to do the next step there as well. |
|
There are some consequences of not having the bundle classes available in rest. The list of available classes will only contain metron-parsers homed parser types so those are the only type of parsers you would be able to create in the UI. Editing parsers outside of these will also not work correctly because the classes won't be available to parse sample messages. These are fairly significant regressions in my opinion. I'm fine with this work being split up into 2 PRs but I believe they should go in at the same time. Otherwise we'll be in a state where the UI does not work. |
|
closing this in preparation for feature branch |
|
Note to all reviewers: #720 I believe I have addressed everything brought up here at this point ( including grok not working, and the config-ui creating instances of the non-base parsers ) |
Contributor Comments
The pr. introduces an extension system for metron, along with refactoring the metron parsers on top of it. This is the base work for METRON-258 - Sideloading parsers, which is to follow, as it enables the creation and management of extensions outside of the metron codebase. The work for enabling side loading is the ability to install and deploy 3rd party extensions/parsers.
There is a lot that can be done with this, but I could nibble at it forever, and I'd like to get feedback and improvements going. There is still more documentation work that can be done for example.
The areas of change:
Travis
Metron Maven Bundle Plugin
bundle-lib
My goal here was not to make any radical changes. If you want the diff, I would suggest cloning nifi latest and diffing this module with nifi-nar_utilsMetron Maven Parser Extension Archetype
Metron-Platform/Metron-Extensions / Metron-Parser-Extensions/**
[note - during integration tests, the parsers are found in the system classloader not the bundles, because the way the tests are tied together]
Bundle Extension Test
Metron-Parsers
Metron-configuration
Metron Tests
RPM-Build
/usr/metron/V/ now has a new directories for extensions:
/extension_etc/PARSER_NAME/ -> that parser's configuration
/extension_alt_etc/ -> location for 3rd Party extension configuration
/extension_lib -> location on disk for rpm to place bundles
/extension_alt_lib -> location on disk for staging 3rd party bundles
METRON-SERVICE ambari
the metron workflow that this enables
We need a new parser:
*create with archetype under metron-extensions/metron-parser-extensions
*implement including tests and test data, all configurations
*add to the copy-resources of RPM-Docker pom
*add to the spec file
*add to the all_parsers variable in params
I have been working in Full Dev to get this going, and I believe it is working enough to get this started.
At the end of vagrant up with full dev, you should have data in kibana, as if nothing had changed ;)
There are issues however:
I have not integrated this with the Metron Docker project, I'm not sure how yet.
I have fixed Metron-Interface to get the test to run, but I think that work needs to be done there.
The next steps here are follow ons for installing parsers from the ui.
Testing:
I am sure I missed many things, and that there are things that could be better. Thank you in advance for your review.
Pull Request Checklist
Thank you for submitting a contribution to Apache Metron (Incubating).
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.
In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:
For all changes:
For code changes:
[x ] Have you included steps to reproduce the behavior or problem that is being changed or addressed?
[x ] Have you included steps or a guide to how the change may be verified and tested manually?
[x ] Have you ensured that the full suite of tests and checks have been executed in the root incubating-metron folder via:
[x ] Have you written or updated unit tests and or integration tests to verify your changes?
[x ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
[x ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?
For documentation related changes:
[ x] Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via
site-book/target/site/index.html:Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommened that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.