-
Notifications
You must be signed in to change notification settings - Fork 505
Metron-498 Grok patterns are now read from zookeeper parser config property "grokPattern" #308
Conversation
| "parserConfig": | ||
| { | ||
| "grokPath":"/patterns/websphere", | ||
| "grokPattern":"# Days - two digit number is used\nDAY \\d{1,2}\n# Time - two digit hour, minute, and second\nTIME \\d{2}:\\d{2}:\\d{2}\n# Timestamp - month, day, and time\nTIMESTAMP %{MONTH:UNWANTED}\\s+%{DAY:UNWANTED} %{TIME:UNWANTED}\n# Generic word field\nWORD \\w+\n# Priority\nPRIORITY \\d+\n# Log start - the first part of the log line\nLOGSTART <%{PRIORITY:priority}>?%{TIMESTAMP:timestamp_string} %{WORD:hostname}\n# Security domain\nSECURITY_DOMAIN [%{WORD:security_domain}]\n# Log middle - the middle part of the log line\nLOGMIDDLE (\\[%{WORD:security_domain}\\])?\\[%{WORD:event_code}\\]\\[%{WORD:event_type}\\]\\[%{WORD:severity}\\]\n# Define IP address formats\nIPV6 ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:)))(%.+)?\nIPV4 (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])\nIP (?:%{IPV6:UNWANTED}|%{IPV4:UNWANTED})\n# Message - the message body of the log\nMESSAGE .*\n# WebSphere - the entire log message\nWEBSPHERE %{LOGSTART:UNWANTED} %{LOGMIDDLE:UNWANTED} %{MESSAGE:message}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only way to do multi line grok statements? This looks super awkward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The websphere Grok pattern is awkward to begin with so it's even worse in JSON. It's a tradeoff between simpler code and being easier to read. I prefer simpler code but I'm probably biased because I was the one writing the code :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please do something to allow multiline strings to be a bit easier to write? My vote is accepting lists of strings or strings for grokPattern and joining them in the case of list a la http://stackoverflow.com/a/7744658
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to test the heck out of this, guys. i like the change because this makes things much simpler because there is no level of indirection anymore where you have to load the grok statement from a file system. but i see potential impacts on vagrant and management pack because we are now changing the way a setting is being interpreted. I didn't see any mods there and I would have expected that. this is one of those PRs we probably want 2 people to independently verify to make sure it's solid. i think there are two things that still need to be done. first, i think the patterns do have to be removed from HDFS because they are now redundant. second, ansible and message pack installers have to be modified to reflect the fact that the value is now in zookeeper. i volunteer to do the first pass and run it up on AWS once this is done. i also volunteer to clean up the docs and tutorials to account for this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wasn't too bad. The grokPattern property now accepts a string or list of strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So - if the grok parser loaded all the grokPattern, and also all the grokPatternSets.. as the improvement? Then moving the old grok patterns out of the parser configs and back into the original file -> ZK patternSet area?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry I don't quite understand the question. Do you mind expanding on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was cribbing how I would propose it would work, for the new jira. Sorry. I am not sure about creating a follow on jira until this is merged, what would you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something like this deserves the same treatment a new feature would get (discussion around requirements, design, etc). There are lots of issues to work though including how the patterns are organized in ZK, how they get upload to ZK, how the parser selectively loads the appropriate patterns and more. We could start with a Jira after this is merged and go from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds good
… unnecessary pattern files
|
Also, should we remove these files?
I don't think they are being used but wasn't sure. |
…the grokPattern property
…pts that create and deploy to grok patterns to HDFS directories
mattf-apache
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall a nice piece of work. Thanks for the opportunity to review it.
| if (grok == null) { | ||
| public List<JSONObject> parse(byte[] rawMessage, SensorParserConfig sensorParserConfig) { | ||
| if (grok == null || isGrokPatternUpdated(sensorParserConfig) || isPatternLabelUpdated(sensorParserConfig)) { | ||
| configure(sensorParserConfig.getParserConfig()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the pattern timestampField reliable? If so, comparing it would be much cheaper than comparing the whole grokPattern, which can be long, and additionally comparing the patternLabel would be unnecessary. If not, a SHA1 hash signature of the grokPattern+patternLabel could be added. This comparison is being done on every message, so worth streamlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timestampField is not what you think it is. It represents the primary timestamp field of the message that's formatted downstream in the parser, not the last time the pattern was changed. I would like to see it go away eventually, since you can now do the same thing with Stellar in the transformation phase.
I agree with you though, since it's called for every message that check needs to be as fast as possible (although you wouldn't use a GrokParser for a high velocity sensor anyways). I think a hash of grokPattern + patternLabel will do the trick. I will add that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, on both counts. Thanks for the correction.
| + originalMessage + " and the parsed message was: " + message + " . Check the pattern at: " | ||
| + grokPath); | ||
| + originalMessage + " , parsed message was: " + message + " , pattern was: " | ||
| + grokPattern); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including several KB of grokPattern in the exception log might not be necessary, especially since if DEBUG is turned on, it was already dumped to log in line 106. Suggest the exception should log only patternLabel and timestampField (or patternHash if you add one). This is comparable to the former behavior of logging the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a point here. The whole pattern shouldn't be logged on every failure, especially since a bad grok pattern will get logged for every message. However there is a reason I did it that way. As a developer, I prefer to have everything I need to troubleshoot something in the same log line. If we log a pattern only when it changes it will be hard to tell which grokPattern caused a failure. You have the message but you'll have to go digging in the logs to find the grokPattern that was applied. Can you think of a way to log efficiently but still provide the context needed to troubleshoot an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "+ grokPattern", would you consider:
- (LOG.isDebugEnabled() ? grokPattern : (patternLabel + " (Turn on DEBUG logging to see pattern text.)"))
But if you think it's really important to log the full pattern regardless of DEBUG setting, I can accept that. It's just that under load in the field you might run 100,000's of messages through a faulty grokPattern before realizing the error, and blow out your log storage. Which irritates users :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what I had in mind. Thanks!
| * @return If null is returned, this is treated as an empty list. | ||
| */ | ||
| List<T> parse(byte[] rawMessage); | ||
| List<T> parse(byte[] rawMessage, SensorParserConfig sensorParserConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently only the GrokParser wants to see the sensorParserConfig, right? Then this patch could be significantly simplified, if desired, by providing both one and two-argument polymorphisms of the parse() method. BasicParser would provide an overridable implementation of the two-argument form as a trivial call to the one-argument form. GrokParser can implement the one-argument form with either an exception throw, or a two-argument call with "current parserConfig" (which would need to be saved from the GrokParser:configure() call). Of course the clients of GrokParser still need to be modified, but all the non-Grok parsers and their unit tests no longer need to change. I think that eliminates 20 of the 38 files in this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea but I'm a little confused on the implementation. Wouldn't just adding this do the trick: "BasicParser would provide an overridable implementation of the two-argument form as a trivial call to the one-argument form"? Why would we want both methods in the interface? Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, since sensorParserConfig is so nicely cached as a POJO, why pass it through arguments at all? GrokParser can just fetch it again, for the comparison in GrokParser:134. Then the signature of parse() can remain the same and none of the client parsers or their unit tests need to change; just the unit test for overall GrokParser. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I had previously suggested polymorphism is so that the children of BasicParser and the children of GrokParser would not have apparently different interfaces. Granted making both implement both, when each only wants one, is less than ideal. But I think the suggestion I just sent is best: no need to change the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where will GrokParser fetch the latest grokPattern from? We can't cache it in the GrokParser object because that copy will be stale as soon as ZK is updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be honest, the Java 8 defaults weren't obvious to me either until I looked closer at the MessageParser interface :)
All this synchronization talk makes me nervous. Would it make things simpler if we detected the config change in ParserBolt.execute and re-initialized the grok object from there? We could save a cached copy of the config in the ParserBolt on each call to execute and detect changes before parse is called using the hashing approach you suggested earlier. This is the reason I initially changed the interface parse method to include the most recent config, to avoid multi-threading complexity. This is approach is very similar, we're just using a different method to pass this config in and detecting a change in ParserBolt instead of GrokParser. The parse method is synchronous by nature anyways. It doesn't need to be aware of a config change immediately, only when the parse method is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My inclination (and my original suggestion) is to keep all the asynchronous behavior entirely in the ParserBolt (where it already is) and only call parser.configurationUpdate() synchronously in the execute method. So yes to that. Then no synch at all needed in GrokParser.
However, your understanding that you could use ParserBolt#updateConfig() to capture notification of changes WITHOUT expensive checks, is brilliant, and exactly what updateConfig() is for. If we do it in ParserBolt, all you need is one synchronized routine:
private boolean configUpdatedFlag = false;
synchronized public boolean atomicCheckAndSet(boolean newValue) {
boolean oldValue = configUpdatedFlag; configUpdatedFlag = newValue; return oldValue;}
Trivial, safe as can be, and ultra cheap. No way to get deadlock, altho if paranoid, one could give it its own lock object instead of using implicit locking on the ParserBolt instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that may not have been real clear. To complete the thought,
In ParserBolt:
AtomicBoolean configUpdatedFlag = new AtomicBoolean(false);
@OverRide
public void updateConfig(String path, byte[] data) throws IOException {
super.updateConfig(path, data);
if (path.startsWith(ConfigurationType.PARSER.getZookeeperRoot() + "/" + getSensorType())) {
configUpdatedFlag.set(true);
}}
and in execute() method, just before invoking parser.parseOptional(), do:
if (configUpdatedFlag.getAndSet(false)) { parser.configurationUpdated(getSensorParserConfig().getParserConfig());}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course one really should use AtomicBoolean. Then, again, no locking. Sigh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. Will add these changes in and put it up for review. Thanks @mattf-horton!
| if(sensorParserConfig != null) { | ||
| List<FieldValidator> fieldValidations = getConfigurations().getFieldValidations(); | ||
| Optional<List<JSONObject>> messages = parser.parseOptional(originalMessage); | ||
| Optional<List<JSONObject>> messages = parser.parseOptional(originalMessage, sensorParserConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch is premised on the idea that changes to the sensorParserConfig will be recognized and passed in to the GrokParser from the ParserBolt. Sorry I'm not familiar enough with the code, but I can't find where the sensorParserConfig is refreshed, or how often it happens. Can you please point it out? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 112. The ConfiguredParserBolt classes ensures a call to getSensorParserConfig() always returns the latest config. Check out that class and ConfiguredBolt, they leverage Apache Curator to listen for Zookeeper changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sensorParserConfig refresh happens asynchronously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. ConfiguredBolt:prepare() installs a Curator event handler which can call updateConfig() asynchronously. Thanks for the pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't metron have a Configuration Management object/service - + interfaces that was the 'source of truth' for the configuration? A default implementation that handled the ZK/Curator changes hand handed out the most recent configuration when asked could then be created. A different implementation could be from file or from ????? etc.
We could have the access to that object/service be part of the interface that the parser could use, instead of implicitly passing it around and cluttering the api
All of these concerns can be encapsulated, and don't need to bleed through to the bolts and parsers.
They don't need to know how to manage the configurations
|
Has this been tested with the rpm and mpack stuff? Pretty sure this will break those items, so there are probably changes that would need to be made as part of this PR to keep from having regressions. |
|
The ASA patterns file |
- reverted MessageParser.parse method back to original form and added MessageParser.configurationUpdated method - changed ParserBolt to detect config changes, set atomic flag and call MessageParser.configurationUpdated on next execute cycle - changed GrokParser to implement MessageParser.configurationUpdated method and only log grokPatterns when log level is set to debug - brought ParserBoltTest to 100% test coverage
|
Added a commit that addresses the concerns about changing the MessageParser interface. All non-Grok related parser files have been reverted and are no longer part of this PR. Passing unit/integration tests and tested on quick-dev. In addition to any feedback people might have, there are 2 issues still pending: testing with rpm/mpack and adjusting code to accommodate PR #276 once it is merged. |
| int numWritten = 0; | ||
| if(sensorParserConfig != null) { | ||
| if (configUpdatedFlag.getAndSet(false)) { | ||
| parser.configurationUpdated(getSensorParserConfig().getParserConfig()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch on the need to re-call getSensorParserConfig(). But I think it needs to be captured in the sensorParserConfig variable for use in lines 132 and 134, eg:
if (configUpdatedFlag.getAndSet(false)) {
sensorParserConfig = getSensorParserConfig(); //re-fetch in case it's changed
parser.configurationUpdated(sensorParserConfig.getParserConfig());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that's definitely incorrect. I think it would be safer if we use the config object fetched in line 118 though. Is there a benefit to updating the config in the middle of the execute method? We're already delaying the change in the parser until the next execute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary to re-call getSensorParserConfig(), because of the tiny chance that the change in config was interjected between the "execute()" thread doing line 118 and doing line 125. If this happened, we would send the OLD config (from sensorParserConfig) to parser.configurationUpdated(), then clear the flag, resulting in not picking up the config change until...the next config change. Bad. Unless I misunderstand the TreeCache, which may be.
However, if we do this correctly, we are not delaying the change to the parser in a significant way, because the ParserBolt calls parser.configurationUpdated() before calling any other parser.* methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wasn't completely clear. Line 125 would change to this:
parser.configurationUpdated(sensorParserConfig.getParserConfig());
So config is fetched in line 118 and used everywhere after that until the next execute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. All that's needed is to fetch sensorParserConfig AFTER doing configUpdatedFlag.getAndSet(false). So right before line 118 do
boolean mustUpdateConfig = configUpdatedFlag.getAndSet(false); //must check before fetching sensorParserConfig
then in the current line 125 use mustUpdateConfig instead of configUpdatedFlag.getAndSet(false).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added these in.
| @Override | ||
| public void updateConfig(String path, byte[] data) throws IOException { | ||
| super.updateConfig(path, data); | ||
| if (path.startsWith(ConfigurationType.PARSER.getZookeeperRoot() + "/" + getSensorType())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a risk that one sensor type might be a leading substring of another? (foo and foobar). If so, need to test for match with terminal slash, or equality without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good catch.
…the execute method. Also improved Zookeeper parser path checking on config update and added more test cases.
| super.updateConfig(path, data); | ||
| if (path.startsWith(ConfigurationType.PARSER.getZookeeperRoot() + "/" + getSensorType())) { | ||
| String pathWithoutTrailingSlash = path.replaceAll("/+$", ""); | ||
| if (pathWithoutTrailingSlash.equals(ConfigurationType.PARSER.getZookeeperRoot() + "/" + getSensorType())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that there may be at least one segment of structured name between sensorType and a value that may have changed, thus still requiring a "starts with"-like calculation. How about replace both lines 188 and 189 with:
if (path.matches("^" + ConfigurationType.PARSER.getZookeeperRoot() + "/" + getSensorType() + "(/|$)")) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry I don't follow. ConfigurationType.PARSER.getZookeeperRoot() is constant and sensorType is a leaf node in Zookeeper. Maybe an example will help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if sensorType is known to always be a leaf node in ZK, that's sufficient. Thanks.
|
And one last, doubtless unwelcome :-) thought -- So, does the timestampField in the parserConfig get changed with every message? If so, will we end up re-loading the grokPattern on every message? |
|
No problem, that's a good question. The "timestampField" property only changes when you change it in Zookeeper by uploading a config. Let's say we're parsing a sensor and the parsed message keeps it's timestamp value in a field named "timestamp_field". The configuration would be "timestampField=timestamp_field". If for some reason the structure of the message changed and the field were now named different, then you would change that setting to the new name. |
|
Ah, that makes sense -- although I would have called it "timestampFieldName" :-) +1. I think it's ready to go, and a great piece of work! |
|
Can't argue with you there, that is a better name. Thanks for all the great suggestions @mattf-horton. |
|
@merrimanr Can you resolve the branch conflicts so we can get this merged into master? Maybe you didn't notice. |
This PR moves Grok patterns from HDFS to a property in Zookeeper ("SensorParserConfig.parserConfig.grokPattern"). Most of the important changes were made to the GrokParser class and related tests. There were 2 primary challenges with implementing this:
The first challenge was solved by changing the MessageParser.parse method to include an additional "SensorParserConfig" parameter that the GrokParser can use to determine if the grok pattern has changed and reinit if necessary. The ParserBolt now passes in the most recent SensorParserConfig to the MessageParser for every message processed. This is similar to how the writer interfaces work. I believe this approach is simpler than having the ParserBolt listen for changes and reconfigure/reinit the GrokParser. Changing this interface makes this PR look much bigger than it actually is because all the parsers and parser tests had to be adjusted. Most of files included are 1 line changes.
The second challenge was overcome by using '\n' to separate lines inside the grok pattern JSON property. An extra '' also had to be added to '' escape characters in the grok pattern files. The advantage of this is that no special serialization/deserialization is needed inside of the GrokParser code. The disadvantage is that the patterns are not as human-readable.
All unit tests and integration tests are passing. I added an extra unit test in SampleGrokParserTest to simulate and test a config being updated. I also tested yaf and squid sensors in quick-dev by updating the grokPattern properties and verifying the new patterns were applied.