-
Notifications
You must be signed in to change notification settings - Fork 505
Conversation
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package org.apache.metron.stellar.dsl.functions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do other Stellar functions handle config? This is the only config class I see in stellar-common and I'm wondering if there's another idiom. That aside, there are some existing config examples out there, e.g. PcapConfig and PcapOptions. Any specific reason you're using this approach rather than an enum for your options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at 2 different patterns that already exist.
The first was the pattern you mentioned, the ConfigOptions interface. The problem is that this class lives in metron-common which depends on stellar-common. Regardless I don't think pattern is a great fit because it doesn't serialize/deserialize easily and contains a lot of features that are not needed here (functional interfaces for example).
The second was the PROFILER_INIT function in the ProfilerFunctions class. This approach uses a java bean (ProfilerConfig) to hold configuration which allows easy serialization/deserialization. This is actually what I used at first until I made the change to accept a config as an argument expression.
I actually went with option 2 at first but found it cumbersome to merge configs together. I ended up with the pattern I did because a map supports applying properties on top of each other well and serialization/deserialization works without issue. We ended having to do this for PcapConfig and PcapOptions in the REST layer by creating the PcapRequest class which extends a Map.
|
|
||
| @Override | ||
| public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { | ||
| super.prepare(stormConf, context, collector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we necessarily want this in all cases? What about users that never need or want this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a property to configure this. Should it be included by default or not ?
Do you think instantiating this object but never using it would be harmful? It might make things simpler for users if there is one less setting to worry about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More specifically, shouldn't this be isolated to the stellar function itself? This seems like a bleeding of concerns. With zookeeper, it makes sense to me because we're taking the architectural position that enabling dynamic, real-time configuration loading is part of the framework. I don't think that same general applicability applies to HttpClients. What about using the initialize method instead?
public interface StellarFunction {
Object apply(List<Object> args, Context context) throws ParseException;
void initialize(Context context);
boolean isInitialized();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that initially but realized there is no shutdown hook in Stellar. HttpClient objects need to be closed and the only hook that exists is in the bolts.
If we did have a way to close it, I would be in favor of managing it in the function. If we decide it's worth adding that hook in Stellar (I think we should as a follow on) we can easily move this inside the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to add the shutdown hook. This stuck out to me also as bleeding concerns a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a shutdown hook in Stellar is not trivial and out of scope for this PR in my opinion. I would prefer that be a follow on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I am wrong here? Would adding a shutdown hook be simple?
| } | ||
|
|
||
| @Override | ||
| public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the comment for enrichment - do we want this tied to the bolts this way?
| */ | ||
| private Object doGet(RestConfig restConfig, HttpGet httpGet, HttpClientContext httpClientContext) throws IOException { | ||
|
|
||
| // Schedule a command to abort the httpGet request if the timeout is exceeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate in the docs somewhere that this is the hard timeout, and the reason we need the hard timeouts is that the lib provided timeouts are insufficient to achieve the hard timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see the REST section in the Stellar README? I can add more content to make that clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have forgotten to refresh since before the readme was added. I'll check it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that README, could you alter
timeout - Hard timeout for the total request time. Defaults to 1000 ms. to be a bit more explicit about it. E.g. something like
timeout - Stellar enforced hard timeout for the total request time. Defaults to 1000 ms. Client timeouts alone are insufficient to guarantee the hard timeout
Is this a reasonable statement of the problem? Unless I'm mistaken, this gets alluded to a bit, but is never explicitly stated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. Done.
|
|
||
| // Add the basic auth credentials if the rest config settings are present | ||
| if (restConfig.getBasicAuthUser() != null && restConfig.getBasicAuthPasswordPath() != null) { | ||
| String password = new String(readBytes(new Path(restConfig.getBasicAuthPasswordPath())), StandardCharsets.UTF_8).trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we calling trim()? Is whitespace not valid at the front/back of passwords?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just being defensive. I found I had to do this when reading the contents from HDFS. I can look into it again and try to track down exactly what the problem was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was with my testing instructions. I was unintentionally adding an extra newline at the end of the password file. I removed trim() and added a warning to the README. Will also update the test instructions.
|
@merrimanr - about the need for a close/shutdown hook - why is it needed in the first place? I don't believe we shutdown Zookeeper connections explicitly. They end when the topology spins down. Why should this be any different? |
|
We do shut down Zookeeper connections here: https://github.com/apache/metron/blob/master/metron-platform/metron-common/src/main/java/org/apache/metron/common/bolt/ConfiguredBolt.java#L129. The Apache HttpComponents docs recommend closing clients: http://hc.apache.org/httpcomponents-client-4.5.x/tutorial/html/fundamentals.html#d5e217. |
|
I'm just not sure we need to shut them down. It's not like there's writing or state-related concerns that push the need to finish with a formal shutdown. The only time we would ever shutdown a client connection would be when we're killing the topology, right? I really think we need to get this client code out of the bolts. I looked a bit through the Stellar code and if we really think we need this, I think it can be accomplished through a change to the |
|
I think we should close it, docs are pretty clear about that. I think it's possible connections could be left open on the server side. Moving the code out of the bolt is an organization/style issue. I think there should be a functional justification like there is for closing a client. |
|
It's not just a style issue. It's an architectural issue, otherwise I wouldn't care. |
|
Throwing this out there for consideration - #1251 |
|
The latest commit incorporates the changes from #1251 and moves the httpclient inside the function. I was able to verify proper closing in the REPL and the EnrichmentIntegrationTest. I was not able to verify in the Storm topologies because Storm does not guarantee these methods will be called in production. I added code comments in the bolts where we are calling I spun this up in full dev again and ran through the test plan. Everything continued to work as expected. |
|
Travis failure does not appear related here |
|
@merrimanr Pending a successful Travis run, I'm +1 by inspection. Thanks! |
|
+1, thanks! |
Contributor Comments
This PR adds a Stellar REST function that can be used to enrich messages with data from 3rd party REST services. This function leverages the Apache HttpComponents library for Http requests.
The function call follows this format:
REST_GET(rest uri, optional rest settings)There are a handful of settings including basic authentication credentials, proxy support, and timeouts. These settings are included in the
RestConfigclass. Any of these settings can be defined in the global config under thestellar.rest.settingsproperty and will override any default values. The global config settings can also be overridden by passing in a config object to the expression. This will allow support for multiple REST services that may have different requirements.Responses are expected to be in JSON format. Successful requests will return a MAP object in Stellar. Errors will be logged and null will be returned by default. There are ways to override this behavior by configuring a list of acceptable status codes and/or values to be returned on error or empty responses.
Considering this will be used on streaming data, how we handle timeouts is important. This function exposes HttpClient timeout settings but those are not enough to keep unwanted latency from being introduced. A hard timeout is implemented in the function to abort a request if the timeout is exceeded. The idea is to guarantee the total request time will not exceed a configured value.
Changes Included
Testing
There are several different ways to test this feature and I would encourage reviewers to get creative and look for cases I may not have thought of. For my testing, I used an online Http service that provides simple endpoints for simulating different use cases:
http://httpbin.org/#/. Feel free to try your own or use this one.I tested this in full dev using the Stellar REPL and the parser and enrichment topologies. First you need to perform a couple setup steps:
/etc/squid/squid.conf, under the lines withacl Safe_ports*:To test with the Stellar REPL, follow these steps:
REST_GETfunction is available:This behavior can be changed by configuring a 404 to be an acceptable status code and returning an empty object instead of null:
Start the REPL with the custom log4j.properties file:
Then run a
REST_GETcommand and quit. The shell should quit successfully and you should see a log message indicatingclose()was called on the Stellar functions:To test with the parser and enrichment topologies follow these steps:
parser_rest_resultfield should now be present.enrichment_rest_resultfield should now be present.Outstanding Issues
Pull Request Checklist
Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.
In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:
For all changes:
For code changes:
Have you included steps to reproduce the behavior or problem that is being changed or addressed?
Have you included steps or a guide to how the change may be verified and tested manually?
Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
Have you written or updated unit tests and or integration tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?
For documentation related changes:
Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via
site-book/target/site/index.html:Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.