-
Notifications
You must be signed in to change notification settings - Fork 506
METRON-1834: Migrate Elasticsearch from TransportClient to new Java REST API #1242
Conversation
…nsformer Shade plugin transformer.
…ate properties for new client.
|
A recent merge with #1218 has caused a number of integration test failures. Looking into it. |
df2a77d to
1a47ded
Compare
|
Note, I force pushed this branch with revised history after the revert done by @nickwallen on #1218. This is effectively a merge with the latest master minus 1218 changes while we resolve the latent issues with ES and doc ids and guids. |
...-elasticsearch/src/main/java/org/apache/metron/elasticsearch/client/ElasticsearchClient.java
Outdated
Show resolved
Hide resolved
...csearch/src/main/java/org/apache/metron/elasticsearch/config/ElasticsearchClientOptions.java
Show resolved
Hide resolved
...search/src/main/java/org/apache/metron/elasticsearch/dao/ElasticsearchColumnMetadataDao.java
Outdated
Show resolved
Hide resolved
...m/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/utils/FieldMapping.java
Show resolved
Hide resolved
...etron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/utils/FieldProperties.java
Show resolved
Hide resolved
...-elasticsearch/src/main/java/org/apache/metron/elasticsearch/writer/ElasticsearchWriter.java
Outdated
Show resolved
Hide resolved
...-elasticsearch/src/main/java/org/apache/metron/elasticsearch/client/ElasticsearchClient.java
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| public String[] getIndices() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there unit tests specifically for this class? I don't see any, but maybe I missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not unit tests, but the integration tests definitely cover this bit.
...on-elasticsearch/src/main/java/org/apache/metron/elasticsearch/utils/ElasticsearchUtils.java
Outdated
Show resolved
Hide resolved
|
I'm happy to clarify variable and method names here, but I want to be careful about overdoing it on the javadoc. It gets stale, is not testable, and is far more often overlooked than the code itself. I'm all for javadoc on public API's. |
|
Just chiming in here in support of taking some care around javadoc. I tend to agree that it goes stale and it can become misleading. I think the metric (and this is just personal opinion here) should be that any public API should be well javadoc'd and otherwise documented (e.g. the REST API) and we should seriously consider javadoc'ing classes which are used broadly. I tend to prefer a compromise whereby the class itself has documentation around the intent and use of the class in Javadoc and the methods are named well and broken up well. From there, I'm ok if there's less effort put around javadoc of individual methods. Just my $0.02 |
|
Thanks for the feedback/viewpoint @cestella . I think @nickwallen has some good points, in particular about why/when to choose one client or the other. I'm adding some of that detail now, and am linking aggressively to the ES docs so that we're not simply reproducing their documentation. In general, I do NOT think we should be writing javadoc like this - https://docs.oracle.com/javase/8/docs/api/java/util/Arrays.html. It is certainly detailed, but arguably pedantic for our purposes. Oh, and we should also be documenting Stellar functions - which is a feature we already support via the annotations you added some time ago. |
|
I get your high-level take on DRY. Sure, makes sense. And I'm certainly not asking for anything close to the level of detail in the Arrays example. I just so often find myself tracing through code for 15 minutes to find answers to questions that could have been answered with one, well-placed comment. Spent most of the day in that situation actually. That being said, whatever middle ground you can find will likely suffice. |
|
I've started addressing some of the questions you had @nickwallen. Not quite finished yet, but the rest should be soon to follow. I'd like to clean up the way we handle the ElasticsearchClient and its instantiation a bit. |
|
Just to chime in, understanding why something was done is important in maintaining the code down the line. @mmiklavc will not be the only one working on this code forever, and as such there should be some documentation somewhere for areas, no matter where they are, to explain important implementation decisions and concepts. The code is the ultimate source of truth, so I would prefer there. As someone who has to go spelunking through undocumented metron source, almost all of which I did not write or review, I can testify to this being helpful and in fact necessary if going forward we want to avoid regressions. Side effects of methods are important too. |
|
I think we're all on the same page @ottobackwards, @nickwallen, and @cestella . As I mentioned early on, this effort was somewhat unique as it was built off of some POC work from @cestella from a couple months back, and I wanted to enhance that work (e.g. addressing the TODO's, bugs, and configuration issues mentioned) while refactoring/changing as little as possible. That's on me for not having gotten out in front of polishing a few of the common/reusable pieces a bit more, so your comments are well received. Updates soon to follow. |
|
@ottobackwards @nickwallen - just to shed some additional light on the choices made here, take a look at the DAO classes. The new |
…h, and kerberos/security documentation
|
Adapted from #840 (comment) Test ScriptRun full dev and verify you see data populating the alerts UI. Testing Instructions beyond the normal smoke test (i.e. letting data PreliminariesSetup env vars. I like to do something like the following: Deploy the dummy parser
Send dummy data through
Test Case: Update via patch
Test Case: Update via replace
Meta Alerts TestSet Up Base DataWe're going to set up a bit of base data. In this case, we care about Create a Meta AlertAt this point, we'll group these alerts together. In Ambari, go to Metron -> Quick Links -> Swagger UI and go to the "Meta Alert Controller". Validate that the metaalert was created and looks good: Note that both sub alerts are present, the various counts are filled in, a GUID has been given specifically to this meta alert, etc. SearchingSearching from the REST API works mostly as expected. After the above data has been created, use the search endpoint to run this query: The result should look similar to the following, and have the messages we'd expect (one from snort and one from metaalerts): |
|
@nickwallen I updated the ElasticsearchSearchIntegrationTest a bit. I'm now using the new client to setup the bro/snort templates, create the indices, and load the data. |
Testing Notes For X-Pack AND SSL (I know, gettin fancy):
|
|
+1 Big effort and many thanks for this. I ran this up in Full Dev and it worked like a charm. |
|
Thanks @nickwallen. I pushed one more commit to make a note of this Jira in Upgrading.md. I verified it looks correct in the site-book and checked it off on the Jira tasks as well. I guess just a quick look (it's like 5 lines) when you get a moment would be great. The last remaining bit before I merge this will be sharing some results around the performance/regression testing performed by @anandsubbu. |
|
@mmiklavc +1 remains. Addition to upgrade doc looks good. A user may just have to translate from old properties to new. |
I completed a set of performance comparison tests between the REST and Transport clients. Here's the update. Note on the Testing Done
Observations
Based on the above, it is seen that there is no immediate regression in performance b/w the REST and Transport client. +1, great work @mmiklavc |
|
I'm about to merge this and wanted to give some added feedback. @anandsubbu and I made some additional changes to the tuning options, specifically setting |
|
Ship it! |
And the number of Workers too. i.e. In my test, I set this to 10 and saw no failed tuples in the kafkaSpout. My +1 holds. Thanks @mmiklavc ! |
Contributor Comments
https://issues.apache.org/jira/browse/METRON-1834
This task has been a long time coming after having completed the ES upgrade in https://issues.apache.org/jira/browse/METRON-939. Motivation for completing this now is that Elasticsearch will be deprecating use of the TransportClient in v 7.x. This PR migrates the Elasticsearch client from TransportClient to the newer Java REST API.
This builds off and finishes work started by @cestella here - https://github.com/cestella/incubator-metron/tree/es_rest_client. I condensed his branch into 1 flattened commit and built on top of it in order to provide attribution.
I have a number of tasks I'm still working through, but I wanted to get the review process started. I've minimally validated X-Pack auth and will have some follow-up for SSL. Test plans and a breakdown of the changes will be soon to follow. For starters, full dev should continue to work as normal and you should see data flowing into indexes for bro, snort, and yaf. There are some additional changes to how this client will be configured, which I'll be documenting shortly. The new client does not take a Map of settings any longer now that it is leverage Apache HTTP Async Client https://www.elastic.co/guide/en/elasticsearch/client/java-rest/5.6/java-rest-low-usage-dependencies.html under the hood. This meant choosing a set of properties to expose and doing a translation to the builder pattern under the hood. Again, I'll have a write-up of this in the migration guide and update the README's accordingly.
NOTE: This checks off 2 items from this follow-on list #840 (comment)
Per discussion in the Metron Slack channel, I will be updating the Jira ticket with a series of tasks to be completed prior to acceptance, including performance regression testing compared with the old API.
Pull Request Checklist
Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.
In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:
For all changes:
For code changes:
Have you included steps to reproduce the behavior or problem that is being changed or addressed?
Have you included steps or a guide to how the change may be verified and tested manually?
Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
Have you written or updated unit tests and or integration tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?
For documentation related changes:
Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via
site-book/target/site/index.html:Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.