Skip to content
This repository was archived by the owner on Aug 20, 2025. It is now read-only.

Conversation

@nickwallen
Copy link
Contributor

@nickwallen nickwallen commented Nov 16, 2018

With this change, when documents are written to Elasticsearch the document ID is no longer set as the Metron GUID, but instead left unset so that Elasticsearch can auto-generate it. Doing this improves write performance into Elasticsearch.

This will also be the case for any Lucene based Indexer, including Solr. This work only covers Elasticsearch, but the same should be done for Solr as part of a separate effort.

This change is dependent on the following open pull requests.

Changes

  • The ElasticsearchRetrieveLatestDao was updated since the GUID is no longer the document ID. This instead does a terms query on the GUID field instead of an ID query.

  • The Document class now contains an optional documentID field. If the Document is retrieved from one of the DAOs this field will be populated. When creating a new document, this field will be empty.

  • Many of the integrations tests had to be updated because the GUID and document ID are now different.

  • The Alert UI was updated so that it visually looks the same. By default, the Metron GUID is still shown as one of the first columns in the table.

    • The table is actually showing the document's GUID field instead of the document ID as it was before. The ID field remains, which contains the document ID generated by Elasticsearch. The user can choose to add this to the table, if they like.
  • This change is backwards compatible. The Alerts UI should continue to work no matter if some of the underlying indices were written with the Metron GUID as the document ID and some are written with the auto-generated document ID.

Testing

  1. Spin-up Full Dev.

  2. Open up the Alerts UI and perform the following basic actions.

    • Search for alerts
    • Escalate an alert
    • Comment on an alert
    • Delete a comment from an alert
    • Create a meta-alert
    • Escalate a meta-alert
  3. Click on the configure wheel and add the 'id' field to the table view. This will now display both the GUID and document ID in the table. They of course will be different.

  4. Click on the 'guid' field in any row to filter the search results by the guid.
    screen shot 2018-11-14 at 2 44 21 pm

  5. Click on the 'id' field to filter the search results by the document ID.
    screen shot 2018-11-14 at 2 44 08 pm

  6. Group by some fields to drill into the data. In the tree view, click on the 'guid' column and ensure the data sorts correctly. Do the same for the 'id' column that was added.
    screen shot 2018-11-14 at 2 46 43 pm

  7. Perform the following actions on a meta-alert.

    • Filter by click. Click on the title of the meta-alert to view only that meta-alert.
    • Rename the meta-alert.
    • Expand/collapse the meta-alert.
    screen shot 2018-11-19 at 12 08 48 pm screen shot 2018-11-19 at 12 09 09 pm

Pull Request Checklist

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?
  • Have you included steps or a guide to how the change may be verified and tested manually?
  • Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
  • Have you written or updated unit tests and or integration tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

cestella and others added 30 commits October 8, 2018 18:06
Document toPatch = retrieveLatestDao.getLatest(guid, sensorType);
if(toPatch != null && toPatch.getDocument() != null) {
originalSource = toPatch.getDocument();
documentID = toPatch.getDocumentID().orElse(null);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to set the doc ID when patching an already indexed document.


} catch (Throwable e) {
container = new DocumentContainer(e);
LOG.error("Unable to find latest document; indexDao={}, error={}",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had no logging to indicate when a particular IndexDao fails. I ran into this and it took some effort to trace since we lacked logging.


} catch (Throwable e) {
container = new DocumentContainer(e);
LOG.error("Unable to remove comment from alert; indexDao={}, error={}",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had no logging to indicate when a particular IndexDao fails. I ran into this and it took some effort to trace since we lacked logging.


} catch (Throwable e) {
container = new DocumentContainer(e);
LOG.error("Unable to add comment to alert; indexDao={}, error={}",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had no logging to indicate when a particular IndexDao fails. I ran into this and it took some effort to trace since we lacked logging.

query = idsQuery.addIds(guid);
// should match any of the guids
// the 'guid' field must be of type 'keyword' or this term query will not match
BoolQueryBuilder guidQuery = boolQuery().must(termsQuery(Constants.GUID, guids));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can no longer use an id query, since the doc ID is no longer the same as the GUID.

private Document toDocument(SearchResult result, Long timestamp) {
Document document = Document.fromJSON(result.getSource());
document.setTimestamp(timestamp);
document.setDocumentID(result.getId());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to set the document ID when retrieving the document.


addFilter(field: string, value: string) {
field = (field === 'id') ? 'guid' : field;
field = (field === 'id') ? '_id' : field;
Copy link
Contributor Author

@nickwallen nickwallen Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows a user to filter on the document ID, if they choose to add it to their view/table.

onSort(sortEvent: SortEvent) {
let sortOrder = (sortEvent.sortOrder === Sort.ASC ? 'asc' : 'desc');
let sortBy = sortEvent.sortBy === 'id' ? 'guid' : sortEvent.sortBy;
let sortBy = sortEvent.sortBy === 'id' ? '_uid' : sortEvent.sortBy;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows a user to sort by the document ID, if they choose to add it to their view/table.

</td>
<td [attr.colspan]="alertsColumnsToDisplay.length - 1">
<a (click)="addFilter('guid', alert.id)" [attr.title]="alert.id" style="color:#689AA9"> {{ alert.source['name'] ? alert.source['name'] : alert.id | centerEllipses:20:cell }}</a>
<a (click)="addFilter('guid', alert.source['guid'])" [attr.title]="alert.source['guid']" style="color:#689AA9"> {{ alert.source['name'] ? alert.source['name'] : alert.source['guid'] | centerEllipses:20:cell }}</a>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Displays the GUID, if a user has not renamed a meta alert.

@justinleet
Copy link
Contributor

This is probably relatively minor, but we seem to map between the underlying "_id" and "id" on the UI. However, in the search box, I can only do "_id:" instead of "id:blah", probably because we pass that stuff directly.

Making that work in the general case (given that there could be other modifiers), is probably not really worth it, given that it's an id field.

However, are we okay with the user having to know that to find a particular id in the alerts UI they need to do "_id"? It's particularly confusing, because the results of a findOne don't include "id" at all (because it's an ES added field, not a Metron document field), and the results of search include "id", e.g.

"results": [
    {
      "id": "AWejNX8L98Y5WuImUxxU",
      "source": {
          ...
      }
    }
]

I'm a bit confused why the search endpoint returns with "id', when directly hitting ES returns "_id". I don't see anything here that would obviously cause it, but maybe I'm not looking in the right place.
For example

curl -X GET "localhost:9200/bro*/_search?pretty=1" -H 'Content-Type: application/json' -d'
{
  "query": {
    "terms": {
      "_id": [ "AWejNX8L98Y5WuImUxxU" ]
    }
  }
}
'

Returns

...
      "hits" : [
      {
        ...
        "_id" : "AWejNX8L98Y5WuImUxxU",
        ...
      }

@justinleet
Copy link
Contributor

justinleet commented Dec 12, 2018

I assume it's unrelated to this PR, but I did want to point out that renaming the metaalert doesn't actually take effect in the Alerts UI pane until something else happens (like a new search, or an escalation). Basically, it doesn't do the optimistic updating we do elsewhere, but instead only comes back on search

@nickwallen
Copy link
Contributor Author

This is probably relatively minor, but we seem to map between the underlying "_id" and "id" on the UI. However, in the search box, I can only do "_id:" instead of "id:blah", probably because we pass that stuff directly.

The UI code has a few of places where it does special treatment for a field called "id" that I had to remap.

One related thing I did ensure we handle is that if you add the "id" field to the table, then click on it, it will correctly populate the search bar so that it filters for that document ID. That is shown in the screenshot of step 5 in the testing instructions. From that screenshot, you see that it actually adds _id to the search bar. And that just confirms what you are seeing.

Logically, it does make sense to me that a user should be able to type "id:XYZ" into the search bar and it filters by document ID, since the column is labelled "id". Either that or we should get rid of this special thing called "id" and represent it consistently in the UI as "_id". Either way it really should be one way or the other.

I can look into this and see how much effort it would take.

However, are we okay with the user having to know that to find a particular id in the alerts UI they need to do "_id"?

What is your opinion on this? Do you think it is worth the effort?

@nickwallen
Copy link
Contributor Author

Also note that this inconsistency exists in master too.

(1) The table has a column called "id" that when you click on it, it populates the search bar with "guid:fff6c243-dedc-4b46-b005-9a187dbb9d24".

screen shot 2018-12-12 at 1 58 08 pm

(2) If I manually type in "id:fff6c243-dedc-4b46-b005-9a187dbb9d24", the UI returns no results because it doesn't map ID correctly to the guid.

screen shot 2018-12-12 at 2 00 48 pm

That being said, the behavior in this PR is rather similar to master in this way. Or is that just me trying to make a case to be lazy and not address this issue? :)

@mmiklavc
Copy link
Contributor

Was the intent for the id mapping to make our client searching doc index tech agnostic? It seems like a reasonable thing to want our client-facing querying interface to be shielded from changes to our underlying dependencies.

@mmiklavc
Copy link
Contributor

@nickwallen are you able to search for "_id" or is that restricted by the UI due to special chars? I think we might want to encapsulate a formal concept of an "id" field that maps to whatever the underlying doc store is - solr/es/other, and it sounds like we already do this to some extent. But I might also expect that users could search on fields specific to the doc store that we don't explicitly handle, e.g. id maps to _id for ES, but maybe you could also search for "_id" explicitly, if desired. Again, this may cause problems with syntax parsing in the UI so I'm just thinking out loud.

@nickwallen
Copy link
Contributor Author

@mmiklavc: are you able to search for "_id" or is that restricted by the UI due to special chars?

Yes, see the screenshot in step 5 of the test instructions.

@justinleet
Copy link
Contributor

If it exists in master, I'm fine leaving it working that way here (maybe document it somewhere?). I'd like to see a follow-on Jira to clean this up though. It seems like we wanted some special casing, but it's not clear for what and it's obviously not a clean implementation

@justinleet
Copy link
Contributor

+1, assuming @mmiklavc is fine with the "id". Ran it up and it worked really well. Thanks for the contribution!

@mmiklavc
Copy link
Contributor

+1 by inspection.

On the subject of the id vs _id vs _uid vs guid - I do think we need to clean this up at some point. _uid, which is _type/doctype + _id, is used for sorting bc you can't sort on _id in ES. In 6.x _uid is deprecated https://www.elastic.co/guide/en/elasticsearch/reference/6.0/mapping-uid-field.html. We've touched on this before, but we should consider a translation layer that manages variations between our optional downstream indexing engines. I'm not a big fan of this implementation bleed into the UI, but I recognize that this is bigger than just the id field.

@asfgit asfgit closed this in eb2aee1 Dec 13, 2018
JonZeolla pushed a commit to JonZeolla/metron that referenced this pull request Jan 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants