Skip to content

The segmentMetadata query is dreadfully slow [O(n*n*log(n))] #2081

@vogievetsky

Description

@vogievetsky

It seems that the segmentMetadata query does an operation on the broker that is O(n*n*log(n)) in the worst case where n is the number of segments it is merging over.

It is in the segments merging (look for the code that does a Joda Time interval condense).

It does an n*log(n) operation n times in the worst case. This is hosing some people using Pivot.

Also this is the result of the segment metadata query for one of my datasources.

Query:

{
  "queryType": "segmentMetadata",
  "dataSource": "my-datasource",
  "merge": true,
  "analysisTypes": []
}
[
  {
    "id": "merged",
    "intervals": [
      "2015-12-04T00:00:00.000Z/2015-12-10T00:59:59.454Z",
      "2015-12-10T01:00:00.000Z/2015-12-10T01:59:59.965Z",
      "2015-12-10T02:00:00.000Z/2015-12-10T02:59:59.740Z",
      "2015-12-10T03:00:00.000Z/2015-12-10T03:59:59.960Z",
      "2015-12-10T04:00:00.000Z/2015-12-10T04:59:59.613Z",
      "2015-12-10T05:00:00.000Z/2015-12-10T05:59:59.079Z",
      "2015-12-10T06:00:00.000Z/2015-12-10T06:59:59.565Z",
      "2015-12-10T07:00:00.000Z/2015-12-10T07:59:59.086Z",
      "2015-12-10T08:00:00.000Z/2015-12-10T08:59:59.577Z",
      "2015-12-10T09:00:00.000Z/2015-12-10T09:59:59.106Z",
      "2015-12-10T10:00:00.000Z/2015-12-10T10:59:59.620Z",
      "2015-12-10T11:00:00.000Z/2015-12-10T11:59:59.143Z",
      "2015-12-10T12:00:00.000Z/2015-12-10T12:59:59.645Z",
      "2015-12-10T13:00:00.000Z/2015-12-10T13:59:59.842Z",
      "2015-12-10T14:00:00.000Z/2015-12-10T14:59:59.686Z",
      "2015-12-10T15:00:00.000Z/2015-12-10T15:59:59.202Z",
      "2015-12-10T16:00:00.000Z/2015-12-10T16:59:59.722Z",
      "2015-12-10T17:00:00.000Z/2015-12-10T17:59:59.742Z",
      "2015-12-10T18:00:00.000Z/2015-12-10T18:59:59.739Z",
      "2015-12-10T19:00:00.000Z/2015-12-10T19:59:59.237Z",
      "2015-12-10T20:00:00.000Z/2015-12-10T20:59:59.764Z",
      "2015-12-10T21:00:00.000Z/2015-12-10T21:59:59.285Z",
      "2015-12-10T22:00:00.000Z/2015-12-10T22:59:59.808Z",
      "2015-12-10T23:00:00.000Z/2015-12-10T23:35:38.446Z"
    ],
    "columns": {
       "omitted": "..."
    }
}

What are these intervals?
Why do they start on a round number and end on the interval max data time?
What is the expected use for them?

I want to be able to disable any interval information coming back to me. I can not imagine what someone might do with this info so I would argue that disabling this would be a good default.

When intervals are sent back please consider them as intervalStart/intervalEnd not intervalStart/dataEnd

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions