Skip to content

Add deep storage segment metric#16072

Merged
georgew5656 merged 3 commits intoapache:masterfrom
georgew5656:addColdSegmentMetric
Mar 11, 2024
Merged

Add deep storage segment metric#16072
georgew5656 merged 3 commits intoapache:masterfrom
georgew5656:addColdSegmentMetric

Conversation

@georgew5656
Copy link
Copy Markdown
Contributor

@georgew5656 georgew5656 commented Mar 7, 2024

Addressing some followup comments from the change in the metric made in #16020

Description

If a datasource has a replicaCount=0 rule that applies to some segments that are not covered by a replicaCount>0 rule, those segments will not be reported as unavailable in segment/unavailable/count. Since they are not queryable by native queries (only using the query from deep storage feature), we want a separate metric that lets us check how many segments fit these criteria.

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...

Add metric segment/deepStorage/count that counts the number of segments that are only queryable through the query from deep storage metric.

Release note

New metric to support query from deep storage feature.

Key changed/added classes in this PR
  • DruidCoordinator
  • Stats

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@georgew5656 georgew5656 requested review from cryptoe and kfaraz March 7, 2024 21:43
@georgew5656 georgew5656 changed the title Add cold segment metric Add deep storage segment metric Mar 7, 2024
@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Mar 8, 2024

If a datasource has a replicaCount=0 rule that applies to some segments that are not covered by a replicaCount>0 rule, those segments will not be reported as available in segment/unavailable/count

After this patch, will such segments be reported as available or unavailable?
If they need to be reported as unavailable, would we need to remove the replicaCount.required() == 0 condition from the computation of unavailable/count?

return datasourceToUnavailableSegments;
}

public Object2IntMap<String> getDatasourceToDeepStorageQueryOnlySegmentCount()
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Some of the code in this method may be commoned out with getDatasourceToUnavailableSegmentCount.

Comment thread docs/operations/metrics.md Outdated
|`segment/unneededEternityTombstone/count`|Number of non-overshadowed eternity tombstones marked as unused.| |Varies|
|`segment/unavailable/count`|Number of unique segments left to load until all used segments are available for queries.|`dataSource`|0|
|`segment/underReplicated/count`|Number of segments, including replicas, left to load until all used segments are available for queries.|`tier`, `dataSource`|0|
|`segment/deepStorageOnly/count`|Number of unique segments that are only available for querying directly from deep storage.|`dataSource`|Varies|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the metric name deepStorageQueryOnly or better yet availableDeepStorageOnly as it aligns better with unavailable/count?

@georgew5656
Copy link
Copy Markdown
Contributor Author

If a datasource has a replicaCount=0 rule that applies to some segments that are not covered by a replicaCount>0 rule, those segments will not be reported as available in segment/unavailable/count

After this patch, will such segments be reported as available or unavailable? If they need to be reported as unavailable, would we need to remove the replicaCount.required() == 0 condition from the computation of unavailable/count?

that should say "not be reported as unavailable", my bad.

i think they should not be reported as unavailable b/c that metric is supposed to be 0 during steady state and enabling query from deep storage shouldn't change this

@georgew5656 georgew5656 requested a review from kfaraz March 8, 2024 16:22
@suneet-s
Copy link
Copy Markdown
Contributor

suneet-s commented Mar 8, 2024

What is the purpose of this metric? The doc says it's typical value is varies - is there a good or bad value?

I can imagine tracking the size of cold data as a useful metric, but I'm not sure about the use of seeing how the count of cold segments changes over time will be helpful. Maybe some doc updates about that would help clarify the use of the metric.

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Mar 9, 2024

i think they should not be reported as unavailable b/c that metric is supposed to be 0 during steady state

Thanks, @georgew5656 , the steady state reasoning makes sense to me.

@suneet-s , one use of this metric would be to find out if the number of segments that are not available for querying using the native engine. For example, if there are some segments that are available for querying from deep storage using MSQ, they would not be counted towards the unavailable/count, but these would still not be available to the native engine. The new metric would thus be reported as non-zero for such cases.

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
As @suneet-s requested, you may add more info in the docs. But the current definition of the metric also seems sufficient to me.

@georgew5656 georgew5656 merged commit 94d2a28 into apache:master Mar 11, 2024
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants