Skip to content

[bugfix] Run cold schema refresh thread periodically #16873

Merged
cryptoe merged 10 commits intoapache:masterfrom
findingrish:fix-cold-schema
Aug 13, 2024
Merged

[bugfix] Run cold schema refresh thread periodically #16873
cryptoe merged 10 commits intoapache:masterfrom
findingrish:fix-cold-schema

Conversation

@findingrish
Copy link
Copy Markdown
Contributor

@findingrish findingrish commented Aug 9, 2024

Issue

Parent issue: #14989
Fixes a bug in #16676.
Schema from cold segments wasn't getting reflected in the datasource schema.
The problem turned out to be in the refresh thread, it ran only once instead of running periodically.

This PR also adds some metrics to monitor cold schema refresh process.

Testing

  1. Added unit test
  2. Verified in a Druid cluster that,
    • Completely cold datasource is queryable. Refer to cold datasource in the attached screen shot with 1 segment.

Screenshot 2024-08-09 at 5 40 33 PM

Screenshot 2024-08-09 at 5 39 39 PM

  • Datasource schema for partially cold datasource includes columns from cold segments. Refer to partially_cold datasource in the attached screen shot with 2 segments. The hot segment has columns c3, c4, c5 and cold segment has columns c1, c2, c3.

Screenshot 2024-08-09 at 5 40 05 PM

  • Schema for new cold segments created after enabling the CentralizedDatasourceSchema feature is included in datasource schema.
  • Entirely cold datasources which existed before enabling CentralizedDatasourceSchema will not be queryable until at least 1 segment is loaded after enabling the feature. The datasource remains queryable even if the segment is unloaded.

Release Notes

Added followings metrics to monitor cold schema refresh process.

metadatacache/cold/segment/count -> Number of cold segment per datasource. 
metadatacache/cold/refresh/count -> Number of cold segments with cached schema per datasource. 
metadatacache/cold/process/time -> Time taken in milliseconds to process cold segments. 

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

ServiceMetricEvent.Builder metricBuilder =
new ServiceMetricEvent.Builder().setDimension(DruidMetrics.DATASOURCE, dataSourceName);

emitter.emit(metricBuilder.setMetric("metadatacache/cold/segment/count", coldSegments));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use a prefix for metadatacache/cold/segment/

Comment thread docs/operations/metrics.md Outdated
|`metadatacache/temporaryMetadataQueryResults/count`|Number of segments for which schema was fetched by executing segment metadata query.||Eventually it should be 0.|
|`metadatacache/temporaryPublishedMetadataQueryResults/count`|Number of segments for which schema is cached after back filling in the database.||This value gets reset after each database poll. Eventually it should be 0.|
|`metadatacache/cold/segment/count`|Number of cold segments.|`dataSource`||
|`metadatacache/cold/refresh/count`|Number of cold segments with cached schema.|`dataSource`||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cold segment" is not a term that exists anywhere outside of CoordinatorSegmentMetadataCache.
It would be better to use a term that is more easily relatable.

Also, the metadatacache metrics seems all over the place with their prefixes. It is unavoidable for metrics across multiple features to not have a consistent naming scheme. But metrics within the same feature should be consistent. Since this feature is still nascent, I would advise we revisit all the metric names for this feature and categorize them nicely with proper prefixes. (not necessarily in this PR)

cc: @cryptoe

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to use a term that is more easily relatable.

I don't find much references for such segments, do you have any suggestions?

Since this feature is still nascent, I would advise we revisit all the metric names for this feature and categorize them nicely with proper prefixes.

Are you suggesting to use a different prefix than metadatacache? How about cds?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes lets change the wording from cold to deepstorageonly.

@cryptoe cryptoe merged commit f67ff92 into apache:master Aug 13, 2024
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants