Skip to content

Add paramter to loadstatus API to compute underdeplication against cluster view#11056

Merged
zachjsh merged 9 commits intoapache:masterfrom
zachjsh:load-status-use-cluster-view
Apr 5, 2021
Merged

Add paramter to loadstatus API to compute underdeplication against cluster view#11056
zachjsh merged 9 commits intoapache:masterfrom
zachjsh:load-status-use-cluster-view

Conversation

@zachjsh
Copy link
Copy Markdown
Contributor

@zachjsh zachjsh commented Mar 31, 2021

Description

This change adds a query parameter computeUsingClusterView to loadstatus apis
that if specified have the coordinator compute undereplication for segments based
on the number of services available within cluster that the segment can be replicated
on, instead of the configured replication count configured in load rule. This query
parameter is only available when also requesting full loadstatus. A default load rule is
created in all clusters that specified that all segments should be replicated 2 times. As
replicas are forced to be on separate nodes in the cluster, this causes the loadstatus api
to report that there are under-replicated segments when there is only 1 data server in the
cluster. In this case, calling loadstatus api without this new query parameter will always
result in a response indicating under-replication of segments

  • loadstatus api returns 503 if the cluster object isnt yet initialized. This should only happen
    during the period of time before the coordinator run loop has executed at least once, as this
    object is set during every instantiation of the coordinator run loop.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

…uster view

This change adds a query parameter `computeUsingClusterView` to loadstatus apis
that if specified have the coordinator compute undereplication for segments based
on the number of services available within cluster that the segment can be replicated
on, instead of the configured replication count configured in load rule. A default
load rule is created in all clusters that specified that all segments should be
replicated 2 times. As replicas are forced to be on separate nodes in the cluster,
this causes the loadstatus api to report that there are under-replicated segments
when there is only 1 data server in the cluster. In this case, calling loadstatus
api without this new query parameter will always result in a response indicating
under-replication of segments
@zachjsh zachjsh requested review from clintropolis and jon-wei March 31, 2021 18:47
}

@Override
public void updateUnderReplicatedWithClusterView(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an implementation for BroadcastDistributionRule as well? I think the implementation there could just call updateUnderReplicated and ignore the cluster parameter, since it's already looking at the available servers in the cluster.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

break;
}

if (computeUsingClusterView && (rule instanceof LoadRule)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (rule instanceof LoadRule) should be removed, since BroadcastDistributionRule can also load segments (it also returns true for canLoadSegments(), which is already checked before this)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed accordingly. Before though, for broadcast rule, it was falling through to else branch and effectively doing same thing. Thought it was weird to add implementation for Broadcast rule that didn't use the parameter, but I guess not a big deal.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks, I thought it would be cleaner to have the new method be supported by all segment-loading rules, vs. the behavior specific to just LoadRule

{
DataSourcesResource dataSourcesResource = new DataSourcesResource(inventoryView, segmentsMetadataManager, null, null, null, null);
Response response = dataSourcesResource.getDatasourceLoadstatus("datasource1", null, null, null, null);
Response response = dataSourcesResource.getDatasourceLoadstatus("datasource1", null, null, null, null, null);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests that use the new parameter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test.

@zachjsh zachjsh requested a review from jon-wei April 1, 2021 21:36
@zachjsh
Copy link
Copy Markdown
Contributor Author

zachjsh commented Apr 1, 2021

@techdocsmith can you take a look at the change to docs.

Copy link
Copy Markdown
Contributor

@techdocsmith techdocsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That last sentence was complicated. I am not sure if I parsed it out correctly, but mabye my suggestions can help lead to some clarification.

Comment thread docs/operations/api-reference.md Outdated
Comment thread docs/operations/api-reference.md Outdated
Comment thread docs/operations/api-reference.md Outdated
Comment thread docs/operations/api-reference.md Outdated
Comment thread docs/operations/api-reference.md Outdated
Comment thread docs/operations/api-reference.md Outdated
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@techdocsmith techdocsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some edits to the new bit and found some typos in the previous bit. I am still not 100% sure about this bit:

Or the number of replicas for the segment in each tier where it is configured to be replicated equals the available nodes of a service type that are currently allowed to load the segment in the tier.

I feel like it needs to be simplified or broken down somehow.

Comment thread docs/operations/api-reference.md Outdated
Comment thread docs/operations/api-reference.md Outdated
Comment thread docs/operations/api-reference.md Outdated
Comment thread docs/operations/api-reference.md Outdated
Comment thread docs/operations/api-reference.md Outdated
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
@zachjsh zachjsh closed this Apr 2, 2021
@zachjsh zachjsh reopened this Apr 2, 2021
@zachjsh zachjsh merged commit 8cf1e83 into apache:master Apr 5, 2021
@zachjsh zachjsh deleted the load-status-use-cluster-view branch April 5, 2021 04:02
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants