Skip to content

Introduce per-HealthCheckRegistration parameters#42646

Merged
halter73 merged 93 commits into
dotnet:mainfrom
francedot:healthcheck-periodicity
Jul 28, 2023
Merged

Introduce per-HealthCheckRegistration parameters#42646
halter73 merged 93 commits into
dotnet:mainfrom
francedot:healthcheck-periodicity

Conversation

@francedot
Copy link
Copy Markdown
Contributor

@francedot francedot commented Jul 10, 2022

Introduce per-HealthCheckRegistration parameters

  • You've read the Contributor Guide and Code of Conduct.
  • You've included unit or integration tests for your change, where applicable.
  • You've included inline docs for your change, where applicable.
  • There's an open issue for the PR that you are making. If you'd like to propose a new feature or change, please open an issue to discuss the change or find an existing issue.

Change the delay and periodicity on a per-Healthcheck basis. Currently all health checks share the same delay and period.

Description

In the current implementation, delay and periodicity of the healthchecks are set at the HealthCheckPublisherOptions configuration level and shared across all HealthCheckRegistration.

This PR introduces per-HealthCheck delay, period (namely Individual Healthchecks) by extending the HealthCheckRegistration and HealthCheckPublisherHostedService`.

If no Delay or Period is specified during the IHealthChecksBuilder.Add(HealthCheckRegistration registration), default values from the HealthCheckPublisherOptions will be used instead.

The HealthCheck API layer will change for:

HealthCheckRegistration.cs

namespace Microsoft.Extensions.Diagnostics.HealthChecks;

public sealed class HealthCheckRegistration
{
     public HealthCheckRegistration(string name, IHealthCheck instance, HealthStatus? failureStatus, IEnumerable<string>? tags, TimeSpan? timeout);
     public HealthCheckRegistration(string name, Func<IServiceProvider, IHealthCheck> factory, HealthStatus? failureStatus, IEnumerable<string>? tags, TimeSpan? timeout);

+    public TimeSpan? Delay { get; set; }
+    public TimeSpan? Period { get; set; }
}

Here is an example on how to use the modified HealthChecksBuilder:

builder.Services.AddHealthChecks()
   .Add(new HealthCheckRegistration(
               name: "SampleHealthCheck1",
               instance: new DelegateHealthCheck(_ => Task.FromResult(HealthCheckResult.Healthy(HealthyMessage))),
               failureStatus: null,
               tags: null,
               timeout: default)
          {
               Delay = TimeSpan.FromSeconds(40),
               Period = TimeSpan.FromSeconds(30)
          });

Fixes #42645

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jul 10, 2022
@ghost
Copy link
Copy Markdown

ghost commented Jul 10, 2022

Thanks for your PR, @francedot. Someone from the team will get assigned to your PR shortly and we'll get it reviewed.

@francedot francedot marked this pull request as ready for review July 11, 2022 14:13
@francedot francedot requested a review from Pilchie as a code owner July 11, 2022 14:13
@mkArtakMSFT mkArtakMSFT added the old-area-web-frameworks-do-not-use *DEPRECATED* This label is deprecated in favor of the area-mvc and area-minimal labels label Jul 11, 2022
@rafikiassumani-msft rafikiassumani-msft removed their assignment Jul 11, 2022
@rafikiassumani-msft rafikiassumani-msft requested a review from a team July 11, 2022 16:32
@halter73 halter73 requested a review from Tratcher July 11, 2022 18:05
@halter73
Copy link
Copy Markdown
Member

The API added here doesn't exactly match the proposed API in #42645. Let's line these up and get it approved before merging.

@francedot
Copy link
Copy Markdown
Contributor Author

francedot commented Jul 12, 2022

Proposed API change lined up for AddCheck, AddAsyncCheck, AddCheck<T>, AddTypeActivatedCheck<T> in HealthChecksBuilderDelegateExtensions.cs and HealthChecksBuilderAddCheckExtensions.cs

Comment thread src/HealthChecks/HealthChecks/src/HealthCheckPublisherHostedService.cs Outdated
Comment thread src/HealthChecks/HealthChecks/src/PublicAPI.Shipped.txt Outdated
@francedot francedot marked this pull request as draft July 13, 2022 22:41
Comment thread src/HealthChecks/HealthChecks/src/HealthCheckPublisherHostedService.cs Outdated
Comment thread src/HealthChecks/HealthChecks/src/DefaultHealthCheckService.cs Outdated
Copy link
Copy Markdown
Member

@halter73 halter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. My only remaining comments are pretty minor test-related stuff.

Comment thread src/HealthChecks/HealthChecks/src/HealthCheckPublisherHostedService.cs Outdated
Comment thread src/HealthChecks/HealthChecks/test/HealthCheckPublisherHostedServiceTest.cs Outdated
Comment thread src/HealthChecks/HealthChecks/test/HealthCheckPublisherHostedServiceTest.cs Outdated
Comment thread src/HealthChecks/HealthChecks/test/HealthCheckPublisherHostedServiceTest.cs Outdated
name: "CheckDelay9Period5",
instance: new DelegateHealthCheck(_ =>
{
unblockDelayedCheck.TrySetResult(null); // Unblock last delayed check
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the test takes at least 9 seconds? 😨

Ideally, we'd use a mockable timer for this, but since that'd be a lot of work that wasn't done before either, I think we can settle for making it reasonably fast.

Can we divide all of the delays by a factor of 10? Same for both tests. That should shave 16 seconds of these tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@halter73 - I have updated the tests to block on an HC with delay 2s. With delays / 10 I am ending up with non-deterministic tests (at least locally).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was it failing with all of the delays divided by a factor of 10? I tried it on my machine before suggesting it, and it seemed to consistently pass. I didn't change any of the periods.

Maybe my machine is just faster or something, but I feel like we could reduce all the delays by a factor of 10 and then just delay an extra second before checking the report entries, it should pass almost always. The periods are sufficiently large that we shouldn't get any repeats either.

I would love for these to be really deterministic though. So it'd even pass given a starved scheduler and whatnot. If you're feeling ambitious, you could kick the tires on the new TimeProvider API. It was basically made for exactly this use case, so you wouldn't have to go and make your own custom abstraction. It was just merged 4 days ago, so it's not available quite yet in the aspnetcore repo, but it should arrive in the next SDK update which usually happens Monday. It's already available in the latest installer.

We should also verify exactly what report entries were recorded and which weren't especially in the predicate tests. It's not enough to do a couple contains checks when part of what we need to verify is we're filtering health checks out.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're tight on time, and using TimeProvider will also involve our first reference to https://www.nuget.org/packages/Microsoft.Bcl.TimeProvider as a package, I filed #49715 as a follow up. These kind of target-specific dependencies have caused issues in the recent past and we're trying to reduce risk.

HealthChecks previously used Timer and was not updated to use TimeProvider like auth or Kestrel, so this is not a regression. @Tratcher

Comment thread src/HealthChecks/HealthChecks/test/HealthCheckPublisherHostedServiceTest.cs Outdated

Assert.True(entries.Count() == 3);
Assert.Equal(
new[] { "CheckDelay2Period18", "CheckDelay7Period11", "CheckDelay9Period5" },
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also shorten the delays by a factor of 10 in this test and check the entries after the finally when we know the timers are stopped.

Comment thread src/HealthChecks/HealthChecks/test/HealthCheckPublisherHostedServiceTest.cs Outdated
Comment thread src/HealthChecks/HealthChecks/test/HealthCheckPublisherHostedServiceTest.cs Outdated
@halter73 halter73 removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Apr 1, 2023
@ghost
Copy link
Copy Markdown

ghost commented Apr 8, 2023

Looks like this PR hasn't been active for some time and the codebase could have been changed in the meantime.
To make sure no breaking changes are introduced, please leave an /azp run comment here to rerun the CI pipeline and confirm success before merging the change.

@ghost ghost added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Apr 8, 2023
@danmoseley danmoseley added area-healthchecks Includes: Healthchecks (some bugs also in Extensions repo) and removed old-area-web-frameworks-do-not-use *DEPRECATED* This label is deprecated in favor of the area-mvc and area-minimal labels labels Jul 7, 2023
@francedot
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Commenter does not have sufficient privileges for PR 42646 in repo dotnet/aspnetcore

@Tratcher
Copy link
Copy Markdown
Member

/azp run

@ghost ghost removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Jul 24, 2023
@azure-pipelines
Copy link
Copy Markdown

No commit pushedDate could be found for PR 42646 in repo dotnet/aspnetcore

Copy link
Copy Markdown
Member

@halter73 halter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@francedot Thank you for continuing to push on this. This looks great! I filed a follow up issue to use TimeProvider. Hopefully it's resilient to potentially slow CI machines 🤞 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-healthchecks Includes: Healthchecks (some bugs also in Extensions repo) community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce per-HealthCheck Delay and Period

9 participants