-
Notifications
You must be signed in to change notification settings - Fork 48
RavenDB index health monitoring #2478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/ServiceControl.Infrastructure.RavenDB/RavenHealthReporter.cs
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
f95423c to
472681f
Compare
|
I am hesitant about waiting for non stale indexes at start or throwing until we are sure that there are no customer who do have stale indexes and working ServiceControl. I would fear that we are going to blow up perfectly fine instance of ServiceControl. |
A perfectly fine instance wouldn't have a lot of work to get non-stale indexes at start. At most a few seconds if the instance was underload when it gracefully exited. This guards again the problem that when the SC instance exits ungracefully (kill, crash, etc.) and has index issues where indexes get rebuild by RavenDB at start start will be delayed until that point. |
|
@ramonsmits one of the arguments for not delaying the start was to prevent the situation when our clients are already in the situation when the indexes are failing or significantly lagging behind. That said these clients will first face the delay when upgrading to the new minor of the Service Control. I would assume that this is a moment when they actually have sometimes put set aside to have a closer look at this. Secondly, if we put the bare minimum alerting on the lagging indexes they should be covered on an ongoing basis. Finally, I would put in place a setting flag that would enable skipping the checks when set - to make sure we can unblock any client that runs into problems on production. |
|
@tmasternak Then lets only keep the reporter? |
|
As for me, I like the idea with a flag(-s)
or vice versa doesn't matter at all - but we add a possibility. |
@ramonsmits and remove the startup delay? |
… and a custom check that reports index lag issues.
…on-stale at startup.
7aae330 to
f973986
Compare
|
@tmasternak @SzymonPobiega I've removed the wait for indexes to become non-stale at start. Still throwing on index errors. |
src/ServiceControl.Audit.UnitTests/ApprovalFiles/APIApprovals.CustomCheckDetails.approved.txt
Outdated
Show resolved
Hide resolved
mikeminutillo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I love this! Great job.
src/ServiceControl.Audit.UnitTests/ApprovalFiles/APIApprovals.CustomCheckDetails.approved.txt
Outdated
Show resolved
Hide resolved
src/ServiceControl.Audit/Infrastructure/CheckRavenDBIndexLag.cs
Outdated
Show resolved
Hide resolved
src/ServiceControl.Audit/Infrastructure/CheckRavenDBIndexLag.cs
Outdated
Show resolved
Hide resolved
src/ServiceControl.UnitTests/ApprovalFiles/APIApprovals.CustomCheckDetails.approved.txt
Outdated
Show resolved
Hide resolved
|
@mikeminutillo @danielmarbach thanks for your great feedback. I've applied some refactorings. The Index error check at start is removed. Only reporting remains. I'll cherry pick the index error check on startup into a seperate PR. |
Uh oh!
There was an error while loading. Please reload this page.