Commit 518b446
committed
pkg/clusterconditions/promql: Reduce MinBetweenMatches to 1s
530a509 (pkg/cvo/availableupdates: Prioritize conditional risks for
largest target version, 2023-03-06, #909) prioritized the order in
which risks were evaluated. But we were still waiting 10 minutes
between different PromQL evaluations while evaluating conditional
update risks. The original 10m requirement is from the enhancement
[1], and was implemented in ca186ed (pkg/clusterconditions/cache:
Add a cache wrapper for client-side throttling, 2021-11-10, #663).
But discussing with Lala, Scott, and Ben, we feel like the addressing
the demonstrated user experience need of low-latency risk evaluation
[2] is worth reducing the throttling to 1s per expression evaluation.
We still have MinForCondition set to an hour, so with this commit, a
cluster-version operator evaluating three risks will move from a
timeline like:
1. 0s, hear about risks that depend on PromQL A, B, and C. Evaluate A for the first time.
2. 10m, evaluate B for the first time (MinBetweenMatches after 1).
3. 20m, evaluate C for the first time (MinBetweenMatches after 2).
4. 1h, evaluate A again (MinForCondition after 1, also well past MinBetweenMatches after 3).
5. 1h10m, evaluate B again (MinForCondition after 2 and MinBetweenMatches after 4).
6. 1h20m, evaluate C again (MinForCondition after 3 and MinBetweenMatches after 5).
7. 2h, evaluate A again (MinForCondition after 4, also well past MinBetweenMatches after 6).
8. 2h10m, evaluate B again (MinForCondition after 5 and MinBetweenMatches after 7).
9. 2h20m, evaluate C again (MinForCondition after 6 and MinBetweenMatches after 8).
to a timeline like:
1. 0s, hear about risks that depend on PromQL A, B, and C. Evaluate A for the first time.
2. 1s, evaluate B for the first time (MinBetweenMatches after 1).
3. 2s, evaluate C for the first time (MinBetweenMatches after 2).
4. 1h, evaluate A again (MinForCondition after 1, also well past MinBetweenMatches after 3).
5. 1h1s, evaluate B again (MinForCondition after 2 and MinBetweenMatches after 4).
6. 1h2s, evaluate C again (MinForCondition after 3 and MinBetweenMatches after 5).
7. 2h, evaluate A again (MinForCondition after 4, also well past MinBetweenMatches after 6).
8. 2h1s, evaluate B again (MinForCondition after 5 and MinBetweenMatches after 7).
9. 2h2s, evaluate C again (MinForCondition after 6 and MinBetweenMatches after 8).
We could deliver faster cache warming while preserving spaced out
refresh evaluation by splitting MinBetweenMatches into a 1s
MinBetweenMatchesInitial and 10m MinBetweenMatchesWhenCached, which
would produce timelines like:
1. 0s, hear about risks that depend on PromQL A, B, and C. Evaluate A for the first time.
2. 1s, evaluate B for the first time (MinBetweenMatchesInitial after 1).
3. 2s, evaluate C for the first time (MinBetweenMatchesInitial after 2).
4. 1h, evaluate A again (MinForCondition after 1, also well past MinBetweenMatchesWhenCached after 3).
5. 1h10m, evaluate B again (MinForCondition after 2 and MinBetweenMatchesWhenCached after 4).
6. 1h20m, evaluate C again (MinForCondition after 3 and MinBetweenMatchesWhenCached after 5).
7. 2h, evaluate A again (MinForCondition after 4, also well past MinBetweenMatchesWhenCached after 6).
8. 2h10m, evaluate B again (MinForCondition after 5 and MinBetweenMatchesWhenCached after 7).
9. 2h20m, evaluate C again (MinForCondition after 6 and MinBetweenMatchesWhenCached after 8).
but again discussing with Lala, Scott, and Ben, the code complexity to
deliver that distinction does not seem to be worth thet protection it
delivers to the PromQL engine. And really, PromQL engines concerned
about load should harden themselves, including via Retry-After [3]
that allow clients to back off gracefully when the service needs that,
instead of relying on clients to guess about the load the service
could handle and back off without insight into actual server capacity.
[1]: https://github.com/openshift/enhancements/blame/158111ce156aac7fa6063a47c00e129c13033aec/enhancements/update/targeted-update-edge-blocking.md#L323-L325
[2]: https://issues.redhat.com/browse/OCPBUGS-19512
[3]: https://www.rfc-editor.org/rfc/rfc9110#name-retry-after1 parent 965bfb2 commit 518b446
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
| 65 | + | |
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
| |||
0 commit comments