Upgrade the fabric client to support newer versions of k8s#13804
Upgrade the fabric client to support newer versions of k8s#13804churromorales wants to merge 16 commits intoapache:masterfrom
Conversation
…ty to override peon monitors
| |`druid.indexer.runner.javaOptsArray`|`JsonArray`|java opts for the task.|`-Xmx1g`|No| | ||
| |`druid.indexer.runner.labels`|`JsonObject`|Additional labels you want to add to peon pod|`{}`|No| | ||
| |`druid.indexer.runner.annotations`|`JsonObject`|Additional annotations you want to add to peon pod|`{}`|No| | ||
| |`druid.indexer.runner.peonMonitors`|`JsonArray`|An override for the `druid.monitoring.monitors`, For the situation you have monitors setup, and do not want to inherit those from the overlord.|`[]`|No| |
There was a problem hiding this comment.
Nit: It would be useful to mention this property in the extension specific docs as well.
| import static org.mockito.Mockito.verify; | ||
| import static org.mockito.Mockito.when; | ||
|
|
||
| class LogWatchInputStreamTest |
Check notice
Code scanning / CodeQL
Unused classes and interfaces
| .waitUntilCondition( | ||
| x -> x != null && x.getStatus() != null && x.getStatus().getActive() == null, | ||
| x -> x != null && x.getStatus() != null && x.getStatus().getActive() == null | ||
| && (x.getStatus().getFailed() != null || x.getStatus().getSucceeded() !=null), |
There was a problem hiding this comment.
This was a tricky one. For some background
the overlord launches a task, waits for it to complete and then returns the status. Now in k8s < 1.25 it would just mean once the pod is not active, go grab the status.
Now the contact has changed a bit, because in 1.25 they introduced finalizers:
kubernetes/kubernetes#110948
And we noticed that when we ran tasks on a 1.25 k8s druid cluster they would complete fine but marked as failure.
We outputted the job status and noticed that the job was not active, but both success and failure were not set, but there was this field that had
uncountedTerminatedPods=UncountedTerminatedPods(failed=[], succeeded=[e916cbf9-467a-45f3-86a7-3767145d6384], additionalProperties={})
which from the docs:
UncountedTerminatedPods holds the UIDs of Pods that have terminated but the job controller hasn't yet accounted for in the status counters. The job controller creates pods with a finalizer. When a pod terminates (succeeded or failed), the controller does three steps to account for it in the job status: (1) Add the pod UID to the arrays in this field. (2) Remove the pod finalizer. (3) Remove the pod UID from the arrays while increasing the corresponding counter. This field is beta-level. The job controller only makes use of this field when the feature gate JobTrackingWithFinalizers is enabled (enabled by default). Old jobs might not be tracked using this field, in which case the field remains null.
So now what happens is the job goes from a state where it is not active, to having uncountedTerminatedPods to then having a status with success or failure. I will push up a one-line fix to make this work, but for those of you working with 1.25 version of k8s, I’m sure you will be affected as well.
Basically add another check to wait on,
Right now we wait for this:
// block until
job.getStatus() != null && job.getActive() == null
// then return
return job.getStatus().getSucceeded() != null
So the change has to become
// block until
job.getStatus() != null && job.getActive() == null && (job.getStatus().getFailed() != null || job.getStatus().getSucceeded() !=null)
// then return
return job.getStatus().getSucceeded() != null
This should keep things backwards compatible and working in all versions of k8s
There was a problem hiding this comment.
So for the older k8s version, we can expect that when job.getActive() is null, one of success or failure field must be set. Question - do we need to check for getActive() result at all? can we just block till the job either succeeded or failed?
There was a problem hiding this comment.
that could work, want me to change it? This definitely works above, I don't have access to many different k8s environments, and someone in the community helped me test the above on 1.25. I could definitely change this, but I unfortunately have no way of testing this out.
There was a problem hiding this comment.
Please make this change, we can test it out on a v1.25 cluster
There was a problem hiding this comment.
@nlippis I believe you can test by taking this commit and making that small change and apply it onto the 25 branch. I think if it works for you, ill resubmit this with the change.
|
@abhishekagarwal87 what do you think? This should bring the client up-to-date as well as fix things for those druid users running on k8s 1.25+ that were having issues. |
| .waitUntilCondition( | ||
| x -> x != null && x.getStatus() != null && x.getStatus().getActive() == null, | ||
| x -> x != null && x.getStatus() != null && x.getStatus().getActive() == null | ||
| && (x.getStatus().getFailed() != null || x.getStatus().getSucceeded() !=null), |
There was a problem hiding this comment.
So for the older k8s version, we can expect that when job.getActive() is null, one of success or failure field must be set. Question - do we need to check for getActive() result at all? can we just block till the job either succeeded or failed?
| { | ||
|
|
||
| @Test | ||
| void makingCodeCoverageHappy() |
There was a problem hiding this comment.
Hmm. This is a bit of a bummer. Is there no way to avoid this?
There was a problem hiding this comment.
So I had tests for a class, then i upgraded a client library and had to change some things. but the functions were all tested before. Now i would expect that if tests pass I would be good, since it was already tested before. But turns out if you have to make any changes to make the new library apis work, then code coverage complains. It is unfortunate here, because I didn't change any behavior.
There was a problem hiding this comment.
aren't there any tests covering that path?
There was a problem hiding this comment.
i added a getter to a class, the code coverage tool complained that i added new code without a test. So I added a test for the getter. That was new code, so i guess its okay to add a test if that's what druid wants. Although I feel tests like this are not really that useful and make the whole suite slow in general.
| import static org.junit.jupiter.api.Assertions.assertEquals; | ||
|
|
||
| // must have a kind / minikube cluster installed and the image pushed to your repository | ||
| @Disabled |
There was a problem hiding this comment.
We are using K3S now. do you still same issues?
There was a problem hiding this comment.
k3s? This was for anyone who had an existing k8s cluster, they could just run this test and point it there. The k8s integration tests are definitely not flexible enough for me to test mm-less without rewriting a lot of it.
There was a problem hiding this comment.
how was this running before?
There was a problem hiding this comment.
I had a kind cluster running locally, it was in the original patch. For anyone that wants to test with a local k8s cluster I left this test in the codebase. If you want me to remove it, I'll be happy to get rid of it. I find it useful for when I make changes, since I can't integration test this with the way current druid-it tests are setup.
ektravel
left a comment
There was a problem hiding this comment.
Left one suggestion for the doc.
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
|
@churromorales - can you address the build failures? |
techdocsmith
left a comment
There was a problem hiding this comment.
Small stylistic change for doc.
So I had tests for the |
jwitko
left a comment
There was a problem hiding this comment.
very much needed changes. thank you for making
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
…ests caused by a merge conflict
|
@abhishekagarwal87 you good with this now? I explained about the test coverage in an earlier comment. |
|
Do you also need to make this change? --- a/extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
+++ b/extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
@@ -180,7 +180,7 @@ public class KubernetesTaskRunner implements TaskLogStreamer, TaskRunner
completedPhase = monitorJob(peonPod, k8sTaskId);
} else {
Job job = existingJob.get();
- if (job.getStatus().getActive() == null) {
+ if (job.getStatus() != null && job.getStatus().getActive() == null && (job.getStatus().getFailed() != null || job.getStatus().getSucceeded() !=null)) {
if (job.getStatus().getSucceeded() != null) {
completedPhase = new JobResponse(job, PeonPhase.SUCCEEDED);
} else { |
I believe that is the crux of the change. I explain why here: #13804 (comment) Or let me know if I misunderstood you. |
|
@abhishekagarwal87 Are there any additional concerns with this PR apart from the code coverage failure? From the discussion it looks like there isn't really a way to make the coverage tool pass without adding non-meaningful tests. |
|
Or let me know if I misunderstood you.
Sorry I wasn't clear. There are two spots in the code that have that same success / failure logic, I only see one in this pr, after the job is monitored. But I also see one before the job is monitored just after it is spawned in KubernetesTaskRunner. If the logic is not necessary at that point, please feel free to ignore me. |
|
@a2l007 - Sorry if I am missing something but if CC is failing anyway, there is no need to add |
|
it is a good question. I think the way I coded it was not very clear, I apologize for that. Lets look at this case.... startup overlord, The conditional block you are referring to handles that case. It first does a check to see if the task is not active, then checks the state. If it is still running it does the Does that help make sense of how things are working? These are good questions, and I want to make sure the code is doing the correct thing. |
|
@churromorales - Thank you for resolving the conflicts. There are build failures. can you take a look? |
|
@abhishekagarwal87 I pushed up the fix, there were some conflicts due to something else getting merged before this. |
|
@abhishekagarwal87 The test failing are from the other PR that was merged, I don't have cycles to look and fix whatever is wrong and we honestly forked our druid so whether it goes upstream is not a big deal. I think the thing is the other PR is using all the old libraries, and it can't deserialize the pod template anymore. So if you want to update this feature to work with newer k8s versions, that will have to be sorted out at some point, or you can just close off this PR and worry about upgrades when the time comes. As it stands the mm-less wont work in newer versions of k8s so, I'll leave it up to imply what they want to do. |
|
I understand @churromorales. would you be open to giving write access on your branch to someone who wants to take this PR forward? |
|
closing since these changes were added to #14028 |
This PR contains 2 items.
We found that when we used the
TaskCountStatsMonitorin the overlord config, the peon tasks would not start because they inherit the monitors from the parent process, which used to be the Middle Manager (but that would never have that monitor originally). So now you can override this value with the following config:druid.indexer.runner.peonMonitors