USHIFT-1085: feat: initial implementation of storage migration#1956
USHIFT-1085: feat: initial implementation of storage migration#1956eggfoobar wants to merge 8 commits intoopenshift:mainfrom
Conversation
|
@eggfoobar: This pull request references USHIFT-1085 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: eggfoobar The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
b556240 to
c23955d
Compare
|
What do you think about changing the order: first merging "partial start" (etcd+KAS) in places where migration would happen, and then adding code from this PR so migrator could already be executed? |
There was a problem hiding this comment.
I'm wondering if, at this point, we shouldn't just call off the whole thing. We care for "all or nothing".
Do you foresee another component that would handle failure in results?
There was a problem hiding this comment.
The failure state is something we need to think in terms of our resources and the customers, typically if it fails it means theres a compatibility error with the resource versions, we should catch that in our testing for our CRs, but if the user has applied different CRDs that fail migration, they should be notified to fix them manually. I was thinking of this function call giving a migrator failure (i.e. can't reach server can't list resources) and a resource migration error, that would be recoverable.
There was a problem hiding this comment.
Data errors are recoverable, and it may make sense to collect all of them unless it makes the logic significantly more complicated.
There was a problem hiding this comment.
Not sure if we need that. Is there a way to log that some resource was migration from version A to B? We could log that in "real time", not packing all results into a collection and then iterating to log the items
There was a problem hiding this comment.
Sure thing, we can do that. The primary reason it's here is just to give us raw access to write the information in any format we want if we need to store the results somewhere easily.
There was a problem hiding this comment.
Is the pre-run phase logging to the standard MicroShift log, or do we see these errors in the greenboot health check log? It would be nice if the greenboot log could at least report "there was a data migration error, check the MicroShift logs for details".
There was a problem hiding this comment.
It's currently just using klog, I'm not sure how we send logs to greenboot but I love the idea. We should make it do that
There was a problem hiding this comment.
The klog output is going to go to stdout/stderr. I guess that's going to the systemd unit where pre-run is running, rather than a greenboot-specific log. Can we put these messages (and any others) somewhere for our health-check script to collect and report?
There was a problem hiding this comment.
Sure, I added some files here to use in greenboot, https://github.com/openshift/microshift/pull/1956/files#diff-e78e17834c76663b756393d04ebe7aceb8ea4059d44b7d2ae3ee4e9999aead6eR23-R24
There was a problem hiding this comment.
Data errors are recoverable, and it may make sense to collect all of them unless it makes the logic significantly more complicated.
There was a problem hiding this comment.
This seems like a good opportunity to use a channel to communicate between the goroutines, instead managing a lock.
There was a problem hiding this comment.
Is the pre-run phase logging to the standard MicroShift log, or do we see these errors in the greenboot health check log? It would be nice if the greenboot log could at least report "there was a data migration error, check the MicroShift logs for details".
1ccbbd9 to
e55e400
Compare
Signed-off-by: ehila <ehila@redhat.com>
Signed-off-by: ehila <ehila@redhat.com>
removing for now to remove complexity, no clear performance gain is present should revisit if performance issues crop up Signed-off-by: ehila <ehila@redhat.com>
Signed-off-by: ehila <ehila@redhat.com>
Signed-off-by: ehila <ehila@redhat.com>
updated gather logic to be more inline with trigger controller in kube storage migrator Signed-off-by: ehila <ehila@redhat.com>
put in helper logic to write log and status to backup folder for uvalidation Signed-off-by: ehila <ehila@redhat.com>
e55e400 to
f999ea3
Compare
Signed-off-by: ehila <ehila@redhat.com>
|
@eggfoobar: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Naming the variable the same as the package caught my eye here. If there's another reason to update the PR, you could consider renaming the variable to something like dynamicClient.
There was a problem hiding this comment.
What's the extra level of indirection for here?
|
/close Closing PR in favor of other approaches. In this PR we were trying to implement this feature with out the cluster being fully up, upon further investigation, we will need the cluster up in order to fully support CRDs and their migration webhooks. |
|
@eggfoobar: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Initial implementation for storage migrator
This currently only contains the code for the migrator itself. I tested a few ways of running this for performance, we're looking at around 2.8 seconds for a migration to run.
The calling code isn't present, I'll open up a new PR for that logic.
/assign @pmtk