Skip to content

USHIFT-1085: feat: initial implementation of storage migration#1956

Closed
eggfoobar wants to merge 8 commits intoopenshift:mainfrom
eggfoobar:storage-migration
Closed

USHIFT-1085: feat: initial implementation of storage migration#1956
eggfoobar wants to merge 8 commits intoopenshift:mainfrom
eggfoobar:storage-migration

Conversation

@eggfoobar
Copy link
Copy Markdown
Contributor

Initial implementation for storage migrator

This currently only contains the code for the migrator itself. I tested a few ways of running this for performance, we're looking at around 2.8 seconds for a migration to run.

The calling code isn't present, I'll open up a new PR for that logic.

/assign @pmtk

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 23, 2023
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Jun 23, 2023

@eggfoobar: This pull request references USHIFT-1085 which is a valid jira issue.

Details

In response to this:

Initial implementation for storage migrator

This currently only contains the code for the migrator itself. I tested a few ways of running this for performance, we're looking at around 2.8 seconds for a migration to run.

The calling code isn't present, I'll open up a new PR for that logic.

/assign @pmtk

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 23, 2023
@openshift-ci openshift-ci Bot requested review from jerpeter1 and pacevedom June 23, 2023 17:47
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 23, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eggfoobar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 23, 2023
@eggfoobar eggfoobar force-pushed the storage-migration branch 2 times, most recently from b556240 to c23955d Compare June 23, 2023 20:48
@pmtk
Copy link
Copy Markdown
Member

pmtk commented Jun 26, 2023

What do you think about changing the order: first merging "partial start" (etcd+KAS) in places where migration would happen, and then adding code from this PR so migrator could already be executed?

Comment thread pkg/admin/migration/migration.go Outdated
Comment thread pkg/admin/migration/migration.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if, at this point, we shouldn't just call off the whole thing. We care for "all or nothing".
Do you foresee another component that would handle failure in results?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure state is something we need to think in terms of our resources and the customers, typically if it fails it means theres a compatibility error with the resource versions, we should catch that in our testing for our CRs, but if the user has applied different CRDs that fail migration, they should be notified to fix them manually. I was thinking of this function call giving a migrator failure (i.e. can't reach server can't list resources) and a resource migration error, that would be recoverable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data errors are recoverable, and it may make sense to collect all of them unless it makes the logic significantly more complicated.

Comment thread pkg/admin/migration/migration.go Outdated
Comment thread pkg/admin/migration/migration.go Outdated
Comment thread pkg/admin/migration/migration.go Outdated
Comment thread pkg/admin/migration/migration.go Outdated
Comment thread pkg/admin/migration/migration.go Outdated
Comment thread pkg/admin/migration/migration.go Outdated
Comment thread pkg/admin/migration/types.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need that. Is there a way to log that some resource was migration from version A to B? We could log that in "real time", not packing all results into a collection and then iterating to log the items

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, we can do that. The primary reason it's here is just to give us raw access to write the information in any format we want if we need to store the results somewhere easily.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the pre-run phase logging to the standard MicroShift log, or do we see these errors in the greenboot health check log? It would be nice if the greenboot log could at least report "there was a data migration error, check the MicroShift logs for details".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently just using klog, I'm not sure how we send logs to greenboot but I love the idea. We should make it do that

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The klog output is going to go to stdout/stderr. I guess that's going to the systemd unit where pre-run is running, rather than a greenboot-specific log. Can we put these messages (and any others) somewhere for our health-check script to collect and report?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread pkg/admin/migration/types.go Outdated
Comment thread pkg/admin/migration/migration.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data errors are recoverable, and it may make sense to collect all of them unless it makes the logic significantly more complicated.

Comment thread pkg/admin/migration/migration.go Outdated
Comment thread pkg/admin/migration/migration.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good opportunity to use a channel to communicate between the goroutines, instead managing a lock.

Comment thread pkg/admin/migration/migration.go Outdated
Comment thread pkg/admin/migration/types.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the pre-run phase logging to the standard MicroShift log, or do we see these errors in the greenboot health check log? It would be nice if the greenboot log could at least report "there was a data migration error, check the MicroShift logs for details".

eggfoobar added 7 commits July 6, 2023 09:56
Signed-off-by: ehila <ehila@redhat.com>
Signed-off-by: ehila <ehila@redhat.com>
removing for now to remove complexity, no clear performance gain is present
should revisit if performance issues crop up

Signed-off-by: ehila <ehila@redhat.com>
Signed-off-by: ehila <ehila@redhat.com>
Signed-off-by: ehila <ehila@redhat.com>
updated gather logic to be more inline with trigger controller in kube storage migrator

Signed-off-by: ehila <ehila@redhat.com>
put in helper logic to write log and status to backup folder for uvalidation

Signed-off-by: ehila <ehila@redhat.com>
@eggfoobar eggfoobar force-pushed the storage-migration branch from e55e400 to f999ea3 Compare July 6, 2023 13:57
@eggfoobar eggfoobar changed the title [WIP] USHIFT-1085: feat: initial implementation of storage migration USHIFT-1085: feat: initial implementation of storage migration Jul 6, 2023
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 6, 2023
Signed-off-by: ehila <ehila@redhat.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jul 6, 2023

@eggfoobar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/microshift-e2e-arm 09a2556 link false /test microshift-e2e-arm
ci/prow/e2e-openshift-conformance-reduced-arm 09a2556 link false /test e2e-openshift-conformance-reduced-arm

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Comment thread pkg/admin/migration/migration.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming the variable the same as the package caught my eye here. If there's another reason to update the PR, you could consider renaming the variable to something like dynamicClient.

Comment thread pkg/admin/migration/migration.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the extra level of indirection for here?

Comment thread pkg/admin/migration/migration.go Outdated
@eggfoobar
Copy link
Copy Markdown
Contributor Author

/close

Closing PR in favor of other approaches. In this PR we were trying to implement this feature with out the cluster being fully up, upon further investigation, we will need the cluster up in order to fully support CRDs and their migration webhooks.

@openshift-ci openshift-ci Bot closed this Jul 8, 2023
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jul 8, 2023

@eggfoobar: Closed this PR.

Details

In response to this:

/close

Closing PR in favor of other approaches. In this PR we were trying to implement this feature with out the cluster being fully up, upon further investigation, we will need the cluster up in order to fully support CRDs and their migration webhooks.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants