Skip to content

Create a mechanism for backup/restore of ${PROJECTS_ROOT} from one OpenShift cluster to another. #23570

@cgruver

Description

@cgruver

Is your enhancement related to a problem? Please describe

In the even of a cluster outage, a significant number of developers will be unable to work.

What is needed is a mechanism to recover a workspace back to a known good state, including any uncommitted changes to the code base or other work.

Describe the solution you'd like

I am building a simple prototype which uses a FROM Scratch container image to store the state of a workspace.

https://github.com/cgruver/workspace-backup-prototype

Prototype Backup -

  1. A CronJob runs in the Dev Spaces namespace which runs every hour.
  2. The CronJob looks for dev workspaces which were stopped within the last hour and are currently not running.
  3. The CronJob creates a Job in the user's namespace which uses Buildah to create a container image with the contents of /projects from the workspace PVC.
  4. The container image is pushed to an external image registry.

Prototype Restore flow -

  1. The user logs into a secondary OpenShift cluster that has Dev Spaces installed.
  2. The user creates a new workspace from the Git URL of the workspace that needs to be restored.
  3. The user indicates that they wish for the workspace to be restored from a backup. (Right now that is a manual flow using modifications to the Devfile to inject an init container). Desired flow is for a selection in the dashboard to request restore.)
  4. The workspace is created via the normal flow except that an init-container is run after PVC creation that pulls the backup image and copies the contents to ${PROJECTS_ROOT} before starting the workspace.

I am currently working on extending the prototype to use the internal registry of the secondary OCP cluster in order to manage RBAC on the container images and restrict access to the user who created the original workspace.

Describe alternatives you've considered

PVC snapshots - rejected because of the complexity of managing restore across clusters.

DevWorspace mirroring - rejected because of the complexity of synching data and Custom Resources across clusters.

Additional context

No response

Metadata

Metadata

Assignees

Labels

area/che-operatorIssues and PRs related to Eclipse Che Kubernetes Operatorarea/dashboardkind/enhancementA feature request - must adhere to the feature request template.severity/P1Has a major impact to usage or development of the system.

Projects

Status

🚧 In Progress

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions