-
Notifications
You must be signed in to change notification settings - Fork 1.3k
CLOUDSTACK-9538: FIX failure in Deleting Snapshot From Primary Storage RBD Storage if vm has been removed #1710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e RBD Storage if vm has been removed
|
Hi, @ustcweizhou do we need to take any more action or inform some people from ACS to take this PR to upcoming releases? |
|
@jburwell Could you please review this PR ? Thanks, @ozhanrk wants to make it in 4.9.1. |
|
@ustcweizhou @ozhanrk Reading the ticket and the patch, this behavior seems like a common request across storage implementations. Some users want snapshots to be automatically removed and others do not. Given the variability of use cases, I think choosing one approach over another would be inappropriate. It feels like we should step back and evaluate a global-setting with a VM-level override that specifies whether or not snapshots should be removed when a VM is destroyed. Does this reasoning make sense or am I misunderstanding the goal? Also, this feels like a minor enhancement rather than a defect fix. Therefore, I think we should move the target of the work to |
|
Hi @jburwell , this is a missing 3 line of code fix which was occured because of the fixes done in CLOUDSTACK-8302, CLOUDSTACK-9297. Those fixed bug's are already in 4.9.0 release so the codes which @ustcweizhou fixed are the required, remaining parts for the complete functional fix of CLOUDSTACK-9297 ticket. Also this fix only effects RBD users, other storage users does not effect. So if possible please lets include this fix for upcoming 4.9 lts release and if we like to add a new enhancement lets plan it for 4.10 or later upcoming. We are trying to fix this problem nearly from 4.8 release like 6-9 months or more. Currently we using this pr on our 4.9 production environment by compiling this code over 4.9.0 code, so we really need this fix on upcoming lts code. |
|
@ozhanrk I recommend combining the fixes into a single PR for review cohesion and to ensure that the merge maintains the stability of the release branches. You could either amend this PR with the necessary commits or create a new PR and close this one -- whichever approach is easier for you. Ultimately, the PR should have one commit per JIRA ticket fixed. |
|
Hi @jburwell , i am so sorry but i do not understand, what you mean by combining fixes under a single pr because this is a only single fix with a 4 lines of code change which only effects rbd storage users for primary storage. The previous tickets which mentioned above was both fixed and currently in 4.9.0.0 code. So this is the only remaining part of that fixes for rbd users. @ustcweizhou i am not a developer, if i am wrong could you explain our current status in more technical way, i am very new to ACS community, so maybe i could not express pr's current status :) |
|
@jburwell John, this is a small bug fix for issue: failed to delete snapshot on RBD if the original volume is removed. |
|
Hi @rhtyd @jburwell, |
|
@ustcweizhou @ozhanrk this feels like a relatively narrow case (RBD on KVM), and we are extremely late for 4.9.1.0. We have numerous "small" fixes that could be included that would further delay significant fixes that affect nearly all users. I am sorry, but this fix will have to wait until 4.9.2.0/4.10.1.0/4.11.0.0. |
|
@ozhanrk @ustcweizhou I asked a functionality question regarding this functionality to which I cannot find an answer. Namely, that this patch is deleting snapshots in a circumstance where not all users want them deleted. |
|
hi @jburwell this pr does not delete any kind of snapshots, this patch only prevents, solves a kind of timeout issue on code thats all, nothing more. This is a 4 lines of small fix where delete snapshot code could not find already deleted snapshots ids so it waits till timeout reached, with this patch it understands they are already deleted so it did not wait and continue the process. |
|
@blueorangutan package |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-191 |
|
@blueorangutan test |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-377)
|
|
Do we have LGTMs on this? |
|
Hi @kiwiflyer @nathanejohnson i saw that you are merged this fix and using it on your ENA CloudStack branch, when you have time could you check the code for possible LGTM's please. |
|
LGTM! We're running it in our labs. |
|
This code LGTM. |
|
Hi @rhtyd we have got the required LGTM's now. |
|
Thanks @ozhanrk I'll kick a final test round with xenserver and merge this. |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + xenserver-65sp1) has been kicked to run smoke tests |
|
@blueorangutan test |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-419)
|
|
Trillian test result (tid-418)
|
| SnapshotInfo snapshotOnPrimaryInfo = snapshotDataFactory.getSnapshot(snapshotId, DataStoreRole.Primary); | ||
| if (((PrimaryDataStoreImpl)snapshotOnPrimaryInfo.getDataStore()).getPoolType() == StoragePoolType.RBD) { | ||
| long volumeId = snapshotOnPrimary.getVolumeId(); | ||
| VolumeVO volumeVO = volumeDao.findById(volumeId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will throw NPE, if snapshot has no primary volume id. Can there be such a case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhtyd Hi Rohit, each snapshot should have a volume_id (at least on all our platforms).
Actually in engine/storage/snapshot/src/org/apache/cloudstack/storage/snapshot/SnapshotObject.java, the getVolumeId() also return long , not Long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ustcweizhou
|
@ustcweizhou @ozhanrk can you check snapshot and volume related errors in the last test runs above. |
|
There were some intermittent errors seen, I'll re-kick tests. |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-252 |
|
@blueorangutan test |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-438)
|
|
@blueorangutan package |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos6 ✔centos7 ✖debian. JID-262 |
|
Failure wrt debian pkg was due to an env issue. |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-462)
|
|
@blueorangutan test centos7 xenserver-65sp1 |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + xenserver-65sp1) has been kicked to run smoke tests |
|
Trillian test result (tid-468)
|
|
Tests LGTM. |
CLOUDSTACK-9538: FIX failure in Deleting Snapshot From Primary Storage RBD Storage if vm has been removed * pr/1710: CLOUDSTACK-9538: FIX failure in Deleting Snapshot From Primary Storage RBD Storage if vm has been removed Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
No description provided.