Skip to content

Conversation

@mlsorensen
Copy link
Contributor

@mlsorensen mlsorensen commented Aug 30, 2023

Description

This PR allows the PowerFlex plugin to gracefully handle the case where volume has already been removed from the backend PowerFlex storage and a user attempts to expunge volume.

Without this, attempting to also clean up in cloudstack results in volumes that are stuck in Destroy state, unless an admin manually edits the database. If the volume is missing on the back end and admins want to recover, they can still attempt to fix this in some way and the record would be preserved in CloudStack as long as a volume expunge is not attempted.

Failed to expunge the volume Vol[37950|name=testvol|vm=null|DATADISK] in Primary data store : Unable to delete PowerFlex volume: xxxxx:xxxxx due to API failed due to: {"message":"Could not find the volume","httpStatusCode":500,"errorCode":79}

We catch this error case and match the error message to determine if the volume is already gone. There may be some document for handling the error codes more directly, but currently the PowerFlex client doesn't handle errors in this way.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Copy link
Member

@yadvr yadvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM didn't test it though

@yadvr
Copy link
Member

yadvr commented Aug 30, 2023

@blueorangutan package

@blueorangutan
Copy link

@rohityadavcloud a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link

codecov bot commented Aug 30, 2023

Codecov Report

Merging #7924 (42acba9) into 4.18 (439d70f) will decrease coverage by 0.01%.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##               4.18    #7924      +/-   ##
============================================
- Coverage     13.06%   13.06%   -0.01%     
  Complexity     9093     9093              
============================================
  Files          2720     2720              
  Lines        257431   257437       +6     
  Branches      40141    40142       +1     
============================================
  Hits          33622    33622              
- Misses       219582   219588       +6     
  Partials       4227     4227              
Files Changed Coverage Δ
...age/datastore/client/ScaleIOGatewayClientImpl.java 0.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 6943

Copy link
Member

@harikrishna-patnala harikrishna-patnala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yadvr
Copy link
Member

yadvr commented Aug 31, 2023

@blueorangutan test

@blueorangutan
Copy link

@rohityadavcloud a [SF] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

Copy link
Member

@harikrishna-patnala harikrishna-patnala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quickly verified the issue and the fix. Volume delete with expunge completed gracefully.

2023-08-31 06:19:26,717 WARN [o.a.c.s.d.c.ScaleIOGatewayClientImpl] (API-Job-Executor-1:ctx-c77978d8 job-154 ctx-2c92ed87) (logid:2628581a) API says deleting volume 4ace104b00000005 does not exist, handling gracefully

return removeVolumeStatus;
}
} catch (Exception ex) {
if (ex instanceof ServerApiException && ex.getMessage().contains("Could not find the volume")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depending on the textual contents of the exception message seems a bit unrelyable and maintenance sensitive, is there no return code that can be queried? (excuse my lack of scaleIO/PowerFlex knowledge) (and clgtm otherwise)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanHoogland Marcus mentioned this in the PR description
"We catch this error case and match the error message to determine if the volume is already gone. There may be some document for handling the error codes more directly, but currently the PowerFlex client doesn't handle errors in this way."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, only reviewed the code :(

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

This will be merged when trillian tests finish.

@blueorangutan
Copy link

[SF] Trillian test result (tid-7610)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 45706 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr7924-t7610-kvm-centos7.zip
Smoke tests completed. 107 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_upgrade_kubernetes_cluster Failure 664.58 test_kubernetes_clusters.py

@weizhouapache weizhouapache merged commit 89e0a4c into apache:4.18 Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants