Adds the option to auto-retry for downloading metadata from GlotPress#474
Adds the option to auto-retry for downloading metadata from GlotPress#474
Conversation
0d09afd to
e1736fb
Compare
e1736fb to
17ca0b7
Compare
|
As a side note just for your information as future directions (even tho this probably won't be implemented anytime soon given our other priorities): Of course in the meantime, the auto-retry mechanism you propose here is still useful to get a workaround in ASAP, so the PR is still useful. But just figured I'd link to those ideas we've been brainstorming to ultimately get rid of those 409 altogether FYI :) |
|
@AliSoftware Thank you for linking those. I was aware that this current action is likely to be deprecated, but I need a temporary solution to continue my progress in release management in CI. |
AliSoftware
left a comment
There was a problem hiding this comment.
Logic looks sound, but left some ideas; curious what you think.
| description: 'Whether to auto retry downloads after Too Many Requests error', | ||
| type: FastlaneCore::Boolean, | ||
| optional: true, | ||
| default_value: false), |
There was a problem hiding this comment.
tbh I wouldn't mind having this set to true, given how useful it'll be to have this feature (even for repos that are still doing manual releases and not using release-on-CI) 🙃
There was a problem hiding this comment.
I wouldn't mind it either, but that'd be somewhat of a breaking change. If others feel the same way about setting it to true, it's easy enough to change :)
There was a problem hiding this comment.
+1 for setting to true for the convenience it would provide.
I can see how it would change the default behavior and be seen as a breaking change. However, given we are the only consumers, I think our threshold for when to introduce breaking changes is relatively low. I'd also be fine with this being a minor version bump, as updating to this version would not "break" anything.
| module Helper | ||
| class MetadataDownloader | ||
| AUTO_RETRY_SLEEP_TIME = 20 | ||
| MAX_AUTO_RETRY_ATTEMPTS = 30 |
There was a problem hiding this comment.
I wonder if it wouldn't make more sense to have the auto_retry ConfigItem be an Integer that would allow to set that max-retry-attempts value?
That way we could use a value like 30 (!) for CI because we really need to be sure to end up in success, while we could use a value like 3 for repos still doing manual release, because it'd be nice not to have to UI.confirm the retry manually every time this happens just on the first or second try, but if it fails consistently more than 3 times we might want to wait longer when confirming manually.
There was a problem hiding this comment.
That's a cool idea. I could make that change if there is practical use for it for others or if you feel strongly that it is a better approach.
My understanding is that, this issue is mostly specific to WPAndroid & WPiOS and as you mentioned it's a temporary one. That's why I did this in the most straightforward way.
There was a problem hiding this comment.
FWIW, I don't feel strongly about it.
| # We got rate-limited, auto_retry or offer to try again with a prompt | ||
| if @auto_retry && @auto_retry_attempt_counter <= MAX_AUTO_RETRY_ATTEMPTS | ||
| UI.message("Received 429 for `#{locale}`. Auto retrying in #{AUTO_RETRY_SLEEP_TIME} seconds...") | ||
| sleep(AUTO_RETRY_SLEEP_TIME) |
There was a problem hiding this comment.
I wonder if we wouldn't benefit from a linear or exponential wait time between attempts instead of a fixed time?
e.g. sleep(10 + @auto_retry_attempt_counter*5) or something?
There was a problem hiding this comment.
I've considered this one, but in my experience, it's really not necessary. I think the server just checks the number of requests in a certain amount of time, but I could be wrong.
For what it's worth, I actually wanted to go with something shorter, but since it's in CI, it doesn't really matter if it takes 1 or 2 minutes longer. Funnily enough, I tried to create the 429 error today for quite some time and I couldn't get it even once.
Happy to apply the suggestion if you feel there is enough benefit.
|
Thank you for doing this @oguzkocer ! I think about auto-retry literally every time I run the lane, but never made the time for it. |
| @target_files = target_files | ||
| @auto_retry = auto_retry | ||
| @alternates = {} | ||
| @auto_retry_attempt_counter = 0 |
There was a problem hiding this comment.
It just happened to me that the downloader instance is reused at call site for each locale to be downloaded.
This means that if you only set the @auto_retry_attempt_counter in initialize, the counter will be shared across all usages of the downloader, and thus across all locales for which you call download(…) on.
Was this intentional to consider that MAX_AUTO_RETRY_ATTEMPTS be seen as "max 30 retries in total" as opposed to "max 30 retries per locale"? If not, I think we need to make sure @auto_retry_attempt_counter is reset to 0 every time the caller calls download(…) on a new locale.
For example, maybe you could move the current 3 LoC implementation of download to a private try_download method, then change download(…) implementation to now do @auto_retry_attempt_counter = 0 + call try_download(…)?
There was a problem hiding this comment.
This was intentional. The only goal I had with this attempt counter is so that we wouldn't get into an infinite loop. I don't think we'll ever hit this limit unless there is an issue with the server or CI in which case, I thought it'd be best to stop and let the release manager handle it either manually or by retrying once the issue is resolved.
What does it do?
This PR adds
auto_retryoption togp_downloadmetadataAction. This is an important feature for release management in CI because we can't interact with the process.The implementation is pretty basic. If we get
429: Too Many Requestserror, we wait for20seconds and we try again. If we auto retry30times, we'll stop auto retrying, but the manual retry will still work. When I manually retry, I typically do it in a few seconds, but in CI we have the luxury of waiting, so I randomly selected20seconds. For the max retries, I thought10minutes of total retrying would be a reasonable one. However, I am happy to change these numbers if you have any feedback.@mokagio You might especially be interested in this PR as WPiOS release manager.
To Test
I found testing these changes to be a bit tricky because we don't know whether we'll get
429from the server or not. So, I prepared a test branchtest/auto-retry-gp_downloadmetadatawhich switches the200and429status codes, so we can reproduce the scenario easily. I also changed theMAX_AUTO_RETRY_ATTEMPTSto3, so we can easily test that as well.test/auto-retry-gp_downloadmetadataWordPress-Android branch. This branch is already setup to use the correct branch for testing this PR and hasauto_retryenabled.bundle exec fastlane download_metadata_stringsReceived 429 forar. Auto retrying in 20 seconds...4 times. First is the original attempt, and then 3 auto retry attempts (It's very unlikely but you might get a different locale thanar)Retry downloadingarafter receiving 429 from the API? (y/n)nand then quickly doctrl+c. Otherwise you might get stuck in a loop 😅Checklist before requesting a review
bundle exec rubocopto test for code style violations and recommendationsspecs/*_spec.rb) if applicablebundle exec rspecto run the whole test suite and ensure all your tests passCHANGELOG.mdfile to describe your changes under the approprioate existing###subsection of the existing## Trunksection.