Skip to content

Adds the option to auto-retry for downloading metadata from GlotPress#474

Merged
oguzkocer merged 4 commits intotrunkfrom
auto-retry-gp_downloadmetadata
May 19, 2023
Merged

Adds the option to auto-retry for downloading metadata from GlotPress#474
oguzkocer merged 4 commits intotrunkfrom
auto-retry-gp_downloadmetadata

Conversation

@oguzkocer
Copy link
Contributor

@oguzkocer oguzkocer commented May 10, 2023

What does it do?

This PR adds auto_retry option to gp_downloadmetadataAction. This is an important feature for release management in CI because we can't interact with the process.

The implementation is pretty basic. If we get 429: Too Many Requests error, we wait for 20 seconds and we try again. If we auto retry 30 times, we'll stop auto retrying, but the manual retry will still work. When I manually retry, I typically do it in a few seconds, but in CI we have the luxury of waiting, so I randomly selected 20 seconds. For the max retries, I thought 10 minutes of total retrying would be a reasonable one. However, I am happy to change these numbers if you have any feedback.

@mokagio You might especially be interested in this PR as WPiOS release manager.

To Test

I found testing these changes to be a bit tricky because we don't know whether we'll get 429 from the server or not. So, I prepared a test branch test/auto-retry-gp_downloadmetadata which switches the 200 and 429 status codes, so we can reproduce the scenario easily. I also changed the MAX_AUTO_RETRY_ATTEMPTS to 3, so we can easily test that as well.

  • Checkout the test/auto-retry-gp_downloadmetadata WordPress-Android branch. This branch is already setup to use the correct branch for testing this PR and has auto_retry enabled.
  • Run bundle exec fastlane download_metadata_strings
  • Verify the log Received 429 for ar. Auto retrying in 20 seconds... 4 times. First is the original attempt, and then 3 auto retry attempts (It's very unlikely but you might get a different locale than ar)
  • Verify the log Retry downloading ar after receiving 429 from the API? (y/n)
  • Abort the process - You can type n and then quickly do ctrl+c. Otherwise you might get stuck in a loop 😅

Checklist before requesting a review

  • Run bundle exec rubocop to test for code style violations and recommendations
  • Add Unit Tests (aka specs/*_spec.rb) if applicable
  • Run bundle exec rspec to run the whole test suite and ensure all your tests pass
  • Make sure you added an entry in the CHANGELOG.md file to describe your changes under the approprioate existing ### subsection of the existing ## Trunk section.

Base automatically changed from remove/git-push-actions to trunk May 12, 2023 01:14
@oguzkocer oguzkocer force-pushed the auto-retry-gp_downloadmetadata branch 2 times, most recently from 0d09afd to e1736fb Compare May 18, 2023 09:54
@oguzkocer oguzkocer force-pushed the auto-retry-gp_downloadmetadata branch from e1736fb to 17ca0b7 Compare May 18, 2023 09:56
@oguzkocer oguzkocer marked this pull request as ready for review May 18, 2023 10:19
@oguzkocer oguzkocer enabled auto-merge May 18, 2023 10:19
@oguzkocer oguzkocer requested a review from a team May 18, 2023 10:19
@AliSoftware
Copy link
Contributor

As a side note just for your information as future directions (even tho this probably won't be implemented anytime soon given our other priorities):
We've been working with i18n folks to nudge in the direction of using a single request to get a zip download of all locales of a GlotPress Project at once instead of one request per locale (see also this PR that adds the feature to GlotPress' .com instance), and hopefully that might allow us to reduce the risk of 409 errors.

Of course in the meantime, the auto-retry mechanism you propose here is still useful to get a workaround in ASAP, so the PR is still useful. But just figured I'd link to those ideas we've been brainstorming to ultimately get rid of those 409 altogether FYI :)

@oguzkocer
Copy link
Contributor Author

@AliSoftware Thank you for linking those. I was aware that this current action is likely to be deprecated, but I need a temporary solution to continue my progress in release management in CI.

Copy link
Contributor

@AliSoftware AliSoftware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks sound, but left some ideas; curious what you think.

description: 'Whether to auto retry downloads after Too Many Requests error',
type: FastlaneCore::Boolean,
optional: true,
default_value: false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I wouldn't mind having this set to true, given how useful it'll be to have this feature (even for repos that are still doing manual releases and not using release-on-CI) 🙃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind it either, but that'd be somewhat of a breaking change. If others feel the same way about setting it to true, it's easy enough to change :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for setting to true for the convenience it would provide.

I can see how it would change the default behavior and be seen as a breaking change. However, given we are the only consumers, I think our threshold for when to introduce breaking changes is relatively low. I'd also be fine with this being a minor version bump, as updating to this version would not "break" anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in dc56ef0.

module Helper
class MetadataDownloader
AUTO_RETRY_SLEEP_TIME = 20
MAX_AUTO_RETRY_ATTEMPTS = 30
Copy link
Contributor

@AliSoftware AliSoftware May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it wouldn't make more sense to have the auto_retry ConfigItem be an Integer that would allow to set that max-retry-attempts value?

That way we could use a value like 30 (!) for CI because we really need to be sure to end up in success, while we could use a value like 3 for repos still doing manual release, because it'd be nice not to have to UI.confirm the retry manually every time this happens just on the first or second try, but if it fails consistently more than 3 times we might want to wait longer when confirming manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a cool idea. I could make that change if there is practical use for it for others or if you feel strongly that it is a better approach.

My understanding is that, this issue is mostly specific to WPAndroid & WPiOS and as you mentioned it's a temporary one. That's why I did this in the most straightforward way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't feel strongly about it.

# We got rate-limited, auto_retry or offer to try again with a prompt
if @auto_retry && @auto_retry_attempt_counter <= MAX_AUTO_RETRY_ATTEMPTS
UI.message("Received 429 for `#{locale}`. Auto retrying in #{AUTO_RETRY_SLEEP_TIME} seconds...")
sleep(AUTO_RETRY_SLEEP_TIME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we wouldn't benefit from a linear or exponential wait time between attempts instead of a fixed time?

e.g. sleep(10 + @auto_retry_attempt_counter*5) or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've considered this one, but in my experience, it's really not necessary. I think the server just checks the number of requests in a certain amount of time, but I could be wrong.

For what it's worth, I actually wanted to go with something shorter, but since it's in CI, it doesn't really matter if it takes 1 or 2 minutes longer. Funnily enough, I tried to create the 429 error today for quite some time and I couldn't get it even once.

Happy to apply the suggestion if you feel there is enough benefit.

@mokagio
Copy link
Contributor

mokagio commented May 19, 2023

Thank you for doing this @oguzkocer ! I think about auto-retry literally every time I run the lane, but never made the time for it.

@oguzkocer oguzkocer requested a review from a team May 19, 2023 04:27
@target_files = target_files
@auto_retry = auto_retry
@alternates = {}
@auto_retry_attempt_counter = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just happened to me that the downloader instance is reused at call site for each locale to be downloaded.

This means that if you only set the @auto_retry_attempt_counter in initialize, the counter will be shared across all usages of the downloader, and thus across all locales for which you call download(…) on.

Was this intentional to consider that MAX_AUTO_RETRY_ATTEMPTS be seen as "max 30 retries in total" as opposed to "max 30 retries per locale"? If not, I think we need to make sure @auto_retry_attempt_counter is reset to 0 every time the caller calls download(…) on a new locale.

For example, maybe you could move the current 3 LoC implementation of download to a private try_download method, then change download(…) implementation to now do @auto_retry_attempt_counter = 0 + call try_download(…)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional. The only goal I had with this attempt counter is so that we wouldn't get into an infinite loop. I don't think we'll ever hit this limit unless there is an issue with the server or CI in which case, I thought it'd be best to stop and let the release manager handle it either manually or by retrying once the issue is resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM 👍

@oguzkocer oguzkocer merged commit f1e5479 into trunk May 19, 2023
@oguzkocer oguzkocer deleted the auto-retry-gp_downloadmetadata branch May 19, 2023 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants