AWS: Fix S3InputStream retry policy#11335
Conversation
|
@amogh-jahagirdar @jackye1995 @danielcweeks PTAL. Thanks! |
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Thanks for fixing this @edgarRd , this was a miss on my part on the original PR. Yeah we need to be using onRetry for triggering the input stream reset, not onFailure. Had some minor comments on the log message, would be great to fix so we can get this correctly into 1.7
| void resetForRetry() throws IOException { | ||
| resetForRetryCounter.incrementAndGet(); | ||
| super.resetForRetry(); | ||
| } |
There was a problem hiding this comment.
Thanks for fixing this, this was crucial for verifying the retry behavior actually resets the input stream. Seems like we were just getting lucky on the tests before since they only counted the number of attempts but not what was actually happening in the attempts.
There was a problem hiding this comment.
Yes. That's correct.
|
@amogh-jahagirdar I've addressed the comment. PTAL - Thanks. |
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Thanks @edgarRd, really appreciate the fix here! I'll keep it open for a little bit in case anyone else had any comments.
singhpk234
left a comment
There was a problem hiding this comment.
LGTM as well, Thanks @edgarRd !
|
Thanks @edgarRd, I'll go ahead and merge. Thank you @Parth-Brahmbhatt @singhpk234 for reviewing! |
|
Thanks for the reviews, @amogh-jahagirdar @singhpk234 and @Parth-Brahmbhatt ! |
The retry policy for
S3InputStreamreads introduced in c0d73f4 (#10433) is not actually re-opening the stream on each retry attempt given that it usesonFailurewhich, as per the documentation, it is triggered after the whole (retry) policy has failed completely and was unable to produce a successful result, i.e. all retries have been exhausted and failed. With the existing implementation, once a stream fails with something likeSSLExceptiondue to a socket connection reset, the retries will keep failing (due to not actually reopening the stream) and therefore this won't work. TheonFailurecall does not work either because by the timeopenStreamis called, the retry policy failure has already been propagated, failing the read operation.The tests also were not testing that the retry policy actually called
openStream(true)to reset the stream, it only tested that the calls went through the retry policy a given number of times, generally always succeeding on the last one.This PR fixes the problem by using
onRetryinstead, as per the documentation, to re-open the stream right before we attempt the read again. Also, it fixes the tests by actually checking that the stream is reset (re-opened).