Skip to content

Commit a3ecbcc

Browse files
committed
Fix 429 retry-after handling of the LFS batch API endpoint
While the 429 retry-after HTTP error handling for the storage endpoint works just as expected, the same was not true for batch API endpoint. In case of a 429 error, the call to the batch API endpoint would be retried as well as the `retry-after` HTTP header honoured correctly, but the LFS command would still exit with a generic `LFS: Error` message. This was caused by the fact, that while the retry-able error from the `Batch` method was correctly overwritten by `nil` in case it of a retry, it would finally again be converted into a retry-able error and hence be no longer `nil`. This would lead to a bubble-up of the original 429 retry-able HTTP error to the final error check, altough the exact operation was successfully retried. Furthermore, the overwrite with `nil` during correct handling of the first object caused all subsequent objects to fail and hence never being enqueued for retrial. This was solved by removing the immediate overwrite by `nil` in case of a retry-able-later error and doing the final error conversion based on the whole batch result instead of a single object result.
1 parent 32f571d commit a3ecbcc

File tree

1 file changed

+10
-2
lines changed

1 file changed

+10
-2
lines changed

tq/transfer_queue.go

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -557,22 +557,30 @@ func (q *TransferQueue) enqueueAndCollectRetriesFor(batch batch) (batch, error)
557557
var err error
558558
bRes, err = Batch(q.manifest, q.direction, q.remote, q.ref, batch.ToTransfers())
559559
if err != nil {
560+
var hasNonScheduledErrors = false
560561
// If there was an error making the batch API call, mark all of
561562
// the objects for retry, and return them along with the error
562563
// that was encountered. If any of the objects couldn't be
563564
// retried, they will be marked as failed.
564565
for _, t := range batch {
565566
if q.canRetryObject(t.Oid, err) {
567+
hasNonScheduledErrors = true
566568
enqueueRetry(t, err, nil)
567569
} else if readyTime, canRetry := q.canRetryObjectLater(t.Oid, err); canRetry {
568-
err = nil
569570
enqueueRetry(t, err, &readyTime)
570571
} else {
572+
hasNonScheduledErrors = true
571573
q.wait.Done()
572574
}
573575
}
574576

575-
return next, errors.NewRetriableError(err)
577+
// Only return error and mark operation as failure if at least one object
578+
// was not enqueued for retrial at a later point.
579+
if hasNonScheduledErrors {
580+
return next, errors.NewRetriableError(err)
581+
} else {
582+
return next, nil
583+
}
576584
}
577585
}
578586

0 commit comments

Comments
 (0)