out_s3: implement retry_limit parameter#10825
Conversation
WalkthroughReplaces the global MAX_UPLOAD_ERRORS with a per-instance Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor FLB as S3 Output Instance
participant CH as Chunk / Multipart Upload
participant S3 as AWS S3
participant INS as Instance (retry_limit)
FLB->>CH: initiate upload/flush/part
CH->>S3: send request
S3-->>CH: success / error
alt Success
CH-->>FLB: mark complete
else Error
CH->>INS: increment failure counter
INS->>INS: compare failures >= retry_limit
alt failures < retry_limit
INS-->>CH: schedule retry
else failures >= retry_limit
INS-->>CH: inactivate/discard upload
CH-->>FLB: log exceeded retry_limit
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
plugins/out_s3/s3.c (3)
1873-1881: Gate retry logic by FLB_OUT_RETRY_UNLIMITED
Wrap the existing retry‐limit check soFLB_OUT_RETRY_UNLIMITEDdoesn’t prematurely stop retries (else any non-negative counter ≥ –1 is always true):--- a/plugins/out_s3/s3.c +++ b/plugins/out_s3/s3.c @@ -1873,7 +1873,9 @@ upload_contents->retry_counter++; - if (upload_contents->retry_counter >= ctx->ins->retry_limit) { + if (ctx->ins->retry_limit != FLB_OUT_RETRY_UNLIMITED && + upload_contents->retry_counter >= ctx->ins->retry_limit) { flb_plg_warn(ctx->ins, "Chunk file failed to send %d times, will not " "retry", upload_contents->retry_counter); s3_store_file_inactive(ctx, upload_contents->upload_file);This change aligns with other plugins’ unlimited-retry gating and restores true unlimited behavior.
3291-3296: Gatecomplete_errorscheck byFLB_OUT_RETRY_UNLIMITED
Wrap the existing threshold test so it’s only applied whenretry_limitisn’t unlimited:- if (m_upload->complete_errors >= ctx->ins->retry_limit) { + if (ctx->ins->retry_limit != FLB_OUT_RETRY_UNLIMITED && + m_upload->complete_errors >= ctx->ins->retry_limit) {
1628-1633: Gate multipart upload error threshold whenretry_limitis finite
Modify inplugins/out_s3/s3.c:1628–1633as follows:- if (tmp_upload->upload_errors >= ctx->ins->retry_limit) { + if (ctx->ins->retry_limit != FLB_OUT_RETRY_UNLIMITED && + tmp_upload->upload_errors >= ctx->ins->retry_limit) {This skips the error threshold when
retry_limitisFLB_OUT_RETRY_UNLIMITED(-1), avoiding immediate completion for unlimited retries. (fossies.org)
🧹 Nitpick comments (1)
plugins/out_s3/s3.h (1)
59-61: Clarify that retry_limit comes from the output instance (Retry_Limit).To avoid confusion with plugin-local fields, consider wording this note to reference the generic output Retry_Limit (ins->retry_limit) that Fluent Bit core already provides, since that’s what the code uses.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
plugins/out_s3/s3.c(7 hunks)plugins/out_s3/s3.h(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/out_s3/s3.c (1)
src/flb_fstore.c (1)
flb_fstore_file_inactive(218-235)
🔇 Additional comments (1)
plugins/out_s3/s3.h (1)
100-103: Comment tweak looks fine.
There was a problem hiding this comment.
basically, it looks good but the output's retrying mechanism has infinite retrying possibility.
So, it shouldn't be referred directly and we should keep to follow the previous behavior especially for denying to run with infinite retrying.
So, I wanted to prevent such relatively dangerous behavior with:
diff --git a/plugins/out_s3/s3.c b/plugins/out_s3/s3.c
@@
+static inline int s3_retry_cap(struct flb_s3 *ctx)
+{
+ int lim = ctx->ins->retry_limit;
+ if (lim < 0) {
+ /* Avoid infinite internal loops on non-existent buckets, etc. */
+ lim = 5; /* keep historical behavior */
+ }
+ return lim;
+}
@@ -1336,11 +1336,12 @@ static int put_all_chunks(struct flb_s3 *ctx)
- if (chunk->failures >= ctx->ins->retry_limit) {
+ if (chunk->failures >= s3_retry_cap(ctx)) {
flb_plg_warn(ctx->ins,
- "Chunk for tag %s failed to send %i times, "
- "will not retry",
- (char *) fsf->meta_buf, ctx->ins->retry_limit);
+ "Chunk for tag %s failed to send %d/%d times, will not retry",
+ (char *) fsf->meta_buf,
+ chunk->failures, s3_retry_cap(ctx));
flb_fstore_file_inactive(ctx->fs, fsf);
continue;
}
@@ -1625,7 +1626,7 @@ static struct multipart_upload *get_upload(struct flb_s3 *ctx,
- if (tmp_upload->upload_errors >= ctx->ins->retry_limit) {
+ if (tmp_upload->upload_errors >= s3_retry_cap(ctx)) {
tmp_upload->upload_state = MULTIPART_UPLOAD_STATE_COMPLETE_IN_PROGRESS;
flb_plg_error(ctx->ins, "Upload for %s has reached max upload errors",
tmp_upload->s3_key);
}
@@ -1871,7 +1872,7 @@ static void s3_upload_queue(struct flb_config *config, void *out_context)
/* If retry limit was reached, discard file and remove file from queue */
upload_contents->retry_counter++;
- if (upload_contents->retry_counter >= ctx->ins->retry_limit) {
+ if (upload_contents->retry_counter >= s3_retry_cap(ctx)) {
flb_plg_warn(ctx->ins, "Chunk file failed to send %d/%d times, will not retry",
- upload_contents->retry_counter, ctx->ins->retry_limit);
+ upload_contents->retry_counter, s3_retry_cap(ctx));
s3_store_file_inactive(ctx, upload_contents->upload_file);
mk_list_del(&upload_contents->_head);
continue;
}
@@ -3270,6 +3271,10 @@ static void cb_s3_upload(struct flb_config *config, void *data)
ret = send_chunk(...);
if (ret != FLB_OK) {
flb_plg_error(ctx->ins, "Could not send chunk with tag %s",
(char *) fsf->meta_buf);
- if (chunk->failures >= ctx->ins->retry_limit) {
+ if (chunk->failures >= s3_retry_cap(ctx)) {
flb_plg_warn(ctx->ins,
- "Chunk for tag %s failed to send %i times, will not retry",
- (char *) fsf->meta_buf, ctx->ins->retry_limit);
+ "Chunk for tag %s failed to send %d/%d times, will not retry",
+ (char *) fsf->meta_buf, chunk->failures, s3_retry_cap(ctx));
flb_fstore_file_inactive(ctx->fs, fsf);
continue;
}
}
@@ -3288,7 +3293,7 @@ static void cb_s3_upload(struct flb_config *config, void *data)
- if (m_upload->complete_errors >= ctx->ins->retry_limit) {
+ if (m_upload->complete_errors >= s3_retry_cap(ctx)) {
flb_plg_error(ctx->ins,
"Upload for %s has reached max completion errors, plugin will give up",
m_upload->s3_key);
}
@@ -3789,10 +3794,10 @@ static void cb_s3_flush(...)
- if (upload_file != NULL && upload_file->failures >= ctx->ins->retry_limit) {
- flb_plg_warn(ctx->ins, "File with tag %s failed to send %d times, will not retry",
- event_chunk->tag, ctx->ins->retry_limit);
+ if (upload_file != NULL && upload_file->failures >= s3_retry_cap(ctx)) {
+ flb_plg_warn(ctx->ins, "File with tag %s failed to send %d/%d times, will not retry",
+ event_chunk->tag, upload_file->failures, s3_retry_cap(ctx));
s3_store_file_inactive(ctx, upload_file);
upload_file = NULL;
}Or, rounding the negative retry limit should be set up with 5 forcibly in static int cb_s3_init(..) would be better.
What do you think of plugging this possibility?
|
@cosmo0920 I agree. The reason I kept the infinite retry feature was as I thought it was expected for this feature to be there. Thanks a lot for the suggestions
Let me try the cb_s3_init changes and check once |
Thanks for your response. I'm continuing to consider where Fluent Bit should be appropriate Fluent Bit to check this condition. |
|
The patches look good. So, we need to concatenate into one commit. |
retry_limit parameter is not honored and is set to 5. This feature provides dynamic retry_limit based on configuration for out_s3 plugin. Signed-off-by: usharma <ujjwal_sharma@intuit.com> out_s3: fix retry_limit as it is part of default instance Fixed the retry_limit as it was not needed and it part of default instance Signed-off-by: usharma <ujjwal_sharma@intuit.com> out_s3: fix infinite retry_limit in cb_s3_init Fixes the infinite retry limit to MAX_UPLOAD_ERRORS(old behaviour) in cb_s3_init. If negative retry limit is set, the retry limit is set to MAX_UPLOAD_ERRORS Signed-off-by: usharma <ujjwal_sharma@intuit.com> out_s3: fix infinite retry_limit in cb_s3_init Fixes the infinite retry limit to MAX_UPLOAD_ERRORS(old behaviour) in cb_s3_init. If negative retry limit is set, the retry limit is set to MAX_UPLOAD_ERRORS Signed-off-by: usharma <ujjwal_sharma@intuit.com>
d026f5f to
a9f62f2
Compare
|
Hi @cosmo0920 Apologies for the delay in the PR. Thanks for the feedback. I have squashed the commits into 1. Let me know if something else is needed |
|
note to myself : in order to approve PRs we need to check the commits are clean: in that case there is a bad squash, I know this errors happens, will check how to prevent this in the CI |
retry_limit parameter is not honored and is set to 5. This feature provides dynamic retry_limit based on configuration for out_s3 plugin.
This feature fixes the infinite retry when buffers are cleared and Bucket name does not exist. It also implements retry_limit when users provide it on a out_s3 plugin level.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
flb.log
This run is for the scenario where we do not push the records.
valgrind_report.txt
This Report is justified as we dont want to delete the file and use
flb_fstore_file_inactive(ctx->fs, fsf)instead offlb_fstore_destroy(ctx->fs)It correlates with the current Master version :
valgrind_mem_check_4.0.8.txt
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Additional Tests Run with Logs :
[Buffers Cleared Everytime]
BUCKET DOES NOT EXIST
[Buffers Cleared Everytime]
BUCKET EXISTS
Summary by CodeRabbit
New Features
Documentation
fixes: #10819