Skip to content

Optionally redrive failed requests#19

Merged
astuyve merged 4 commits intomainfrom
aj/retryable-flush-from-flusher
May 22, 2025
Merged

Optionally redrive failed requests#19
astuyve merged 4 commits intomainfrom
aj/retryable-flush-from-flusher

Conversation

@astuyve
Copy link
Copy Markdown
Contributor

@astuyve astuyve commented May 20, 2025

What does this PR do?

Modifies the trace flusher and metric flusher to return vecs of data which can be resubmitted in the event of an intermittent failure due to the start/stop behavior of serverless functions.

Motivation

Allows us to redrive data outside of a typical retry loop

Additional Notes

Describe how to test/QA your changes

@astuyve astuyve requested a review from Copilot May 20, 2025 17:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces changes to enable optionally redriving failed metrics and trace requests in serverless environments by returning the unsent data from the flush operations for retry.

  • Updated the dogstatsd flusher to return failed Series and SketchPayload vectors via a new flush_with_retries function.
  • Modified the trace flusher to accept and return failed trace data for potential retransmission.
  • Added Clone derivations in the Datadog-related types to support safe data retries.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
crates/dogstatsd/src/flusher.rs Adjusted flush logic to optionally return failed metrics for retry and improved error logging.
crates/dogstatsd/src/datadog.rs Added Clone trait derivations to resource and metric types to support retry functionality.
crates/datadog-trace-agent/src/trace_flusher.rs Updated send and flush to return unsent trace data for redriving failed requests.

// Return the failed metrics for potential retry
Some((series_failed, sketches_failed))
} else {
debug!("Some metrics were not sent but no errors occurred");
Copy link

Copilot AI May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider adding a comment explaining why failed batches that did not encounter a shipping error are not returned for retry. This clarification would help future maintainers understand the intended behavior.

Copilot uses AI. Check for mistakes.
Comment thread crates/datadog-trace-agent/src/trace_flusher.rs
astuyve and others added 2 commits May 20, 2025 13:23
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

async fn should_try_next_batch(resp: Result<Response, ShippingError>) -> bool {
/// Returns a tuple (continue_to_next_batch, should_retry_this_batch)
async fn should_try_next_batch(resp: Result<Response, ShippingError>) -> (bool, bool) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably should just return an enum instead of a tuple of bools

.last_result
{
Ok(_) => debug!("Successfully flushed traces"),
Err(e) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to only clone it after it fails here? Or is coalesce directly taking ownership of it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unfortunately coalesce_and_send takes ownership. This can be improved inside libdatadog

Copy link
Copy Markdown

@hghotra hghotra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

@astuyve astuyve merged commit b1583da into main May 22, 2025
25 checks passed
@astuyve astuyve deleted the aj/retryable-flush-from-flusher branch May 22, 2025 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants