Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #433 +/- ##
==========================================
+ Coverage 67.88% 68.56% +0.68%
==========================================
Files 193 193
Lines 24643 25086 +443
==========================================
+ Hits 16728 17201 +473
+ Misses 7915 7885 -30
🚀 New features to boost your workflow:
|
4e750da to
3c993d6
Compare
7a45559 to
87f7441
Compare
87f7441 to
7bca31b
Compare
9b363c5 to
f0dd686
Compare
pierotibou
left a comment
There was a problem hiding this comment.
Thanks a lot. Nice and simple solution!
| pub delay_ms: Duration, | ||
| /// The type of backoff to use for the delay between retries. | ||
| pub backoff_type: RetryBackoffType, | ||
| /// An optional jitter to add randomness to the delay. |
There was a problem hiding this comment.
Do we really needed the jitter? I don't mind leaving it but I'm a big fan of simplicity and implementing only what we need at a given time (as I saw too many times "vital" features nobody was using :p)
There was a problem hiding this comment.
I can't say with confidence that we need jitter. Perhaps someone from the trace-agent team would have a better sense of how important it is. It was a suggestion made in a libdatadog weekly sync and was simple enough to implement.
There is nothing blocking the sidecar from continually flushing during connectivity issues, so it is conceivable that there are several SendData objects attempting retries concurrently and having jitter in place may help with this.
I'm happy to defer to others on this and remove it if we think it's overkill.
There was a problem hiding this comment.
You can leave it for now anyway and we can reconsider later.
About the trace agent, it has a backoff mechanism, not yet implemented by any tracer and we have a ticket for it
| retry_strategy.delay(2).await; | ||
| let elapsed = start.elapsed(); | ||
|
|
||
| // For the Exponential strategy, the delay for the second attempt should be double the |
There was a problem hiding this comment.
nitpicking - we should maybe add one other iteration as double is also the case for the double strategy, to make sure we don't mix up anything.
| // result. Otherwise, delay and try again. | ||
| match &result { | ||
| Ok(response) => { | ||
| if response.status().is_client_error() || response.status().is_server_error() { |
There was a problem hiding this comment.
As you asked in standup, this is good for now. We have one story - APMSP-1054 - to improve this down the road (as nobody implemented it yet)
bantonsson
left a comment
There was a problem hiding this comment.
Some general perf comments. I'll continue looking at it.
|
|
||
| if target.api_key.is_some() { | ||
| req = req.header("Content-type", "application/x-protobuf"); | ||
| let agent_payload = construct_agent_payload(self.tracer_payloads.clone()); |
There was a problem hiding this comment.
I can't figure out a way to avoid this clone 😢 The raw mapping of proto definitions to rust structs make this messy.
Adding retry logic as-is would be a bit problematic. Hyper doesn't allow the use of references and we can't clone request objects so we need to refactor where we construct the objects to support retries.
Add RetryStrategy to send_data to support retrying requests.
f0dd686 to
d37f762
Compare
1 similar comment
d37f762 to
7b7225c
Compare
bantonsson
left a comment
There was a problem hiding this comment.
Thanks for the great tests. This looks good, but I'm confused by the RetryBackoffType naming.
What does this PR do?
Introduces retry logic to send traces in trace_utils::SendData. The retry strategy is configurable via the
RetryStrategystruct.SendDatawill use a Default strategy, but callers have the option to configure the strategy via the SendData.set_retry_strategy() function. TraceExporter, Mini-agent, and SIdecar will have the default retry strategy behavior.Motivation
In order to turn the sidecar on by default for PHP there needs to be support for retries when flushing traces fail. Adding the retry logic to trace_utils made the most sense (and can be used outside of the sidecar)
Additional Notes
The
RetryStrategystruct should live in another file. And some of the test helper functions should move to a common place. But, there are other PRs in flight that will be modifyingsend_data.rs. So we can do that work in a follow-up PR to minimize disruption. TODO comments with Jira IDs have been added to the code in these spots.How to test the change?
Unit tests have been added. Longer-term we may want to consider using the test-agent.