feat: [Trace Stats] Move stats generation after trace obfuscation by lym953 · Pull Request #855 · DataDog/datadog-lambda-extension

lym953 · 2025-09-19T19:54:13Z

This PR

Move stats generation after trace obfuscation, which is the correct order as suggested by Trace Agent team. Right now stats generation is before trace obfuscation.
Also generate trace stats for OTLP agent. Right now we only do it for trace agent.

Architecture

Copied from #842

Testing

Tested in the next PR #856, which implements stats concentrator. Trace stats appeared in Datadog.

Next steps

Implement StatsConcentrator
Rename for clarity:

SendingTraceStatsProcessor -> TraceStatsGenerator
stats_sender -> stats_generator

Small refactor: consider passing around stats_sender instead of stats_concentrator_handle. Right now SendingTraceStatsProcessor::new() is called in three places. It might be possible to call it only once then pass it around.

Notes

Jira: https://datadoghq.atlassian.net/browse/SVLS-7593

lym953 · 2025-09-19T19:58:10Z

+        if compute_trace_stats {
+            if let Err(err) = stats_sender.send(&processed_traces) {
+                error!("OTLP | Error sending traces to the stats concentrator: {err}");
+                return (
+                StatusCode::INTERNAL_SERVER_ERROR,
+                    json!({ "message": format!("Error sending traces to the stats concentrator: {err}") }).to_string()
+                ).into_response();


Core change: Add a stats generation hook in OTLP agent.

Same comment as for traces

lym953 · 2025-09-19T20:00:32Z

-        if config.compute_trace_stats {
-            if let Err(err) = stats_sender.send(&traces) {
-                return error_response(
-                    StatusCode::INTERNAL_SERVER_ERROR,
-                    format!("Error sending stats to the stats aggregator: {err}"),
-                );
-            }
-        }
-


Moving this into send_processed_traces() below

lym953 · 2025-09-19T20:02:02Z

+        if config.compute_trace_stats {
+            if let Err(err) = self.stats_sender.send(&processed_traces) {
+                error!("TRACE_PROCESSOR | Error sending traces to the stats concentrator: {err}");
+                return Err(SendingTraceProcessorError::SendStatsError(err));
+            }
+        }


Core change: this is moved into send_processed_traces() after obfuscation is done.

Should we allow to continue even if trace stats forwarding failed? Wouldn't it be better to keep having some data being forwarded still?

I think the fundamental question is whether trace stats are as important as traces themselves. If yes, then when stats fail to be sent, we should probably return error to let the caller handle this case. Otherwise we can return ok in this case. Seems you think stats are less important than traces, so let me swallow the error then.

Ideally, I think this would directly impede sending traces, I wouldn't expect this error to happen a lot of times, but ideally we'd like to have some data as opposed to none

I'm not sure if the tracer would send the data back if we respond with an error, is that the case?

Because this error in theory is not related to any datadog logic, but just a failed channel forwarding

we'd like to have some data as opposed to none

What do you mean by "have some data"? Do you mean traces should still be sent to Datadog even if stats fail to send?

I'm not sure if the tracer would send the data back if we respond with an error, is that the case?

What do you mean by "send the data back"?
I'm not sure how tracers work, but if we think this error is critical, the extension should surface it to the caller.

What do you mean by "have some data"? Do you mean traces should still be sent to Datadog even if stats fail to send?

Correct, before, WDYT?

What do you mean by "send the data back"?

As in, if you reply back to the tracer with a 400/500 status, will it re-send the tracer payload we failed to process?

I'm not sure how tracers work, but if we think this error is critical, the extension should surface it to the caller.

I agree, but in this case, it's not a processing error, it's more of a critical error on the extension, right? So we could say it's not the tracers fault, but ours(?

As in, if you reply back to the tracer with a 400/500 status, will it re-send the tracer payload we failed to process?

I spot checked dd-trace-py. It doesn't retry as long as it gets a valid response, even if the status is 400/500.

I'm okay either way. I pushed a commit to swallow and log the error. Could you review it?

lym953 · 2025-09-19T20:02:56Z

 #[allow(clippy::too_many_arguments)]
 #[async_trait]
 pub trait TraceProcessor {
-    async fn process_traces(


This doesn't need to be async.

lym953 · 2025-09-19T20:03:53Z

-                self.stats_concentrator.add(stats)?;
+    pub fn send(
+        &self,
+        traces: &TracerPayloadCollection,


Generating stats from processed traces instead of raw traces

lym953 · 2025-09-19T20:11:29Z

        };

-        let builder = SendDataBuilder::new(body_size, payload, header_tags, &endpoint)
+        let builder = SendDataBuilder::new(body_size, payload.clone(), header_tags, &endpoint)


I have to clone the processed trace payload so I can return it to generate stats. Let me know if you have a more efficient approach.

duncanista · 2025-09-22T17:02:34Z

+}
+
 // Extracts information from traces related to stats and sends it to the stats concentrator
 impl SendingTraceStatsProcessor {


SendingTraceStatsProcessor name confuses me on what it does

It's modified from SendingTraceProcessor. As the PR summary says, I plan to rename it to TraceStatsGenerator. Does this sound good to you?

lym953 added 9 commits September 19, 2025 14:35

Try to move stats sending after obfuscation (incomplete)

64b4a95

Fix some errors

72d6999

Fix some errors

f3f163b

Clone processed traces

48372e3

Send traces for otlp agent

d5d38ca

fmt

92fb225

Use thiserror

b327cf3

Add error handling

32ba524

Comment

1de93b1

lym953 commented Sep 19, 2025

View reviewed changes

lym953 marked this pull request as ready for review September 19, 2025 20:14

lym953 requested a review from a team as a code owner September 19, 2025 20:14

Comment

d6d0ada

duncanista reviewed Sep 22, 2025

View reviewed changes

Do not return error when stats fail to send

71c1d79

duncanista approved these changes Sep 22, 2025

View reviewed changes

Merge branch 'main' into yiming.luo/trace-stats-6

622c59c

lym953 merged commit 13c8b7b into main Sep 22, 2025
46 checks passed

lym953 deleted the yiming.luo/trace-stats-6 branch September 22, 2025 20:02

Conversation

lym953 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR

Architecture

Testing

Next steps

Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

duncanista Sep 22, 2025 • edited by lym953 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lym953 Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lym953 commented Sep 19, 2025 •

edited

Loading

duncanista Sep 22, 2025 •

edited by lym953

Loading

lym953 Sep 22, 2025 •

edited

Loading