feat: [Trace Stats] Implement stats concentrator by lym953 · Pull Request #856 · DataDog/datadog-lambda-extension

lym953 · 2025-09-19T20:13:14Z

This PR

Implements trace stats concentrator, which aggregates trace stats by time slots and aggregation keys.

Now we have minimal working support for trace stats. You can use it by setting env var DD_COMPUTE_TRACE_STATS to true.

Testing

Steps:

Invoke a function 5000 times
Check the sum of trace.aws.lambda.hits trace metric

Result

The trace metric is accurate (5000) for most of the runtimes, but there's undercounting for some runtimes. Will debug it as a next step.

dotnet6: 4999
dotnet8: 5000
golang: 4750
java 11: 5000
java 17: 5000
java21: 5000
node18: 5000
node20: 5000
node22: 5000
python39: 5000
python310: 5000
python311: 5000
python312: 5000
ruby32: 4753

Thanks @purple4reina for testing.

Next steps

Investigate into the undercounting issue
Handle the data fields marked as TODO right now, so the metrics can be grouped properly, and other metrics (other than hits) can be reported correctly.
At the same time, evaluate whether we can reuse the existing Rust-based solution.

Note

Jira: https://datadoghq.atlassian.net/browse/SVLS-7593

## This PR 1. Move stats generation after trace obfuscation, which is the correct order as suggested by Trace Agent team. Right now stats generation is before trace obfuscation. 2. Also generate trace stats for OTLP agent. Right now we only do it for trace agent. ## Architecture Copied from #842 <img width="1296" height="674" alt="image" src="https://github.com/user-attachments/assets/2d4cb925-6cfc-4581-8ed6-6bd87cf0d87a" /> ## Testing Tested in the next PR #856, which implements stats concentrator. Trace stats appeared in Datadog. <img width="538" height="317" alt="image" src="https://github.com/user-attachments/assets/48b849cc-2413-41d5-8576-5ff657c21a0f" /> ## Next steps 1. Implement `StatsConcentrator` 2. Rename for clarity: - `SendingTraceStatsProcessor` -> `TraceStatsGenerator` - `stats_sender` -> `stats_generator` 3. Small refactor: consider passing around `stats_sender` instead of `stats_concentrator_handle`. Right now `SendingTraceStatsProcessor::new()` is called in three places. It might be possible to call it only once then pass it around. ## Notes Jira: https://datadoghq.atlassian.net/browse/SVLS-7593

lym953 · 2025-09-23T18:44:36Z

+        stats: Stats,
+    ) -> pb::ClientStatsPayload {
+        pb::ClientStatsPayload {
+            // TODO: handle this


Marking many fields with TODO, so I can keep this PR small, iterate fast and handle them in future PRs.
Some of them may need code changes, while some may need just understanding work and updating the comment.

litianningdatadog · 2025-09-24T14:28:20Z

+use std::{
+    collections::HashMap,
+    sync::Arc,
+    time::{SystemTime, UNIX_EPOCH},


Should we switch to the tokio::time module to align more closely with Tokio's scheduling?

Could you elaborate?

For example, what's the problem if we use std::time?

Per tokio-rs/tokio#4633, it is better to use tokio's time package if tokio scheduling is involved.

Talked offline. We will keep using std::time because tokio::time is mainly used for handling time intervals, but for trace stats we need to work with absolute timestamps.

litianningdatadog · 2025-09-24T14:30:08Z

+pub struct Stats {
+    pub hits: i32,
+    pub duration: i64, // in nanoseconds
+    pub error: i32,


Does its int type stand for an error code or error count? Can we clarify it in comment or rename it?

Added a comment

duncanista · 2025-09-24T20:51:12Z

+    pub name: String,
+    // e.g. "my-lambda-function-name", "datadog_lambda.handler", "urllib.request"
+    pub resource: String,
+    // e.g. "aws.lambda.load", "aws.lambda.import"


Suggested change

// e.g. "aws.lambda.load", "aws.lambda.import"

// e.g. "serverless"

not sure if the comment is correct for r#type

You are right. Will fix.

duncanista · 2025-09-24T20:51:50Z

+// This is to reduce the chance of flushing stats that are still being collected to save some cost.
+const NO_FLUSH_BUCKET_COUNT: u64 = 2;
+
+const S_TO_NS: u64 = 1_000_000_000;


probably already exists somewhere?

It only exists in f64:

datadog-lambda-extension/bottlecap/src/lifecycle/invocation/processor.rs

Line 47 in ee8fd6f

pub const S_TO_NS: f64 = 1_000_000_000.0;

I need to define another one in u64.

lym953 mentioned this pull request Sep 19, 2025

feat: [Trace Stats] Move stats generation after trace obfuscation #855

Merged

Base automatically changed from yiming.luo/trace-stats-6 to main September 22, 2025 20:02

lym953 force-pushed the yiming.luo/trace-stats-7 branch from d6b18a8 to ef07b85 Compare September 23, 2025 17:21

lym953 commented Sep 23, 2025

View reviewed changes

lym953 force-pushed the yiming.luo/trace-stats-7 branch from 018242c to c27a768 Compare September 23, 2025 18:46

lym953 marked this pull request as ready for review September 23, 2025 18:49

lym953 requested a review from a team as a code owner September 23, 2025 18:49

litianningdatadog reviewed Sep 24, 2025

View reviewed changes

duncanista reviewed Sep 24, 2025

View reviewed changes

duncanista approved these changes Sep 24, 2025

View reviewed changes

lym953 force-pushed the yiming.luo/trace-stats-7 branch from c3fcc18 to 0bf2d8b Compare September 25, 2025 17:34

lym953 added 9 commits September 25, 2025 14:59

Implement stats concentrator

8b39c5e

fmt

c323875

Add lots of TODOs

dbfd983

Rename test

4be028d

fmt

a8e9255

Remove unused import

4d2fe96

Rename: error -> errors

12d25e5

fmt

cb6be37

Fix comment for r#type

c099094

lym953 force-pushed the yiming.luo/trace-stats-7 branch from 0bf2d8b to c099094 Compare September 25, 2025 18:59

lym953 merged commit 53659e5 into main Sep 25, 2025
46 checks passed

lym953 deleted the yiming.luo/trace-stats-7 branch September 25, 2025 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: [Trace Stats] Implement stats concentrator#856

feat: [Trace Stats] Implement stats concentrator#856
lym953 merged 9 commits intomainfrom
yiming.luo/trace-stats-7

lym953 commented Sep 19, 2025 •

edited

Loading

Uh oh!

lym953 Sep 23, 2025 •

edited

Loading

Uh oh!

litianningdatadog Sep 24, 2025

Uh oh!

lym953 Sep 24, 2025

Uh oh!

lym953 Sep 24, 2025

Uh oh!

litianningdatadog Sep 24, 2025

Uh oh!

lym953 Sep 25, 2025

Uh oh!

litianningdatadog Sep 24, 2025

Uh oh!

lym953 Sep 24, 2025

Uh oh!

duncanista Sep 24, 2025

Uh oh!

lym953 Sep 25, 2025

Uh oh!

duncanista Sep 24, 2025

Uh oh!

lym953 Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	// e.g. "aws.lambda.load", "aws.lambda.import"
	// e.g. "serverless"

Conversation

lym953 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR

Testing

Steps:

Result

Next steps

Note

Uh oh!

lym953 Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lym953 commented Sep 19, 2025 •

edited

Loading

lym953 Sep 23, 2025 •

edited

Loading