feat: Lazily resolve api key by lym953 · Pull Request #717 · DataDog/datadog-lambda-extension

lym953 · 2025-06-24T19:55:20Z

Motivation

today we basically block/await on that decrypt call before we can call /next
so if we can instead make that async and then resolve the future only when we need to flush data, that can be a big win for many customers.

https://datadoghq.atlassian.net/browse/SVLS-6995

Previous work

DataDog/serverless-components#21, DataDog/serverless-components#24 created ApiKeyFactory, which is a util to enable lazy API key resolution.

This PR

Updates Bottlecap code to use ApiKeyFactory to lazily resolve API key, i.e. instead of resolving it by querying Secret Manager or KMS during init phase, do it at flushing time when api key is actually needed.

Note

This PR changes the behavior when key resolution fails, i.e. when resolve_secrets() returns None.

Before: run extension_loop_idle(), which does not stop the runtime
After: panic, which will stop the runtime (if I understand correctly). Of course it's not ideal. Any better idea?
- It's harder now to run extension_loop_idle() because api key resolution code is not in the main loop anymore, but in various consumer code of api key
- Is there a way to gracefully shut down the extension without affecting the runtime?

Update: Added a PR to address resolution failure: #732
These two PRs should be merged together. Keeping them separate PRs just to make review easier.

Testing

Setup

Runtime: Go1 on Amazon Linux 2
Architecture: arm64
An app with empty implementation code

Result

Below is the Datadog Next-Gen Extension ready in: time logged.

Before: (prod extension arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Extension-ARM:82)
- 88.6 ± 1.8 (ms)
After: (test extension arn:aws:lambda:us-east-1:425362996713:layer:Datadog-Bottlecap-Beta-ARM-yiming:2)
- 35.4 ± 5.1 (ms)
- (-60.0%)

Both use 5 samples.

Notes

https://datadoghq.atlassian.net/issues/SVLS-6996
https://datadoghq.atlassian.net/issues/SVLS-6998

# Problem Right now `AwsConfig` has a lot of fields, including the ones related to credential: ``` pub aws_access_key_id: String, pub aws_secret_access_key: String, pub aws_session_token: String, pub aws_container_credentials_full_uri: String, pub aws_container_authorization_token: String, ``` The next PR #717 wants to lazily load API key and the credentials. To do that, for the resolver function `resolve_secrets()`, I need to change the param `aws_config` from `&AwsConfig` to `Arc<RwLock<AwsConfig>>`. Because `aws_config` is passed to many places, this change involves updating lots of functions, which is formidable. # This PR Separates these credential-related fields out from `AwsConfig` and creates a new struct `AwsCredentials` Thus, the next PR will only need to change the param `aws_credentials` from `&AwsCredentials` to `Arc<RwLock<AwsCredentials>>`. Because `aws_credentials` is not used in lots of places, the next PR becomes easier. https://datadoghq.atlassian.net/issues/SVLS-6996 https://datadoghq.atlassian.net/issues/SVLS-6998

Copilot

Pull Request Overview

This PR integrates ApiKeyFactory across Bottlecap to defer DD-API-KEY resolution until flush/send time, reducing init latency. Key changes include:

Replace direct API key strings with Arc<ApiKeyFactory> in all flushers, agents, and tests
Refactor trace/stat flusher and log flusher to initialize endpoints and headers lazily via OnceCell
Update resolve_secrets to use an async RwLock for AWS credentials and adjust related helper signatures

Reviewed Changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/metrics_integration_test.rs	Switched `FlusherConfig` to use `ApiKeyFactory`
tests/logs_integration_test.rs	Switched `LogsFlusher` instantiation to use factory
src/traces/trace_processor.rs	Made `process_traces` async and use `ApiKeyFactory`
src/traces/trace_agent.rs	Replaced stored key with factory; await per request
src/traces/stats_flusher.rs	Swapped in factory and lazily build `Endpoint`
src/secrets/decrypt.rs	Converted credentials to `Arc<RwLock<_>>` and updated calls
src/proxy/mod.rs	Changed `should_start_proxy` to take `Arc<AwsConfig>`
src/proxy/interceptor.rs	Updated interceptor to use `Arc<AwsConfig>`
src/otlp/agent.rs	Updated OTLP agent to await `process_traces`
src/logs/flusher.rs	Introduced factory plus lazy `HeaderMap` caching
src/lifecycle/invocation/span_inferrer.rs	Updated tests to pass `Arc<AwsConfig>`
src/lifecycle/invocation/processor.rs	Refactored processor to use `Arc<AwsConfig>`
Cargo.toml	Bumped `dogstatsd` revision

Copilot · 2025-07-03T15:55:20Z

+    endpoint: OnceCell<Endpoint>,
 }



Caching the Endpoint with an initial API key may lead to stale keys if ApiKeyFactory rotates or refreshes credentials. Consider regenerating or invalidating the cell when the underlying key changes.

Suggested change

endpoint: OnceCell<Endpoint>,

}

}

impl ServerlessStatsFlusher {

async fn construct_endpoint(&self) -> Endpoint {

let api_key = self.api_key_factory.get_api_key().await.to_string();

let stats_url = trace_stats_url(&self.config.site);

Endpoint {

url: hyper::Uri::from_str(&stats_url)

.expect("can't make URI from stats url, exiting"),

api_key: Some(api_key.clone().into()),

timeout_ms: self.config.flush_timeout * 1_000,

test_token: None,

}

}

}

Copilot · 2025-07-03T15:55:20Z

    config: Arc<config::Config>,
-    headers: HeaderMap,
+    api_key_factory: Arc<ApiKeyFactory>,
+    headers: OnceCell<HeaderMap>,


Storing headers including DD-API-KEY in a OnceCell will cache the first resolved key indefinitely. If the key can change at runtime, you might emit outdated headers; consider refreshing per flush or using a time-to-live.

Suggested change

headers: OnceCell<HeaderMap>,

headers: Arc<Mutex<HeaderMap>>,

Copilot · 2025-07-03T15:55:21Z


        let received_payload =
-            if let TracerPayloadCollection::V07(payload) = tracer_payload.get_payloads() {
+            if let TracerPayloadCollection::V07(payload) = tracer_payload.await.get_payloads() {


[nitpick] Awaiting the future inline in the match expression can reduce readability. It may help to .await process_traces into a local send_data variable first, then call get_payloads() on it.

lym953 · 2025-07-03T15:48:08Z

+    let aws_config = Arc::new(aws_config);
+    let aws_credentials = Arc::new(RwLock::new(aws_credentials));
+    let api_key_factory = {
+        let config = Arc::clone(&config);
+        let aws_config = Arc::clone(&aws_config);
+        let aws_credentials = Arc::clone(&aws_credentials);
+
+        Arc::new(ApiKeyFactory::new_from_resolver(Arc::new(move || {
+            let config = Arc::clone(&config);
+            let aws_config = Arc::clone(&aws_config);
+            let aws_credentials = Arc::clone(&aws_credentials);
+
+            Box::pin(async move {
+                resolve_secrets(config, aws_config, aws_credentials)
+                    .await
+                    .unwrap_or_else(|| {
+                        error!("Failed to resolve API key");
+                        String::new()
+                    })
+            })
+        })))
+    };


During INIT phase, instead of resolving API key, just initialize an API key factory.

lym953 · 2025-07-03T15:50:19Z

 #[allow(clippy::too_many_lines)]
 async fn extension_loop_active(
-    aws_config: &AwsConfig,
+    aws_config: Arc<AwsConfig>,


Changing it to Arc so it can be passed to ApiKeyFactory and shared across threads.

lym953 · 2025-07-03T15:51:12Z

    config: Arc<config::Config>,
-    headers: HeaderMap,
+    api_key_factory: Arc<ApiKeyFactory>,
+    headers: OnceCell<HeaderMap>,


Lazily initialize headers, which includes api key.

lym953 · 2025-07-03T15:52:04Z

-    aws_config: &AwsConfig,
-    aws_credentials: &mut AwsCredentials,
+    aws_config: Arc<AwsConfig>,
+    aws_credentials: Arc<RwLock<AwsCredentials>>,


A core change: added RwLock

ah and we need an RwLock here because the factory will lazily write/update this struct member?

Yes. The factory fills in aws_access_key_id, aws_secret_access_key and aws_session_token if we are in snap start.

lym953 · 2025-07-03T15:53:40Z

    config: Arc<config::Config>,
-    endpoint: Endpoint,
+    api_key_factory: Arc<ApiKeyFactory>,
+    endpoint: OnceCell<Endpoint>,


Lazily resolve endpoint, which contains api key.

lym953 · 2025-07-03T17:07:16Z

+            Box::pin(async move {
+                resolve_secrets(config, aws_config, aws_credentials)
+                    .await
+                    .expect("Failed to resolve API key")


Any better way to handle this?

Mmm ideally, if failing on resolve you'd enter the noop loop, what would happen if you added that here?

It will panic at flush time and stop the runtime, so I wonder if there's a way for the extension to stop gracefully at that time without stopping the runtime.

astuyve

Nice work Yiming! Let's get this onto self monitoring to test for a few days

duncanista · 2025-07-09T16:43:09Z

+    let api_key_factory = {
+        let config = Arc::clone(&config);
+        let aws_config = Arc::clone(&aws_config);
+        let aws_credentials = Arc::clone(&aws_credentials);
+
+        Arc::new(ApiKeyFactory::new_from_resolver(Arc::new(move || {
+            let config = Arc::clone(&config);
+            let aws_config = Arc::clone(&aws_config);
+            let aws_credentials = Arc::clone(&aws_credentials);
+
+            Box::pin(async move {
+                resolve_secrets(config, aws_config, aws_credentials)
+                    .await
+                    .expect("Failed to resolve API key")
+            })
+        })))
+    };


Wondering if we could make this a method here on in the api key resolver, really don't like the pattern of having the declaration of the same variable 3 times just because we're nesting it, given how main code has increasingly grown, it would be good to hide this in some way and document it

Done! Extracted a function create_api_key_factory()

duncanista

Great PR @lym953 !

…745) # Background Right now `SendData` is passed around across channels. # This PR Instead of passing `SendData`, pass `SendDataBuilderInfo`, which bundles `SendDataBuilder` and payload size. Just before flush, call `SendDataBuilder.build()` to build `SendData`. # Motivation DataDog/libdatadog#1140 (comment) It is suggested that the function `set_api_key()` shouldn't be added on `SendData`, but should be added on `SendDataBuilder`. Because need to call `set_api_key()` just before flush, we need to make sure the object is `SendDataBuilder` instead of `SendData` until flush time. And because we need payload size in Trace Aggregator, and `SendDataBuilder` doesn't expose this field, we need to pass it explicitly along with `SendDataBuilder`. # Next steps Update #717 #732 so that `get_api_key()` is called just before flush. # Dependency DataDog/libdatadog#1140

Fix clippy

# Context The previous PR #717 defers API key resolution from extension init stage to flush time. However, that PR doesn't well handle the failure case. - Before that PR, if resolution fails in init stage, the extension will run an idle loop. - After that PR, the extension will crash at flush time, which will kill the runtime as well, which is not desired. # What does this PR do? 1. For traces, defer key resolution from `TraceProcessor.process_traces()` to `TraceFlusher.flush()`. - (This should ideally be in the previous PR, but since that is already approved, let me add this change in this new PR.) 2. If resolution fails at flush time, then make flush a no-op, so the extension can keep running and consume events without crashing. # Dependencies 1. DataDog/serverless-components#25 2. DataDog/libdatadog#1140 # Manual Test ## Steps 1. Create a layer in sandbox 2. Apply the layer to a Lambda function 3. Set the env var `DD_API_KEY_SECRET_ARN` to an invalid value 5. Run the Lambda 6. Then set `DD_API_KEY_SECRET_ARN` to a valid value 7. Run the Lambda ## Result 1. The function was successful <img width="319" alt="image" src="https://github.com/user-attachments/assets/f8a5cb36-f678-4643-ba1c-85f41256ffa1" /> 2. The extension printed some error logs <img width="737" height="33" alt="image" src="https://github.com/user-attachments/assets/22553d24-e1f5-4ee5-9a91-0d18e3e2f297" /> <img width="603" height="186" alt="image" src="https://github.com/user-attachments/assets/e797f991-ecba-45f0-8f49-7b7b59dd9e7b" /> 3. With valid secret ARN, the Lambda runs successfully and reports to Datadog <img width="678" height="150" alt="image" src="https://github.com/user-attachments/assets/073089f8-1e9a-4728-b8d1-1db7aa85d031" /> <img width="533" height="96" alt="image" src="https://github.com/user-attachments/assets/d5f2b81c-5e02-42bc-b3ef-85e611228fc6" /> # Automated Test I didn't add any automated test because from what I see in the codebase, existing tests are usually unit tests for short functions and not for long functions that this PR touches. Please let me know if you think I should add automated tests.

# Problem When a Lambda (1) uses snap start, and (2) specifies Datadog API key using `DD_API_KEY_SECRET_ARN`, the extension will encounter a deadlock. For a `RwLock`, the extension first gets a read lock: https://github.com/DataDog/datadog-lambda-extension/blob/daf633dd003447d78261e7c371838b5af21073a1/bottlecap/src/secrets/decrypt.rs#L45 then tries to get a write lock: https://github.com/DataDog/datadog-lambda-extension/blob/daf633dd003447d78261e7c371838b5af21073a1/bottlecap/src/secrets/decrypt.rs#L65 which never finishes. This causes the function to time out. This bug was introduced in #717. # This PR Fix this bug by removing the `RwLock` usage. `AwsCredential` is only created and used once in `resolve_secrets()`, and `resolve_secrets()` is only called once, so there's no need to protect this struct with a lock. # Testing Tested on a Lambda with: - Python 3.13 runtime - snap start - using `DD_API_KEY_SECRET_ARN` Before: - The function timed out. - Data failed to be sent to Datadog. After: - The function finished without timeout. - Data was sent to Datadog successfully. # Notes Jira: https://datadoghq.atlassian.net/browse/SLES-2482

# Problem Right now `AwsConfig` has a lot of fields, including the ones related to credential: ``` pub aws_access_key_id: String, pub aws_secret_access_key: String, pub aws_session_token: String, pub aws_container_credentials_full_uri: String, pub aws_container_authorization_token: String, ``` The next PR #717 wants to lazily load API key and the credentials. To do that, for the resolver function `resolve_secrets()`, I need to change the param `aws_config` from `&AwsConfig` to `Arc<RwLock<AwsConfig>>`. Because `aws_config` is passed to many places, this change involves updating lots of functions, which is formidable. # This PR Separates these credential-related fields out from `AwsConfig` and creates a new struct `AwsCredentials` Thus, the next PR will only need to change the param `aws_credentials` from `&AwsCredentials` to `Arc<RwLock<AwsCredentials>>`. Because `aws_credentials` is not used in lots of places, the next PR becomes easier. https://datadoghq.atlassian.net/issues/SVLS-6996 https://datadoghq.atlassian.net/issues/SVLS-6998

* chore(bottlecap): make config a folder module (#242) * remove `config.rs` file * create `config/mod.rs` * move to `config/flush_strategy.rs` * move to `config/log_level.rs` * update imports * fmt * feat(bottlecap): add logs processing rules (#243) * add logs processing rules field * add `regex` crate * add `processing_rules.rs` config module * use `processing_rule` module instead * update logs `processor` to use compiled rules * update unit test * Svls 4825 support encrypted keys manual (#258) * add plumbing for aws secret manager * strip as much deps as possible * fix test * remove unused warning * reorg runner for bottlecap * fix overwriting of arch * add full error to the panic * avoid building the go agent all the time * rename module * speed up build * add simple scripts to build and publish * remove deleted call * remove changes from common scripts * resolve import conflicts * wrong file pushed * make sure permissions are right * move secret parsing after log activation * add some stat to build * add manual req for secret (still broken) * rebuild after conflict on cargo loc * automate update and call * change headers and fix signature * fix typo and small refactor * remove useless thread spawn * small refactors on deploy scripts * use access key always for signatures * the secret has to be used to sign * fix: missing newline in request * use only manual decrypt * add timed steps * add scripts to force restarts * fix launch script * refactor decrypt * cargo format and clippy * fix clippy error add formatting/clippy functinos --------- Co-authored-by: AJ Stuyvenberg <astuyve@gmail.com> * add kms handling (#261) * add kms handling * fix return value * fix test * fix kms * remove committed test file * rename * format * fmt after fix * fix conflicts * await async stuff * formatting * bubble up error converting to sdt * use box dyn for generic errors * reforamt * address other comments * remove old build file added with conflict * Svls 4978 handle secrets error (#271) * add kms handling * fix return value * fix test * fix kms * remove committed test file * rename * format * fmt after fix * fix conflicts * await async stuff * formatting * bubble up error converting to sdt * use box dyn for generic errors * reforamt * address other comments * remove old build file added with conflict * do not pass around the whole config for just the secret * fix scope and just bubble up erros * reformat * renaming * without api key, just call next loop * fix types and format * fix folder path * fix cd and returns * resolve conflicts * formatter * chore(bottlecap): log failover reason (#292) * print failover reason as json string * fmt * update key to be more verbose * Add APM tracing support (#294) * wip: tracing * feat: tracing WIP * feat: rename mini agent to trace agent * feat: fmt * feat: Fix formatting after rename * fix: remove extra tokio task * feat: allow tracing * feat: working v5 traces * feat: Update to use my branch of libdatadog so we have v5 support * feat: Update w/ libdatadog to pass trace encoding version * feat: update w/ merged libdatadog changes * feat: Refactor trace agent, reduce code duplication, enum for trace version. Pass trace provider. Manual stats flushing. Custom create endpoint until we clean up that code in libdatadog. * feat: Unify config, remove trace config. Tests pass * feat: fmt * feat: fmt * clippy fixes * parse time * feat: clippy again * feat: revert dockerfile * feat: no-default-features * feat: Remove utils, take only what we need * feat: fmt moves the import * feat: replace info with debug. Replace log with tracing lib * feat: more debug * feat: Remove call to trace utils * feat: Allow appsec but in a disabled-only state until we add support for the runtime proxy (#296) * feat: Allow appsec but in a disabled-only state until we add support for the runtime proxy * feat: Log failover reason * fix: serverless_appsec_enabled. Also log the reason * feat: Require DD_EXTENSION_VERSION: next (#302) * feat: Require DD_EXTENSION_VERSION: next * feat: add tests, fix metric tests * feat: revert metrics test byte changes * feat: fmt * feat: remove ref * feat: honor enhanced metrics bool (#307) * feat: honor enhanced metrics bool * feat: add test * feat: refactor to log instead of return result * fix: clippy * feat: warn by default (#316) * chore(bottlecap): fallback on `datadog.yaml` usage (#326) * fallback on `datadog.yaml` presence * add comment * fix(bottlecap): filter debug logs from external crates (#329) * remove `tracing-log` instead, use the `tracing-subscriber` `tracing-log` feature * capitalize debugs * remove unnecessary file * update log formatter prefix * update log filter * fmt * chore(bottlecap): switch flushing strategy to race (#318) * feat: race flush * refactor: periodic only when configured * fmt * when flushing strategy is default, set periodic flush tick to `1s` * on `End`, never flush until the end of the invocation * remove `tokio_unstable` feature for building * remove debug comment * remove `invocation_times` mod * update `flush_control.rs` * use `flush_control` in main * allow `end,<ms>` strategy allows to flush periodically over a given amount of seconds and at the end * update `debug` comment for flushing * simplify logic for flush strategy parsing * remove log that could spam debug * refactor code and add unit test --------- Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com> Co-authored-by: alexgallotta <5581237+alexgallotta@users.noreply.github.com> * remove log that might confuse customers (#333) * Fix dogstatsd multiline (#335) * test: add invalid string and multi line distro test with empty newline * test: move unit test to appropriate package * fix: do not error log for empty and new line strings --------- Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com> * add env vars to be ignored (#337) * feat: Open up more env vars which we don't rely on (#344) * feat: Allow trace disabled plugins (#348) * feat: Allow trace disabled plugins * feat: trace debug * feat: Allowlist additional env vars (#354) * feat: Allowlist additional env vars * fix: fmt * feat: and repo url * aj/allow apm replace tags array (#358) * fix: allow objects to be ignored * feat: specs * fix(bottlecap): set explicit deny list and allow yaml usage (#363) * set explicit deny list also allow `datadog.yaml` usage * add unit test for parsing rule from yaml * remove `object_ignore.rs` * remove import * remove logging failover reason when user is not opt-in * chore(bottlecap): fast failover (#371) * failover fast * typo * failover on `/opt/datadog_wrapper` set * aj/fix log level casing (#372) * feat: serde's rename_all isn't working, use a custom deserializer to lowercase loglevels * feat: default is warn * feat: Allow reptition to clear up imports * feat: rebase * feat: failover on dd proxy (#391) * feat: support HTTPS_PROXY (#381) * feat: support DD_HTTP_PROXY and DD_HTTPS_PROXY * fix: remove import * fix: fmt * feat: Revert fqdn changes to enable testing * feat: Use let instead of repeated instantiation * feat: Rip out proxy stuff we dont need but make sure we dont proxy the telemetry or runtime APIs with system proxies * feat: remove debug * fix: no debugs for hyper/h2 * fix: revert cargo changes * feat: Pin libdatadog deps to v13.1 * fix: rebase with dogstatsd 13.1 * fix: use main for dsdrs * fix: remove unwrap * fix: fmt * fix: licenses * increase size boo * fix: size ugh * fix: install_default() in tests * aj/honor both proxies in order (#410) * feat: Honor priority order of DD_PROXY_HTTPS over HTTPS_PROXY * feat: fmt * fix: Prefer Ok over some + ok * Feat: Use tags for proxy support in libdatadog * fix: no proxy for tests * fix: license * all this for a comma * accept `datadog_wrapper` * Revert "accept `datadog_wrapper`" This reverts commit 9560657582f2f22c8e68af5d0bb9d7d2b0765650. * accept `datadog_wrapper` (#373) * feat(bottlecap): create Inferred Spans baseline + infer API Gateway HTTP spans (#405) * add `Trigger` trait for inferred spans * add `ApiGatewayHttpEvent` trigger * add `SpanInferrer` * make `invocation::processor` to use `SpanInferrer` * send `aws_config` to `invocation::processor` * use incoming payload for `invocation::processor` for span inferring * add `api_gateway_http_event.json` for testing * add `api_gateway_proxy_event.json` for testing * fix: Convert tag hashmap to sorted vector of tags * fix: fmt --------- Co-authored-by: AJ Stuyvenberg <astuyve@gmail.com> * feat(bottlecap): Add Composite Trace Propagator (#413) * add `trace_propagation_style.rs` * add Trace Propagation to `config.rs` also updated unit tests, as we have custom behavior, we should check only the fields we care about in the tests * add `links` to `SpanContext` * add composite propagator also known as our internal http propagator, but in reality, http doesnt make any sense to me, its just a composite propagator which we used based on our configuration * update `TextMapPropagator`s to comply with interface also updated the naming * fmt * add unit testing for `config.rs` * add `PartialEq` to `SpanContext` * correct logic from `text_map_propagator.rs` logic was wrong in some parts, this was discovered through unit tests * add unit tests for `DatadogCompositePropagator` also corrected some logic * feat(bottlecap): add capture lambda payload (#454) * add `tag_span_from_value` * add `capture_lambda_payload` config * add unit testing for `tag_span_from_value` * update listener `end_invocation_handler` parsing should not be handled here * add capture lambda payload feature also parse body properly, and handle `statusCode` * feat(bottlecap): add Cold Start Span + Tags (#450) * add some helper functions to `invocation::lifecycle` mod * create cold start span on processor * move `generate_span_id` to father module * send `platform_init_start` data to processor * send `PlatformInitStart` to main bus * update cold start `parent_id` * fix start time of cold start span * enhanced metrics now have a `dynamic_value_tags` for tags which we have to calculate at points in time * `AwsConfig` now has a `sandbox_init_time` value * add `is_empty` to `ContextBuffer` * calculate init tags on invoke also add a method to reset processor invocation state * restart init tags on set * set tags properly for proactive init * fix unit test * remove debug line * make sure `cold_start` tag is only set in one place * feat(bottlecap): support service mapping and `peer.service` tag (#455) * add some helper functions to `invocation::lifecycle` mod * create cold start span on processor * move `generate_span_id` to father module * send `platform_init_start` data to processor * send `PlatformInitStart` to main bus * update cold start `parent_id` * fix start time of cold start span * enhanced metrics now have a `dynamic_value_tags` for tags which we have to calculate at points in time * `AwsConfig` now has a `sandbox_init_time` value * add `is_empty` to `ContextBuffer` * calculate init tags on invoke also add a method to reset processor invocation state * restart init tags on set * set tags properly for proactive init * fix unit test * remove debug line * make sure `cold_start` tag is only set in one place * add service mapping config serializer * add `service_mapping.rs` * add `ServiceNameResolver` interface for service mapping * implement interface in every trigger * send `service_mapping` lookup table to span enricher * create `SpanInferrer` with `service_mapping` config * fmt * rename failover to fallback (#465) * fix(bottlecap): fallback when otel set (#470) * fallback on otel * add unit test * feat(bottlecap): fallback on opted out only (#473) * fallback on opted out only * log on opted out * fix(bottlecap): fallback on yaml otel config (#474) * fallback on opted out only * fallback on yaml otel config * switch `legacy` to `compatibility` * feat: honor serverless_logs (#475) * feat: honor serverless_logs * fmt --------- Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com> * feat: Flush timeouts (#480) * fix version parsing for number (#492) * fix: fallback on intake urls (#495) * fallback on `dd_url`, `dd_url`, and, apm and logs intake urls * fix env var for apm url * grammar * set dogstatsd timeout (#497) * set dogstatsd timeout * add todo for other edge case * add comment on jitter. Likely not required for lambda * fmt * update license * update sha for dogstatsd --------- Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com> * fix: set right domain and arn by region on secrets manager (#511) * check whether the region is in China and use the appropriated domain * correct arn for lambda in chinese regions * fix: typo in china arn * fix: reuse function to detect right aws partition and support gov too * nest and rearrange imports * fix imports again * fix: Honor noproxy and skip proxying if ddsite is in the noproxy list (#520) * fix: Honor noproxy and skip proxying if ddsite is in the noproxy list * feat: specs * feat: Oneline check, add comment * Support proxy yaml config (#523) * fix: Honor noproxy and skip proxying if ddsite is in the noproxy list * feat: specs * feat: yaml proxy had a different format * feat: Oneline check, add comment * feat: Support nonstandard proxy config * feat: specs * fix: bad merge whoops * feat: Support snapstart's vended credentials (#532) * feat: Support snapstart's vended credentials * feat: Add snapstart events * fix: specs * feat: Mutable config as we consume it entirely by the secrets module. * fix: needless borrow * feat: add zstd and compress (#558) * feat: add zstd and compress * hack: skip clippy for a sec * feat: Honor logs config settings. * fix: dont set zstd header unless we compress * fmt * clippy * fmt * fix: ints * licenses * remove debug code * wtf clippy and fmt, pick one --------- Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com> * Svls 6036 respect timeouts (#537) * log shipping times * set flush timeout for traces * remove retries * fix conflicts * address comments * Fallback on gov regions (#550) * Aj/support pci and custom endpoints (#585) * feat: logs_config_logs_dd_url * feat: apm pci endpoints * feat: metrics * feat: support metrics using dogstatsd methods * fix: use the right var * tests: use server url override * feat: refactor into flusher method * feat: clippy * Aj/yaml apm replace tags (#602) * feat: yaml APM replace tags rule parsing * feat: Custom deserializer for replace tags. yaml -> JSON so we can rely on the same method because ReplaceRule is totally private * remove aj * feat: merge w/ libdatadog main * feat: Parse http obfuscation config from yaml * feat: licenses * feat: parse env and service as strings or ints (#608) * feat: parse env and service as strings or ints * feat: add service test * fmt * Add DSM and Profiling endpoints (#622) - **feat: Support DSM proxy endpoint** - **feat: profiling support** - **feat: add additional tags** * chore(config): parse config only twice (#651) # What? Removes `FallbackConfig` and `FallbackYamlConfig` in favor of the existing configurations. # How? 1. Using only the known places where we are going to fallback from the available configs. 2. Moved environment variables and yaml config to its own file for readability. # Notes - Added fallbacks for OTLP (in preparation for that PR, allowed some fields to not fallback). * fix: Parse DD_APM_REPLACE_TAGS env var (#656) Fixes an issue where we didn't parse `DD_APM_REPLACE_TAGS` because the yaml block includes an additional `config` word after APM, which is not present in the env var. As usual, env vars override config file settings * feat: Optionally disable proc enhanced metrics (#663) Fixes #648 For customers using very very fast/small lambda functions (usually just rust), there can be a small 1-2ms increase in runtime duration when collecing metrics like open file descriptors or tmp file usage. We still enable these by default, but customers can now optionally disable them * fix(config): serialize booleans from anything (#657) # What? Serializes any boolean with values `0|1|true|TRUE|False|false` to its boolean part. # How? Using `serde-aux` crate to leverage the unit testing and ownership. # Motivation Some products at Datadog allow this values as they coalesce them – [SVLS-6687](https://datadoghq.atlassian.net/browse/SVLS-6687) [SVLS-6687]: https://datadoghq.atlassian.net/browse/SVLS-6687?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ * chore(config): create `aws` module (#659) # What? Refactors methods related to AWS config into its own module # Motivation Just cleaning and removing stuff from main – [SVLS-6686](https://datadoghq.atlassian.net/browse/SVLS-6686) [SVLS-6686]: https://datadoghq.atlassian.net/browse/SVLS-6686?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ * feat: [SVLS-6242] bottlecap fips builds (#644) Building bottlecap with fips mode. This is entirely focused on removing `ring` (and other non-FIPS-compliant dependencies from our `fips`-featured builds.) * fix(config): remove `apm_ignore_resources` check in OTEL (#676) # What? Removes usage of `DD_APM_IGNORE_RESOURCES` in the OTEL span transform. # Why? 1. The implementation was incorrect and shouldn't check for resources to ignore in the transformation step. 2. It was not properly used in the `apm_config` for YAML files. # Notes: - Follow up PR to implement `APM_IGNORE_RESOURCES` properly in the Trace Agent. # More Learn about ignoring resources: https://docs.datadoghq.com/tracing/guide/ignoring_apm_resources/?tab=datadogyaml#ignoring-based-on-resources `DD_APM_IGNORE_RESOURCES` is specified as: ``` A list of regular expressions can be provided to exclude certain traces based on their resource name. All entries must be surrounded by double quotes and separated by commas. ``` A correct usage would be: ```env DD_APM_IGNORE_RESOURCES="(GET|POST) /healthcheck,API::NotesController#index" ``` or in yaml ```yaml apm_config: ignore_resources: ["(GET|POST) /healthcheck","API::NotesController#index"] ``` * feat(proxy): abstract lambda runtime api proxy (#669) # What? Abstracts the concept of the `proxy` from the Lambda Web Adapter implementation. This will unlock the usage of ASM. # How? Using `axum` crate, we create a new server proxy with specific routes from the Lambda Runtime API which we are interested in proxying. # Motivation ASM and [SVLS-6760](https://datadoghq.atlassian.net/browse/SVLS-6760) [SVLS-6760]: https://datadoghq.atlassian.net/browse/SVLS-6760?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ * fix(config): fix otlp trace agent to start when right configuration is set (#680) # What? Ensures that OTLP agent is only enabled when the `otlp_config_receiver_protocols_http_endpoint` is set, and when `otlp_config_traces_enabled` is `true` # Motivation #678 # Notes OTEL agent should only spin up when receiver protocols endpoint is set, so this was a miss on my side. * feat: continuous flushing strategy for high throughput functions (#684) This is a heavy refactor and new feature. - Introduces FlushDecision and separates it from FlushStrategy - Cleans up FlushControl logic and methods It also adds the ability to flush telemetry across multiple serial lambda invocations. This is done using the `continuous` strategy. This is a huge win for busy functions as seen in our test fleet, where the p99/max drops precipitously, which also causes the average to plummet. This also helps reduce the number of cold starts encountered during scaleup events, which further reduces latency along with costs: ![image](https://github.com/user-attachments/assets/14851e22-327d-43b0-8246-5780cfbf6ef7) Technical implementation: We spawn the task and collect the flush handles, then in the two periodic strategies we check if there were any errors or unresolved futures in the next flush cycle. If so, we switch to the `periodic` strategy to ensure flushing completes successfully. We don't adapt to the periodic strategy unless the last 20 invocations occurred within the `config.flush_timeout` value, which has been increased by default. This is a naive implementation. A better one would be to calculate the first derivative of the invocation periodicity. If the rate is increasing, we can adapt to the continuous strategy. If the rate slows, we should fall back to the periodic strategy. <img width="807" alt="image" src="https://github.com/user-attachments/assets/d3c25419-f1da-4774-975f-0e254047b9b7" /> The existing implementation is cautious in that we could definitely adapt sooner but don't. Todo: add a feature flag for continuous flushing? * fix: bump flush_timeout default (#697) A little goofy because we use this to determine when/how to move over to continuous flushing, but the gist is that our invocation context tracks the start time of each invocation. Because it's all local to a single sandbox, this means that the time diff between invocations includes post runtime duration, so it's very common to have 20 invocations greater than 10s if there are even a couple of periodic/end flushes in there. This customizable with `DD_FLUSH_TIMEOUT` so if people want to set it to a very short timeout, they are able to. * feat: Allow users to specify continuous strategy (#701) https://datadoghq.atlassian.net/browse/SVLS-6994 * feat: Use http2 unless overridden or using a proxy (#706) We rolled out HTTP/2 support for logs in v81, which seems to have broken logs for some users relying on proxies which may not support http2. This change introduces a new configuration option called `use_http1`. 1. If `DD_HTTP_PROTOCOL` is explicitly set to http1, we'll use it 2. If `DD_HTTP_PROTOCOL` is not set and the user is using a proxy, we'll use http1 unless overridden by the `DD_HTTP_PROTOCOL` flag being set to anything other than `http1`. fixes #705 * Dual shipping metrics support (#704) Adds support for dual shipping metrics to endpoints configured using the `additional_endpoints` YAML or `DD_ADDITIONAL_ENDPOINTS` env var config. For each configured endpoint/API key combination, we now create a separate `MetricsFlusher` to flush the same batch of metrics to multiple endpoints in parallel. Also, updates the retry logic to retry flushing for the specific flusher that encountered an error. Tested dual shipping metrics to 2 additional orgs/endpoints including eu1. Depends on dogstatsd changes: https://github.com/DataDog/serverless-components/pull/20 * chore: Separate AwsCredentials from AwsConfig (#716) # Problem Right now `AwsConfig` has a lot of fields, including the ones related to credential: ``` pub aws_access_key_id: String, pub aws_secret_access_key: String, pub aws_session_token: String, pub aws_container_credentials_full_uri: String, pub aws_container_authorization_token: String, ``` The next PR https://github.com/DataDog/datadog-lambda-extension/pull/717 wants to lazily load API key and the credentials. To do that, for the resolver function `resolve_secrets()`, I need to change the param `aws_config` from `&AwsConfig` to `Arc<RwLock<AwsConfig>>`. Because `aws_config` is passed to many places, this change involves updating lots of functions, which is formidable. # This PR Separates these credential-related fields out from `AwsConfig` and creates a new struct `AwsCredentials` Thus, the next PR will only need to change the param `aws_credentials` from `&AwsCredentials` to `Arc<RwLock<AwsCredentials>>`. Because `aws_credentials` is not used in lots of places, the next PR becomes easier. https://datadoghq.atlassian.net/issues/SVLS-6996 https://datadoghq.atlassian.net/issues/SVLS-6998 * chore(config): separate config from sources (#709) # What? Separates the configuration from sources, allowing it to be used in more use cases. # How? Creates new default configuration and separates the environment variables and YAML sources from the default. # Why? Make it easier to track changes in every source, as the field names might be different to what they are used at the configuration level. # Notes I expect to abstract this even more by providing it as a crate which can have features, that way customers can only use the sources and product specific fields they need. --------- Co-authored-by: Aleksandr Pasechnik <aleksandr.pasechnik@datadoghq.com> Co-authored-by: Florentin Labelle <florentin.labelle@outlook.fr> * Dual Shipping Logs Support (#718) Adds support for dual shipping metrics to endpoints configured using the `logs_config` YAML or `DD_LOGS_CONFIG_ADDITIONAL_ENDPOINTS` env var config. Implemented a `LogsFlusher` as a wrapper to all the `Flusher` instances to manages flushing to all configured endpoints. Moved retry logic to `LogsFlusher`, as the retry request contains the endpoint details and does not have to be tied to a particular flusher. --------- Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com> * chore: upgrade rust version for toolchain to 1.84.1 (#743) # This PR 1. In `rust-toolchain.toml`, upgrade Rust version from `1.81.0` to `1.84.1`. 2. Fix/mute clippy errors caused by the upgrade - some errors require non-trivial code changes, so I muted them for now and added a TODO to fix them in separate PRs. # Motivation `libdatadog` now uses `1.84.1` https://github.com/DataDog/libdatadog/blame/main/Cargo.toml#L62 To test changes on `libdatadog`, I need to change the Rust version in `datadog-lambda-extension` to 1.84.1 as well. Making this a separate PR: 1. so it's easier to test multiple PRs that depend on changes on `libdatadog` in parallel after I merge this PR to main. 4. because this PR also involves lots of code changes needed to make clippy happy * feat: dual shipping APM support (#735) Adds support for dual shipping traces to endpoints configured using the `apm_config` YAML or `DD_APM_CONFIG_ADDITIONAL_ENDPOINTS` env var config. #### Additional Notes: - Bumped libdatadog (and serverless-components) to include https://github.com/DataDog/libdatadog/pull/1139 - Adds configuration option to set compression level for trace payloads * chore: Add doc and rename function for flushing strategy (#740) # Motivation It took me quite some effort to understand flushing strategies. I want to make it easier to understand for me and future developers. # This PR Tries to make flushing strategy code more readable: 1. Add/move comments 2. Create an enum `ConcreteFlushStrategy`, which doesn't contain `Default` because it is required to be resolved to a concrete strategy 3. Rename `should_adapt` to `evaluate_concrete_strategy()` # To reviewers There are still a few things I don't understand, which are marked with `TODO`. Appreciate explanation! Also correct me if any comment I added is wrong. * chore: upgrade to edition 2024 and fix all linter warnings (#754) Also updates CI to run `clippy` on `--all-targets` so that linter errors aren't ignored on side targets such as tests. * fix(apm): Enhance Synthetic Span Service Representation (#751)  ### What does this PR do?  Rollout of span naming changes to align serverless product with tracer to create streamlined Service Representation for Serverless Key Changes: - Change service name to match instance name for all managed services (aws.lambda -> lambda name, etc) (breaking) - Opt out via `DD_TRACE_AWS_SERVICE_REPRESENTATION_ENABLED` - Add `span.kind:server` on synthetic spans made via span-inferrer, cold start and lambda invocation spans - Remove `_dd.base_service` tags on synthetic spans to avoid unintentional service override ### Motivation  Improve Service Map for Serverless. This allows for synthetic spans to have their own service on the map which connects with the inferred spans from the tracer side. * feat: port of Serverless AAP from Go to Rust (#755) # What? Ports the Serverless App & API Protection feature (AAP, also known as Serverless AppSec) from the Go extension to Rust. This is using https://github.com/DataDog/libddwaf-rust to provide bindings to the in-app WAF. This provides enhanced support for API Protection (notably, response schema collection) compared to the Go version. Tradeoff is that XML request and response security processing is not currently supported in this version (it was in Go, but likely seldom used). This introduces a `bottlecap::appsec::processor::Processor` that is integrated in the `bottlecap::proxy::Interceptor` (for request & response acquisition) as well as in the `bottlecap::trace_processor::TraceProcessor` (to decorate the `aws.lambda` span with security data). # Why? We plan on decommissioning the Go version of the agent and a tracer-side version of the Serverless AAP feature will not be available across all supported language runtimes before several weeks/months. Also [SVLS-5286](https://datadoghq.atlassian.net/browse/SVLS-5286) # Notes [SVLS-5286]: https://datadoghq.atlassian.net/browse/SVLS-5286?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ --------- Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com> * feat: No longer launch Go-based agent for compatibility/OTLP/AAP config (#788) https://datadoghq.atlassian.net/browse/SVLS-7398 - As part of coming release, bottlecap agent no longer launches Go-based agent when compatibility/AAP/OTLP features are active - Emit the same metric when detecting any of above configuration - Update corresponding unit tests Manifests: - [Test lambda function](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ltn1-fullinstrument-bn-cold-python310-lambda?code=&subtab=envVars&tab=testing) with [logs](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn1-fullinstrument-bn-cold-python310-lambda/log-events/2025$252F08$252F21$252F$255B$2524LATEST$255Df3788d359677452dad162488ff15456f$3FfilterPattern$3Dotel) showing compatibility/AAP/OTPL are enabled <img width="2260" height="454" alt="image" src="https://github.com/user-attachments/assets/5dfd4954-5191-4390-83f5-a8eb3bffb9d3" /> - [Logging](https://app.datadoghq.com/logs/livetail?query=functionname%3Altn1-fullinstrument-bn-cold-python310-lambda%20Metric&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&fromUser=true&messageDisplay=inline&refresh_mode=paused&storage=driveline&stream_sort=desc&viz=stream&from_ts=1755787655569&to_ts=1755787689060&live=false) <img width="1058" height="911" alt="image" src="https://github.com/user-attachments/assets/629f75d1-e115-4478-afac-ad16d9369fa7" /> - [Metric](https://app.datadoghq.com/screen/integration/aws_lambda_enhanced_metrics?fromUser=false&fullscreen_end_ts=1755788220000&fullscreen_paused=true&fullscreen_refresh_mode=paused&fullscreen_section=overview&fullscreen_start_ts=1755787200000&fullscreen_widget=2&graph-explorer__tile_def=N4IgbglgXiBcIBcD2AHANhAzgkAaEAxgK7ZIC2A%2BhgHYDWmcA2gLr4BOApgI5EfYOxGoTphRJqmDhQBmSNmQCGOeJgIK0CtnhA8ObCHyagAJkoUVMSImwIc4IMhwT6CDfNQWP7utgE8AjNo%2BvvaYRGSwpggKxkgA5gB0kmxgemh8mAkcAB4IHBIQ4gnSChBoSKlswAAkCgDumBQKBARW1Ai41ZxxhdSd0kTUBAi9AL4ABABGvuPAA0Mj4h6OowkKja2DCAAUAJTaCnFx3UpyoeEgo6wgsvJEGgJCN3Jk9wrevH6BV-iWbMqgTbtOAAJgADPg5MY9BRpkZEL4UHZ4LdXhptBBqNDsnAISAoXp7NDVJdmKMfiBsL50nBgOSgA&refresh_mode=sliding&from_ts=1755783890661&to_ts=1755787490661&live=true) <img width="1227" height="1196" alt="image" src="https://github.com/user-attachments/assets/2922eb54-9853-4512-a902-dfa97916b643" /> * Revert "feat: No longer launch Go-based agent for compatibility/OTLP/AAP config (#788)" This reverts commit 0f5984571eb842e5ce1cbadbec0f92d73befcd08. * Ignoring Unwanted Resources in APM (#794) ## Task https://datadoghq.atlassian.net/browse/SVLS-6846 ## Overview We want to allow users to set filter tags which drops traces with root spans that match specified span tags. Specifically, users can set `DD_APM_FILTER_TAGS_REQUIRE` or `DD_APM_FILTER_TAGS_REJECT`. More info [here](https://docs.datadoghq.com/tracing/guide/ignoring_apm_resources/?tab=datadogyaml#trace-agent-configuration-options). ## Testing Deployed changes to Lambda. Invoked Lambda directly and through API Gateway to check with different root spans. Set the tags to either be REQUIRE or REJECT with value `name:aws.lambda`. Confirmed in logs and UI that we were dropping spans. * feat: eat: Add hierarchical configurable compression levels (#800) feat: Add hierarchical configurable compression levels - Add global compression_level config parameter (0-9, default: 6) with fallback hierarchy - Support 2-level compression configuration: global level first, then module-specific - This makes configuration more convenient - set once globally or override per module - Apply compression configuration to metrics flushers and trace processor - Add environment variable DD_COMPRESSION_LEVEL for global setting Test - Configuration: <img width="966" height="742" alt="image" src="https://github.com/user-attachments/assets/b33c0fd3-2b02-4838-8660-fc9ea9493998" /> - ([log](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn1-fullinstrument-bn-cold-python310-lambda/log-events/2025$252F08$252F25$252F$255B$2524LATEST$255D9c19719435bc48839f6f005d2b58b552)) Configuration: <img width="965" height="568" alt="image" src="https://github.com/user-attachments/assets/dfef594a-549f-4773-879d-549234f03fb7" /> * cherry pick: No longer launch Go-based agent for compatibility/OTLP/AAP config (#817) Cherry pick of previously reverted #788 https://datadoghq.atlassian.net/browse/SVLS-7398 - As part of coming release, bottlecap agent no longer launches Go-based agent when compatibility/AAP/OTLP features are active - Emit the same metric when detecting any of above configuration - Update corresponding unit tests Attention: it is an known issue with .Net https://github.com/aws/aws-lambda-dotnet/issues/2093 Manifests: - [Test lambda function](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ltn1-fullinstrument-bn-cold-python310-lambda?code=&subtab=envVars&tab=testing) with [logs](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn1-fullinstrument-bn-cold-python310-lambda/log-events/2025$252F08$252F21$252F$255B$2524LATEST$255Df3788d359677452dad162488ff15456f$3FfilterPattern$3Dotel) showing compatibility/AAP/OTPL are enabled <img width="2260" height="454" alt="image" src="https://github.com/user-attachments/assets/5dfd4954-5191-4390-83f5-a8eb3bffb9d3" /> - [Logging](https://app.datadoghq.com/logs/livetail?query=functionname%3Altn1-fullinstrument-bn-cold-python310-lambda%20Metric&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&fromUser=true&messageDisplay=inline&refresh_mode=paused&storage=driveline&stream_sort=desc&viz=stream&from_ts=1755787655569&to_ts=1755787689060&live=false) <img width="1058" height="911" alt="image" src="https://github.com/user-attachments/assets/629f75d1-e115-4478-afac-ad16d9369fa7" /> - [Metric](https://app.datadoghq.com/screen/integration/aws_lambda_enhanced_metrics?fromUser=false&fullscreen_end_ts=1755788220000&fullscreen_paused=true&fullscreen_refresh_mode=paused&fullscreen_section=overview&fullscreen_start_ts=1755787200000&fullscreen_widget=2&graph-explorer__tile_def=N4IgbglgXiBcIBcD2AHANhAzgkAaEAxgK7ZIC2A%2BhgHYDWmcA2gLr4BOApgI5EfYOxGoTphRJqmDhQBmSNmQCGOeJgIK0CtnhA8ObCHyagAJkoUVMSImwIc4IMhwT6CDfNQWP7utgE8AjNo%2BvvaYRGSwpggKxkgA5gB0kmxgemh8mAkcAB4IHBIQ4gnSChBoSKlswAAkCgDumBQKBARW1Ai41ZxxhdSd0kTUBAi9AL4ABABGvuPAA0Mj4h6OowkKja2DCAAUAJTaCnFx3UpyoeEgo6wgsvJEGgJCN3Jk9wrevH6BV-iWbMqgTbtOAAJgADPg5MY9BRpkZEL4UHZ4LdXhptBBqNDsnAISAoXp7NDVJdmKMfiBsL50nBgOSgA&refresh_mode=sliding&from_ts=1755783890661&to_ts=1755787490661&live=true) <img width="1227" height="1196" alt="image" src="https://github.com/user-attachments/assets/2922eb54-9853-4512-a902-dfa97916b643" /> ==== Another manifest for .Net: - [Lambda function](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ltn1-fullinstrument-bn-cold-dotnet6-lambda?code=&subtab=envVars&tab=testing) - [Log](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn1-fullinstrument-bn-cold-dotnet6-lambda/log-events/2025$252F08$252F29$252F$255B$2524LATEST$255D15ca867ee94049129ed461283ae46f01$3FfilterPattern$3Dfailover) - Configuration <img width="1490" height="902" alt="image" src="https://github.com/user-attachments/assets/b070e5e1-8335-4494-877f-6475d9959af2" /> - Log shows the issue reasons <img width="990" height="536" alt="image" src="https://github.com/user-attachments/assets/5503de33-ea92-401c-a595-c165e39b0c6e" /> <img width="848" height="410" alt="image" src="https://github.com/user-attachments/assets/54d1e87c-93e7-4084-8a9a-173cb7d0c4a7" /> <img width="938" height="458" alt="image" src="https://github.com/user-attachments/assets/4f205ec2-d923-47d1-9005-762650798894" /> --------- Co-authored-by: Tianning Li <tianning.li@datadoghq.com> * feat: [Trace Stats] Add feature flag DD_COMPUTE_TRACE_STATS (#841) ## This PR Adds a feature flag `DD_COMPUTE_TRACE_STATS`. - If true, trace stats will be computed from the extension side. In this case, we set `_dd.compute_stats` to `0`, so trace stats won't be computed on the backend. - If false, trace stats will NOT be computed from the extension side. In this case, we set `_dd.compute_stats` to `1`, so trace stats will be computed on the backend. - Defaults to false for now, so `_dd.compute_stats` still defaults to `1`, i.e. default behavior is not changed. - After we fully support computing trace stats on extension side, I will change the default to true then delete the flag. Jira: https://datadoghq.atlassian.net/browse/SVLS-7593 * fix: use tokio time instead of std time because tokio time can be frozen (#846) Tokio time allows us to pause or sleep without blocking the runtime. It also allows time to be paused (mainly for tests). I think we may need the sleep to force blocking code to yield --------- Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com> * add support for observability pipeline (#826) ## Task https://datadoghq.atlassian.net/jira/software/c/projects/SVLS/boards/5420?quickFilter=7573&selectedIssue=SVLS-7525 ## Overview * Add support for sending logs to an Observability Pipeline instead of directly to Datadog. * To enable, customers must set `DD_ENABLE_OBSERVABILITY_PIPELINE_FORWARDING` to true, and `DD_LOGS_CONFIG_LOGS_DD_URL` to their Observability Pipeline endpoint. Will fast follow and update docs to reflect this. * Initially, I was using setting up the observability pipeline with 'Datadog Agent' as the source. This required us to format the log message in a certain format. However, chatting with the Observability Pipeline Team, they actually recommend we use 'Http Server' as the source for our pipeline setup instead since this just accepts any json. ## Testing Created an [observability pipeline](https://ddserverless.datadoghq.com/observability-pipelines/b15e4a64-880d-11f0-b622-da7ad0900002/view) and deployed a lambda function with the changes. Triggered the lambda function and confirmed we see it in our [logs](https://ddserverless.datadoghq.com/logs?query=function_arn%3A%22arn%3Aaws%3Alambda%3Aus-east-1%3A425362996713%3Afunction%3Aobcdkstackv3-hellofunctionv3ec5a2fbe-l9qvtrowzb5q%22&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&messageDisplay=inline&refresh_mode=sliding&storage=hot&stream_sort=desc&viz=stream&from_ts=1758196420534&to_ts=1758369220534&live=true). We know it is going through the observability pipeline because we can see an attached 'http_server' attached as the source type. * feat: lower zstd default compression (#867) A quick test run showed our max duration skews on smaller lambda sizes with lots of data setting the zstd compression level to 6. Looks like we start to block the CPU at around thi smark. Gonna default it to 3, as tested below with 3 500k runs. <img width="1293" height="319" alt="image" src="https://github.com/user-attachments/assets/d1224676-f14f-4a55-8440-089bb9ff91d0" /> * revert(#817): reverts fallback config (#871) # What? This reverts commit 2396c4fe102677179c834c2dd65cb5b2ea79ca8f from #817 # Why? Need a release # Notes We'll cherry pick and bring it back at some point * chore: [Trace Stats] Rename env var DD_COMPUTE_TRACE_STATS (#875) # This PR As @apiarian-datadog suggested in https://github.com/DataDog/datadog-lambda-extension/pull/841#discussion_r2376111825, rename the feature flag `DD_COMPUTE_TRACE_STATS` to `DD_COMPUTE_TRACE_STATS_ON_EXTENSION` for clarity. # Notes Jira: https://datadoghq.atlassian.net/browse/SVLS-7593 * feat: remove failover to go (#882) Removes the failover to Go. If we can't parse any of the config options we log the failing value and move on with the default specified. * fix: use datadog as default propagation style if supplied version is malformed (#891) Fixes an issue where config parsing fails if this is invalid * fix: use None if propagation style is invalid (#895) After internal discussion we determined that the tracing libraries use None of the trace propagation style is invalid or malformed. This brings us into alignment. * feat: Support periodic reload for api key secret (#893) # This PR Supports the env var `DD_API_KEY_SECRET_RELOAD_INTERVAL`, in seconds. It applies when Datadog API Key is set using `DD_API_KEY_SECRET_ARN`. For example: - if it's `120`, then api key will be reloaded about every 120 seconds. Note that reload can only be triggered when api key is used, usually when data is being flushed. If there is no invocation and no data needs to be flushed, then reload won't happen. - If it's not set or set to `0`, then api key will only be loaded once the first time it is used, and won't be reloaded. # Motivation Some customers regularly rotate their api key in a secret. We need to provide a way for them to update our cached key. https://github.com/DataDog/datadog-lambda-extension/issues/834 # Testing ## Steps 1. Set the env var `DD_API_KEY_SECRET_RELOAD_INTERVAL` to `120` 2. Invoke the Lambda every minute ## Result The reload interval is passed to the `ApiKeyFactory` <img width="711" height="25" alt="image" src="https://github.com/user-attachments/assets/6fcc5081-accb-4928-8fa7-094d36aa2fa1" /> Reload happens roughly every 120 seconds. It's sometimes longer than 120 seconds due to the reason explained above. <img width="554" height="252" alt="image" src="https://github.com/user-attachments/assets/3fa78249-ff98-47d2-a953-f090630bbeb1" /> # Notes to Users When you use this env var, please also keep a grace period for the old api key after you update the secret to the new key, and make the grace period longer than the reload interval to give the extension sufficient time to reload the secret. # Internal Notes Jira: https://datadoghq.atlassian.net/browse/SVLS-7572 * [SVLS-7885] update tag splitting to allow for ',' and ' ' (#916) ## Overview We currently split the`DD_TAGS` only by `,`. Customer is asking if we can also split by spaces since that is common for container images and lambda lets you deploy images. (https://docs.datadoghq.com/getting_started/tagging/assigning_tags/?tab=noncontainerizedenvironments) * [SLES-2547] add metric namespace for DogStatsD (#920) Follow up from https://github.com/DataDog/serverless-components/pull/48 What does this PR do? Add support for DD_STATSD_METRIC_NAMESPACE. Motivation This was brought up by a customer, they noticed issues migrating to bottlecap. Our docs show we should support this, but we currently don't have it implemented - https://docs.datadoghq.com/serverless/guide/agent_configuration/#dogstatsd-custom-metrics. Additional Notes Requires changes in agent/extension. Will follow up with those PRs. Describe how to test/QA your changes Deployed changes to extension and tested with / without the custom namespace env variable. Confirmed that metrics are getting the prefix attached, [metrics](https://ddserverless.datadoghq.com/metric/explorer?fromUser=false&graph_layout=stacked&start=1762783238873&end=1762784138873&paused=false#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGABuMMiZeBoIOKoAdPJYTFJNcMRwtRIdmfgiCMAAVDwgfKCR0bmxWABMickIqel4WTl5iIXFIHPlVcgAVjiMIk3TmvIY2U219Y0tbYwdXT0EkucDeEOj4zwAXSornceEwoXCINUYIwMVK8QmFFAUJhcJ0CwmQJA9SwaByoGueIQCE2UBwMCcmXBGggmUSaFEcCcckUynSDKg9MZTnoTGUIjcHjQiKSEHsmCwzIUmwZIiUgJ4fGx8gZCAAwlJhDAUCIwWgeEA) * refactor: Move metric namespace validation to dogstatsd util (#921) https://datadoghq.atlassian.net/browse/SLES-2547 - Updates dependency to use centralized parse_metric_namespace function. - Removes duplicate code in favor of the shared implementation. Test: - Deploy the extension and config w/ [DD_STATSD_METRIC_NAMESPACE](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ltn-fullinstrument-bn-10bst-node22-lambda?subtab=envVars&tab=configure) <img width="964" height="290" alt="image" src="https://github.com/user-attachments/assets/94836a3a-9905-44b4-9565-185745e47981" /> - Invoke the function and expect to see the metric using this custom prefix namespace <img width="1170" height="516" alt="Screenshot 2025-11-11 at 4 59 57 PM" src="https://github.com/user-attachments/assets/0bf4ac5e-ac1c-4cfe-817e-89b004717caf" /> [Metric link](https://ddserverless.datadoghq.com/metric/explorer?fromUser=true&graph_layout=stacked&start=1762897808375&end=1762898083375&paused=true#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGABuMMhsaGg4YG5oUAB0WmiCLapS4m6iMMAAVDwgPAC6VBpyaDmg8hgzCAg5STgwTpmYGhoQmYloonBOcorK6QdQ+4dO9EzKIm4eaKP8EPaYWMcKKwciSuM8Pggd7iADCUmEMBQIjwdR4QA) * [SVLS-7704] add support for SSM Parameter API key (#924) ## Overview * Add support for customers storing Datadog API Key in SSM Parameter Store. ## Testing * Deployed changes and confirmed this work with Parameter Store String and SecureString. * feat: Add support for DD_LOGS_ENABLED as alias for DD_SERVERLESS_LOGS_ENABLED (#928) https://datadoghq.atlassian.net/browse/SVLS-7818 ## Overview Add DD_LOGS_ENABLED environment variable and YAML config option as an alias for DD_SERVERLESS_LOGS_ENABLED. Both variables now use OR logic, meaning logs are enabled if either variable is set to true. Changes: - Add logs_enabled field to EnvConfig and YamlConfig structs - Implement OR logic in merge_config functions: logs are enabled if either DD_LOGS_ENABLED or DD_SERVERLESS_LOGS_ENABLED is true - Add comprehensive test coverage with 9 test cases covering all combinations of the two variables - Maintain backward compatibility with existing configurations - Default value remains true when neither variable is set ## Testing Set DD_LOGS_ENABLED and DD_SERVERLESS_LOGS_ENABLED to false and expect: - [Log can be found in AWS console](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn-fullinstrument-bn-cold-node22-lambda/log-events/2025$252F11$252F13$252F$255B$2524LATEST$255D455478dcbc944055b5be933e2e099f6a$3FfilterPattern$3DREPORT+RequestId) - [Log could NOT be found in DD console](https://ddserverless.datadoghq.com/logs?query=source%3Alambda%20%40lambda.arn%3A%22arn%3Aaws%3Alambda%3Aus-east-1%3A425362996713%3Afunction%3Altn-fullinstrument-bn-cold-node22-lambda%22%20AND%20%22REPORT%20RequestId%22&agg_m=count&agg_m_source=base&agg_t=count&clustering_pattern_field_path=message&cols=host%2Cservice%2C%40lambda.request_id&fromUser=true&messageDisplay=inline&refresh_mode=paused&storage=hot&stream_sort=desc&viz=stream&from_ts=1763063694206&to_ts=1763065424700&live=false) Otherwise the log should be available in DD console. * chore: Upgrade libdatadog and construct http client for traces (#917) Upgrade libdatadog. Including: - Rename a few creates: - `ddcommon` -> `libdd-common` - `datadog-trace-protobuf` -> `libdd-trace-protobuf` - `datadog-trace-utils` -> `libdd-trace-utils` - `datadog-trace-normalization` -> `libdd-trace-normalization` - `datadog-trace-stats` -> `libdd-trace-stats` - Use the new API to send traces, which takes in an http_client object instead of proxy url string GitHub issue: https://github.com/DataDog/datadog-lambda-extension/issues/860 Jira: https://datadoghq.atlassian.net/browse/SLES-2499 Slack discussion: https://dd.slack.com/archives/C01TCF143GB/p1762526199549409 * Merge Lambda Managed Instance feature branch (#947) https://datadoghq.atlassian.net/browse/SVLS-8080 ## Overview Merge Lambda Managed Instance feature branch ## Testing Covered by individual commits Co-authored-by: shreyamalpani <shreya.malpani@datadoghq.com> Co-authored-by: duncanista <30836115+duncanista@users.noreply.github.com> Co-authored-by: astuyve <aj.stuyvenberg@datadoghq.com> Co-authored-by: jchrostek-dd <john.chrostek@datadoghq.com> Co-authored-by: tianning.li <tianning.li@datadoghq.com> * fix(config): support colons in tag values (URLs, etc.) (#953) https://datadoghq.atlassian.net/browse/SVLS-8095 ## Overview Tag parsing previously used split(':') which broke values containing colons like URLs (git.repository_url:https://...). Changed to usesplitn(2, ':') to split only on the first colon, preserving the rest as the value. Changes: - Add parse_key_value_tag() helper to centralize parsing logic - Refactor deserialize_key_value_pairs to use helper - Refactor deserialize_key_value_pair_array_to_hashmap to use helper - Add comprehensive test coverage for URL values and edge cases ## Testing unit test and expect e2e tests to pass Co-authored-by: tianning.li <tianning.li@datadoghq.com> * [SVLS-7934] feat: Support TLS certificate for trace/stats flusher (#961) ## Problem A customer reported that their Lambda is behind a proxy, and the Rust-based extension can't send traces to Datadog via the proxy, while the previous go-based extension worked. ## This PR Supports the env var `DD_TLS_CERT_FILE`: The path to a file of concatenated CA certificates in PEM format. Example: `DD_TLS_CERT_FILE=/opt/ca-cert.pem`, so the when the extension flushes traces/stats to Datadog, the HTTP client created can load and use this cert, and connect the proxy properly. ## Testing ### Steps 1. Create a Lambda in a VPC with an NGINX proxy. 2. Add a layer to the Lambda, which includes the CA certificate `ca-cert.pem` 3. Set env vars: - `DD_TLS_CERT_FILE=/opt/ca-cert.pem` - `DD_PROXY_HTTPS=http://10.0.0.30:3128`, where `10.0.0.30` is the private IP of the proxy EC2 instance - `DD_LOG_LEVEL=debug` 4. Update routing rules of security groups so the Lambda can reach `http://10.0.0.30:3128` 5. Invoke the Lambda ### Result **Before** Trace flush failed with error logs: > DD_EXTENSION | ERROR | Max retries exceeded, returning request error error=Network error: client error (Connect) attempts=1 DD_EXTENSION | ERROR | TRACES | Request failed: No requests sent **After** Trace flush is successful: > DD_EXTENSION | DEBUG | TRACES | Flushing 1 traces DD_EXTENSION | DEBUG | TRACES | Added root certificate from /opt/ca-cert.pem DD_EXTENSION | DEBUG | TRACES | Proxy connector created with proxy: Some("http://10.0.0.30:3128") DD_EXTENSION | DEBUG | Sending with retry url=https://trace.agent.datadoghq.com/api/v0.2/traces payload_size=1120 max_retries=1 DD_EXTENSION | DEBUG | Received response status=202 Accepted attempt=1 DD_EXTENSION | DEBUG | Request succeeded status=202 Accepted attempts=1 DD_EXTENSION | DEBUG | TRACES | Flushing took 1609 ms ## Notes This fix only covers trace flusher and stats flusher, which use `ServerlessTraceFlusher::get_http_client()` to create the HTTP client. It doesn't cover logs flusher and proxy flusher, which use a different function (http.rs:get_client()) to create the HTTP client. However, logs flushing was successful in my tests, even if no certificate was added. We can come back to logs/proxy flusher if someone reports an error. * chore: Upgrade libdatadog (#964) ## Overview The crate `datadog-trace-obfuscation` has been renamed as `libdd-trace-obfuscation`. This PR updates this dependency. ## Testing * [SVLS-8211] feat: Add timeout for requests to span_dedup_service (#986) ## Problem Span dedup service sometimes fails to return the result and thus logs the error: > DD_EXTENSION | ERROR | Failed to send check_and_add response: true I see this error in our Self Monitoring and a customer's account. Also I believe it causes extension to fail to receive traces from the tracer, causing missing traces. This is because the caller of span dedup is in `process_traces()`, which is the function that handles the tracer's HTTP request to send traces. If this function fails to get span dedup result and gets stuck, the HTTP request will time out. ## This PR While I don't yet know what causes the error, this PR adds a patch to mitigate the impact: 1. Change log level from `error` to `warn` 2. Add a timeout of 5 seconds to the span dedup check, so that if the caller doesn't get an answer soon, it defaults to treating the trace as not a duplicate, which is the most common case. ## Testing To merge this PR then check log in self monitoring, as it's hard to run high-volume tests in self monitoring from a non-main branch. * [SVLS-8150] fix(config): ensure logs intake URL is correctly prefixed (#1021) ## Overview Ensures `DD_LOGS_CONFIG_LOGS_DD_URL` is correctly prefixed with `https://` ## Testing Manually tested that logs get sent to alternate logs intake * chore(deps): upgrade dogstatsd (#1020) ## Overview Continuation of #1018 removing unnecessary mut lock on callers for dogstatsd * chore(deps): upgrade rust to `v1.93.1` (#1034) ## What? Upgrade rust to latest stable 1.93.1 ## Why? `time` vulnerability fix is only available on rust >= 1.88.0 * feat(http): allow skip ssl validation (#1064) ## Overview Add DD_SKIP_SSL_VALIDATION support, parsed from both env and YAML, matching the datadog-agent's behavior — applied to all outgoing HTTP clients (reqwest via danger_accept_invalid_certs, hyper via custom ServerCertVerifier). ## Motivation Customers in environments with corporate proxies or custom CA setups need the ability to disable TLS certificate validation, matching the existing datadog-agent config option. The Go agent applies tls.Config{InsecureSkipVerify: true} to all HTTP transports via a central CreateHTTPTransport() — we mirror this by wiring the config through to both client builders. And [SLES-2710](https://datadoghq.atlassian.net/browse/SLES-2710) ## Changes Config (config/mod.rs, config/env.rs, config/yaml.rs): - Add skip_ssl_validation: bool to Config, EnvConfig, and YamlConfig with default false reqwest client (http.rs): - .danger_accept_invalid_certs(config.skip_ssl_validation) on the shared client builder hyper client (traces/http_client.rs): - Custom NoVerifier implementing rustls::client::danger::ServerCertVerifier that accepts all certificates - Uses CryptoProvider::get_default() (not hardcoded aws_lc_rs) for FIPS-safe signature scheme reporting - New skip_ssl_validation parameter on create_client() ## Testing Unit tests and self monitoring [SLES-2710]: https://datadoghq.atlassian.net/browse/SLES-2710?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ * add Cargo.toml for datadog-agent-config * update licenses * remove aws.rs from datadog-agent-config * chore: upgrade workspace rust edition to 2024 (#96) * upgrade rust edition to 2024 for workspace * apply formatting --------- Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com> Co-authored-by: alexgallotta <5581237+alexgallotta@users.noreply.github.com> Co-authored-by: AJ Stuyvenberg <astuyve@gmail.com> Co-authored-by: Nicholas Hulston <nicholashulston@gmail.com> Co-authored-by: Aleksandr Pasechnik <aleksandr.pasechnik@datadoghq.com> Co-authored-by: shreyamalpani <shreya.malpani@datadoghq.com> Co-authored-by: Yiming Luo <yiming.luo@datadoghq.com> Co-authored-by: Florentin Labelle <florentin.labelle@outlook.fr> Co-authored-by: Romain Marcadier <romain.muller@telecomnancy.net> Co-authored-by: Zarir Hamza <zarir.hamza@datadoghq.com> Co-authored-by: Romain Marcadier <romain.marcadier@datadoghq.com> Co-authored-by: Tianning Li <tianning.li@datadoghq.com> Co-authored-by: jchrostek-dd <john.chrostek@datadoghq.com> Co-authored-by: astuyve <aj.stuyvenberg@datadoghq.com>

lym953 mentioned this pull request Jun 27, 2025

feat: Add helper functions to ApiKeyFactory struct DataDog/serverless-components#24

Merged

lym953 force-pushed the yiming.luo/separate-aws-creds branch from f749307 to 11c1c77 Compare June 27, 2025 17:35

lym953 mentioned this pull request Jun 30, 2025

chore: Separate AwsCredentials from AwsConfig #716

Merged

Base automatically changed from yiming.luo/separate-aws-creds to main July 2, 2025 21:11

lym953 force-pushed the yiming.luo/lazy-api-key-3 branch from fdd93a4 to 4ac16ad Compare July 2, 2025 21:31

lym953 requested a review from Copilot July 3, 2025 15:52

Copilot AI reviewed Jul 3, 2025

View reviewed changes

lym953 commented Jul 3, 2025

View reviewed changes

lym953 marked this pull request as ready for review July 3, 2025 17:09

lym953 requested a review from a team as a code owner July 3, 2025 17:09

lym953 force-pushed the yiming.luo/lazy-api-key-3 branch from d623d67 to 3724235 Compare July 7, 2025 17:02

lym953 mentioned this pull request Jul 8, 2025

feat: Handle API key resolution failure #732

Merged

astuyve approved these changes Jul 9, 2025

View reviewed changes

lym953 force-pushed the yiming.luo/lazy-api-key-3 branch from cec3247 to 727d04f Compare July 9, 2025 15:47

duncanista reviewed Jul 9, 2025

View reviewed changes

duncanista approved these changes Jul 9, 2025

View reviewed changes

lym953 force-pushed the yiming.luo/lazy-api-key-3 branch from 3be2928 to e820be0 Compare July 10, 2025 20:18

lym953 changed the base branch from main to jordan.gonzalez/trace-agent/aggregation-for-proxy July 10, 2025 20:19

duncanista approved these changes Jul 11, 2025

View reviewed changes

duncanista force-pushed the jordan.gonzalez/trace-agent/aggregation-for-proxy branch from 4c1162d to 07aa341 Compare July 14, 2025 19:11

lym953 mentioned this pull request Jul 14, 2025

chore: pass SendDataBuilderInfo instead of SendData until flush time #745

Merged

Base automatically changed from jordan.gonzalez/trace-agent/aggregation-for-proxy to main July 16, 2025 19:47

lym953 force-pushed the yiming.luo/lazy-api-key-3 branch from 2ef2a9a to 37caca4 Compare July 16, 2025 20:51

feat: Lazily resolve api key

ef63759

Fix clippy

lym953 force-pushed the yiming.luo/lazy-api-key-3 branch from 37caca4 to ef63759 Compare July 16, 2025 21:46

lym953 merged commit a8d05a1 into main Jul 17, 2025
46 checks passed

lym953 deleted the yiming.luo/lazy-api-key-3 branch July 17, 2025 19:53

kirkrd mentioned this pull request Sep 15, 2025

Cached Datadog api key doesn't play well with snapstart in V84+ #834

Closed

lym953 mentioned this pull request Sep 18, 2025

fix: Fix deadlock when snap start and using secret for api key #853

Merged

-    endpoint: OnceCell<Endpoint>,
-}
+}
+impl ServerlessStatsFlusher {
+    async fn construct_endpoint(&self) -> Endpoint {
+        let api_key = self.api_key_factory.get_api_key().await.to_string();
+        let stats_url = trace_stats_url(&self.config.site);
+        Endpoint {
+            url: hyper::Uri::from_str(&stats_url)
+                .expect("can't make URI from stats url, exiting"),
+            api_key: Some(api_key.clone().into()),
+            timeout_ms: self.config.flush_timeout * 1_000,
+            test_token: None,
+        }
+    }
+}

	headers: OnceCell<HeaderMap>,
	headers: Arc<Mutex<HeaderMap>>,

Conversation

lym953 commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Previous work

This PR

Note

Testing

Setup

Result

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astuyve left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

duncanista left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lym953 commented Jun 24, 2025 •

edited

Loading