Skip to content

feat: Add helper functions to ApiKeyFactory struct#24

Merged
lym953 merged 15 commits intomainfrom
yiming.luo/api-key-future-cursor-2
Jun 27, 2025
Merged

feat: Add helper functions to ApiKeyFactory struct#24
lym953 merged 15 commits intomainfrom
yiming.luo/api-key-future-cursor-2

Conversation

@lym953
Copy link
Copy Markdown
Contributor

@lym953 lym953 commented Jun 25, 2025

What does this PR do?

  1. Add helper functions to ApiKeyFactory struct, including:
    1. new_from_resolver(), which takes in a resolver function and enables lazy resolution
    2. new_from_static_key(), which takes in a static api key string slice
    3. get_api_key()
  2. Move ApiKeyFactory related code to a separate module api_key

Motivation

As a continuation of #21.

From @astuyve:

today we basically block/await on that decrypt call before we can call /next
so if we can instead make that async and then resolve the future only when we need to flush data, that can be a big win for many customers.

https://datadoghq.atlassian.net/browse/SVLS-6995

Additional Notes

Describe how to test/QA your changes

  1. Passed the added automated tests
  2. Used it in Lambda extension, which did improve start time. See feat: Lazily resolve api key datadog-lambda-extension#717

@lym953 lym953 requested a review from Copilot June 25, 2025 17:50

This comment was marked as outdated.

@lym953 lym953 force-pushed the yiming.luo/api-key-future-cursor-2 branch from f606cce to f6c7a65 Compare June 27, 2025 14:44
@lym953 lym953 requested a review from Copilot June 27, 2025 15:52
@lym953 lym953 changed the title feat: Move key resolution logic to ApiKeyFactory struct feat: Add helper functions to ApiKeyFactory struct Jun 27, 2025

This comment was marked as outdated.

Comment thread crates/dogstatsd/src/api_key.rs
@lym953 lym953 marked this pull request as ready for review June 27, 2025 16:08
Comment thread crates/dogstatsd/src/api_key.rs Outdated

#[derive(Clone)]
pub struct ApiKeyFactory {
inner: Arc<ApiKeyFactoryInner>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this Arc? the ApiKeyResolverFn is already wrapped in an Arc, and the OnceCell should already be Send + Sync. Can you just derive Clone on ApiKeyFactoryInner? And then if you can do that, you no longer need the wrapper struct. You might even be able to get away with not having the Arc around the ApiKeyResolverFn, since you are already required it to be Send+Sync

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong on OnceCell, it looks like it's memory is copied. You might be able to get away with just wrapping the OnceCell in an Arc, and that way at least you don't need the wrapper struct, and in the default case there is no overhead for using the static string.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also LazyLock which might give the correct behaviour with less overhead https://doc.rust-lang.org/std/sync/struct.LazyLock.html

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.

Do you need this Arc? the ApiKeyResolverFn is already wrapped in an Arc, and the OnceCell should already be Send + Sync. Can you just derive Clone on ApiKeyFactoryInner? And then if you can do that, you no longer need the wrapper struct.

Great! This works.

2.

You might even be able to get away with not having the Arc around the ApiKeyResolverFn, since you are already required it to be Send+Sync

Do you mean this?

pub type ApiKeyResolverFn = dyn Fn() -> Pin<Box<dyn Future<Output = String> + Send>> + Send + Sync;

#[derive(Clone)]
pub enum ApiKeyFactory {
    Static(String),
    Dynamic {
        resolver_fn: ApiKeyResolverFn,
        api_key: OnceCell<String>,
    },
}

This doesn't work because resolver_fn has unknown size.

3.

I might be wrong on OnceCell, it looks like it's memory is copied. You might be able to get away with just wrapping the OnceCell in an Arc

You are right. Will add Arc.

4.

There is also LazyLock which might give the correct behaviour with less overhead

This doesn't work because std::sync::LazyLock doesn't allow async functions. OnceCell is the best choice I found that works inside tokio.

Thanks for the comments! I also just learned that enum can have impl.

@lym953 lym953 requested a review from Copilot June 27, 2025 18:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a dedicated api_key module to encapsulate API key resolution logic and updates existing components to use the new ApiKeyFactory helpers.

  • Moved ApiKeyFactory logic into its own module with static and dynamic constructors.
  • Refactored Flusher to depend on ApiKeyFactory instead of raw closures.
  • Updated integration tests and serverless-compat usage to call the new helpers.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/dogstatsd/tests/integration_test.rs Swapped manual API key closure for ApiKeyFactory::new_from_static_key
crates/dogstatsd/src/lib.rs Added pub mod api_key
crates/dogstatsd/src/flusher.rs Updated Flusher to accept Arc<ApiKeyFactory>
crates/dogstatsd/src/api_key.rs Added ApiKeyFactory enum, helper constructors, and unit tests
crates/datadog-serverless-compat/src/main.rs Replaced inline API key closures with ApiKeyFactory::new_from_static_key
Comments suppressed due to low confidence (2)

crates/dogstatsd/src/api_key.rs:37

  • Add a unit test to verify that the dynamic resolver is only invoked once (i.e., that the OnceCell caching works as intended).
                    .get_or_init(|| async { (resolver_fn)().await })

crates/dogstatsd/src/api_key.rs:17

  • [nitpick] Add module- or item-level doc comments for ApiKeyFactory, new_from_resolver, new_from_static_key, and get_api_key to clarify intended usage.
impl ApiKeyFactory {

pub struct Flusher {
// Accept a future so the API key resolution is deferred until the flush happens
api_key_factory: ApiKeyFactory,
// Allow accepting a future so the API key resolution is deferred until the flush happens
Copy link

Copilot AI Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Update this comment to reflect that api_key_factory now uses ApiKeyFactory rather than accepting a raw future-producing closure.

Suggested change
// Allow accepting a future so the API key resolution is deferred until the flush happens
// Uses ApiKeyFactory to defer API key resolution until the flush happens

Copilot uses AI. Check for mistakes.
Comment thread crates/dogstatsd/src/api_key.rs Outdated
}

impl ApiKeyFactory {
pub fn new_from_resolver(resolver_fn: ApiKeyResolverFn) -> Self {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some docs on what is considered a resolver

Comment thread crates/dogstatsd/src/api_key.rs Outdated

#[cfg(test)]
pub mod tests {
use crate::api_key::ApiKeyFactory;
Copy link
Copy Markdown
Contributor

@duncanista duncanista Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in tests. it's common to just use use super::*, which would give you all the imports from the file you're in

Comment thread crates/dogstatsd/src/api_key.rs Outdated
use std::sync::Arc;

#[tokio::test]
async fn new_from_resolver() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good practice to name tests in the form of test_new_from_resolver to not confuse user to the usage above, sometimes, if you wanna be more descriptive and add more context, you can also do test_new_from_resolver_<EXPECTED-ACTION>

@lym953 lym953 merged commit b29bba8 into main Jun 27, 2025
26 checks passed
@lym953 lym953 deleted the yiming.luo/api-key-future-cursor-2 branch June 27, 2025 20:25
lym953 added a commit that referenced this pull request Jul 10, 2025
lym953 added a commit to DataDog/datadog-lambda-extension that referenced this pull request Jul 17, 2025
# Motivation
From @astuyve:
> today we basically block/await on that decrypt call before we can call
/next
so if we can instead make that async and then resolve the future only
when we need to flush data, that can be a big win for many customers.

https://datadoghq.atlassian.net/browse/SVLS-6995

# Previous work
DataDog/serverless-components#21,
DataDog/serverless-components#24 created
`ApiKeyFactory`, which is a util to enable lazy API key resolution.

# This PR

Updates Bottlecap code to use `ApiKeyFactory` to lazily resolve API key,
i.e. instead of resolving it by querying Secret Manager or KMS during
init phase, do it at flushing time when api key is actually needed.

# Note

This PR changes the behavior when key resolution fails, i.e. when
`resolve_secrets()` returns `None`.
- Before: run `extension_loop_idle()`, which does not stop the runtime
- After: panic, which will stop the runtime (if I understand correctly).
Of course it's not ideal. Any better idea?
- It's harder now to run `extension_loop_idle()` because api key
resolution code is not in the main loop anymore, but in various consumer
code of api key
- Is there a way to gracefully shut down the extension without affecting
the runtime?

Update: Added a PR to address resolution failure:
#732
These two PRs should be merged together. Keeping them separate PRs just
to make review easier.

# Testing
## Setup
- Runtime: Go1 on Amazon Linux 2
- Architecture: arm64
- An app with empty implementation code

## Result
Below is the `Datadog Next-Gen Extension ready in:` time logged.

- Before: (prod extension
`arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Extension-ARM:82`)
  - 88.6 ± 1.8 (ms)

- After: (test extension
`arn:aws:lambda:us-east-1:425362996713:layer:Datadog-Bottlecap-Beta-ARM-yiming:2`)
  - 35.4 ± 5.1 (ms)
  - (-60.0%)

<img width="461" alt="image"
src="https://github.com/user-attachments/assets/b2973aae-d8f2-4003-a37f-6af05a42e059"
/>

Both use 5 samples.

# Notes
https://datadoghq.atlassian.net/issues/SVLS-6996
https://datadoghq.atlassian.net/issues/SVLS-6998
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants