[SVLS-7018] Support profiling in Azure Functions#40
Conversation
c0abc20 to
d9ea13b
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Extract headers and body | ||
| let (parts, body) = request.into_parts(); | ||
|
|
||
| let body_bytes = match body.collect().await { | ||
| Ok(collected) => collected.to_bytes(), | ||
| Err(e) => { | ||
| return log_and_create_http_response( | ||
| &format!("Error reading profiling request body: {e}"), | ||
| StatusCode::BAD_REQUEST, | ||
| ); | ||
| } | ||
| }; | ||
|
|
||
| // Create proxy request | ||
| let proxy_request = ProxyRequest { | ||
| headers: parts.headers, | ||
| body: body_bytes, | ||
| target_url: config.profiling_intake.url.to_string(), | ||
| }; |
There was a problem hiding this comment.
Forward profiling query string to Datadog
The profiling proxy handler builds the outgoing request URL from the static profiling_intake endpoint and ignores parts.uri.query(). Profiling clients pass service, environment, and other metadata via the query string when posting to /profiling/v1/input; dropping these parameters means the flusher forwards payloads without required metadata, causing ingestion failures or misattribution. Append the incoming query parameters to ProxyRequest.target_url before sending to the proxy flusher.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
The lambda extension doesn't preserve query parameters: https://github.com/DataDog/datadog-lambda-extension/blob/main/bottlecap/src/traces/proxy_flusher.rs#L119
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /// Handles incoming proxy requests for profiling - can be abstracted into a generic proxy handler for other proxy requests in the future | ||
| async fn profiling_proxy_handler( | ||
| config: Arc<config::Config>, | ||
| request: hyper_migration::HttpRequest, | ||
| proxy_tx: Sender<ProxyRequest>, | ||
| ) -> http::Result<hyper_migration::HttpResponse> { | ||
| debug!("Trace Agent | Received profiling request"); | ||
|
|
||
| // Extract headers and body | ||
| let (parts, body) = request.into_parts(); | ||
|
|
||
| let body_bytes = match body.collect().await { | ||
| Ok(collected) => collected.to_bytes(), | ||
| Err(e) => { | ||
| return log_and_create_http_response( | ||
| &format!("Error reading profiling request body: {e}"), | ||
| StatusCode::BAD_REQUEST, | ||
| ); |
There was a problem hiding this comment.
Enforce profiling payload size before buffering
The new profiling proxy endpoint reads the entire request body into memory (body.collect().await) without validating the Content-Length. The trace and stats handlers call verify_request_content_length to reject payloads larger than Config::max_request_content_length before allocation, but the profiling handler omits this guard. A client can therefore POST an arbitrarily large profiling payload and the mini agent will buffer it in RAM before forwarding, allowing a single request to exhaust memory and crash the function. Consider applying the same content length check as the other endpoints before collecting the body.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
The lambda extension also doesn't validate the content-length, probably because profiling payloads are typically much larger and they're coming from the Datadog profiler
duncanpharvey
left a comment
There was a problem hiding this comment.
Looks good! Just a few minor comments.
Let's make sure to test this in all hosting plans and operating systems prior to releasing. I've seen issues in the past with TLS certificates. See DataDog/libdatadog#523
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
7671d2d to
0663d2c
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let profiling_intake_url = format!("https://intake.profile.{}/api/v2/profile", dd_site); | ||
|
|
||
| // DD_APM_DD_URL env var will primarily be used for integration tests | ||
| // overrides the entire trace/trace stats intake url prefix | ||
| if let Ok(endpoint_prefix) = env::var("DD_APM_DD_URL") { |
There was a problem hiding this comment.
Honor DD_APM_DD_URL for profiling intake
When a custom APM endpoint is configured via DD_APM_DD_URL (used in integration tests and deployments without direct internet access), both trace and trace-stats intakes are redirected to that prefix, but the new profiling intake stays hardcoded to https://intake.profile.{dd_site}/api/v2/profile (lines 125-132). Profiling payloads will therefore bypass the override and attempt to reach the public intake, leading to failed requests in restricted networks or data being sent to the wrong backend. The profiling endpoint should apply the same DD_APM_DD_URL override as the other intakes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
It looks like the Go Agent uses DD_APM_PROFILING_DD_URL for this: https://github.com/DataDog/datadog-agent/blob/f166f77aad04d382e1e5dceb183c7548ba73f496/pkg/config/setup/apm.go#L141
There was a problem hiding this comment.
Fixed in 9576d47 to respect the DD_APM_PROFILING_DD_URL override!
… should be the prefix of the url
7d47f2e to
a5d3ca6
Compare
a5d3ca6 to
b749b9d
Compare
Duncan has reviewed it twice so I'll ship this now! I'll chat with him about the serverless scheduler in the tracers when he's back |
What does this PR do?
mpscchannel where the proxy flusher receives it and forwards it to Datadog, handling retries and exponential backoffs_dd.origin:azurefunctionfunctionname:<name-of-function>Motivation
Profiling is not yet supported in Azure Functions.
Additional Notes
Tested across all hosting plans. This works for Python and Node.js.
Describe how to test/QA your changes
To test this, follow the Serverless Compatibility Layer instructions to deploy function apps across runtimes and hosting plans with the built binary. Add
DD_PROFILING_ENABLED=truein Azure app settings. Generate load on the function app and check Datadog to see profiles show up, and click into the profile > Runtime Info to see the added tags!Ticket for handling profile payloads: SVLS-7982
Ticket for adding tags to profiles: SVLS-7983
FRSLES-759
Follow up task: Talk with APM Serverless about changes in their tracer code that will need to be made. SVLS-7841