-
Notifications
You must be signed in to change notification settings - Fork 1
[SVLS-7018] Support profiling in Azure Functions #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b9e66d5
49fa6d3
39eab7d
9c1f4e9
156b8ec
3870258
7dbb84a
5744931
15ddf9f
c5d51af
5eeefb3
f1ec0b8
a17218b
9818997
c23f3eb
a678f23
082e21c
bf61041
6049999
66d97a8
d846870
299488d
b58e2d7
e5fec36
92655f4
8fcf166
9330986
9c17914
ebf4733
0f671f7
4e2a124
b749b9d
b33a25b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,7 @@ | ||
| // Copyright 2023-Present Datadog, Inc. https://www.datadoghq.com/ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| use http_body_util::BodyExt; | ||
| use hyper::service::service_fn; | ||
| use hyper::{http, Method, Response, StatusCode}; | ||
| use libdd_common::hyper_migration; | ||
|
|
@@ -12,7 +13,8 @@ use std::time::Instant; | |
| use tokio::sync::mpsc::{self, Receiver, Sender}; | ||
| use tracing::{debug, error}; | ||
|
|
||
| use crate::http_utils::log_and_create_http_response; | ||
| use crate::http_utils::{log_and_create_http_response, verify_request_content_length}; | ||
| use crate::proxy_flusher::{ProxyFlusher, ProxyRequest}; | ||
| use crate::{config, env_verifier, stats_flusher, stats_processor, trace_flusher, trace_processor}; | ||
| use libdd_trace_protobuf::pb; | ||
| use libdd_trace_utils::trace_utils; | ||
|
|
@@ -22,8 +24,10 @@ const MINI_AGENT_PORT: usize = 8126; | |
| const TRACE_ENDPOINT_PATH: &str = "/v0.4/traces"; | ||
| const STATS_ENDPOINT_PATH: &str = "/v0.6/stats"; | ||
| const INFO_ENDPOINT_PATH: &str = "/info"; | ||
| const PROFILING_ENDPOINT_PATH: &str = "/profiling/v1/input"; | ||
| const TRACER_PAYLOAD_CHANNEL_BUFFER_SIZE: usize = 10; | ||
| const STATS_PAYLOAD_CHANNEL_BUFFER_SIZE: usize = 10; | ||
| const PROXY_PAYLOAD_CHANNEL_BUFFER_SIZE: usize = 10; | ||
|
|
||
| pub struct MiniAgent { | ||
| pub config: Arc<config::Config>, | ||
|
|
@@ -32,6 +36,7 @@ pub struct MiniAgent { | |
| pub stats_processor: Arc<dyn stats_processor::StatsProcessor + Send + Sync>, | ||
| pub stats_flusher: Arc<dyn stats_flusher::StatsFlusher + Send + Sync>, | ||
| pub env_verifier: Arc<dyn env_verifier::EnvVerifier + Send + Sync>, | ||
| pub proxy_flusher: Arc<ProxyFlusher>, | ||
| } | ||
|
|
||
| impl MiniAgent { | ||
|
|
@@ -42,7 +47,7 @@ impl MiniAgent { | |
| let mini_agent_metadata = Arc::new( | ||
| self.env_verifier | ||
| .verify_environment( | ||
| self.config.verify_env_timeout, | ||
| self.config.verify_env_timeout_ms, | ||
| &self.config.env_type, | ||
| &self.config.os, | ||
| ) | ||
|
|
@@ -82,6 +87,17 @@ impl MiniAgent { | |
| .await; | ||
| }); | ||
|
|
||
| // channels to send processed profiling requests to our proxy flusher | ||
| let (proxy_tx, proxy_rx): (Sender<ProxyRequest>, Receiver<ProxyRequest>) = | ||
| mpsc::channel(PROXY_PAYLOAD_CHANNEL_BUFFER_SIZE); | ||
|
|
||
| // start our proxy flusher for profiling requests | ||
| let proxy_flusher = self.proxy_flusher.clone(); | ||
| tokio::spawn(async move { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we have ways to gracefully shut down this flusher? do we need them?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right now we don't, but that's a good point, if this background task is still completing a request when the function handler returns, that could result in data loss. I looked at my notes from when I initially wrote this code and I had this from talking to Darcy: For Azure Functions there are ~20s after invocation before it starts to freeze. If flush period < 20s then chance of data loss is low I also have this from Azure support with some information about the lifecycle of Azure functions (TLDR they didn't have a documented answer):
Looking at this log, it seems to be able to send the request in <200ms consistently so chances of data loss seem fairly low?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fair point. i guess if we don't need it here, we can leave this as it is. if we ever extract this out into a shared component that would be useful in environments that do have some sort of shutdown detection, we can add something then. |
||
| let proxy_flusher = proxy_flusher.clone(); | ||
| proxy_flusher.start_proxy_flusher(proxy_rx).await; | ||
| }); | ||
|
|
||
| // setup our hyper http server, where the endpoint_handler handles incoming requests | ||
| let trace_processor = self.trace_processor.clone(); | ||
| let stats_processor = self.stats_processor.clone(); | ||
|
|
@@ -96,14 +112,17 @@ impl MiniAgent { | |
| let endpoint_config = endpoint_config.clone(); | ||
| let mini_agent_metadata = Arc::clone(&mini_agent_metadata); | ||
|
|
||
| let proxy_tx = proxy_tx.clone(); | ||
|
|
||
| MiniAgent::trace_endpoint_handler( | ||
| endpoint_config, | ||
| req.map(hyper_migration::Body::incoming), | ||
| trace_processor, | ||
| trace_tx, | ||
| stats_processor, | ||
| stats_tx, | ||
| mini_agent_metadata, | ||
| trace_processor.clone(), | ||
| trace_tx.clone(), | ||
| stats_processor.clone(), | ||
| stats_tx.clone(), | ||
| Arc::clone(&mini_agent_metadata), | ||
| proxy_tx.clone(), | ||
| ) | ||
| }); | ||
|
|
||
|
|
@@ -167,6 +186,7 @@ impl MiniAgent { | |
| stats_processor: Arc<dyn stats_processor::StatsProcessor + Send + Sync>, | ||
| stats_tx: Sender<pb::ClientStatsPayload>, | ||
| mini_agent_metadata: Arc<trace_utils::MiniAgentMetadata>, | ||
| proxy_tx: Sender<ProxyRequest>, | ||
| ) -> http::Result<hyper_migration::HttpResponse> { | ||
| match (req.method(), req.uri().path()) { | ||
| (&Method::PUT | &Method::POST, TRACE_ENDPOINT_PATH) => { | ||
|
|
@@ -190,6 +210,15 @@ impl MiniAgent { | |
| ), | ||
| } | ||
| } | ||
| (&Method::POST, PROFILING_ENDPOINT_PATH) => { | ||
| match Self::profiling_proxy_handler(config, req, proxy_tx).await { | ||
| Ok(res) => Ok(res), | ||
| Err(err) => log_and_create_http_response( | ||
| &format!("Error processing profiling request: {err}"), | ||
| StatusCode::INTERNAL_SERVER_ERROR, | ||
| ), | ||
| } | ||
| } | ||
| (_, INFO_ENDPOINT_PATH) => match Self::info_handler(config.dd_dogstatsd_port) { | ||
| Ok(res) => Ok(res), | ||
| Err(err) => log_and_create_http_response( | ||
|
|
@@ -205,13 +234,67 @@ impl MiniAgent { | |
| } | ||
| } | ||
|
|
||
| /// Handles incoming proxy requests for profiling - can be abstracted into a generic proxy handler for other proxy requests in the future | ||
| async fn profiling_proxy_handler( | ||
| config: Arc<config::Config>, | ||
| request: hyper_migration::HttpRequest, | ||
| proxy_tx: Sender<ProxyRequest>, | ||
| ) -> http::Result<hyper_migration::HttpResponse> { | ||
| debug!("Trace Agent | Received profiling request"); | ||
|
|
||
| // Extract headers and body | ||
| let (parts, body) = request.into_parts(); | ||
| if let Some(response) = verify_request_content_length( | ||
| &parts.headers, | ||
| config.max_request_content_length, | ||
| "Error processing profiling request", | ||
| ) { | ||
| return response; | ||
| } | ||
|
|
||
| let body_bytes = match body.collect().await { | ||
| Ok(collected) => collected.to_bytes(), | ||
| Err(e) => { | ||
| return log_and_create_http_response( | ||
| &format!("Error reading profiling request body: {e}"), | ||
| StatusCode::BAD_REQUEST, | ||
| ); | ||
|
Comment on lines
+237
to
+261
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The new profiling proxy endpoint reads the entire request body into memory ( Useful? React with 👍 / 👎.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The lambda extension also doesn't validate the content-length, probably because profiling payloads are typically much larger and they're coming from the Datadog profiler |
||
| } | ||
| }; | ||
|
|
||
| // Create proxy request | ||
| let proxy_request = ProxyRequest { | ||
| headers: parts.headers, | ||
| body: body_bytes, | ||
| target_url: config.profiling_intake.url.to_string(), | ||
|
kathiehuang marked this conversation as resolved.
|
||
| }; | ||
|
Comment on lines
+245
to
+270
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The profiling proxy handler builds the outgoing request URL from the static Useful? React with 👍 / 👎.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The lambda extension doesn't preserve query parameters: https://github.com/DataDog/datadog-lambda-extension/blob/main/bottlecap/src/traces/proxy_flusher.rs#L119 |
||
|
|
||
| debug!( | ||
| "Trace Agent | Sending profiling request to channel, target: {}", | ||
| proxy_request.target_url | ||
| ); | ||
|
|
||
| // Send to channel | ||
| match proxy_tx.send(proxy_request).await { | ||
| Ok(_) => log_and_create_http_response( | ||
| "Successfully buffered profiling request to be flushed", | ||
| StatusCode::OK, | ||
| ), | ||
| Err(err) => log_and_create_http_response( | ||
| &format!("Error sending profiling request to the proxy flusher: {err}"), | ||
| StatusCode::INTERNAL_SERVER_ERROR, | ||
| ), | ||
| } | ||
| } | ||
|
|
||
| fn info_handler(dd_dogstatsd_port: u16) -> http::Result<hyper_migration::HttpResponse> { | ||
| let response_json = json!( | ||
| { | ||
| "endpoints": [ | ||
| TRACE_ENDPOINT_PATH, | ||
| STATS_ENDPOINT_PATH, | ||
| INFO_ENDPOINT_PATH | ||
| INFO_ENDPOINT_PATH, | ||
| PROFILING_ENDPOINT_PATH | ||
| ], | ||
| "client_drop_p0s": true, | ||
| "config": { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is
DD_APM_PROFILING_DD_URLa standard config name? if not, can we add prefix somewhere in there to make it clear that it's the prefix (presumably including the scheme, host, and port?)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datadog-agent uses
DD_APM_PROFILING_DD_URL: https://github.com/DataDog/datadog-agent/blob/main/pkg/config/setup/apm.go#L141But good point, I made the comment clearer!