[SVLS-6337] Shut down trace agent if resource group can't be determined#39
[SVLS-6337] Shut down trace agent if resource group can't be determined#39kathiehuang merged 6 commits intomainfrom
Conversation
|
@codex review |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. |
|
@codex review |
|
To use Codex here, create a Codex account and connect to github. |
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. |
|
|
||
| // Check for Azure Flex Consumption plan without DD_AZURE_RESOURCE_GROUP | ||
| if is_azure_flex_without_resource_group() { | ||
| error!( |
There was a problem hiding this comment.
what log level does this use? what amount of logging would it cause?
There was a problem hiding this comment.
I was following the pattern of how other parts of this file handles errors, but I'm a little confused because in the Azure logs, the error is being mapped to a severityLevel of 1, but Azure docs says errors should be severity level 3.

For amount of logging, it seems like it logs this error whenever the Azure Function app instance starts (happens a couple logs after "Initializing Warm up Extension"). But it seems like the trace agent keeps trying to restart so a lot of logs get outputted that look like these:

I'm a little bit confused but happy to call about this!
What does this PR do?
Adds a check to see if we are in an Azure function that is on the flex consumption plan and doesn't have the
DD_AZURE_RESOURCE_GROUPenv var set. If so, shut down the trace agent.libdatadogAzure metadata detection logic to check for the env var similarlyMotivation
aas.resource.groupspan attribute for functions on flex consumption plans is set incorrectly in Datadog - they're all set to "flex"aas.resource.idis built usingaas.resource.group, and the resource id is used in billing, which needs to be accurateDD_AZURE_RESOURCE_GROUPenv var. Rather than handling that inlibdatadog, we decided to do it at a higher level in the trace agent to inform the customer of the error while shutting down the trace agent gracefully and preventing any traces from being sent to DatadogJira Ticket
Describe how to test/QA your changes
everywhere that
libdatadogis used to the most recent commit hash of the PR in libdatadog (currentlyd1b35ef21fff3c4588073504905081c8923bbc4b)use_serverless_compat_local_pathto true and making sure the built binary is in yourpythonfolderDD_AZURE_RESOURCE_GROUPenv var, you should see no traces. Check the logs in Azure Portal - you should see an error log with the message"ERROR: Resource group not found. If you are using Azure Functions on Flex Consumption plan, please add your resource group name as an environment variable called DD_AZURE_RESOURCE_GROUP in Azure app settings."DD_AZURE_RESOURCE_GROUPas an environment variable with your resource group. Repeat step 4, you should see the correct resource group in theresource.groupspan attribute!Expected error without

DD_AZURE_RESOURCE_GROUP, no traces sent to DD:With

DD_AZURE_RESOURCE_GROUP- traces sent to DD with the correct resource group/id.