New repo layout by masci · Pull Request #3 · DataDog/datadog-agent

masci · 2016-06-10T14:22:39Z

Inspired by kubernetes and etcd, keeps executables and libraries in different packages.
Added shell scripts to build the binaries and run tests.
Massive cleanup of unused resources.

Update README and return type

Fix syscall prefix on old kernels

Fix the following DCA error: ``` 2021-04-26 15:23:32 UTC | CLUSTER | ERROR | (pkg/clusteragent/externalmetrics/datadogmetric_controller.go:170 in process) | Impossible to synchronize DatadogMetric (attempt #3): datadog-agent-helm/dcaautogen-afda243730e31f83713b52b84d9d2bac9b6d1f, err: Unable to create DatadogMetric: datadog-agent-helm/dcaautogen-afda243730e31f83713b52b84d9d2bac9b6d1f, err: datadogmetric.datadoghq.com "dcaautogen-afda243730e31f83713b52b84d9d2bac9b6d1f" is invalid: kind: Invalid value: "datadogmetric": must be DatadogMetric ```

Added reference to the upgraded gohai commit (#3)

* Revert "[acix-24] Notify job status as infra failures using job_failure (#24215)" This reverts commit 9f09a75. * Revert "Use gitlab module instead of raw requests (3) (#24867)" This reverts commit fb56148.

This reverts commit b98551c.

…date (#3…" This reverts commit 21dfbd5.

First, The fallback expression `|| EXIT_CODE=$?` was evaluated in a subshell, which caused it to be swallowed from the perspective of the parent shell. The fix simply consists in moving it to the parent shell. In addition, it turns out that `notarytool submit [...] --wait` doesn't return a `nonzero` exit code on `Invalid` statuses, which might be by design (or might be fixed in a later `Xcode` version). The present change therefore changes the approach to fail fast(er) by first enabling [recommended shell flags](https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail), and then verifying `notarytool log`'s resulting status which should be exactly `Accepted` as per https://keith.github.io/xcode-man-pages/notarytool.1.html#wait. For instance, should `notarytool log`'s resulting status be `Invalid`, the `agent_dmg-x64-a7` job would then terminate as follows: ``` + '[' Invalid = Accepted ']' Attempt #3 failed with error code 1 Running after_script Cleaning up project directory and file based variables ERROR: Job failed: exit status 1 ``` Conversely, for an `Accepted` resulting status: ``` + '[' Accepted = Accepted ']' + exit 0 \e[0K Built signed package $ $S3_CP_CMD $OMNIBUS_PACKAGE_DIR/version-manifest.json $S3_SBOM_STORAGE_URI/$CI_JOB_NAME/version-manifest.json upload: omnibus/pkg/version-manifest.json to s3://sbom-root-us1-ddbuild-io/datadog-agent/99999999/agent_dmg-x64-a7/version-manifest.json Running after_script Saving cache for successful job Uploading artifacts for successful job Cleaning up project directory and file based variables Job succeeded ```

# This is the 1st commit message: [build] Fork rules_multitool to our own extension. Import the core parts of rules_multitool but with appropriate modifications for our needs. - Re-root things so this is a local extension, rather than a distinct module. - remove the ability to use .netrc for authentication. - remove the :cwd and :workspace_root variations. - There are equivalent workarounds using `--run_under="cd <path> &&"` - Using `bazel run` is usually not the best practice. Rules should use `$(location <tool>)`. If a tool is so common that people need to call it anywhere, at any time, then it should be in their path. - The code is left commented out. Ready to be enabled if a good case is presented. - Remove the support for WORKSPACE. Next steps: - Change download structure so the exeuctable name is the tool, rather than `"executable"`. - Stop passing the attributes of each binary around as json blobs to be encoded and decoded. - Make `:path` feature that will print the full execution path. # This is the commit message #2: Update bazel/multitool/extension.bzl Yeah. That's a better name. Co-authored-by: Joseph Gette <jgettepost@gmail.com> # This is the commit message #3: Update bazel/multitool/extension.bzl Co-authored-by: Joseph Gette <jgettepost@gmail.com> # This is the commit message #4: Update bazel/multitool/private/templates.bzl Co-authored-by: Joseph Gette <jgettepost@gmail.com> # This is the commit message #5: render

# This is the 1st commit message: [build] Fork rules_multitool to our own extension. Import the core parts of rules_multitool but with appropriate modifications for our needs. - Re-root things so this is a local extension, rather than a distinct module. - remove the ability to use .netrc for authentication. - remove the :cwd and :workspace_root variations. - There are equivalent workarounds using `--run_under="cd <path> &&"` - Using `bazel run` is usually not the best practice. Rules should use `$(location <tool>)`. If a tool is so common that people need to call it anywhere, at any time, then it should be in their path. - The code is left commented out. Ready to be enabled if a good case is presented. - Remove the support for WORKSPACE. Next steps: - Change download structure so the exeuctable name is the tool, rather than `"executable"`. - Stop passing the attributes of each binary around as json blobs to be encoded and decoded. - Make `:path` feature that will print the full execution path. # This is the commit message #2: Update bazel/multitool/extension.bzl Yeah. That's a better name. Co-authored-by: Joseph Gette <jgettepost@gmail.com> # This is the commit message #3: Update bazel/multitool/extension.bzl Co-authored-by: Joseph Gette <jgettepost@gmail.com> # This is the commit message #4: Update bazel/multitool/private/templates.bzl Co-authored-by: Joseph Gette <jgettepost@gmail.com> # This is the commit message #5: render # This is the commit message #6: iff # This is the commit message #7: commit # This is the commit message #8: lintyfresth

# This is the 1st commit message: Add ability to detect ship_source_offer # This is the commit message #2: get rid of merge ghost # This is the commit message #3: buildifier woes

Feature/ssm

### What does this PR do? This syncs additional code from the original PAR into the datadog agent (all the kubernetes bundles) ### Motivation ### Describe how you validated your changes Code is not called yet ### Additional Notes Previous PR : #43595 More PRs to come Co-authored-by: gabriel.plassard <gabriel.plassard@datadoghq.com>

# This is the 1st commit message: Keep our own copy of cacert.pem - Replace omnibus fetch from upstream with that static copy. - Include text in the BUILD file about how we check for new upstream versions. - Add explanation of why we have this. https://datadoghq.atlassian.net/browse/ABLD-169 # This is the commit message #2: just use copy for windows # This is the commit message #3: qmarks # This is the commit message #4: omnibus is to blame

# This is the 1st commit message: Keep our own copy of cacert.pem - Replace omnibus fetch from upstream with that static copy. - Include text in the BUILD file about how we check for new upstream versions. - Add explanation of why we have this. https://datadoghq.atlassian.net/browse/ABLD-169 # This is the commit message #2: just use copy for windows # This is the commit message #3: qmarks # This is the commit message #4: omnibus is to blame # This is the commit message #5: maybe # This is the commit message #6: add back in default version # This is the commit message #7: drop livestream on debug # This is the commit message #8: You're kidding, :live_stream? # This is the commit message #9: srsly # This is the commit message #10: just copy on windows # This is the commit message #11: cwd with copy probably does not work # This is the commit message #12: just give up on pkg_install for certs # This is the commit message #13: drop unneded pkg_install targets # This is the commit message #14: - use cwd to make it a little cleaner - update cert to 2025-09-09 # This is the commit message #15: comma # This is the commit message #16: Revert use of cwd on copy. It doesn't matter if it is ugly or not. We are going to delete it this quarter anyway.

### What does this PR do? Skip the SSH session patcher and add a test to illustrate the current issue. In addition, adds the possibility to check specific fields in the json returned for ssh_session events. ### Motivation The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved. Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events. ### Describe how you validated your changes Added a test that illustrate the issue : `TestSSHUserSessionBlocking` With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent. Error without commenting the patcher : ``` Error: Received unexpected error: All attempts fail: #1: not found #2: not found #3: not found #4: not found #5: not found #6: not found #7: not found #8: not found #9: not found #10: not found #11: not found #12: not found #13: not found #14: not found #15: not found #16: not found #17: not found #18: not found #19: not found #20: not found #21: not found #22: not found #23: not found #24: not found #25: not found #26: not found #27: not found #28: not found #29: not found #30: not found Test: TestSSHUserSessionBlocking/second_ssh_no_auth ``` Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>

Skip the SSH session patcher and add a test to illustrate the current issue. In addition, adds the possibility to check specific fields in the json returned for ssh_session events. ### Motivation The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved. Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events. ### Describe how you validated your changes Added a test that illustrate the issue : `TestSSHUserSessionBlocking` With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent. Error without commenting the patcher : ``` Error: Received unexpected error: All attempts fail: #1: not found #2: not found #3: not found #4: not found #5: not found #6: not found #7: not found #8: not found #9: not found #10: not found #11: not found #12: not found #13: not found #14: not found #15: not found #16: not found #17: not found #18: not found #19: not found #20: not found #21: not found #22: not found #23: not found #24: not found #25: not found #26: not found #27: not found #28: not found #29: not found #30: not found Test: TestSSHUserSessionBlocking/second_ssh_no_auth ``` Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com> (cherry picked from commit 40d1f09) ___ Co-authored-by: Théo Putegnat <theo.putegnat@datadoghq.com>

Backport 40d1f09 from #45437. ___ ### What does this PR do? Skip the SSH session patcher and add a test to illustrate the current issue. In addition, adds the possibility to check specific fields in the json returned for ssh_session events. ### Motivation The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved. Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events. ### Describe how you validated your changes Added a test that illustrate the issue : `TestSSHUserSessionBlocking` With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent. Error without commenting the patcher : ``` Error: Received unexpected error: All attempts fail: #1: not found #2: not found #3: not found #4: not found #5: not found #6: not found #7: not found #8: not found #9: not found #10: not found #11: not found #12: not found #13: not found #14: not found #15: not found #16: not found #17: not found #18: not found #19: not found #20: not found #21: not found #22: not found #23: not found #24: not found #25: not found #26: not found #27: not found #28: not found #29: not found #30: not found Test: TestSSHUserSessionBlocking/second_ssh_no_auth ``` Co-authored-by: axel.vonengel <axel.vonengel@datadoghq.com>

Backport 40d1f09 from #45437. ___ ### What does this PR do? Skip the SSH session patcher and add a test to illustrate the current issue. In addition, adds the possibility to check specific fields in the json returned for ssh_session events. ### Motivation The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved. Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events. ### Describe how you validated your changes Added a test that illustrate the issue : `TestSSHUserSessionBlocking` With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent. Error without commenting the patcher : ``` Error: Received unexpected error: All attempts fail: #1: not found #2: not found #3: not found #4: not found #5: not found #6: not found #7: not found #8: not found #9: not found #10: not found #11: not found #12: not found #13: not found #14: not found #15: not found #16: not found #17: not found #18: not found #19: not found #20: not found #21: not found #22: not found #23: not found #24: not found #25: not found #26: not found #27: not found #28: not found #29: not found #30: not found Test: TestSSHUserSessionBlocking/second_ssh_no_auth ``` Co-authored-by: YoannGh <yoann.ghigoff@datadoghq.com> Co-authored-by: florent.clarret <florent.clarret@datadoghq.com>

### What does this PR do? Skip the SSH session patcher and add a test to illustrate the current issue. In addition, adds the possibility to check specific fields in the json returned for ssh_session events. ### Motivation The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved. Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events. ### Describe how you validated your changes Added a test that illustrate the issue : `TestSSHUserSessionBlocking` With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent. Error without commenting the patcher : ``` Error: Received unexpected error: All attempts fail: #1: not found #2: not found #3: not found #4: not found #5: not found #6: not found #7: not found #8: not found #9: not found #10: not found #11: not found #12: not found #13: not found #14: not found #15: not found #16: not found #17: not found #18: not found #19: not found #20: not found #21: not found #22: not found #23: not found #24: not found #25: not found #26: not found #27: not found #28: not found #29: not found #30: not found Test: TestSSHUserSessionBlocking/second_ssh_no_auth ``` Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>

Summary of Changes HIGH Priority Issues Fixed: #1: Write lock held across network I/O (impl/delegatedauth.go:270) - Refactored refreshAndGetAPIKey to release the lock before making network calls (authenticate) - The lock is now only held briefly to check/update state, not during network I/O #2: Context not propagated to signer.SignHTTP (aws.go:195) - Updated generateAwsAuthData to accept a context parameter - Changed signer.SignHTTP(context.Background(), ...) to signer.SignHTTP(ctx, ...) #3: Context not propagated to getCredentials IMDS call (aws.go:119) - Updated getCredentials to accept a context parameter - Removed ctx := context.Background() and now uses the passed context for IMDS calls MEDIUM Priority Issues Fixed: #4: No response body size limit (api/delegated_auth.go:97) - Added maxResponseBodySize = 1 * 1024 * 1024 constant (1 MB) - Wrapped response body with io.LimitReader to prevent memory exhaustion #5: No overall HTTP client timeout (api/delegated_auth.go:82) - Added httpClientTimeout = 30 * time.Second constant - Added Timeout: httpClientTimeout to the HTTP client #6: config.Set called while holding write lock (impl/delegatedauth.go:341) - Moved updateConfigWithAPIKey call outside the lock in startBackgroundRefresh - Captured the API key while holding the lock, then released it before calling config.Set #7: Blocking IMDS calls while holding write lock (impl/delegatedauth.go:127) - Refactored initializeIfNeeded to perform cloud detection without holding locks - IMDS calls now happen outside any lock, then state is updated with a brief write lock #8: Regex fails silently for non-standard formats (api/delegated_auth.go:36) - Added debug log when endpoint doesn't match known Datadog domain pattern - Updated function documentation to clarify behavior #9: Uncached IMDS credential fetch (aws.go:104) - Added documentation explaining the trade-off (refresh interval is typically 60 minutes, so caching is not critical) #10: Auth proof format undocumented (aws.go:98) - Added detailed comment documenting the auth proof format: <base64-body>|<base64-headers>|<method>|<base64-url> LOW Priority Issues Fixed: #11: Unnecessarily exported types (aws.go) - Changed SigningData to signingData (unexported) - Changed AWSAuth.AwsRegion to AWSAuth.region (unexported) - Updated all references in aws.go and aws_test.go #12: Tests exercise copy of goroutine (impl/delegatedauth_test.go:19) - Added documentation explaining why tests use a simplified goroutine pattern - Clarified that integration tests cover the actual startBackgroundRefresh function #13: Subsequent Config param silently ignored (def/delegatedauth.go:24) - Updated documentation to clearly state that only the first Config is used - Added warning log when a different Config is passed on subsequent calls

…ctions Replace the monolithic batcher (5 ring buffers sharing one transport) with a generic pipeline[T] struct. Each pipeline owns its own ring buffers, flush goroutines, and dedicated UDS connection: metricsPipeline = pipeline[metricPoint] + unixConn #1 logsPipeline = pipeline[logEntry] + unixConn #2 tracePipeline = pipeline[capturedTraceStat] + unixConn #3 Pipelines are fully independent — one slow pipeline (e.g. logs sending large frames) cannot block or starve another. Key changes: - pipeline[T]: generic struct with AddEntry(T), AddContextDef, Stop. 1-2 flush goroutines per pipeline (entries + optional contexts). flushChunked reused unchanged. - unixConn: simple per-connection transport replacing pooledTransport. Lazy dial, mutex held during Send, reconnect once on error. - activate(): creates 3 pipelines with 3 independent connections. sync.Once coordinates teardown when any transport disconnects. Testbench results (all 0 drops): - dogstatsd-p99: 2.8M metrics sent - logs-high-throughput: 40M logs at 10 MiB/s, 3 GB Parquet - metrics-logs-combined: 743K metrics + 1M logs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Massimiliano Pippi added 7 commits June 9, 2016 16:31

first pass at layout

e9b6754

updated deps

8fea6cc

remove unused modules

316e9cf

removed more unused modules

93cfd5e

added facilities to distribute assets, naive version

95359ea

move the C bridge into the py package

b664ecd

slightly more meaningful

c4b4778

masci merged commit b6e6a60 into master Jun 10, 2016

masci deleted the massi/layout branch June 10, 2016 18:22

masci pushed a commit that referenced this pull request Apr 1, 2019

Merge pull request #3 from DataDog/maxime/update-readme

4383dd6

Update README and return type

hush-hush added a commit that referenced this pull request Apr 17, 2019

Merge pull request #3 from DataDog/maxime/update-readme

5d3fb47

Update README and return type

safchain added a commit to safchain/datadog-agent that referenced this pull request May 26, 2020

Merge pull request DataDog#3 from lebauce/fix-syscall-prefix

6ddc189

Fix syscall prefix on old kernels

gbbr mentioned this pull request Oct 29, 2021

pkg/trace/{api,agent,pb}: introduce new client payload formats: TracerPayload & TraceChunk #9474

Closed

7 tasks

iglendd added a commit that referenced this pull request Mar 18, 2022

Switched to github.com/shirou/gopsutil/v3

45789a0

Added reference to the upgraded gohai commit (#3)

einat-stern pushed a commit that referenced this pull request Sep 19, 2022

CR #3

5ddbc17

tirelibirefe mentioned this pull request Aug 18, 2023

How/where can we find hidden APM paramaters for APM POC setup? #18886

Closed

akarpz mentioned this pull request Oct 26, 2023

do DNS lookups in require.eventually #20444

Closed

10 tasks

CelianR added a commit that referenced this pull request May 6, 2024

Revert "Revert gitlab-use-module #3 (#25024)"

ea3e311

This reverts commit b98551c.

FlorentClarret added a commit that referenced this pull request Feb 17, 2025

Revert "fix(release): Move the prelude creation with the changelog up…

edbd80a

…date (#3…" This reverts commit 21dfbd5.

JSGette added a commit that referenced this pull request Sep 17, 2025

Protect $ #3

a925694

JSGette added a commit that referenced this pull request Sep 26, 2025

Protect $ #3

01d5edb

aiuto added a commit that referenced this pull request Nov 18, 2025

# This is a combination of 3 commits.

8edfef9

# This is the 1st commit message: Add ability to detect ship_source_offer # This is the commit message #2: get rid of merge ghost # This is the commit message #3: buildifier woes

s-alad pushed a commit that referenced this pull request Nov 21, 2025

Merge pull request #3 from rapdev-io/feature/ssm

477c2b4

Feature/ssm

matt-dz mentioned this pull request Feb 26, 2026

Implement Agent Safe Shell — POSIX commands as safe builtins #46945

Closed

4 tasks

JSGette added a commit that referenced this pull request Apr 15, 2026

Add debug logging #3

6af777b

daniel-taf mentioned this pull request Apr 20, 2026

[PRMS-3140] fall back to numeric UID when username lookup fails in Gohai #49559

Merged

This was referenced Apr 24, 2026

Coordinator run log — observer AD iteration #49678

Draft

coord run-log (full) — 2026-04-27 16:44 #49939

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New repo layout#3

New repo layout#3
masci merged 7 commits intomasterfrom
massi/layout

masci commented Jun 10, 2016 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

masci commented Jun 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

masci commented Jun 10, 2016 •

edited

Loading