Skip to content

[core] Config loader: first draft#4

Merged
masci merged 1 commit intomasterfrom
massi/loader_first_draft
Jun 21, 2016
Merged

[core] Config loader: first draft#4
masci merged 1 commit intomasterfrom
massi/loader_first_draft

Conversation

@masci
Copy link
Copy Markdown
Contributor

@masci masci commented Jun 13, 2016

First draft of the loader system for checks' configurations.
The agent main module was adapted to follow the new logic (first search for the configuration, then search for the corresponding checks) but only for the Python checks atm.

More details on the README file

@masci masci force-pushed the massi/loader_first_draft branch from 65dc49a to 52f2d52 Compare June 15, 2016 10:18
Comment thread pkg/loader/file_provider.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like go-logging but we've been using seelog almost everywhere else...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will open a PR to switch the logging library

@olivielpeau
Copy link
Copy Markdown
Member

👍 LGTM, don't have any additional comment to make

changed config loading logic
cosmetics
use config provider interface
renamed modules
docs
@masci masci force-pushed the massi/loader_first_draft branch from 85d7386 to f32d3a5 Compare June 21, 2016 15:22
@masci masci merged commit d4636e9 into master Jun 21, 2016
@masci masci deleted the massi/loader_first_draft branch June 21, 2016 15:26
safchain added a commit to safchain/datadog-agent that referenced this pull request May 11, 2020
Add open, rename, rmdir and unlink events
safchain pushed a commit to safchain/datadog-agent that referenced this pull request Jun 4, 2020
@akarpz akarpz mentioned this pull request Oct 26, 2023
10 tasks
dd-mergequeue Bot pushed a commit that referenced this pull request May 6, 2024
* Revert "Revert gitlab-use-module #3 (#25024)"

This reverts commit b98551c.

* [gitlab-use-module] Fixed trigger child pipeline

* [gitlab-use-module] Applied suggestion
alexgallotta pushed a commit that referenced this pull request May 9, 2024
* Revert "Revert gitlab-use-module #3 (#25024)"

This reverts commit b98551c.

* [gitlab-use-module] Fixed trigger child pipeline

* [gitlab-use-module] Applied suggestion
CelianR added a commit that referenced this pull request Aug 20, 2025
aiuto added a commit that referenced this pull request Sep 17, 2025
# This is the 1st commit message:

[build] Fork rules_multitool to our own extension.

Import the core parts of rules_multitool but with appropriate modifications for our needs.

- Re-root things so this is a local extension, rather than a distinct module.
- remove the ability to use .netrc for authentication.
- remove the :cwd and :workspace_root variations.
  - There are equivalent workarounds using `--run_under="cd <path> &&"`
  - Using `bazel run` is usually not the best practice. Rules should use `$(location <tool>)`. If a tool is so common that people need to call it anywhere, at any time, then it should be in their path.
  - The code is left commented out. Ready to be enabled if a good case is presented.
- Remove the support for WORKSPACE.

Next steps:
- Change download structure so the exeuctable name is the tool, rather than `"executable"`.
- Stop passing the attributes of each binary around as json blobs to be encoded and decoded.
- Make `:path` feature that will print the full execution path.

# This is the commit message #2:

Update bazel/multitool/extension.bzl

Yeah. That's a better name.

Co-authored-by: Joseph Gette <jgettepost@gmail.com>
# This is the commit message #3:

Update bazel/multitool/extension.bzl

Co-authored-by: Joseph Gette <jgettepost@gmail.com>
# This is the commit message #4:

Update bazel/multitool/private/templates.bzl

Co-authored-by: Joseph Gette <jgettepost@gmail.com>
# This is the commit message #5:

render
aiuto added a commit that referenced this pull request Sep 17, 2025
# This is the 1st commit message:

[build] Fork rules_multitool to our own extension.

Import the core parts of rules_multitool but with appropriate modifications for our needs.

- Re-root things so this is a local extension, rather than a distinct module.
- remove the ability to use .netrc for authentication.
- remove the :cwd and :workspace_root variations.
  - There are equivalent workarounds using `--run_under="cd <path> &&"`
  - Using `bazel run` is usually not the best practice. Rules should use `$(location <tool>)`. If a tool is so common that people need to call it anywhere, at any time, then it should be in their path.
  - The code is left commented out. Ready to be enabled if a good case is presented.
- Remove the support for WORKSPACE.

Next steps:
- Change download structure so the exeuctable name is the tool, rather than `"executable"`.
- Stop passing the attributes of each binary around as json blobs to be encoded and decoded.
- Make `:path` feature that will print the full execution path.

# This is the commit message #2:

Update bazel/multitool/extension.bzl

Yeah. That's a better name.

Co-authored-by: Joseph Gette <jgettepost@gmail.com>
# This is the commit message #3:

Update bazel/multitool/extension.bzl

Co-authored-by: Joseph Gette <jgettepost@gmail.com>
# This is the commit message #4:

Update bazel/multitool/private/templates.bzl

Co-authored-by: Joseph Gette <jgettepost@gmail.com>
# This is the commit message #5:

render

# This is the commit message #6:

iff

# This is the commit message #7:

commit

# This is the commit message #8:

lintyfresth
dd-mergequeue Bot pushed a commit that referenced this pull request Oct 28, 2025
#42435)

…2229)"

### What does this PR do?
This reverts commit 57eccdf.

### Motivation
[#incident-44969](https://dd.enterprise.slack.com/archives/C09P8AQQBV0)

### Describe how you validated your changes

### Additional Notes


Co-authored-by: stanley.liu <stanley.liu@datadoghq.com>
songy23 added a commit that referenced this pull request Oct 28, 2025
s-alad pushed a commit that referenced this pull request Nov 21, 2025
initial pass at yaml file secret backend
dd-mergequeue Bot pushed a commit that referenced this pull request Dec 15, 2025
### What does this PR do?

This syncs additional code from the original PAR into the datadog agent (all the gitlab bundles)

### Motivation

We want to integrate the PAR in the agent

### Describe how you validated your changes

Code is not called yet and won't be packaged into any of the existing agents binaries, so the quality gate failing is linked to the broader quality gate incident rather than this PR

### Additional Notes

Requires #agent-devx review because of the change to the license file for the gitlab client, discussion about this in our #opensource channel
Previous PR : #43599
More PRs to come


Co-authored-by: gabriel.plassard <gabriel.plassard@datadoghq.com>
Ishirui added a commit that referenced this pull request Dec 26, 2025
scottopell added a commit that referenced this pull request Jan 12, 2026
Key optimizations:
- Fix #1: Pass Arc<HashSet> directly to avoid cloning container sets in predicates
- Fix #4: Extract mtime once per file before sorting (avoids O(n log n) syscalls)
- Fix #9: Add refresh staleness check to skip redundant file discovery
- Fix #10: Wrap stats cache in Arc to avoid deep clones on cache hit
- Fix #11: Use sort_unstable for timeseries points (faster, no stability needed)
- Fix #13: Single division instead of two in rate calculation
- Pre-allocate RawContainerData vectors with estimated capacity
- Remove metric_name from projection (only needed for predicate, not output)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
aiuto added a commit that referenced this pull request Jan 13, 2026
# This is the 1st commit message:

Keep our own copy of cacert.pem

- Replace omnibus fetch from upstream with that static copy.
- Include text in the BUILD file about how we check for new upstream versions.
- Add explanation of why we have this.

https://datadoghq.atlassian.net/browse/ABLD-169

# This is the commit message #2:

just use copy for windows

# This is the commit message #3:

qmarks

# This is the commit message #4:

omnibus is to blame
aiuto added a commit that referenced this pull request Jan 13, 2026
# This is the 1st commit message:

Keep our own copy of cacert.pem

- Replace omnibus fetch from upstream with that static copy.
- Include text in the BUILD file about how we check for new upstream versions.
- Add explanation of why we have this.

https://datadoghq.atlassian.net/browse/ABLD-169

# This is the commit message #2:

just use copy for windows

# This is the commit message #3:

qmarks

# This is the commit message #4:

omnibus is to blame

# This is the commit message #5:

maybe

# This is the commit message #6:

add back in default version

# This is the commit message #7:

drop livestream on debug

# This is the commit message #8:

You're kidding, :live_stream?

# This is the commit message #9:

srsly

# This is the commit message #10:

just copy on windows

# This is the commit message #11:

cwd with copy probably does not work

# This is the commit message #12:

just give up on pkg_install for certs

# This is the commit message #13:

drop unneded pkg_install targets

# This is the commit message #14:

- use cwd to make it a little cleaner
- update cert to 2025-09-09

# This is the commit message #15:

comma

# This is the commit message #16:

Revert use of cwd on copy.
It doesn't matter if it is ugly or not. We are going to delete it this quarter anyway.
gh-worker-dd-mergequeue-cf854d Bot pushed a commit that referenced this pull request Jan 27, 2026
### What does this PR do?

Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.

### Motivation

The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.

### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.


Error without commenting the patcher :
```
        	Error:      	Received unexpected error:
        	            	All attempts fail:
        	            	#1: not found
        	            	#2: not found
        	            	#3: not found
        	            	#4: not found
        	            	#5: not found
        	            	#6: not found
        	            	#7: not found
        	            	#8: not found
        	            	#9: not found
        	            	#10: not found
        	            	#11: not found
        	            	#12: not found
        	            	#13: not found
        	            	#14: not found
        	            	#15: not found
        	            	#16: not found
        	            	#17: not found
        	            	#18: not found
        	            	#19: not found
        	            	#20: not found
        	            	#21: not found
        	            	#22: not found
        	            	#23: not found
        	            	#24: not found
        	            	#25: not found
        	            	#26: not found
        	            	#27: not found
        	            	#28: not found
        	            	#29: not found
        	            	#30: not found
        	Test:       	TestSSHUserSessionBlocking/second_ssh_no_auth
```

Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>
dd-octo-sts Bot added a commit that referenced this pull request Jan 27, 2026
Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.

### Motivation

The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.

### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.

Error without commenting the patcher :
```
        	Error:      	Received unexpected error:
        	            	All attempts fail:
        	            	#1: not found
        	            	#2: not found
        	            	#3: not found
        	            	#4: not found
        	            	#5: not found
        	            	#6: not found
        	            	#7: not found
        	            	#8: not found
        	            	#9: not found
        	            	#10: not found
        	            	#11: not found
        	            	#12: not found
        	            	#13: not found
        	            	#14: not found
        	            	#15: not found
        	            	#16: not found
        	            	#17: not found
        	            	#18: not found
        	            	#19: not found
        	            	#20: not found
        	            	#21: not found
        	            	#22: not found
        	            	#23: not found
        	            	#24: not found
        	            	#25: not found
        	            	#26: not found
        	            	#27: not found
        	            	#28: not found
        	            	#29: not found
        	            	#30: not found
        	Test:       	TestSSHUserSessionBlocking/second_ssh_no_auth
```

Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>
(cherry picked from commit 40d1f09)

___

Co-authored-by: Théo Putegnat <theo.putegnat@datadoghq.com>
dd-octo-sts Bot added a commit that referenced this pull request Jan 27, 2026
Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.

### Motivation

The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.

### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.

Error without commenting the patcher :
```
        	Error:      	Received unexpected error:
        	            	All attempts fail:
        	            	#1: not found
        	            	#2: not found
        	            	#3: not found
        	            	#4: not found
        	            	#5: not found
        	            	#6: not found
        	            	#7: not found
        	            	#8: not found
        	            	#9: not found
        	            	#10: not found
        	            	#11: not found
        	            	#12: not found
        	            	#13: not found
        	            	#14: not found
        	            	#15: not found
        	            	#16: not found
        	            	#17: not found
        	            	#18: not found
        	            	#19: not found
        	            	#20: not found
        	            	#21: not found
        	            	#22: not found
        	            	#23: not found
        	            	#24: not found
        	            	#25: not found
        	            	#26: not found
        	            	#27: not found
        	            	#28: not found
        	            	#29: not found
        	            	#30: not found
        	Test:       	TestSSHUserSessionBlocking/second_ssh_no_auth
```

Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>
(cherry picked from commit 40d1f09)

___

Co-authored-by: Théo Putegnat <theo.putegnat@datadoghq.com>
gh-worker-dd-mergequeue-cf854d Bot pushed a commit that referenced this pull request Jan 28, 2026
Backport 40d1f09 from #45437.

 ___

### What does this PR do?

Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.

### Motivation

The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.

### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.


Error without commenting the patcher :
```
        	Error:      	Received unexpected error:
        	            	All attempts fail:
        	            	#1: not found
        	            	#2: not found
        	            	#3: not found
        	            	#4: not found
        	            	#5: not found
        	            	#6: not found
        	            	#7: not found
        	            	#8: not found
        	            	#9: not found
        	            	#10: not found
        	            	#11: not found
        	            	#12: not found
        	            	#13: not found
        	            	#14: not found
        	            	#15: not found
        	            	#16: not found
        	            	#17: not found
        	            	#18: not found
        	            	#19: not found
        	            	#20: not found
        	            	#21: not found
        	            	#22: not found
        	            	#23: not found
        	            	#24: not found
        	            	#25: not found
        	            	#26: not found
        	            	#27: not found
        	            	#28: not found
        	            	#29: not found
        	            	#30: not found
        	Test:       	TestSSHUserSessionBlocking/second_ssh_no_auth
```

Co-authored-by: axel.vonengel <axel.vonengel@datadoghq.com>
gh-worker-dd-mergequeue-cf854d Bot pushed a commit that referenced this pull request Jan 28, 2026
Backport 40d1f09 from #45437.

 ___

### What does this PR do?

Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.

### Motivation

The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.

### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.


Error without commenting the patcher :
```
        	Error:      	Received unexpected error:
        	            	All attempts fail:
        	            	#1: not found
        	            	#2: not found
        	            	#3: not found
        	            	#4: not found
        	            	#5: not found
        	            	#6: not found
        	            	#7: not found
        	            	#8: not found
        	            	#9: not found
        	            	#10: not found
        	            	#11: not found
        	            	#12: not found
        	            	#13: not found
        	            	#14: not found
        	            	#15: not found
        	            	#16: not found
        	            	#17: not found
        	            	#18: not found
        	            	#19: not found
        	            	#20: not found
        	            	#21: not found
        	            	#22: not found
        	            	#23: not found
        	            	#24: not found
        	            	#25: not found
        	            	#26: not found
        	            	#27: not found
        	            	#28: not found
        	            	#29: not found
        	            	#30: not found
        	Test:       	TestSSHUserSessionBlocking/second_ssh_no_auth
```

Co-authored-by: YoannGh <yoann.ghigoff@datadoghq.com>
Co-authored-by: florent.clarret <florent.clarret@datadoghq.com>
theomagellan pushed a commit that referenced this pull request Feb 2, 2026
### What does this PR do?

Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.

### Motivation

The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.

### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.


Error without commenting the patcher :
```
        	Error:      	Received unexpected error:
        	            	All attempts fail:
        	            	#1: not found
        	            	#2: not found
        	            	#3: not found
        	            	#4: not found
        	            	#5: not found
        	            	#6: not found
        	            	#7: not found
        	            	#8: not found
        	            	#9: not found
        	            	#10: not found
        	            	#11: not found
        	            	#12: not found
        	            	#13: not found
        	            	#14: not found
        	            	#15: not found
        	            	#16: not found
        	            	#17: not found
        	            	#18: not found
        	            	#19: not found
        	            	#20: not found
        	            	#21: not found
        	            	#22: not found
        	            	#23: not found
        	            	#24: not found
        	            	#25: not found
        	            	#26: not found
        	            	#27: not found
        	            	#28: not found
        	            	#29: not found
        	            	#30: not found
        	Test:       	TestSSHUserSessionBlocking/second_ssh_no_auth
```

Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>
wynbennett added a commit that referenced this pull request Feb 23, 2026
Summary of Changes

  HIGH Priority Issues Fixed:

  #1: Write lock held across network I/O (impl/delegatedauth.go:270)
  - Refactored refreshAndGetAPIKey to release the lock before making network calls (authenticate)
  - The lock is now only held briefly to check/update state, not during network I/O

  #2: Context not propagated to signer.SignHTTP (aws.go:195)
  - Updated generateAwsAuthData to accept a context parameter
  - Changed signer.SignHTTP(context.Background(), ...) to signer.SignHTTP(ctx, ...)

  #3: Context not propagated to getCredentials IMDS call (aws.go:119)
  - Updated getCredentials to accept a context parameter
  - Removed ctx := context.Background() and now uses the passed context for IMDS calls

  MEDIUM Priority Issues Fixed:

  #4: No response body size limit (api/delegated_auth.go:97)
  - Added maxResponseBodySize = 1 * 1024 * 1024 constant (1 MB)
  - Wrapped response body with io.LimitReader to prevent memory exhaustion

  #5: No overall HTTP client timeout (api/delegated_auth.go:82)
  - Added httpClientTimeout = 30 * time.Second constant
  - Added Timeout: httpClientTimeout to the HTTP client

  #6: config.Set called while holding write lock (impl/delegatedauth.go:341)
  - Moved updateConfigWithAPIKey call outside the lock in startBackgroundRefresh
  - Captured the API key while holding the lock, then released it before calling config.Set

  #7: Blocking IMDS calls while holding write lock (impl/delegatedauth.go:127)
  - Refactored initializeIfNeeded to perform cloud detection without holding locks
  - IMDS calls now happen outside any lock, then state is updated with a brief write lock

  #8: Regex fails silently for non-standard formats (api/delegated_auth.go:36)
  - Added debug log when endpoint doesn't match known Datadog domain pattern
  - Updated function documentation to clarify behavior

  #9: Uncached IMDS credential fetch (aws.go:104)
  - Added documentation explaining the trade-off (refresh interval is typically 60 minutes, so caching is not critical)

  #10: Auth proof format undocumented (aws.go:98)
  - Added detailed comment documenting the auth proof format: <base64-body>|<base64-headers>|<method>|<base64-url>

  LOW Priority Issues Fixed:

  #11: Unnecessarily exported types (aws.go)
  - Changed SigningData to signingData (unexported)
  - Changed AWSAuth.AwsRegion to AWSAuth.region (unexported)
  - Updated all references in aws.go and aws_test.go

  #12: Tests exercise copy of goroutine (impl/delegatedauth_test.go:19)
  - Added documentation explaining why tests use a simplified goroutine pattern
  - Clarified that integration tests cover the actual startBackgroundRefresh function

  #13: Subsequent Config param silently ignored (def/delegatedauth.go:24)
  - Updated documentation to clearly state that only the first Config is used
  - Added warning log when a different Config is passed on subsequent calls
StephenWakely added a commit that referenced this pull request Mar 10, 2026
- Run benchmarks across comp/dogstatsd/server, pkg/aggregator, and
  comp/forwarder/defaultforwarder; results saved to plans/bench-baseline-forwarder.txt
- Generated pprof profiles (mem.out, cpu.out) from aggregator flush benchmarks
- Created scripts/profile_pipeline.sh documenting exact reproduction commands
- Documented top 10 allocation sites and CPU hotspots in plans/profiling-baseline.md
- Added benchmark test files for aggregator (time_sampler, context_resolver),
  forwarder, and dogstatsd/server that will be used in subsequent stories
- Typecheck passes (go build ./comp/... ./pkg/aggregator/... ./comp/forwarder/...)

Key findings:
  - pkg/metrics.(*Gauge).flush is #1 allocator (26.97% of objects) — target US-004
  - contextResolver.trackContext is #4 allocator and 26.58% cumulative CPU — target US-003
  - GC overhead accounts for ~22% of CPU — directly reducible via alloc reduction
  - Forwarder: 15 allocs/op per transaction creation — target US-007

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
chouetz added a commit that referenced this pull request Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants