Skip to content

[dogstatsd][aggregator] First pass#8

Merged
olivielpeau merged 1 commit intomasterfrom
mergeable-dogstatsd
Jul 5, 2016
Merged

[dogstatsd][aggregator] First pass#8
olivielpeau merged 1 commit intomasterfrom
mergeable-dogstatsd

Conversation

@olivielpeau
Copy link
Copy Markdown
Member

Simple prototype of a Dogstatsd server and an aggregator:

  • both support only gauges for now
  • the aggregator sends payloads to the metrics API endpoint of a local
    forwarder (not implemented, so use a running dd-agent to forward the
    packets)
  • the aggregator has been plugged to the existing check runner
  • each check has to specify its run interval to the aggregator
  • most of the tests from dd-agent's dogstatsd have been copied (most of
    them are not implemented yet) to give us an idea of the cases we need
    to support

@olivielpeau
Copy link
Copy Markdown
Member Author

cc @yannmh

Comment thread pkg/aggregator/sampler.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can omit the Json tag here since contextKey is not exported

@olivielpeau olivielpeau force-pushed the mergeable-dogstatsd branch from 95337da to d25d098 Compare July 5, 2016 12:56
@masci
Copy link
Copy Markdown
Contributor

masci commented Jul 5, 2016

LGTM, time to merge!

Simple prototype of a Dogstatsd server and an aggregator:

* both support only gauges for now
* the aggregator sends payloads to the metrics API endpoint of a local
forwarder (not implemented, so use a running dd-agent to forward the
packets)
* the aggregator has been plugged to the existing check runner
* each check has to specify its run interval to the aggregator
* most of the tests from dd-agent's dogstatsd have been copied (most of
them are not implemented yet) to give us an idea of the cases we need
to support
@olivielpeau olivielpeau force-pushed the mergeable-dogstatsd branch from d25d098 to 9245a39 Compare July 5, 2016 17:29
@olivielpeau olivielpeau merged commit ef96bfb into master Jul 5, 2016
@olivielpeau olivielpeau deleted the mergeable-dogstatsd branch July 5, 2016 17:32
@dcoleman17 dcoleman17 restored the mergeable-dogstatsd branch July 20, 2017 15:30
@masci masci deleted the mergeable-dogstatsd branch July 26, 2017 16:44
masci added a commit that referenced this pull request Apr 1, 2019
Add RTLD_GLOBAL flag to dlopen
hush-hush pushed a commit that referenced this pull request Apr 17, 2019
Add RTLD_GLOBAL flag to dlopen
safchain pushed a commit to safchain/datadog-agent that referenced this pull request May 20, 2020
safchain pushed a commit to safchain/datadog-agent that referenced this pull request Jun 4, 2020
aiuto added a commit that referenced this pull request Sep 17, 2025
# This is the 1st commit message:

[build] Fork rules_multitool to our own extension.

Import the core parts of rules_multitool but with appropriate modifications for our needs.

- Re-root things so this is a local extension, rather than a distinct module.
- remove the ability to use .netrc for authentication.
- remove the :cwd and :workspace_root variations.
  - There are equivalent workarounds using `--run_under="cd <path> &&"`
  - Using `bazel run` is usually not the best practice. Rules should use `$(location <tool>)`. If a tool is so common that people need to call it anywhere, at any time, then it should be in their path.
  - The code is left commented out. Ready to be enabled if a good case is presented.
- Remove the support for WORKSPACE.

Next steps:
- Change download structure so the exeuctable name is the tool, rather than `"executable"`.
- Stop passing the attributes of each binary around as json blobs to be encoded and decoded.
- Make `:path` feature that will print the full execution path.

# This is the commit message #2:

Update bazel/multitool/extension.bzl

Yeah. That's a better name.

Co-authored-by: Joseph Gette <jgettepost@gmail.com>
# This is the commit message #3:

Update bazel/multitool/extension.bzl

Co-authored-by: Joseph Gette <jgettepost@gmail.com>
# This is the commit message #4:

Update bazel/multitool/private/templates.bzl

Co-authored-by: Joseph Gette <jgettepost@gmail.com>
# This is the commit message #5:

render

# This is the commit message #6:

iff

# This is the commit message #7:

commit

# This is the commit message #8:

lintyfresth
s-alad pushed a commit that referenced this pull request Nov 21, 2025
aiuto added a commit that referenced this pull request Jan 13, 2026
# This is the 1st commit message:

Keep our own copy of cacert.pem

- Replace omnibus fetch from upstream with that static copy.
- Include text in the BUILD file about how we check for new upstream versions.
- Add explanation of why we have this.

https://datadoghq.atlassian.net/browse/ABLD-169

# This is the commit message #2:

just use copy for windows

# This is the commit message #3:

qmarks

# This is the commit message #4:

omnibus is to blame

# This is the commit message #5:

maybe

# This is the commit message #6:

add back in default version

# This is the commit message #7:

drop livestream on debug

# This is the commit message #8:

You're kidding, :live_stream?

# This is the commit message #9:

srsly

# This is the commit message #10:

just copy on windows

# This is the commit message #11:

cwd with copy probably does not work

# This is the commit message #12:

just give up on pkg_install for certs

# This is the commit message #13:

drop unneded pkg_install targets

# This is the commit message #14:

- use cwd to make it a little cleaner
- update cert to 2025-09-09

# This is the commit message #15:

comma

# This is the commit message #16:

Revert use of cwd on copy.
It doesn't matter if it is ugly or not. We are going to delete it this quarter anyway.
gh-worker-dd-mergequeue-cf854d Bot pushed a commit that referenced this pull request Jan 27, 2026
### What does this PR do?

Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.

### Motivation

The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.

### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.


Error without commenting the patcher :
```
        	Error:      	Received unexpected error:
        	            	All attempts fail:
        	            	#1: not found
        	            	#2: not found
        	            	#3: not found
        	            	#4: not found
        	            	#5: not found
        	            	#6: not found
        	            	#7: not found
        	            	#8: not found
        	            	#9: not found
        	            	#10: not found
        	            	#11: not found
        	            	#12: not found
        	            	#13: not found
        	            	#14: not found
        	            	#15: not found
        	            	#16: not found
        	            	#17: not found
        	            	#18: not found
        	            	#19: not found
        	            	#20: not found
        	            	#21: not found
        	            	#22: not found
        	            	#23: not found
        	            	#24: not found
        	            	#25: not found
        	            	#26: not found
        	            	#27: not found
        	            	#28: not found
        	            	#29: not found
        	            	#30: not found
        	Test:       	TestSSHUserSessionBlocking/second_ssh_no_auth
```

Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>
dd-octo-sts Bot added a commit that referenced this pull request Jan 27, 2026
Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.

### Motivation

The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.

### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.

Error without commenting the patcher :
```
        	Error:      	Received unexpected error:
        	            	All attempts fail:
        	            	#1: not found
        	            	#2: not found
        	            	#3: not found
        	            	#4: not found
        	            	#5: not found
        	            	#6: not found
        	            	#7: not found
        	            	#8: not found
        	            	#9: not found
        	            	#10: not found
        	            	#11: not found
        	            	#12: not found
        	            	#13: not found
        	            	#14: not found
        	            	#15: not found
        	            	#16: not found
        	            	#17: not found
        	            	#18: not found
        	            	#19: not found
        	            	#20: not found
        	            	#21: not found
        	            	#22: not found
        	            	#23: not found
        	            	#24: not found
        	            	#25: not found
        	            	#26: not found
        	            	#27: not found
        	            	#28: not found
        	            	#29: not found
        	            	#30: not found
        	Test:       	TestSSHUserSessionBlocking/second_ssh_no_auth
```

Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>
(cherry picked from commit 40d1f09)

___

Co-authored-by: Théo Putegnat <theo.putegnat@datadoghq.com>
dd-octo-sts Bot added a commit that referenced this pull request Jan 27, 2026
Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.

### Motivation

The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.

### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.

Error without commenting the patcher :
```
        	Error:      	Received unexpected error:
        	            	All attempts fail:
        	            	#1: not found
        	            	#2: not found
        	            	#3: not found
        	            	#4: not found
        	            	#5: not found
        	            	#6: not found
        	            	#7: not found
        	            	#8: not found
        	            	#9: not found
        	            	#10: not found
        	            	#11: not found
        	            	#12: not found
        	            	#13: not found
        	            	#14: not found
        	            	#15: not found
        	            	#16: not found
        	            	#17: not found
        	            	#18: not found
        	            	#19: not found
        	            	#20: not found
        	            	#21: not found
        	            	#22: not found
        	            	#23: not found
        	            	#24: not found
        	            	#25: not found
        	            	#26: not found
        	            	#27: not found
        	            	#28: not found
        	            	#29: not found
        	            	#30: not found
        	Test:       	TestSSHUserSessionBlocking/second_ssh_no_auth
```

Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>
(cherry picked from commit 40d1f09)

___

Co-authored-by: Théo Putegnat <theo.putegnat@datadoghq.com>
gh-worker-dd-mergequeue-cf854d Bot pushed a commit that referenced this pull request Jan 28, 2026
Backport 40d1f09 from #45437.

 ___

### What does this PR do?

Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.

### Motivation

The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.

### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.


Error without commenting the patcher :
```
        	Error:      	Received unexpected error:
        	            	All attempts fail:
        	            	#1: not found
        	            	#2: not found
        	            	#3: not found
        	            	#4: not found
        	            	#5: not found
        	            	#6: not found
        	            	#7: not found
        	            	#8: not found
        	            	#9: not found
        	            	#10: not found
        	            	#11: not found
        	            	#12: not found
        	            	#13: not found
        	            	#14: not found
        	            	#15: not found
        	            	#16: not found
        	            	#17: not found
        	            	#18: not found
        	            	#19: not found
        	            	#20: not found
        	            	#21: not found
        	            	#22: not found
        	            	#23: not found
        	            	#24: not found
        	            	#25: not found
        	            	#26: not found
        	            	#27: not found
        	            	#28: not found
        	            	#29: not found
        	            	#30: not found
        	Test:       	TestSSHUserSessionBlocking/second_ssh_no_auth
```

Co-authored-by: axel.vonengel <axel.vonengel@datadoghq.com>
gh-worker-dd-mergequeue-cf854d Bot pushed a commit that referenced this pull request Jan 28, 2026
Backport 40d1f09 from #45437.

 ___

### What does this PR do?

Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.

### Motivation

The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.

### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.


Error without commenting the patcher :
```
        	Error:      	Received unexpected error:
        	            	All attempts fail:
        	            	#1: not found
        	            	#2: not found
        	            	#3: not found
        	            	#4: not found
        	            	#5: not found
        	            	#6: not found
        	            	#7: not found
        	            	#8: not found
        	            	#9: not found
        	            	#10: not found
        	            	#11: not found
        	            	#12: not found
        	            	#13: not found
        	            	#14: not found
        	            	#15: not found
        	            	#16: not found
        	            	#17: not found
        	            	#18: not found
        	            	#19: not found
        	            	#20: not found
        	            	#21: not found
        	            	#22: not found
        	            	#23: not found
        	            	#24: not found
        	            	#25: not found
        	            	#26: not found
        	            	#27: not found
        	            	#28: not found
        	            	#29: not found
        	            	#30: not found
        	Test:       	TestSSHUserSessionBlocking/second_ssh_no_auth
```

Co-authored-by: YoannGh <yoann.ghigoff@datadoghq.com>
Co-authored-by: florent.clarret <florent.clarret@datadoghq.com>
theomagellan pushed a commit that referenced this pull request Feb 2, 2026
### What does this PR do?

Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.

### Motivation

The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.

### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.


Error without commenting the patcher :
```
        	Error:      	Received unexpected error:
        	            	All attempts fail:
        	            	#1: not found
        	            	#2: not found
        	            	#3: not found
        	            	#4: not found
        	            	#5: not found
        	            	#6: not found
        	            	#7: not found
        	            	#8: not found
        	            	#9: not found
        	            	#10: not found
        	            	#11: not found
        	            	#12: not found
        	            	#13: not found
        	            	#14: not found
        	            	#15: not found
        	            	#16: not found
        	            	#17: not found
        	            	#18: not found
        	            	#19: not found
        	            	#20: not found
        	            	#21: not found
        	            	#22: not found
        	            	#23: not found
        	            	#24: not found
        	            	#25: not found
        	            	#26: not found
        	            	#27: not found
        	            	#28: not found
        	            	#29: not found
        	            	#30: not found
        	Test:       	TestSSHUserSessionBlocking/second_ssh_no_auth
```

Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>
wynbennett added a commit that referenced this pull request Feb 23, 2026
Summary of Changes

  HIGH Priority Issues Fixed:

  #1: Write lock held across network I/O (impl/delegatedauth.go:270)
  - Refactored refreshAndGetAPIKey to release the lock before making network calls (authenticate)
  - The lock is now only held briefly to check/update state, not during network I/O

  #2: Context not propagated to signer.SignHTTP (aws.go:195)
  - Updated generateAwsAuthData to accept a context parameter
  - Changed signer.SignHTTP(context.Background(), ...) to signer.SignHTTP(ctx, ...)

  #3: Context not propagated to getCredentials IMDS call (aws.go:119)
  - Updated getCredentials to accept a context parameter
  - Removed ctx := context.Background() and now uses the passed context for IMDS calls

  MEDIUM Priority Issues Fixed:

  #4: No response body size limit (api/delegated_auth.go:97)
  - Added maxResponseBodySize = 1 * 1024 * 1024 constant (1 MB)
  - Wrapped response body with io.LimitReader to prevent memory exhaustion

  #5: No overall HTTP client timeout (api/delegated_auth.go:82)
  - Added httpClientTimeout = 30 * time.Second constant
  - Added Timeout: httpClientTimeout to the HTTP client

  #6: config.Set called while holding write lock (impl/delegatedauth.go:341)
  - Moved updateConfigWithAPIKey call outside the lock in startBackgroundRefresh
  - Captured the API key while holding the lock, then released it before calling config.Set

  #7: Blocking IMDS calls while holding write lock (impl/delegatedauth.go:127)
  - Refactored initializeIfNeeded to perform cloud detection without holding locks
  - IMDS calls now happen outside any lock, then state is updated with a brief write lock

  #8: Regex fails silently for non-standard formats (api/delegated_auth.go:36)
  - Added debug log when endpoint doesn't match known Datadog domain pattern
  - Updated function documentation to clarify behavior

  #9: Uncached IMDS credential fetch (aws.go:104)
  - Added documentation explaining the trade-off (refresh interval is typically 60 minutes, so caching is not critical)

  #10: Auth proof format undocumented (aws.go:98)
  - Added detailed comment documenting the auth proof format: <base64-body>|<base64-headers>|<method>|<base64-url>

  LOW Priority Issues Fixed:

  #11: Unnecessarily exported types (aws.go)
  - Changed SigningData to signingData (unexported)
  - Changed AWSAuth.AwsRegion to AWSAuth.region (unexported)
  - Updated all references in aws.go and aws_test.go

  #12: Tests exercise copy of goroutine (impl/delegatedauth_test.go:19)
  - Added documentation explaining why tests use a simplified goroutine pattern
  - Clarified that integration tests cover the actual startBackgroundRefresh function

  #13: Subsequent Config param silently ignored (def/delegatedauth.go:24)
  - Updated documentation to clearly state that only the first Config is used
  - Added warning log when a different Config is passed on subsequent calls
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants