Conversation
olivielpeau
left a comment
There was a problem hiding this comment.
Left a few comments.
The main thing would be to replace datadog-agent/dd-agent with datadog-agent6 (which would allow us to have agent 5 and 6 installed simultaneously). At some point we'll want to change this back to datadog-agent
| name 'datadog-agent' | ||
|
|
||
| source path: '..' | ||
|
|
There was a problem hiding this comment.
let's define here all the dependencies that datadog-agent requires (python for instance, can't think of any other one though)
| @@ -0,0 +1,11 @@ | |||
| name 'datadog-agent' | |||
|
|
|||
| source path: '..' | |||
There was a problem hiding this comment.
I think the path needs to be an absolute path, are you sure this works? (see http://www.rubydoc.info/github/opscode/omnibus/Omnibus/Software#source-instance_method)
| # Clean up what we installed | ||
| rm -f /etc/init.d/datadog-agent | ||
| rm -rf /etc/datadog/agent | ||
| rm -rf /opt/$PROJECT_NAME/* |
There was a problem hiding this comment.
this should be /opt/datadog/agent right?
There was a problem hiding this comment.
(or rather /opt/datadog-agent6)
| # | ||
| require "./lib/ostools.rb" | ||
|
|
||
| name 'datadog-agent' |
There was a problem hiding this comment.
let's temporarily name the pkg differently (so that we can have both agent 5 and 6 installed on the same machine), datadog-agent6 would make sense to me
| name 'datadog-agent' | ||
| maintainer 'Datadog Packages <package@datadoghq.com>' | ||
| homepage 'http://www.datadoghq.com' | ||
| install_dir '/opt/datadog/agent' |
| if linux? | ||
| # Debian | ||
| if debian? | ||
| extra_package_file '/lib/systemd/system/datadog-agent.service' |
There was a problem hiding this comment.
-> datadog-agent6.service?
(etc)
| end | ||
|
|
||
| # Example configuration files for the agent and the checks | ||
| extra_package_file '/etc/datadog/agent/datadog.conf.example' |
There was a problem hiding this comment.
-> /etc/datadog-agent6/datadog.conf.example?
(etc)
olivielpeau
left a comment
There was a problem hiding this comment.
Added another bunch of comments, some of them important, others not so much, let me know what you think :)
|
|
||
| # .deb specific flags | ||
| package :deb do | ||
| vendor 'Datadog <info@datadoghq.com>' |
There was a problem hiding this comment.
unrelated to your changes, but the email should be package@datadoghq.com, could you fix it now? 🙇
(I doubt we'll ever think about fixing it otherwise)
| @@ -0,0 +1,17 @@ | |||
| #!/bin/sh | |||
There was a problem hiding this comment.
These package scripts need to live in the folder named after the project (see https://github.com/DataDog/omnibus-ruby/blob/datadog-5.0.0/lib/omnibus/project.rb#L571), so these scripts would need to be moved to datadog-agent6 too
| # Just a dummy file that needs to be in the RPM package list if we don't want it to be removed | ||
| # during RPM upgrades. (the old files from the RPM file listthat are not in the new RPM file | ||
| # list will get removed, that's why we need this one here) | ||
| extra_package_file '/usr/bin/datadog-agent6' |
There was a problem hiding this comment.
now that I think about it maybe we should just remove all these extra_package_files since we're not actually packaging them (at least for now), but it's just a suggestion, that's up to you really
There was a problem hiding this comment.
let's remove them, I prefer to start with the bare minimum to avoid carrying on dead or unused code
| # software definitions, but you can clone/fork Chef's to get you started. | ||
| gem 'omnibus-software', git: 'git://github.com/datadog/omnibus-software.git', branch: ENV['OMNIBUS_SOFTWARE_BRANCH'] | ||
|
|
||
| gem 'ohai', '~> 8.10.0' |
There was a problem hiding this comment.
you can remove this dep, I added it a while ago to pin ohai but it's not needed anymore
* Provide Omnibus installer for the project (#22) + gitlab CI * use gitlab cache system instead of putting stuff on /cache
[cpu] Add count for number of logical processors
[cpu] Add count for number of logical processors
### What does this PR do?
Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.
### Motivation
The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.
### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.
Error without commenting the patcher :
```
Error: Received unexpected error:
All attempts fail:
#1: not found
#2: not found
#3: not found
#4: not found
#5: not found
#6: not found
#7: not found
#8: not found
#9: not found
#10: not found
#11: not found
#12: not found
#13: not found
#14: not found
#15: not found
#16: not found
#17: not found
#18: not found
#19: not found
#20: not found
#21: not found
#22: not found
#23: not found
#24: not found
#25: not found
#26: not found
#27: not found
#28: not found
#29: not found
#30: not found
Test: TestSSHUserSessionBlocking/second_ssh_no_auth
```
Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>
Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.
### Motivation
The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.
### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.
Error without commenting the patcher :
```
Error: Received unexpected error:
All attempts fail:
#1: not found
#2: not found
#3: not found
#4: not found
#5: not found
#6: not found
#7: not found
#8: not found
#9: not found
#10: not found
#11: not found
#12: not found
#13: not found
#14: not found
#15: not found
#16: not found
#17: not found
#18: not found
#19: not found
#20: not found
#21: not found
#22: not found
#23: not found
#24: not found
#25: not found
#26: not found
#27: not found
#28: not found
#29: not found
#30: not found
Test: TestSSHUserSessionBlocking/second_ssh_no_auth
```
Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>
(cherry picked from commit 40d1f09)
___
Co-authored-by: Théo Putegnat <theo.putegnat@datadoghq.com>
Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.
### Motivation
The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.
### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.
Error without commenting the patcher :
```
Error: Received unexpected error:
All attempts fail:
#1: not found
#2: not found
#3: not found
#4: not found
#5: not found
#6: not found
#7: not found
#8: not found
#9: not found
#10: not found
#11: not found
#12: not found
#13: not found
#14: not found
#15: not found
#16: not found
#17: not found
#18: not found
#19: not found
#20: not found
#21: not found
#22: not found
#23: not found
#24: not found
#25: not found
#26: not found
#27: not found
#28: not found
#29: not found
#30: not found
Test: TestSSHUserSessionBlocking/second_ssh_no_auth
```
Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>
(cherry picked from commit 40d1f09)
___
Co-authored-by: Théo Putegnat <theo.putegnat@datadoghq.com>
Backport 40d1f09 from #45437. ___ ### What does this PR do? Skip the SSH session patcher and add a test to illustrate the current issue. In addition, adds the possibility to check specific fields in the json returned for ssh_session events. ### Motivation The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved. Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events. ### Describe how you validated your changes Added a test that illustrate the issue : `TestSSHUserSessionBlocking` With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent. Error without commenting the patcher : ``` Error: Received unexpected error: All attempts fail: #1: not found #2: not found #3: not found #4: not found #5: not found #6: not found #7: not found #8: not found #9: not found #10: not found #11: not found #12: not found #13: not found #14: not found #15: not found #16: not found #17: not found #18: not found #19: not found #20: not found #21: not found #22: not found #23: not found #24: not found #25: not found #26: not found #27: not found #28: not found #29: not found #30: not found Test: TestSSHUserSessionBlocking/second_ssh_no_auth ``` Co-authored-by: axel.vonengel <axel.vonengel@datadoghq.com>
Backport 40d1f09 from #45437. ___ ### What does this PR do? Skip the SSH session patcher and add a test to illustrate the current issue. In addition, adds the possibility to check specific fields in the json returned for ssh_session events. ### Motivation The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved. Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events. ### Describe how you validated your changes Added a test that illustrate the issue : `TestSSHUserSessionBlocking` With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent. Error without commenting the patcher : ``` Error: Received unexpected error: All attempts fail: #1: not found #2: not found #3: not found #4: not found #5: not found #6: not found #7: not found #8: not found #9: not found #10: not found #11: not found #12: not found #13: not found #14: not found #15: not found #16: not found #17: not found #18: not found #19: not found #20: not found #21: not found #22: not found #23: not found #24: not found #25: not found #26: not found #27: not found #28: not found #29: not found #30: not found Test: TestSSHUserSessionBlocking/second_ssh_no_auth ``` Co-authored-by: YoannGh <yoann.ghigoff@datadoghq.com> Co-authored-by: florent.clarret <florent.clarret@datadoghq.com>
### What does this PR do?
Skip the SSH session patcher and add a test to illustrate the current issue.
In addition, adds the possibility to check specific fields in the json returned for ssh_session events.
### Motivation
The retry mechanism could cause the agent to send no more than one event per minute if an SSH session was not properly resolved.
Previously, the event was not sent and the agent would wait one minute before sending it with the `unknown` type. However, this `authtype` would never be resolved because the session was initialized before the agent started processing events. As a result, every subsequent SSH event would wait one minute for nothing, causing a significant delay in agent events, potentially blocking all the other events.
### Describe how you validated your changes
Added a test that illustrate the issue : `TestSSHUserSessionBlocking`
With this change, the ssh_session event is now sent with `authtype` set to `unknown` and directly sent.
Error without commenting the patcher :
```
Error: Received unexpected error:
All attempts fail:
#1: not found
#2: not found
#3: not found
#4: not found
#5: not found
#6: not found
#7: not found
#8: not found
#9: not found
#10: not found
#11: not found
#12: not found
#13: not found
#14: not found
#15: not found
#16: not found
#17: not found
#18: not found
#19: not found
#20: not found
#21: not found
#22: not found
#23: not found
#24: not found
#25: not found
#26: not found
#27: not found
#28: not found
#29: not found
#30: not found
Test: TestSSHUserSessionBlocking/second_ssh_no_auth
```
Co-authored-by: theo.putegnat <theo.putegnat@datadoghq.com>
…ame detection Switch from quick-xml's serde deserialization to xml-rs's iterative streaming parser to address security audit findings: - #21: Stack overflow via deeply nested XML (quick-xml recurses per nesting level) - #22: CPU exhaustion via many attributes (quick-xml serde is O(n²) per attribute) - #13: XXE defense-in-depth (xml-rs has no entity expansion support) The new xml_parser module provides shared helpers with MAX_DEPTH=256 and MAX_ATTRS=256 limits, plus security regression tests. All five XML-parsing files (tomcat, jboss, weblogic, websphere, jee) are converted from serde structs to streaming event loops. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The PR implements an Omnibus installer for the project.
For the time being, to allow developers working on the new agent to have a v.5 installed on their dev boxes, I've changed the install and configuration path.