Skip to content

[windows] make windows python checks (specifically WMI) load and run#303

Closed
derekwbrown wants to merge 98 commits intomasterfrom
db/win_py_checks
Closed

[windows] make windows python checks (specifically WMI) load and run#303
derekwbrown wants to merge 98 commits intomasterfrom
db/win_py_checks

Conversation

@derekwbrown
Copy link
Copy Markdown
Contributor

Work in progress.
Agent6 can now load python checks, specifically the WMI check.

Comment thread cmd/agent/common/helpers.go Outdated
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the change that concerns me the most. Python should find everything in "site-packages", but the included DLLs weren't found until I explicitly added the directories below. Which makes me concerned for the checks that use the other packages in site-packages.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so on Unix the actual script that's executed to the agent is https://github.com/DataDog/datadog-agent/blob/1b15f18edbfb5848f16a4613c858ae31b42289d5/pkg/collector/dist/agent, which populates the PYTHONPATH before calling the actual agent binary.
Is this what was missing on Windows and justifies this change, or is there a larger issue?

gmmeyer and others added 20 commits June 14, 2017 12:15
…294)

It should only panic when it makes sense for it to panic
* Fix error handling by using the same `stickyLock` when fetching the
interpreter error

Changing locks between a python call that sets the error indicator
on the interpreter and a fetch of that error can result in the
error indicator not being there anymore when the fetch is attempted.

I'm not sure exactly why this happens. The fact that it happens
"randomly" makes me think that it's related to how the goroutines are
scheduled on OS threads by the go runtime, and to how this can affect
the state of the python interpreter.

* Various improvements on `getPythonError`:
  * now a method of the `stickyLock` struct for clarity
  * check that an error occurred before trying to fetch it (improves
  quality of error that's returned by the function)
  * normalize the python error (recommended by the python C API docs)
  * use the string representation of `pvalue` (improves error message)


* Use `getPythonError` in py's `check.Run` to fetch errors from the
python interpreter. We should use it wherever possible.
* [py] Add `increment`/`decrement` methods to `AgentCheck`

Eases the transition from agent 5 to agent 6 for the checks. The
methods use `count` after adding a suffix to the metric name.

The suffix is required because `count` submits metrics with the
`COUNT` api metric type, whereas in agent 5 these methods use the
`RATE` api metric type, so we need to use a different metric name
(the backend doesn't support using a different API metric type for
the same metric name).

* [py] Log deprecation warning on first usage of `increment`/`decrement`

Logged at most once per check to avoid spamming the logs.
* [snmp] instance should be snmp_device, submit counters as rates.

* [snmp] fix broken snmp tests.
This time when `instances` is the only argument passed in `kwargs`.

This pattern is quite common in the existing checks (since it's the
signature of the agent5's `AgentCheck.__init__`), so it makes sense
to support it.

Added a test case, and assertions in the python test code.
* Add expvars to the scheduler

* Add expvar to check conf
* cleaned up dogstatsd rake commands

* build the static version of the binary when running size test
[percentiles] first pass at adding percentile sketch
* [jmx][auto-discovery] enabling auto-config for JMXfetch - WIP

[jmxfetch] bumping jmxfetch to 0.14.0.

[pipe][windows] fixing compilation issues.

[jmx] try to instantiate loader, if we fail to create pipe - skip.

[py] skip python if we cant load python loader.

[jmx][auto-discovery] implementing whitelist for specific jmxfetch checks.

[jmx][auto-discovery] fix output YAML indentation.

[config] include String() method.

[auto-discovery] separator includes cr.

[auto-discovery] renaming some more vars.

* [jmxfetch] fix multiple vet/lint errors, adding test.

* [jmx][test] fix loader test with in-memory pipe + parallel I/O

* [check][test] adding check String function unit test.

* [jmx] renaming loader module in embed package to jmxloader.
Only imported the modules that are used by existing integrations-core
checks.
As discussed. I've removed them entirely from `.gitlab-ci.yml`
since there's nothing very specific to the gitlab tasks.
We can re-add them once we've put some work into actually
making them fully standalone.
* consolidate default values for settings

* explicitly set default values

* improved descriptions in example file

* moar fixes
Should be replaced by a "standard" `device:` tag. This change shouldn't
make a difference in the backend, I'll make sure of that though.

2 parts to this change:
* `AgentCheck` supports `device_name` as a parameter to the metric
submission methods for backwards-compatibility, but we should stop
supporting it at some point (we log a deprecation notice when
`device_name` is used).
* Remove all traces of `DeviceName` field in aggregator
olivielpeau and others added 20 commits July 7, 2017 17:03
[k8s] Support older versions missing ListDeployment API
Quite a few `integrations-core` checks use this class of exceptions
for some of the exceptions they raise. It makes sense to keep it, it
encourages some level of exception granularity in the checks.
[percentile] change GKArray methods to value receivers
* [load] adding system check.

* [system][iostats] adding iostats check + tests.

* [iostats] more precise comment..

* [uptime] adding check + tests.

* [load] removing logging statement.

* [iostats][test] amending expected number of asserted calls.

* [iostats] improve windows support.

* [iostats] refactor unix specific io submission, for clarity.

* [iostats] refactor, windows numbers differ greatly. Currently not BW compatible.

* [iostats] removing unnecessary C(go) header.

* [iostats] implement device blacklist.

* [iostats] fixing blacklist management + adding test.

* [windows] implement windows IO check

* Initial checkin of code using both WMI & PDH

* Modified APM request

* touch up logging.  Remove calls to APM

* [load] adding system check.

* [system][iostats] adding iostats check + tests.

* [iostats] more precise comment..

* [load] removing logging statement.

* [iostats] improve windows support.

* [iostats] refactor unix specific io submission, for clarity.

* [iostats] refactor, windows numbers differ greatly. Currently not BW compatible.

* [iostats] removing unnecessary C(go) header.

* [iostats] implement device blacklist.

* [iostats] fixing blacklist management + adding test.

* [windows] implement windows IO check

* Initial checkin of code using both WMI & PDH

* Modified APM request

* touch up logging.  Remove calls to APM

* Switch back to wmi, at least for now.  Neither PDH nor WMI is picking up disk changes, at least in the manner tested.

* Fix problems with merge

* Fix merge.   Currently uses WMI; will re-add pdh based IO check at a later
date

* Changes to reflect review feedback

* centos6 changes (#354)

* adds upstart file to centos6

* adds upstart file to rpm

* changes conditional

* enables alternative python

* Fix gitlab ci so ci testing can continue

* Final review feedback.  Make string conversion more efficient by using a
byte buffer to do the original conversion, rather than appending a
character on to the string for each pass.

* Final review feedback. Make string conversion more efficient by using a
byte buffer to do the original conversion, rather than appending a
character on to the string for each pass.
These python checks should be pulled from integrations-core now.

Also, log with warning level the `self.warning` messages
Thin class that allows running the integrations-core `NetworkCheck`s
out of the box.

Also, added the `default_integration_http_timeout` config parameter
and `AgentCheck` attribute that some checks use.
[percentile/forwarder] use v2 endpoint to forward sketch series
* add send_host_metadata to disable host metadata collection if running several instances/host

* rename option to enable_metadata_collection and reword documentation
Using `exec` makes the agent process replace the shell script's
process instead of starting the agent process as a child process.

This is desirable for init systems that can't really track the actual
agent PID otherwise, and can then fail to track the agent process
properly.
* [py] Initialize site, and use PYTHONHOME

* Allow python to automatically initialize `site` when we initialize
the interpreter (see https://docs.python.org/2/library/site.html).
This makes python build its own default `PYTHONPATH` from the
`PYTHONHOME`, so that it can load modules (built-ins, `site-packages`
modules, etc).
* Instead of setting `PYTHONPATH` (since we let python build that on
its own), set `PYTHONHOME`. That'll pick up the embedded python, the
rest will be handled by python.

* [py] Log paths that we add to the python path
* Clean up the `SubmitV2` methods that were duplicates of the `Submit`
methods
* Rename `CheckRuns` to `ServiceChecks` on the v2 endpoint, and change
the endpoint to `/v2/service_checks`
* Add the `UNKNOWN` service check status
* Remove constants that were there only for the agent6-specific checks
@olivielpeau
Copy link
Copy Markdown
Member

@derekwbrown Can you try the changes that I've merged in #390? I think it should solve the loading issues you had on the modules in the site-packages directory.

@derekwbrown
Copy link
Copy Markdown
Contributor Author

created new PR #406

@derekwbrown derekwbrown deleted the db/win_py_checks branch October 12, 2018 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.