Release v0.14.0-rc.1. by SuperQ · Pull Request #423 · prometheus/node_exporter

SuperQ · 2017-01-15T17:32:08Z

Update CHANGELOG
Update VERSION

Changes:
NOTE: We are deprecating several collectors in this release.

gmond - Out of scope.
megacli - Requires forking, moved to textfile collection.
ntp - Out of scope.

[FEATURE] Added loadavg collector for Solaris Added loadavg collector for Solaris #311
[FEATURE] Add StorCli text collector example script Add StorCli text collector example script #320
[FEATURE] Add collector for Linux EDAC Add collector for Linux EDAC #324
[FEATURE] Add text file utility for SMART metrics Add text file utility for SMART metrics #354
[FEATURE] Add a collector for NFS client statistics. Add a collector for NFS client statistics. #360
[FEATURE] Add mountstats collector for detailed NFS statistics Add mountstats collector for detailed NFS statistics #367
[FEATURE] A collector for DRBD A collector for DRBD #365
[FEATURE] Add cpu collector for darwin Add cpu collector for darwin #391
[FEATURE] Add netdev collector for darwin Add netdev collector for darwin #393
[FEATURE] Collect CPU temperatures on FreeBSD Collect CPU temperatures on FreeBSD #397
[FEATURE] Add ZFS collector + review feedback from PRs 213 and 369 Add ZFS collector + review feedback from PRs 213 and 369 #410
[FEATURE] Add initial wifi collector Add initial wifi collector #413
[FEATURE] Add NFS event metrics to mountstats collector Add NFS event metrics to mountstats collector #415
[IMPROVEMENT] hwmon: Provide annotation metric to link chip sysfs paths to human-readable chip types hwmon: Provide annotation metric to link chip sysfs paths to human-readable chip types #359
[IMPROVEMENT] Add node_filesystem_device_errors_total metric Add node_filesystem_device_errors_total metric #374
[IMPROVEMENT] Add runit service dir flag Add runit service dir flag #375
[IMPROVEMENT] Improve Docker documentation Improve Docker documentation #376
[IMPROVEMENT] Ignore autofs filesystems on linux Ignore autofs filesystems on linux #384
[IMPROVEMENT] Replace some FreeBSD collectors with pure Go versions Replace some FreeBSD collectors with pure Go versions #385
[IMPROVEMENT] Use filename as label, move 'label' to own metric hwmon: Use filename as label, move 'label' to own metric #411 (hwmon)
[BUGFIX] mips64 build fix mips64 build fix #361
[BUGFIX] Update vendoring Update vendoring #372 (to fix Panic with DBus #242)
[BUGFIX] Convert remaining collectors to use ConstMetrics Convert remaining collectors to use ConstMetrics #389
[BUGFIX] Check for errors in netdev scanner Check for errors in netdev scanner #398
[BUGFIX] Don't leak or race in FreeBSD devstat collector Don't leak or race in FreeBSD devstat collector #396

* Update CHANGELOG * Update VERSION

mdlayher · 2017-01-15T20:29:58Z

Wifi collector is Linux only for now, by the way.

Stoked!

mdlayher · 2017-01-15T20:31:25Z

Also, thoughts on enabling more collectors by default?

I can speak for the wifi and mountstats collectors, at least, being useful to have enabled by default.

If the machine isn't using WiFi or NFS, neither will report any metrics.

SuperQ · 2017-01-15T21:21:30Z

I don't object, as long as they behave well when nothing is enabled on the node.

discordianfish · 2017-01-16T12:30:49Z

There is still #216 open which we wanted to get it. I'm flying out tomorrow, so won't have time this week. If you think we should get this out now, we can post-pone it IMO.

discordianfish

LGTM

discordianfish

err LGTM

SuperQ · 2017-01-16T12:41:20Z

Yea, I wanted to see that get in, but there's been no progress on #390 in 2 weeks. I'm not sure it's worth waiting for.

jcberthon · 2017-02-09T14:34:40Z

Would it be possible to share with us why ntp is considered out of scope? Is it because it relies on the ntpd software being installed and therefore should get a specific exporter? If yes, will the current code be reused to provide a ntpd_exporter?

matthiasr · 2017-02-09T15:06:42Z

@jcberthon yes, that is the reason. Right now nothing is being removed, and we'd like to only do so once alternatives are available. Would you like to take the code and throw together a standalone exporter?

SuperQ · 2017-02-09T15:11:52Z

The reason we decided ntp was out of scope is because it functions as a blackbox probe. The collector does a real-time NTP probe against an external server. This could be very high traffic if someone were to point a large number of servers at pool.ntp.org or similar. It uses a golang implementation of the NTP protocol which is totally fine, but we didn't feel like it was a good fit for keeping in the node_exporter

I personally found this probe to be very useful as an additional check against ntpd or other such time syncing client software running on servers, it did produce a lot of jitter, and a lot of extra packets to our NTP server pools in production. A typical NTP client only sends one probe every ~15 minutes per server, not every 15-30sec like a node_exporter being scraped.

It's also useful for nodes that are not running NTP clients for whatever reason.

There are a few ways we can replace this functionality.

The code would be easy to adapt to a stand-alone blackbox prober.
We would like to add a node_system_clock_milliseconds metric to the node_exporter, and a function in Prometheus to compare the metric to the scrape time of the sample. This would give us +- 1ms diff of the clock vs the clock of the Prometheus server without having to actually probe anything external to the node.

As for monitoring ntpd and other ntp clients, this is something we could easily add as a textfile helper tool. This would export the real metrics provided by NTP client software running on nodes. I have already written a couple, but they're currently not open source. I will attempt to re-implement them and publish them sometime soon. Maybe in Python this time instead of shell. 😄

discordianfish · 2017-02-09T16:13:53Z

THere is already a node_system_clock_milliseconds metric. Not sure how it's named. I think node_time or something.

SuperQ · 2017-02-09T16:25:55Z

@discordianfish Seems like node_time is seconds resolution, so not really sufficient for monitoring node offsets.

discordianfish · 2017-02-09T16:37:15Z

@SuperQ Right but how would you figure out the precise timestamp of the scrape? The best I came up with was just time() - node_time which might be off by the scrape interval anyway.

SuperQ · 2017-02-09T16:40:24Z

@brian-brazil was talking about a specific function to compare a sample values with their collection timestamps at FOSDEM. Not implemented yet.

discordianfish · 2017-02-09T16:41:47Z

Ah, yes something like that would be great.

jcberthon · 2017-02-14T10:45:05Z

@matthiasr "would like" yes, but with 4 very young kids at home in Winter and a full time job, I have very limited time available for that. In addition, I haven't yet installed prometheus (no time) but I was looking into it to see if I could use it to monitor my Raspberry Pi NTP server, hence my initial interest ;-)

SuperQ · 2017-02-14T11:05:55Z

If you want to monitor an NTP server, you definitely want the NTP metrics helper script, not the ntp collector plugin.

See: #458

jcberthon · 2017-02-14T13:53:35Z

@SuperQ thanks for the hint. However, ntpq -pn gives you a view of the sync status per sources, it is useful information but not enough to know the "quality" of your NTP server, especially because those sources might change (if you use the NTP pool project for instance.

IMHO, It is better to use ntpq -c rv in order to get the ntp server kernel status (sync or not, stratum, rootdisp+rootdelay/2 (which is for me the maximum time difference to true UTC in ms), and offset (also in ms, but I'm still unsure of what that is exactly, ntp documentation is very unclear)). By far I'm not a NTP specialist, but I think those values gives you a better view of the "quality" of your NTP server than what ntpq -pn returns.

I'm also monitoring the frequency and sys_jitter returned value of ntpq -c rv but I'm not sure how to interpret them correctly.

For monitoring this, I'm using a simple script at the moment which loop around that command:

echo "ntpstats,host=$(hostname) $(ntpq -c "rv 0 offset,sys_jitter" | sed 's/ //g'),$(ntpq -c "rv 0 rootdisp,rootdelay" | sed 's/ //g')" | curl --silent --show-error -i -XPOST 'https://influxdb.lan:8086/write?db=telegraf' -u ${username}:${password} --data-binary  @- >/dev/null

Note: as one can see, I'm using the telegraf line protocol with an influxdb database, and grafana for display. But I want to investigate alternatives, hence my interest in prometheus.

SuperQ · 2017-02-14T14:41:36Z

@jcberthon, That's super useful information! I have filed #462 to add additional metrics.

Release v0.14.0-rc.1.

0d4e881

* Update CHANGELOG * Update VERSION

SuperQ assigned discordianfish Jan 15, 2017

SuperQ requested review from discordianfish and juliusv January 16, 2017 01:04

discordianfish reviewed Jan 16, 2017

View reviewed changes

discordianfish approved these changes Jan 16, 2017

View reviewed changes

SuperQ merged commit 5a07f41 into master Jan 16, 2017

SuperQ deleted the superq/v0.14.0-rc.1 branch January 16, 2017 15:55

SuperQ mentioned this pull request Feb 14, 2017

Improve ntpd helper script #462

Closed

Comments

Conversation

SuperQ commented Jan 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdlayher commented Jan 15, 2017

Uh oh!

mdlayher commented Jan 15, 2017

Uh oh!

SuperQ commented Jan 15, 2017

Uh oh!

discordianfish commented Jan 16, 2017

Uh oh!

discordianfish left a comment

Choose a reason for hiding this comment

Uh oh!

discordianfish left a comment

Choose a reason for hiding this comment

Uh oh!

SuperQ commented Jan 16, 2017

Uh oh!

jcberthon commented Feb 9, 2017

Uh oh!

matthiasr commented Feb 9, 2017

Uh oh!

SuperQ commented Feb 9, 2017

Uh oh!

discordianfish commented Feb 9, 2017

Uh oh!

SuperQ commented Feb 9, 2017

Uh oh!

discordianfish commented Feb 9, 2017

Uh oh!

SuperQ commented Feb 9, 2017

Uh oh!

discordianfish commented Feb 9, 2017

Uh oh!

jcberthon commented Feb 14, 2017

Uh oh!

SuperQ commented Feb 14, 2017

Uh oh!

jcberthon commented Feb 14, 2017

Uh oh!

SuperQ commented Feb 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SuperQ commented Jan 15, 2017 •

edited

Loading