Skip to content

Comments

collector/filesystem: s/MNT_NOWAIT/MNT_WAIT#2960

Open
rexagod wants to merge 1 commit intoprometheus:masterfrom
rexagod:1498
Open

collector/filesystem: s/MNT_NOWAIT/MNT_WAIT#2960
rexagod wants to merge 1 commit intoprometheus:masterfrom
rexagod:1498

Conversation

@rexagod
Copy link
Contributor

@rexagod rexagod commented Mar 18, 2024

getfsstat(2) spec mentions that using MNT_NOWAIT will return the information it has available without requesting an update from each file system. Hence, use MNT_WAIT in place of the earlier used mode, and make changes to the affected collectors to avoid being stuck for long intervals.

Fixes: #1498

@rexagod rexagod marked this pull request as draft March 18, 2024 20:04
@rexagod rexagod marked this pull request as ready for review March 18, 2024 20:05
@rexagod rexagod force-pushed the 1498 branch 2 times, most recently from d9f495a to 514d009 Compare March 18, 2024 20:15
@rexagod rexagod marked this pull request as draft March 18, 2024 22:40
`getfsstat(2)` spec mentions that using `MNT_NOWAIT` will return the
information it has available without requesting an update from each file
system. Hence, use `MNT_WAIT` in place of the earlier used mode, and
make changes to the affected collectors to avoid being stuck for long
intervals.

Fixes: prometheus#1498

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
@rexagod rexagod marked this pull request as ready for review March 19, 2024 12:55
@discordianfish
Copy link
Member

I wonder if we could refactor the timeout logic implemented in https://github.com/prometheus/node_exporter/pull/997/files and re-use it here?

@rexagod
Copy link
Contributor Author

rexagod commented Mar 30, 2024

@discordianfish Sure! Do you think it'd be better to do that in a separate PR?

@discordianfish
Copy link
Member

@rexagod I think you can just do it here and use to to address the MNT_WAIT change

@rexagod
Copy link
Contributor Author

rexagod commented Jun 22, 2024

@discordianfish I revisited the machinery introduced in #997, and although it seems fitting for that use case, I've had to drop it entirely in this patch to accommodate for the varying design patterns for different filesystems. As such, I believe a per-filesystem case makes sense here since encapsulating that logic such that it fits all filesystems can get non-scalable in the future as more corner cases are added to satisfy the various implementations. Please note that while the previous machinery is being dropped here, the expected behavior is still the same.

LMK if you think otherwise, though. :)

@discordianfish
Copy link
Member

Would be nice if we could avoid duplicating the channel/goroutine/timeout logic

@rexagod
Copy link
Contributor Author

rexagod commented Dec 16, 2024

So I ended up taking a couple of stabs at this, but I believe that the refactor (that essentially takes the goroutine, channel, and the timeout logic into a common place may add up to the maintenance complexity, since there are different APIs being used between linux|bsd|{open,free}bsd (since they may not be available on other operating systems) that makes it harder to abstract things more. Owing to this, I am able to establish a common pattern between {free,open}bsd, but not linux and bsd as they differ significantly.

PLMK there's anything I'm missing that could help address your earlier comment.

@discordianfish
Copy link
Member

@SuperQ If you're ok with this, I'd say we merge if @rexagod can rebase it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

node_filesystem_{size,avail}_bytes report wrong values for ZFS filesystems

2 participants