node_exporter is causing high LA on remote shares failing

Hello.

Recently we experienced a local network glitch when some NFS remote shares become unavailable. When that happened node_exporter started creating extra threads and host load was raised up to ~1000. The host was still available, no huge latency was observed, though in logs system claimed node_exported was out of file sockets (defaults to 1000). Our OS is RHEL7.

During the event ls /failed/mount or df -h /failed/mount was just stuck indefinitely returning nothing.

It looks like on each scrape new number of threads got started, querying filesystems again and getting stuck again. Can we have some mutex on a per-mount basis that will prevent creating another check for that FS and just will report empty data? This will help to evade bad issues like:

1. Indefinite wait on NFS mounts when the remote host is not available.
2. Stuck syscall on fusefs mounts (like SSHFS) when userland process already died but in-kernel part of the mount is stuck and can't be used until cleaned.

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node_exporter is causing high LA on remote shares failing #1259

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

node_exporter is causing high LA on remote shares failing #1259

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions