-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Hello.
Recently we experienced a local network glitch when some NFS remote shares become unavailable. When that happened node_exporter started creating extra threads and host load was raised up to ~1000. The host was still available, no huge latency was observed, though in logs system claimed node_exported was out of file sockets (defaults to 1000). Our OS is RHEL7.
During the event ls /failed/mount or df -h /failed/mount was just stuck indefinitely returning nothing.
It looks like on each scrape new number of threads got started, querying filesystems again and getting stuck again. Can we have some mutex on a per-mount basis that will prevent creating another check for that FS and just will report empty data? This will help to evade bad issues like:
- Indefinite wait on NFS mounts when the remote host is not available.
- Stuck syscall on fusefs mounts (like SSHFS) when userland process already died but in-kernel part of the mount is stuck and can't be used until cleaned.
Thanks in advance.