Limit nut-scanner thread count by jimklimov · Pull Request #1158 · networkupstools/nut

jimklimov · 2021-11-03T21:15:21Z

If extremely many threads are spawned, e.g. scanning a very large IP address range, the nut-scanner program can crash due to resource exhaustion or hitting some constraints in pthreads implementation (various effects were seen on different generations of OSes) or third-party libraries used in particular scanners.

This PR adds a (CLI-configurable as --thread N) limit to amount of threads that would be spawned by different parallelized scanners (snmp, nut, netxml, serial), and if the limit gets hit - it would pthread_tryjoin_np() to wait for some of the threads to complete their work, and only then proceed to spawn another.

Note that while there is locking to increase or decrease the overall thread counter, checks are not mutex'ed and I anticipate that small accounting errors are possible (e.g. launching up to +1 "extra" thread for each scan type).

Given that scanning usually means sending a request and waiting if any reply comes within a timeout, the processing load of each item scan is negligible while considerable wall-clock time is spent. Lack of parallelism does add up: in tests, an SNMP scan of a /24 subnet took ~5 seconds without hitting a limit, 20 seconds with a limit of 64 threads, and ~1200 seconds with a limit of 1 thread.

The default limit in this PR is arbitrarily set to 1024 threads overall, to accomodate fast scans of a typical /24 subnet with several protocols at once.
UPDATE1: The default can get reduced if allowed file descriptor ulimit is smaller.
UPDATE2: Also individual protocol scanners can be subjected to their own limits; whichever is smaller (and gets hit first) wins.

PS: This is a separate issue from broken netmask processing, addressed in PR #1157

aquette · 2021-11-04T08:11:06Z

side note:
years ago, I uncovered a limitation in Net SNMP shared lib: when spawning more than 1024 handles, you crash it.
Worth to track this

aquette

LGTM, as discussed

jimklimov · 2021-11-04T09:17:26Z

Thanks, though trying to confirm that (in a Debian 11 derived enviromnent) by scanning 8 * /24 subnets and a huge limit of jobs, I did not get a crash and got many responses. Need to check with older systems...

It did however not work well from the first try: needed to bump ulimit -n 131072 in the shell (default 1024) at least for net-snmp to be able to search for its global and per-host config files, probably for net sockets, etc. It did complain otherwise.

Notably, with ulimit -n 2048 and nut-scanner -T 2048 ... and a scan of those 8 * /24 subnets (so 2048 IPs) there were no stderr complaints. With ulimit -n 1536 there were, e.g.:

...
Failed to open SNMP session for 10.0.7.252.
/usr/share/snmp/hosts/10.0.7.254.conf: Too many open files
/usr/share/snmp/hosts/10.0.7.254.local.conf: Too many open files
/usr/share/snmp/hosts/10.0.7.253.local.conf: Too many open files
/var/lib/snmp/hosts/10.0.7.254.conf: Too many open files
/var/lib/snmp/hosts/10.0.7.254.local.conf: Too many open files
/var/lib/snmp/hosts/10.0.7.253.conf: Too many open files
/var/lib/snmp/hosts/10.0.7.253.local.conf: Too many open files
/etc/snmp/hosts/10.0.7.255.conf: Too many open files
/etc/snmp/hosts/10.0.7.255.local.conf: Too many open files
Failed to open SNMP session for 10.0.7.254.
/usr/share/snmp/hosts/10.0.7.255.conf: Too many open files
/usr/share/snmp/hosts/10.0.7.255.local.conf: Too many open files
/var/lib/snmp/hosts/10.0.7.255.conf: Too many open files
/var/lib/snmp/hosts/10.0.7.255.local.conf: Too many open files
Failed to open SNMP session for 10.0.7.255.
Failed to open SNMP session for 10.0.7.253.

... so as far as netsnmp constraints are concerned, whether files or sockets, a 1:1 ulimit/host ratio is okay.

NetXML scans however crashed with a jobs allowance over 1021, and failed to create sockets (Error creating socket) with job allowance greater than ulimit - 3 (so also nearly 1:1):

*** buffer overflow detected ***: terminated
Aborted (core dumped)

...one "buffer overflow" (comes from libc AFAIK) for each thread above 1021 (so starting with nut-scanner --thread 1022); also with ulimit -n 1024 it fails to create sockets starting with --thread 1022 (again, ulimit minus 3 fires here).

It seems, the next improvement here would be to not hardcode the default thread count limit, but to derive it from detected current file descriptor ulimit (probably not the other way around - forcing the ulimit bump to match, since this should not generally run as root to be able to bump).

And probably add a (hard? configurable?) limit on thread-count of individual scanners as impacted by their tech :\

jimklimov · 2021-11-04T17:54:15Z

After recent commits, I can no longer crash nut-scanner -M for NetXML scans alone (even for larger subnets); however when scanning with several protocols (e.g. oldnut and/or netxml + snmp), and using over 1000 threads, I get buffer overflow detected and core-dumps again.

Double-checked that similar crash is possible in 42ITy fork of NUT with its different mechanism of limiting the thread count.

…ad count

…on to limit simultaneous scanning threads

…l children

…d scanning

…anner.c: not all glibc versions HAVE_PTHREAD_TRYJOIN

…nge when we add one

…yjoin_np() if nothing got cleaned away

…with /24 subnets

…tic fixes

…any were allocated

…vigate in code better

…mp.c

…n_snmp.c

…ount with current `ulimit -n` (minus known overhead)

… is usable

…n_nut.c + nut-scan.h: Add a hard limit on netxml scanning thread count

…t.rlim_cur value range

…e AM_MAKEFLAGS (follows-up to PR networkupstools#1151)

Inspired by 42ITy fork approach to similar issue

jimklimov · 2021-11-06T13:35:44Z

TODO later: replicate 42ity approach with semaphores (where available) instead of simple counters (possibly not too thread-safe)?

jimklimov requested review from aquette, clepple and zykh November 3, 2021 21:15

aquette reviewed Nov 4, 2021

View reviewed changes

Comment thread tools/nut-scanner/scan_snmp.c Outdated

aquette approved these changes Nov 4, 2021

View reviewed changes

jimklimov force-pushed the fix-nut-scanner-threadcount branch 4 times, most recently from 860e455 to ee403d5 Compare November 4, 2021 17:18

jimklimov force-pushed the fix-nut-scanner-threadcount branch 2 times, most recently from b1ee48b to a5f4cf4 Compare November 4, 2021 20:16

jimklimov and others added 16 commits November 4, 2021 23:52

tools/nut-scanner/scan_snmp.c: comment a FIXME for limiting the pthre…

c81db41

…ad count

tools/nut-scanner/nut-scanner.c: add support for -j N (--jobs=N) opti…

bbc869c

…on to limit simultaneous scanning threads

tools/nut-scanner/scan_snmp.c: handle max_threads, curr_threads

aa2f472

tools/nut-scanner/nut-scanner.c: add a threadcount_mutex to use in al…

c93da96

…l children

tools/nut-scanner/scan_snmp.c: limit the thread count for parallelize…

a7732eb

…d scanning

configure.ac, tools/nut-scanner/scan_snmp.c, tools/nut-scanner/nut-sc…

10d6780

…anner.c: not all glibc versions HAVE_PTHREAD_TRYJOIN

tools/nut-scanner/scan_snmp.c: move locking of total thread count cha…

c2a4660

…nge when we add one

tools/nut-scanner/scan_snmp.c: update log-tracing of pthread herding

e851221

tools/nut-scanner/scan_snmp.c: update comments and messages

59a41c0

tools/nut-scanner/scan_snmp.c: hide debug logging for thread herding

ace55a1

tools/nut-scanner/scan_snmp.c: only sleep after attempting pthread_tr…

84bb8db

…yjoin_np() if nothing got cleaned away

tools/nut-scanner/nut-scanner.c: bump default max_threads to not lag …

4a4e384

…with /24 subnets

tools/nut-scanner/nut-scan.h + scan_snmp.c: (C) header and some cosme…

1896712

…tic fixes

tools/nut-scanner/scan_snmp.c: only loop to free the thread_array if …

0b31bf9

…any were allocated

tools/nut-scanner/scan_snmp.c: comment "#endif // HAVE_PTHREAD" to na…

7dbbe4d

…vigate in code better

tools/nut-scanner/scan_xml_http.c: limit thread count like in scan_sn…

074e656

…mp.c

jimklimov and others added 13 commits November 4, 2021 23:52

tools/nut-scanner/scan_xml_http.c: define TRUE/FALSE

bb73c88

tools/nut-scanner/scan_nut.c: limit thread count like in scan_snmp.c

143dfdb

tools/nut-scanner/scan_eaton_serial.c: limit thread count like in sca…

574775b

…n_snmp.c

tools/nut-scanner/nut-scanner.c: wrap long line

cd19301

tools/nut-scanner/nut-scanner.c: whitespace fix

3de93f2

tools/nut-scanner/nut-scanner.c: constrain default or requested job c…

20af066

…ount with current `ulimit -n` (minus known overhead)

tools/nut-scanner/nut-scanner.c + configure.ac: detect if getrlimit()…

8fd5594

… is usable

tools/nut-scanner/nut-scanner.c + scan_xml_http.c + scan_snmp.c + sca…

7259852

…n_nut.c + nut-scan.h: Add a hard limit on netxml scanning thread count

tools/nut-scanner/nut-scanner.c: refine sanity-checks for nofile_limi…

c9ae525

…t.rlim_cur value range

*/Makefile.am: define dependencies on out-of-dir *.la helper libs: us…

b871c48

…e AM_MAKEFLAGS (follows-up to PR networkupstools#1151)

tools/nut-scanner/Makefile.am: fix rule for NUT_SCANNER_DEPS to be sure

6eedf12

tools/nut-scanner/nut-scanner.c: rename "--jobs" to "--thread"

c28a011

Inspired by 42ITy fork approach to similar issue

tools/nut-scanner/nut-scanner.c: revise parsing of --thread via strtol()

05b5b2b

Inspired by 42ITy fork approach to similar issue

jimklimov force-pushed the fix-nut-scanner-threadcount branch from 475534f to 05b5b2b Compare November 4, 2021 22:52

jimklimov changed the title ~~Fix nut-scanner thread count~~ Limit nut-scanner thread count Nov 5, 2021

jimklimov merged commit 2502ab8 into networkupstools:master Nov 6, 2021

jimklimov deleted the fix-nut-scanner-threadcount branch November 6, 2021 19:55

This was referenced Nov 6, 2021

Fix nut-scanner debuglevel #1160

Merged

Optimize nut-scanner multi-threaded loops #1166

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Limit nut-scanner thread count#1158

Limit nut-scanner thread count#1158
jimklimov merged 29 commits intonetworkupstools:masterfrom
jimklimov:fix-nut-scanner-threadcount

jimklimov commented Nov 3, 2021 •

edited

Loading

Uh oh!

aquette commented Nov 4, 2021

Uh oh!

Uh oh!

aquette left a comment

Uh oh!

jimklimov commented Nov 4, 2021 •

edited

Loading

Uh oh!

jimklimov commented Nov 4, 2021 •

edited

Loading

Uh oh!

jimklimov commented Nov 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jimklimov commented Nov 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aquette commented Nov 4, 2021

Uh oh!

Uh oh!

aquette left a comment

Choose a reason for hiding this comment

Uh oh!

jimklimov commented Nov 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimklimov commented Nov 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimklimov commented Nov 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jimklimov commented Nov 3, 2021 •

edited

Loading

jimklimov commented Nov 4, 2021 •

edited

Loading

jimklimov commented Nov 4, 2021 •

edited

Loading