Limit nut-scanner thread count#1158
Conversation
|
side note: |
|
Thanks, though trying to confirm that (in a Debian 11 derived enviromnent) by scanning 8 * /24 subnets and a huge limit of jobs, I did not get a crash and got many responses. Need to check with older systems... It did however not work well from the first try: needed to bump Notably, with ... so as far as netsnmp constraints are concerned, whether files or sockets, a 1:1 ulimit/host ratio is okay. NetXML scans however crashed with a jobs allowance over 1021, and failed to create sockets ( ...one "buffer overflow" (comes from It seems, the next improvement here would be to not hardcode the default thread count limit, but to derive it from detected current file descriptor ulimit (probably not the other way around - forcing the ulimit bump to match, since this should not generally run as root to be able to bump). And probably add a (hard? configurable?) limit on thread-count of individual scanners as impacted by their tech :\ |
860e455 to
ee403d5
Compare
|
After recent commits, I can no longer crash Double-checked that similar crash is possible in 42ITy fork of NUT with its different mechanism of limiting the thread count. |
b1ee48b to
a5f4cf4
Compare
…on to limit simultaneous scanning threads
…anner.c: not all glibc versions HAVE_PTHREAD_TRYJOIN
…nge when we add one
…yjoin_np() if nothing got cleaned away
…any were allocated
…vigate in code better
…ount with current `ulimit -n` (minus known overhead)
…n_nut.c + nut-scan.h: Add a hard limit on netxml scanning thread count
…t.rlim_cur value range
…e AM_MAKEFLAGS (follows-up to PR networkupstools#1151)
Inspired by 42ITy fork approach to similar issue
Inspired by 42ITy fork approach to similar issue
475534f to
05b5b2b
Compare
|
TODO later: replicate 42ity approach with semaphores (where available) instead of simple counters (possibly not too thread-safe)? |
If extremely many threads are spawned, e.g. scanning a very large IP address range, the
nut-scannerprogram can crash due to resource exhaustion or hitting some constraints in pthreads implementation (various effects were seen on different generations of OSes) or third-party libraries used in particular scanners.This PR adds a (CLI-configurable as
--thread N) limit to amount of threads that would be spawned by different parallelized scanners (snmp, nut, netxml, serial), and if the limit gets hit - it wouldpthread_tryjoin_np()to wait for some of the threads to complete their work, and only then proceed to spawn another.Note that while there is locking to increase or decrease the overall thread counter, checks are not mutex'ed and I anticipate that small accounting errors are possible (e.g. launching up to +1 "extra" thread for each scan type).
Given that scanning usually means sending a request and waiting if any reply comes within a timeout, the processing load of each item scan is negligible while considerable wall-clock time is spent. Lack of parallelism does add up: in tests, an SNMP scan of a /24 subnet took ~5 seconds without hitting a limit, 20 seconds with a limit of 64 threads, and ~1200 seconds with a limit of 1 thread.
The default limit in this PR is arbitrarily set to 1024 threads overall, to accomodate fast scans of a typical /24 subnet with several protocols at once.
UPDATE1: The default can get reduced if allowed file descriptor
ulimitis smaller.UPDATE2: Also individual protocol scanners can be subjected to their own limits; whichever is smaller (and gets hit first) wins.
PS: This is a separate issue from broken netmask processing, addressed in PR #1157