Skip to content

Support optional NUT_NOCONF_ALLOWED=true in upsd (and enable for 42ity by default)#102

Merged
jimklimov merged 9 commits into42ity:FTYfrom
jimklimov:FTY-remerge-20200314
Apr 8, 2020
Merged

Support optional NUT_NOCONF_ALLOWED=true in upsd (and enable for 42ity by default)#102
jimklimov merged 9 commits into42ity:FTYfrom
jimklimov:FTY-remerge-20200314

Conversation

@jimklimov
Copy link
Copy Markdown
Member

Adaptation of networkupstools#766

Equivalent code will be tested in featureimage/systemd-deps-agents to check for proper behavior

Copy link
Copy Markdown

@boricj boricj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, but shouldn't the configuration be handled inside server/conf.c like any other global setting? Also I'll bike-shed the name: there is (possibly) a configuration but there are no devices, so the case-insensitive name should probably be startWithoutDevices or something like that.

Comment thread server/upsd.c
Comment thread server/upsd.c
@jimklimov
Copy link
Copy Markdown
Member Author

As for an option in NUT config - it was a big harder for me to implement so I left it as a reasonable future expansion. First gotta check the idea works at all, and no code elsewhere would explode looping over 0 UPSes etc. :)

Also, I think a valid possibility here can be initial absence (or emptiness) of an ups.conf, so where would the option to agree with such situation be?

A likely future would be the config option in the file indeed, and the envvar to override if set. Or a command-line option for the daemon...

@jimklimov
Copy link
Copy Markdown
Member Author

Practical test results surmised in networkupstools#766
in short: seems to work as desired
...(and can be improved in another effort, per JB's comments above)

Copy link
Copy Markdown
Member

@aquette aquette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimklimov is right here, and @boricj found the light ;)
the "numbered" variant is to be preferred.
As for the implementation:

  • this should go into upsd.conf (not nut.conf), since upsd specific, and should be handled as a global arg / flag (I can help in implementing if needed Jim)
  • the name: good question :D ALLOW_NO_DEVICE would be good, with the inline comment that goes along

that said, good to know that the current PR already provides the expected behavior.

@jimklimov
Copy link
Copy Markdown
Member Author

jimklimov commented Apr 3, 2020

Note: during live-testing, a caveat was found that I am not readily sure how to address: the C++ nutclient library at e.g. https://github.com/42ity/nut/blob/FTY/clients/nutclient.cpp#L702 expects to find some devices in the response or throws an exception. There does not seem to be a problem for C upsclient.c library.

It should be somehow changed now that still an exception is not getting a response from upsd, or not finding the particular device name asked for. But listing all devices and getting a response with none is valid.

  • Maybe there should be a separate response for "we heard you but got nothing"?

This is a separate issue from this PR (per discussion, there may be several ways to reach the goal of running and having no device sections in the server config).

It is not "critical" (as in breaking stuff), but spews red errors in the product journal regularly:

Apr 03 10:44:55 jenkins2-ova6 fty-nut[7623]: 2020-04-03 10:44:55 fty-nut [ERROR] src/nut_device.cc:612 Major communication problem with NUT (Invalid device)
Apr 03 10:44:55 jenkins2-ova6 fty-nut[7623]: 2020-04-03 10:44:55 fty-nut [INFO ] src/nut_device.cc:634 Updated 0/0 devices in 0.000 seconds
Apr 03 10:45:25 jenkins2-ova6 fty-nut[7623]: 2020-04-03 10:45:25 fty-nut [ERROR] src/nut_device.cc:612 Major communication problem with NUT (Invalid device)
Apr 03 10:45:25 jenkins2-ova6 fty-nut[7623]: 2020-04-03 10:45:25 fty-nut [INFO ] src/nut_device.cc:634 Updated 0/0 devices in 0.000 seconds

Technically the protocol can be checked to have sent a response:

$ telnet localhost 3493
> LIST UPS
< BEGIN LIST UPS
< END LIST UPS

Though maybe could use some command to send a count of known devices (re-confirm that it is zero), and maybe do so atomically (respond with list of devices and a count in same operation, to make sure the shipping manifest corresponds to the crate).

@boricj
Copy link
Copy Markdown

boricj commented Apr 3, 2020

Regarding nutclient, I would remove the throw. It's inconsistent with the rest of the API (https://github.com/42ity/nut/blob/FTY/clients/nutclient.cpp#L433-L444 does not throw an exception on an empty driver list among many others) and while there is the possibility that this change will break clients, I doubt that it will be the case in practice.

@jimklimov
Copy link
Copy Markdown
Member Author

Actually, I see Invalid device at least 17 times in that file alone, and 23 throw NutException overall, so throwing the exception in confirmed error cases does seem reasonable. The point here was that before this PR, returning an empty LIST UPS was an error (the server should not have started); afterwards it is a valid case.

@jimklimov
Copy link
Copy Markdown
Member Author

Although thanks for discussion. I tried to trace up in source where it would throw in the call stack from the routine you pointed to, and realized that the line I pointed to above does seem like the only error case for an empty list that throws such an exception. I guess there we have an empty devs vector as an argument (perhaps coming from earlier empty response, which now I am not sure is directly treated as exception as I assumed first, so would have to check). By text and name of getDevicesVariableValues() it seems that for an empty list of devices we iterate nothing and so end up with the empty map and throw, if indeed it happens here. So just need to sanity-check the inputs about an empty devs and return the empty map in such case, no errors thrown. WYSIWYG :)

@boricj
Copy link
Copy Markdown

boricj commented Apr 3, 2020

As a side note, while upsd would not start without drivers, it can nevertheless be tricked into this state. It will return an empty LIST UPS once a configuration reload happens with no drivers.

@jimklimov
Copy link
Copy Markdown
Member Author

This PR adds the option to start without drivers from the get-go :)

Although your point is also valid, for talking to other NUT versions whose servers can end up in this state indeed.

@jimklimov
Copy link
Copy Markdown
Member Author

Testing such sanity check in the feature image branch...

@jimklimov
Copy link
Copy Markdown
Member Author

So, this removed the Invalid Device errors happening every half a minute while there are not devices.
During/just-after a discovery run, this message happened once amidst many devices found, so not sure which device the error related to and what it was missing or indeed not seeing :\

Hypothesizing, could it be with a device for which no dstate was yet collected by the server so vars are unknown, for example?

In any case, I guess we can live with that if this is indeed the only occurrence and seen only in such edge conditions.

@jimklimov
Copy link
Copy Markdown
Member Author

The decision was made to merge the feature to our common baseline. Further improvement (refine the config toggle for device-less startup, and possibly further review exceptions of nutclient) can be done in other PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants