Fix GetValue() to correctly extract values from a report when compiled in LP64 mode (usbhid-ups)#1040
Fix GetValue() to correctly extract values from a report when compiled in LP64 mode (usbhid-ups)#1040jimklimov merged 9 commits intonetworkupstools:masterfrom nbriggs:issue_1023_GetValue-LP64
Conversation
…d in LP64 mode Changes the strategy for removing potential garbage bits from values extracted from a report. Use the LogMin and LogMax fields to drive which bits are meaningful but avoid confusion when, for example, given a range of -1..2147483647
jimklimov
left a comment
There was a problem hiding this comment.
That's quite a binary-maths exercise! :)
Thanks for the proposed change and simplified code, noted one concern in the comments but you may convince me that it does not matter :)
| } | ||
| b = hibit(range-1); | ||
| /* calculate where the sign bit will be if needed */ | ||
| signbit = 1L << hibit(magMax > magMin ? magMax : magMin); |
There was a problem hiding this comment.
I guess 1L would undermine my question, but wouldn't there be some better case to use an explicit (u)int32_t instead of long? In other words, wondering if NUT might run on platforms where a long might be less than 4 bytes. Or wastefully more than that, for that matter, and in amounts that that would matter... (love the puns)
Alternately, could we "just assume that" 1ULL should certainly be at least 32 bits wide on any platform and shift that for the (u)int32_t types here?
Or just use a boring but presumably safe and portable uint32_t signbit = 1; (assume this zeroes out the other meaningful 31 bits) and signbit <<= ... ?
There was a problem hiding this comment.
There's a lot packed up in there to respond to but perhaps the place to start is
platforms where a
longmight be less than 4 bytes
the C standard, e.g, ISO C99 draft precludes that possibility in section 5.2.4.2.1 Sizes of integer types, where it says that the LONG_MAX value must be no less than +2147483647.
The two common size configurations we currently see are ILP32 (ints, longs, pointers are all 32-bit) and LP64 (int is 32-bit, but long and pointer are 64-bit). I haven't personally encountered an ILP64 system, but I gather some may exist.
We know that 1L is by definition the same size as long -- I think introducing 1ULL, that is unsigned long long to the mix would add to confusion and possible errors rather than reduce it.
I think that if the rest of the NUT code were converted to use explicit sizes rather than the existing short, int, long, long long, and even some "unsigned short int" and "unsigned long long int", then it would be time to do the same here. I also think that lacking test cases for both conforming and non-conforming (to the HID spec) UPSs there's not much chance of doing that and still producing correct results.
[edit to add missing "not"]
There was a problem hiding this comment.
BTW, I just tried running test cases for GetValue, with signbit and mask being uint32_t, and then with magMin and magMax either being unsigned long or uint32_t. In none of these cases did it produce the correct result when the program was compiled in 64-bit (LP64) mode. I haven't done the analysis to determine where things go wrong in these cases, but I'd argue that it's more important to get the correct answer...
There was a problem hiding this comment.
Thanks for the clarifications! I wonder if a version of that makes sense as a comment in the code, and a link to the issues and PRs you made on this point, so "future-we" (generally in the community) do not try to blindly "optimize" this and break stuff unwittingly.
Regarding the latter comment, I take it as that the current solution works for all architectures you could get hands on, and "similar" implementations as well as original one did not?
Also, do you have some test code to share (maybe into same comment, so future "wannabe optimizers" can test their fixes and/or put that into unit tests eventually)? It might help confirm non-regression on other platforms to be sure :)
|
I'll add an external test driver (in C) for the GetValue code in a separate PR if you don't mind. |
|
Thanks for the note, and thanks for thinking about the other PR for tests! Currently Travis went down, so I'll be finalizing the new CI that was brewing (too slowly) hopefully this weekend to take over the marathon stick. |
|
@jimklimov -- hold off on merging this for a bit. I got a test result I didn't expect which I want to investigate. I'll turn this back into a draft until I'm happy with the results. |
|
Problem resolved. It was my mistake in the data I was passing in the test driver because I had forgotten that the report items are presented to GetValue in little-endian order, so 00.00.08.00 is NOT 2048 (what I expected to see) -- it should be 00.08.00.00. |
|
@jimklimov -- would you happen to have access to any debug output from any UPSs from which I can extract data like
and
and so on -- it doesn't matter whether they're entries that NUT uses. I'm building up a collection of predefined tests, as well as making it easy to check values from the command line as in:
One of the predefined tests shows the problem with the original with the updated code the test passes for both compilation modes. |
|
Got an innotech (nutdrv_qx) UPS on USB of one openindiana (amd64) machine, reporting this sort of data points at startup: Do these help your test cases? :) |
|
Thanks... it's not quite enough info -- it's not reporting the offset, size, and value that it extracted from the bytes that it read. I'll go take a look at the nutdrv_qx driver later today to see if there's a way to get the full info out of it. |
|
Oops... answered without doing the necessary investigation! My change only affects UPS units that use the usbhid-ups driver, or, if I am reading it correctly, the mge-shut driver. From the Makefile, that would seem to include units like these which are subdrivers of the usbhid-ups driver:
The nutdrv_qx doesn't use the hidparser code. |
|
@nbriggs Thanks for digging into this.
#733 has several smaller fields, and one 24-bit field ( |
|
I will say that a place where you probably shouldn't trust the HID data is a CyberPower UPS. They seem to be relying on very strange interpretations of the HID stack description (which affects either Physical or Logical min/max), and I haven't had the time to figure out a decent structure for a HID descriptor patching system that doesn't involve including verbatim copies of all known buggy descriptors in NUT to match against. |
clepple
left a comment
There was a problem hiding this comment.
All good points in the discussion on this PR, but I would want to try either unit tests or build and run on actual hardware to give it an explicit thumbs-up. Not sure when I will have time for testing.
|
I've updated the CI farm to include containers with various OSes and platforms (as QEMU emulated on Linux), but so far they seem too slow and complicated to add into the main build iterations. So I hope to at least run some tests for codebase of #1055 with and without this PR added, to see if it behaves well everywhere, thanks :) I will also see if the different-endianness containers do work, some claimed errors during setup (not all instructions implemented in the vCPUs) |
|
FYI: With some recent development on CI side, I made a branch that should combine QEMU testing and proposed LP64 fix and test from #1040 and #1055 ... "so here goes nothing" : https://ci.networkupstools.org/job/nut/job/nut/job/issue_1023_GetValue_qemu_test/ |
|
QEMU on the already-VM CI farm is sloooow... but as of https://ci.networkupstools.org/blue/organizations/jenkins/nut%2Fnut/detail/issue_1023_GetValue_qemu_test/7/pipeline/ the tests went okay for Big-Endian s390x (64-bit) and mips (32-bit) as much as was possible to emulate. And I also checked that the getvaluetest (without the LP64 fix) did fail for s390x in that first item, same as x86 envs originally: Still waiting for mips result with that branch. |
|
Happy to see it's still making progress. When it's merged I'll be rebuilding NUT for the guy whose (SPARC 64-bit) system originally prompted this exercise. |
|
UPDATE: mips(32-bit) build did not expose the 64-bit bit-maths issues, passed the test for PR 1055 codebase (alone, without PR 1040 added). Now both PRs are merged, adding the test fault on master history and fixing it :) Thanks, for the find, fix, explanations and patience! |
Changes the strategy for removing potential garbage bits from values extracted from a report.
Use the LogMin and LogMax fields to drive which bits are meaningful but avoid confusion when,
for example, given a range of -1..2147483647
Closes #1023