Skip to content

Fix GetValue() to correctly extract values from a report when compiled in LP64 mode (usbhid-ups)#1040

Merged
jimklimov merged 9 commits intonetworkupstools:masterfrom
nbriggs:issue_1023_GetValue-LP64
Oct 15, 2021
Merged

Fix GetValue() to correctly extract values from a report when compiled in LP64 mode (usbhid-ups)#1040
jimklimov merged 9 commits intonetworkupstools:masterfrom
nbriggs:issue_1023_GetValue-LP64

Conversation

@nbriggs
Copy link
Copy Markdown
Contributor

@nbriggs nbriggs commented Jun 3, 2021

Changes the strategy for removing potential garbage bits from values extracted from a report.
Use the LogMin and LogMax fields to drive which bits are meaningful but avoid confusion when,
for example, given a range of -1..2147483647

Closes #1023

…d in LP64 mode

Changes the strategy for removing potential garbage bits from values extracted from a report.
Use the LogMin and LogMax fields to drive which bits are meaningful but avoid
confusion when, for example, given a range of -1..2147483647
Copy link
Copy Markdown
Member

@jimklimov jimklimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite a binary-maths exercise! :)

Thanks for the proposed change and simplified code, noted one concern in the comments but you may convince me that it does not matter :)

Comment thread drivers/hidparser.c
}
b = hibit(range-1);
/* calculate where the sign bit will be if needed */
signbit = 1L << hibit(magMax > magMin ? magMax : magMin);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess 1L would undermine my question, but wouldn't there be some better case to use an explicit (u)int32_t instead of long? In other words, wondering if NUT might run on platforms where a long might be less than 4 bytes. Or wastefully more than that, for that matter, and in amounts that that would matter... (love the puns)

Alternately, could we "just assume that" 1ULL should certainly be at least 32 bits wide on any platform and shift that for the (u)int32_t types here?

Or just use a boring but presumably safe and portable uint32_t signbit = 1; (assume this zeroes out the other meaningful 31 bits) and signbit <<= ... ?

Copy link
Copy Markdown
Contributor Author

@nbriggs nbriggs Jun 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot packed up in there to respond to but perhaps the place to start is

platforms where a long might be less than 4 bytes

the C standard, e.g, ISO C99 draft precludes that possibility in section 5.2.4.2.1 Sizes of integer types, where it says that the LONG_MAX value must be no less than +2147483647.

The two common size configurations we currently see are ILP32 (ints, longs, pointers are all 32-bit) and LP64 (int is 32-bit, but long and pointer are 64-bit). I haven't personally encountered an ILP64 system, but I gather some may exist.

We know that 1L is by definition the same size as long -- I think introducing 1ULL, that is unsigned long long to the mix would add to confusion and possible errors rather than reduce it.

I think that if the rest of the NUT code were converted to use explicit sizes rather than the existing short, int, long, long long, and even some "unsigned short int" and "unsigned long long int", then it would be time to do the same here. I also think that lacking test cases for both conforming and non-conforming (to the HID spec) UPSs there's not much chance of doing that and still producing correct results.

[edit to add missing "not"]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I just tried running test cases for GetValue, with signbit and mask being uint32_t, and then with magMin and magMax either being unsigned long or uint32_t. In none of these cases did it produce the correct result when the program was compiled in 64-bit (LP64) mode. I haven't done the analysis to determine where things go wrong in these cases, but I'd argue that it's more important to get the correct answer...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications! I wonder if a version of that makes sense as a comment in the code, and a link to the issues and PRs you made on this point, so "future-we" (generally in the community) do not try to blindly "optimize" this and break stuff unwittingly.

Regarding the latter comment, I take it as that the current solution works for all architectures you could get hands on, and "similar" implementations as well as original one did not?

Also, do you have some test code to share (maybe into same comment, so future "wannabe optimizers" can test their fixes and/or put that into unit tests eventually)? It might help confirm non-regression on other platforms to be sure :)

@nbriggs
Copy link
Copy Markdown
Contributor Author

nbriggs commented Jun 16, 2021

I'll add an external test driver (in C) for the GetValue code in a separate PR if you don't mind.

@jimklimov
Copy link
Copy Markdown
Member

Thanks for the note, and thanks for thinking about the other PR for tests!

Currently Travis went down, so I'll be finalizing the new CI that was brewing (too slowly) hopefully this weekend to take over the marathon stick.

@nbriggs
Copy link
Copy Markdown
Contributor Author

nbriggs commented Jun 18, 2021

@jimklimov -- hold off on merging this for a bit. I got a test result I didn't expect which I want to investigate.

I'll turn this back into a draft until I'm happy with the results.

@nbriggs nbriggs marked this pull request as draft June 18, 2021 23:50
@nbriggs nbriggs marked this pull request as ready for review June 19, 2021 00:07
@nbriggs
Copy link
Copy Markdown
Contributor Author

nbriggs commented Jun 19, 2021

Problem resolved. It was my mistake in the data I was passing in the test driver because I had forgotten that the report items are presented to GetValue in little-endian order, so 00.00.08.00 is NOT 2048 (what I expected to see) -- it should be 00.08.00.00.

@nbriggs
Copy link
Copy Markdown
Contributor Author

nbriggs commented Jun 20, 2021

@jimklimov -- would you happen to have access to any debug output from any UPSs from which I can extract data like

50.628467 [D3] Report[buf]: (5 bytes) => 16 0c 00 00 00
50.628522 [D2] Path: UPS.PowerSummary.PresentStatus.Charging, Type: Input, ReportID: 0x16, Offset: 0, Size: 1, Value: 0
50.628566 [D3] NUT doesn't use this HID object
50.628626 [D3] Report[buf]: (5 bytes) => 16 0c 00 00 00
50.628678 [D2] Path: UPS.PowerSummary.PresentStatus.Discharging, Type: Input, ReportID: 0x16, Offset: 1, Size: 1, Value: 0

and

58.659692 [D3] Report[buf]: (4 bytes) => 0c 64 11 0d
58.659745 [D2] Path: UPS.PowerSummary.RemainingCapacity, Type: Input, ReportID: 0x0c, Offset: 0, Size: 8, Value: 100
58.659785 [D3] NUT doesn't use this HID object
58.659835 [D3] Report[buf]: (4 bytes) => 0c 64 11 0d
58.659885 [D2] Path: UPS.PowerSummary.RunTimeToEmpty, Type: Input, ReportID: 0x0c, Offset: 8, Size: 16, Value: 3345

and so on -- it doesn't matter whether they're entries that NUT uses. I'm building up a collection of predefined tests, as well as making it easy to check values from the command line as in:

./getvaluetest "0c 64 11 0d" 8 16 0 65535 3345
Test #0 buf "0c 64 11 0d" offset 8 size 16 logmin 0 (0x0) logmax 65535 (0xffff) value 3345 PASS

One of the predefined tests shows the problem with the original GetValue() code between 32- and 64-bit compiles:

% file getvaluetest32
getvaluetest32: ELF 32-bit MSB executable SPARC32PLUS Version 1, V8+ Required, UltraSPARC3 Extensions Required, dynamically linked, not stripped
% ./getvaluetest32
Test #1 buf "00 ff ff ff ff" offset 0 size 32 logmin -1 (0xffffffff) logmax 2147483647 (0x7fffffff) value -1 PASS
[...]
% file getvaluetest 
getvaluetest:     ELF 64-bit MSB executable SPARCV9 Version 1, dynamically linked, not stripped
% ./getvaluetest
Test #1 buf "00 ff ff ff ff" offset 0 size 32 logmin -1 (0xffffffffffffffff) logmax 2147483647 (0x7fffffff) value 0 FAIL expected -1
[...]

with the updated code the test passes for both compilation modes.

@jimklimov
Copy link
Copy Markdown
Member

Got an innotech (nutdrv_qx) UPS on USB of one openindiana (amd64) machine, reporting this sort of data points at startup:

   6.454593     [D3] send: Q1
   6.514069     [D5] read [  0]: (8 bytes) => 28 32 32 38 2e 34 20 32
   6.546635     [D5] read [  8]: (8 bytes) => 32 38 2e 34 20 32 32 38
   6.577904     [D5] read [ 16]: (8 bytes) => 2e 34 20 30 31 33 20 35
   6.611767     [D5] read [ 24]: (8 bytes) => 30 2e 33 20 31 33 2e 36
   6.674604     [D5] read [ 32]: (8 bytes) => 20 32 35 2e 30 20 30 30
   6.715857     [D5] read [ 40]: (8 bytes) => 30 30 31 30 30 31 0d 00
   6.715933     [D3] read: (228.4 228.4 228.4 013 50.3 13.6 25.0 00001001
   6.715985     [D5] send_to_all: SETINFO input.voltage "228.4"
  45.050434     [D1] upsdrv_updateinfo...
  45.050519     [D1] Quick update...
  45.052299     [D3] send: Q1
  45.104409     [D5] read [  0]: (8 bytes) => 28 32 32 39 2e 34 20 32
  45.168411     [D5] read [  8]: (8 bytes) => 32 39 2e 34 20 32 32 39
  45.200408     [D5] read [ 16]: (8 bytes) => 2e 34 20 30 31 33 20 35
  45.232386     [D5] read [ 24]: (8 bytes) => 30 2e 33 20 31 33 2e 36
  45.264416     [D5] read [ 32]: (8 bytes) => 20 32 35 2e 30 20 30 30
  45.296381     [D5] read [ 40]: (8 bytes) => 30 30 31 30 30 31 0d 00
  45.296442     [D3] read: (229.4 229.4 229.4 013 50.3 13.6 25.0 00001001
  45.296482     [D5] update_status: OL
  45.296522     [D5] update_status: !LB
  45.296563     [D5] update_status: !CAL
  45.296599     [D5] update_status: !FSD
 556.327876     [D1] Quick update...
 556.330645     [D3] send: Q1
 556.408327     [D5] read [  0]: (8 bytes) => 28 32 32 38 2e 30 20 32
 556.427190     [D5] read [  8]: (8 bytes) => 32 38 2e 30 20 32 32 38
 556.449794     [D5] read [ 16]: (8 bytes) => 2e 30 20 30 31 30 20 35
 556.537239     [D5] read [ 24]: (8 bytes) => 30 2e 33 20 31 33 2e 36
 556.545810     [D5] read [ 32]: (8 bytes) => 20 32 35 2e 30 20 30 30
 556.597602     [D5] read [ 40]: (8 bytes) => 30 30 31 30 30 31 0d 00
 556.597668     [D3] read: (228.0 228.0 228.0 010 50.3 13.6 25.0 00001001
 556.597707     [D5] update_status: OL
 556.597743     [D5] update_status: !LB
 556.597780     [D5] update_status: !CAL
 556.597814     [D5] update_status: !FSD
 589.313029     [D2] send_to_one: sending PONG
 589.313060     [D5] send_to_one: PONG
 589.313100     [D1] upsdrv_updateinfo...
 589.313136     [D1] Full update...
 589.315422     [D3] send: Q1
 589.430053     [D5] read [  0]: (8 bytes) => 28 32 32 39 2e 34 20 32
 589.469210     [D5] read [  8]: (8 bytes) => 32 39 2e 34 20 32 32 39
 589.472486     [D5] read [ 16]: (8 bytes) => 2e 34 20 30 31 30 20 35
 589.557763     [D5] read [ 24]: (8 bytes) => 30 2e 33 20 31 33 2e 36
 589.638961     [D5] read [ 32]: (8 bytes) => 20 32 35 2e 30 20 30 30
 589.696486     [D5] read [ 40]: (8 bytes) => 30 30 31 30 30 31 0d 00
 589.696551     [D3] read: (229.4 229.4 229.4 010 50.3 13.6 25.0 00001001
 589.696601     [D5] send_to_all: SETINFO input.voltage "229.4"
 589.696655     [D5] send_to_all: SETINFO input.voltage.fault "229.4"
 589.696698     [D5] send_to_all: SETINFO output.voltage "229.4"
 589.696738     [D5] send_to_all: SETINFO ups.load "10"
 589.696790     [D5] update_status: OL
 589.696824     [D5] update_status: !LB
 589.700088     [D5] update_status: !CAL
 589.700136     [D5] update_status: !FSD

Do these help your test cases? :)

@nbriggs
Copy link
Copy Markdown
Contributor Author

nbriggs commented Jun 21, 2021

Thanks... it's not quite enough info -- it's not reporting the offset, size, and value that it extracted from the bytes that it read. I'll go take a look at the nutdrv_qx driver later today to see if there's a way to get the full info out of it.

@nbriggs
Copy link
Copy Markdown
Contributor Author

nbriggs commented Jun 21, 2021

Oops... answered without doing the necessary investigation! My change only affects UPS units that use the usbhid-ups driver, or, if I am reading it correctly, the mge-shut driver. From the Makefile, that would seem to include units like these which are subdrivers of the usbhid-ups driver:

apc-hid.c belkin-hid.c cps-hid.c liebert-hid.c mge-hid.c powercom-hid.c
tripplite-hid.c idowell-hid.c openups-hid.c powervar-hid.c delta_ups-hid.c

The nutdrv_qx doesn't use the hidparser code.

@jimklimov jimklimov added ready / code review Author (and CI) consider the PR worthy of human rewievers' time ready / gonna merge The PR is in final cycles leading to merge unless someone logs an objection before we hit the button USB labels Jun 30, 2021
@jimklimov jimklimov requested review from aquette and clepple June 30, 2021 16:31
@jimklimov jimklimov changed the title Fix GetValue() to correctly extract values from a report when compiled in LP64 mode Fix GetValue() to correctly extract values from a report when compiled in LP64 mode (usbhid-ups) Jun 30, 2021
@clepple
Copy link
Copy Markdown
Member

clepple commented Jul 1, 2021

@nbriggs Thanks for digging into this.

would you happen to have access to any debug output from any UPSs from which I can extract data like ...

#733 has several smaller fields, and one 24-bit field (UPS.PowerSummary.RuntimeToEmpty). We added the [D3]-style debug level prefix to the output fairly recently (compared to when usbhid-ups was first written), so unfortunately it's not a single search string across all history. I got a fair number of hits in Gmail when searching for "Report[get]" (and most of those should be archived online e.g.: https://alioth-lists.debian.net/pipermail/nut-upsuser/2015-August/009790.html )

@clepple
Copy link
Copy Markdown
Member

clepple commented Jul 1, 2021

I will say that a place where you probably shouldn't trust the HID data is a CyberPower UPS. They seem to be relying on very strange interpretations of the HID stack description (which affects either Physical or Logical min/max), and I haven't had the time to figure out a decent structure for a HID descriptor patching system that doesn't involve including verbatim copies of all known buggy descriptors in NUT to match against.

Copy link
Copy Markdown
Member

@clepple clepple left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good points in the discussion on this PR, but I would want to try either unit tests or build and run on actual hardware to give it an explicit thumbs-up. Not sure when I will have time for testing.

@nbriggs
Copy link
Copy Markdown
Contributor Author

nbriggs commented Jul 1, 2021

@clepple -- Hi, I've pushed a branch (and create PR #1055) with the test harness that you can use to compare the old and new implementations of GetValue if you so choose. It's independent of this updated GetValue() code.

@jimklimov
Copy link
Copy Markdown
Member

I've updated the CI farm to include containers with various OSes and platforms (as QEMU emulated on Linux), but so far they seem too slow and complicated to add into the main build iterations. So I hope to at least run some tests for codebase of #1055 with and without this PR added, to see if it behaves well everywhere, thanks :)

I will also see if the different-endianness containers do work, some claimed errors during setup (not all instructions implemented in the vCPUs)

@jimklimov
Copy link
Copy Markdown
Member

FYI: With some recent development on CI side, I made a branch that should combine QEMU testing and proposed LP64 fix and test from #1040 and #1055 ... "so here goes nothing" : https://ci.networkupstools.org/job/nut/job/nut/job/issue_1023_GetValue_qemu_test/

@jimklimov
Copy link
Copy Markdown
Member

jimklimov commented Oct 15, 2021

QEMU on the already-VM CI farm is sloooow... but as of https://ci.networkupstools.org/blue/organizations/jenkins/nut%2Fnut/detail/issue_1023_GetValue_qemu_test/7/pipeline/ the tests went okay for Big-Endian s390x (64-bit) and mips (32-bit) as much as was possible to emulate.

And I also checked that the getvaluetest (without the LP64 fix) did fail for s390x in that first item, same as x86 envs originally:

jim@jenkins-debian11-s390x:~/nut$ (cd tests && ./getvaluetest)
Test #1 buf "00 ff ff ff ff" offset 0 size 32 logmin -1 (0xffffffffffffffff) logmax 2147483647 (0x7fffffff) value 0 FAIL expected -1
Test #2 buf "00 ff" offset 0 size 8 logmin -1 (0xffffffffffffffff) logmax 127 (0x7f) value -1 PASS
Test #3 buf "00 ff" offset 0 size 8 logmin 0 (0x0) logmax 127 (0x7f) value 127 PASS
Test #4 buf "00 ff" offset 0 size 8 logmin 0 (0x0) logmax 255 (0xff) value 255 PASS
Test #5 buf "33 00 0a 08 80" offset 0 size 32 logmin 0 (0x0) logmax 65535 (0xffff) value 2560 PASS
Test #6 buf "00 00 08 00 00" offset 0 size 32 logmin 0 (0x0) logmax 65535 (0xffff) value 2048 PASS
Test #7 buf "06 00 00 08" offset 0 size 8 logmin 0 (0x0) logmax 255 (0xff) value 0 PASS
Test #8 buf "06 00 00 08" offset 8 size 8 logmin 0 (0x0) logmax 255 (0xff) value 0 PASS
Test #9 buf "06 00 00 08" offset 16 size 8 logmin 0 (0x0) logmax 255 (0xff) value 8 PASS
Test #10 buf "16 0c 00 00 00" offset 0 size 1 logmin 0 (0x0) logmax 1 (0x1) value 0 PASS
Test #11 buf "16 0c 00 00 00" offset 1 size 1 logmin 0 (0x0) logmax 1 (0x1) value 0 PASS
Test #12 buf "16 0c 00 00 00" offset 2 size 1 logmin 0 (0x0) logmax 1 (0x1) value 1 PASS
Test #13 buf "16 0c 00 00 00" offset 3 size 1 logmin 0 (0x0) logmax 1 (0x1) value 1 PASS
Test #14 buf "16 0c 00 00 00" offset 4 size 1 logmin 0 (0x0) logmax 1 (0x1) value 0 PASS
Test #15 buf "16 0c 00 00 00" offset 5 size 1 logmin 0 (0x0) logmax 1 (0x1) value 0 PASS
Test #16 buf "16 0c 00 00 00" offset 6 size 1 logmin 0 (0x0) logmax 1 (0x1) value 0 PASS
Test #17 buf "16 0c 00 00 00" offset 7 size 1 logmin 0 (0x0) logmax 1 (0x1) value 0 PASS
Test #18 buf "16 0c 00 00 00" offset 8 size 1 logmin 0 (0x0) logmax 1 (0x1) value 0 PASS
Test #19 buf "16 0c 00 00 00" offset 9 size 1 logmin 0 (0x0) logmax 1 (0x1) value 0 PASS
Test #20 buf "16 0c 00 00 00" offset 10 size 1 logmin 0 (0x0) logmax 1 (0x1) value 0 PASS

Still waiting for mips result with that branch.

@nbriggs
Copy link
Copy Markdown
Contributor Author

nbriggs commented Oct 15, 2021

Happy to see it's still making progress. When it's merged I'll be rebuilding NUT for the guy whose (SPARC 64-bit) system originally prompted this exercise.

@jimklimov jimklimov merged commit e67d7aa into networkupstools:master Oct 15, 2021
@jimklimov
Copy link
Copy Markdown
Member

UPDATE: mips(32-bit) build did not expose the 64-bit bit-maths issues, passed the test for PR 1055 codebase (alone, without PR 1040 added).

Now both PRs are merged, adding the test fault on master history and fixing it :)

Thanks, for the find, fix, explanations and patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready / code review Author (and CI) consider the PR worthy of human rewievers' time ready / gonna merge The PR is in final cycles leading to merge unless someone logs an objection before we hit the button USB USB-HID encoding/LogMin/LogMax Issues and solutions (PRs) specifically about incorrect values in bitstream

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hidparser.c: GetValue() can return incorrect result when compiled for 64-bit environment

3 participants