Problem with NUTDRV_QX, subdriver armac (add EOL support) by jimklimov · Pull Request #2003 · networkupstools/nut

jimklimov · 2023-07-31T10:45:26Z

Closes: #1978

Thanks @blaa for proposing the fix, and @convicte for testing it.

@blaa

…upstools#1978] Authored by Tomasz bla Fortuna @blaa at networkupstools#1978 (comment) Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>

Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>

blaa · 2023-07-31T11:11:54Z

Hey, waaaait a bit with this. ;-) As I've wrote in the issue - this might break compatibility with other Armac/powermanagerii devices though.

The end of line test is certainly SAFE, but the -1 added to the copying look rather suspicious. I want to test it with my UPS (I'm on vacation and can do this in september ;s), or get a traffic dump from the @bdkacz for whom the previous version worked.

So, blindly merging that can break compatibility. ;s

jimklimov · 2023-07-31T11:13:54Z

@blaa: I have a question about the logic in this proposed change:

previously, you received the bytes_available amount of bytes via tmpbuf populated by usb_interrupt_read(), and either copied them all to buf or saw a zero-sized reply and aborted.
now you copy one byte less to buf and drop out if the remaining byte is 0x0d; however you do not copy it into buf if it is something else. And then I think you loop to usb_interrupt_read() again so lose the old tmpbuf contents.

Am I right with a gut feeling that you would lose one character in the middle of a reply from each loop here under the "right" conditions? Should there be an else to that comparison to 0x0d clause?

jimklimov · 2023-07-31T11:14:58Z

Converted to "draft" as asked by @blaa (to avoid merge before regression testing) ;)

blaa · 2023-07-31T11:15:47Z

+		memcpy(buf + bufpos, tmpbuf + 1, bytes_available - 1);
+		bufpos += bytes_available - 1;
+
+		if (tmpbuf[bytes_available - 1] == 0x0d) {


This should be safe in all cases... (although it uses the bytes_available as well and it's suspicious that it worked without -1 previously).

At least, it is unsigned and non-zero (per previous checks), so array-wise the expression is safe :)

blaa · 2023-07-31T11:17:32Z


-		memcpy(buf + bufpos, tmpbuf + 1, bytes_available);
-		bufpos += bytes_available;
+		memcpy(buf + bufpos, tmpbuf + 1, bytes_available - 1);


This doesn't look like a simple mistake in dropping -1. My UPS rather wouldn't work with it. Maybe it needs some kind of a detection on how this byte is coded. I need traces from both UPSes to make certain it will cover both cases.

Maaaybe the original issue were I did this driver has one...

That would be #1238 and #1239 I believe.

Otherwise, initially copying all of bytes_available in the original loop seemed reasonable. Say the usb method returned one new byte, here now you'd copy 1 - 1 == 0 bytes (and check if that one is end of line)...

there seem to be only a trace where UPS send 1 byte at a time during each interrupt transfer:

[D3] armac_command: Got a known command 'Q1\r' 2.121903 [D3] read: ret 6 buf c1: 28 30 30 30 30 >(0000< 2.126943 [D3] read: ret 6 buf 81: 32 30 30 30 30 >20000< 2.130978 [D3] read: ret 6 buf c1: 33 30 30 30 30 >30000< 2.133998 [D3] read: ret 6 buf 81: 38 30 30 30 30 >80000< (...)

Here bytes available (low nibble: tmpbuf[0] & 0x0f) seems to be equal to 1 and convey a 1 byte. Interrupt transmits 6 bytes at a time, but the rest is filled with literal ASCII 0. (would be cool if they used ZERO 0x00).

Current trace:

0.083301 [D4] armac command Q1 0.164847 [D4] read: ret 6 buf a6: 28 32 34 31 2e >(241.< 0.184839 [D4] read: ret 6 buf 86: 35 20 30 30 30 >5 000< 0.205851 [D4] read: ret 6 buf a6: 2e 30 20 32 33 >.0 23< 0.226849 [D4] read: ret 6 buf 86: 30 2e 33 20 30 >0.3 0<

The first nibble oscillates between a and 8 - previously c and 8.

But TBH, that seems like an easy fix. I'd assume this 6 means this UPS has available 6 bytes in a buffer. This interrupt transfer transferred 5... it can't transfer more. It's just FASTER than the previous model.

Instead of -1 one could:

bytes_available = (unsigned char)tmpbuf[0] & 0x0f; if (bytes_available == 0) { /* End of transfer */ break; } if (bytes_available > 5) { /* More bytes in UPS buffer, but this transfer only handles up to 5. */ bytes_available = 5; }

Or even use the 'ret' from usb interrupt in case it can transfer more later?

if (bytes_available > ret - 1) { /* More bytes in UPS buffer, but this transfer only handles up to 5. */ bytes_available = ret - 1; }

My current take is this one:

diff --git a/drivers/nutdrv_qx.c b/drivers/nutdrv_qx.c index 350c555a0..550eb4071 100644 --- a/drivers/nutdrv_qx.c +++ b/drivers/nutdrv_qx.c @@ -1825,6 +1825,9 @@ static int armac_command(const char *cmd, char *buf, size_t buflen) * (which will happen after successfully writing a command * to the UPS) */ if (ret != 6) { + /* NOTE: If end condition is invalid for particular UPS we might make one + * request more and get this error. If bufpos > (say) 10 this could be ignored + * and the reply correctly read. */ upsdebugx(1, "interrupt read error: %s (%d)", ret ? nut_usb_strerror(ret) : "timeout", @@ -1838,15 +1841,32 @@ static int armac_command(const char *cmd, char *buf, size_t buflen) tmpbuf[0], tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5], tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5]); + /* + * On most tested devices (including R/2000I/PSW) this was equal to the number of + * bytes returned in the buffer, but on some newer UPS (R/3000I/PF1) it was 1 more + * (1 control + 5 bytes transferred and bytes_available equal to 6 instead of 5). + * + * Current assumption is that this is number of bytes available on the UPS side + * with up to 5 (ret - 1) transferred. + */ bytes_available = (unsigned char)tmpbuf[0] & 0x0f; if (bytes_available == 0) { /* End of transfer */ break; } + if (bytes_available > 5) { + /* Single interrupt transfer has 1 control + 5 data bytes */ + bytes_available = 5; + } memcpy(buf + bufpos, tmpbuf + 1, bytes_available); bufpos += bytes_available; + if (tmpbuf[bytes_available] == 0x0d) { + /* End of line is an end of the message as well */ + break; + } + if (bytes_available <= 2) { /* Slow down, let the UPS buffer more bytes */ usleep(15000);

It compiles, but could use some testing. ;-) Hopefully it will handle both cases without additional -1 magic.

Hm, looking at the data dumps in #1978 (comment) :

0.414746 [D4] read: ret 6 buf 86: 2e 30 20 30 30 >.0 00< 0.435756 [D4] read: ret 6 buf a6: 30 30 30 30 30 >00000< 000<.939819 [D4] read: ret 6 buf 83: 31 0d 30 30 30 >1 1.940207 [D1] interrupt read error: Operation timed out (-7)

So for a line where we have lower nibble (first byte & 0x0f) equal to 6 and ret==6, we have 5 protocol content bytes. However in the last practical line, with the 0x0d, the nibble is 3 and there are 2 content bytes (and the UPS stops sending more, hence the timeout).

I believe, what the driver did originally, with your UPS where its nibble was 1 it copied one byte into the buffer. Where the nibble was 6 it probably copied 6 bytes (starting from tmpbuf+1 so overflowing the tmpbuf defined with size 6 exactly... wondering why compilers or run-times did not complain/segfault - not a write so don't care?) and probably adding garbage into that extra byte of the resulting string for each loop :\

Yeah, I've dropped ret - 1 because 6 is hardcoded above. (Original driver does reads always by 6 bytes). But that's equivalent - if ret != 6 we bail out anyway.

In general we can introduce a constant: const int READ_SIZE = 6;
Use it in usb_interrupt_read, in following if and later instead of ret - it will be cleaner and I can prepare a patch like that.

Architecture-wise, I believe the UPS works like this:
([controller] --uart or rs232-- [usb bridge with buffer]) --usb-- [PC]
And this USB interface is a really poor USB Serial implementation. ;)

So the bytes_available is not a length of a whole serial buffer, just the amount that currently is stored within the usb bridge. MAYBE. As far as we can tell. It's important to differentiate a TRUE data from a fill (0, 0x30), when there's less data then the read size certainly. Would not be necessary if the fill was 0x00. I think deducting 5 is worthless, but we could store info like more_available = true if it's larger than READ_SIZE - 1.

Unsure what to do with this information though. If we find \x0d but more_available == true that's probably an error and we could print a message - not much more though.

Hm, looking at the data dumps in #1978 (comment) :

0.414746 [D4] read: ret 6 buf 86: 2e 30 20 30 30 >.0 00< 0.435756 [D4] read: ret 6 buf a6: 30 30 30 30 30 >00000< 000<.939819 [D4] read: ret 6 buf 83: 31 0d 30 30 30 >1 1.940207 [D1] interrupt read error: Operation timed out (-7)

So for a line where we have lower nibble (first byte & 0x0f) equal to 6 and ret==6, we have 5 protocol content bytes. However in the last practical line, with the 0x0d, the nibble is 3 and there are 2 content bytes (and the UPS stops sending more, hence the timeout).

Watchout for the \r that shifts the 0 in the output in those lines. But yes, nibble is 3 and there are pretty much 2 bytes sent. I forgot about it today - and it suggest I should scan for \d and not just check the last byte.

I believe, what the driver did originally, with your UPS where its nibble was 1 it copied one byte into the buffer. Where the nibble was 6 it probably copied 6 bytes (starting from tmpbuf+1 so overflowing the tmpbuf defined with size 6 exactly... wondering why compilers or run-times did not complain/segfault - not a write so don't care?) and probably adding garbage into that extra byte of the resulting string for each loop :\

Driver when the nibble was 6 could be reading 1 byte over tmpbuf. C compilers and run-times are quirky. :S In Rust that would certainly be a runtime Panic, maybe compilation error. tmpbuf is on the stack, so there's a big chance it was reading 1 byte more from the stack just.

Certainly adding a garbage - that changed the meaning of this nibble in a way not observed in other UPSes.

I'll take a look at this final byte too, maybe it needs special handling to find \r correctly.

I think it will be more productive for you to cherry-pick the changes I've proposed so far and start another PR from your fork to override this one, to avoid me posting your patches from discussion into git :) WDYT?

Yeah, certainly. The problem is - testing. I can probably get this changes tested on the new UPS. Trouble is, to check on mine/other ones. When my sister is watering the plants I'll ask her to power it on; maaaybe I could do it remotely. (;

jimklimov added 2 commits July 31, 2023 12:42

drivers/nutdrv_qx.c: improve armac subdriver for EOL support [network…

fdca09e

…upstools#1978] Authored by Tomasz bla Fortuna @blaa at networkupstools#1978 (comment) Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>

NEWS: improve armac subdriver for EOL support [networkupstools#1978]

591c454

Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>

jimklimov added bug Qx protocol driver Driver based on Megatec Q<number> such as new nutdrv_qx, or obsoleted blazer and some others impacts-release-2.8.0 Issues reported against NUT release 2.8.0 (maybe vanilla or with minor packaging tweaks) labels Jul 31, 2023

jimklimov added this to the 2.8.1 milestone Jul 31, 2023

jimklimov mentioned this pull request Jul 31, 2023

Problem with NUTDRV_QX, subdriver armac - unable start driver #1978

Closed

jimklimov marked this pull request as draft July 31, 2023 11:14

blaa reviewed Jul 31, 2023

View reviewed changes

blaa mentioned this pull request Jul 31, 2023

drivers/nutdrv_qx.c: improve Armac subdriver #2005

Merged

jimklimov closed this Jul 31, 2023

jimklimov mentioned this pull request Oct 22, 2023

vULTECH UPS1050VA-PRO HOMEASSISTANT #1843

Open

Uh oh!

Conversation

jimklimov commented Jul 31, 2023

Uh oh!

blaa commented Jul 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimklimov commented Jul 31, 2023

Uh oh!

jimklimov commented Jul 31, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blaa Jul 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

blaa commented Jul 31, 2023 •

edited

Loading

blaa Jul 31, 2023 •

edited

Loading