Skip to content

Problem with NUTDRV_QX, subdriver armac (add EOL support)#2003

Closed
jimklimov wants to merge 2 commits intonetworkupstools:masterfrom
jimklimov:issue-1978
Closed

Problem with NUTDRV_QX, subdriver armac (add EOL support)#2003
jimklimov wants to merge 2 commits intonetworkupstools:masterfrom
jimklimov:issue-1978

Conversation

@jimklimov
Copy link
Copy Markdown
Member

Closes: #1978

Thanks @blaa for proposing the fix, and @convicte for testing it.

…upstools#1978]

Authored by Tomasz bla Fortuna @blaa at networkupstools#1978 (comment)

Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
@jimklimov jimklimov added bug Qx protocol driver Driver based on Megatec Q<number> such as new nutdrv_qx, or obsoleted blazer and some others impacts-release-2.8.0 Issues reported against NUT release 2.8.0 (maybe vanilla or with minor packaging tweaks) labels Jul 31, 2023
@jimklimov jimklimov added this to the 2.8.1 milestone Jul 31, 2023
@blaa
Copy link
Copy Markdown

blaa commented Jul 31, 2023

Hey, waaaait a bit with this. ;-) As I've wrote in the issue - this might break compatibility with other Armac/powermanagerii devices though.

The end of line test is certainly SAFE, but the -1 added to the copying look rather suspicious. I want to test it with my UPS (I'm on vacation and can do this in september ;s), or get a traffic dump from the @bdkacz for whom the previous version worked.

So, blindly merging that can break compatibility. ;s

@jimklimov
Copy link
Copy Markdown
Member Author

@blaa: I have a question about the logic in this proposed change:

  • previously, you received the bytes_available amount of bytes via tmpbuf populated by usb_interrupt_read(), and either copied them all to buf or saw a zero-sized reply and aborted.
  • now you copy one byte less to buf and drop out if the remaining byte is 0x0d; however you do not copy it into buf if it is something else. And then I think you loop to usb_interrupt_read() again so lose the old tmpbuf contents.

Am I right with a gut feeling that you would lose one character in the middle of a reply from each loop here under the "right" conditions? Should there be an else to that comparison to 0x0d clause?

@jimklimov jimklimov marked this pull request as draft July 31, 2023 11:14
@jimklimov
Copy link
Copy Markdown
Member Author

Converted to "draft" as asked by @blaa (to avoid merge before regression testing) ;)

Comment thread drivers/nutdrv_qx.c
memcpy(buf + bufpos, tmpbuf + 1, bytes_available - 1);
bufpos += bytes_available - 1;

if (tmpbuf[bytes_available - 1] == 0x0d) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be safe in all cases... (although it uses the bytes_available as well and it's suspicious that it worked without -1 previously).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, it is unsigned and non-zero (per previous checks), so array-wise the expression is safe :)

Comment thread drivers/nutdrv_qx.c

memcpy(buf + bufpos, tmpbuf + 1, bytes_available);
bufpos += bytes_available;
memcpy(buf + bufpos, tmpbuf + 1, bytes_available - 1);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like a simple mistake in dropping -1. My UPS rather wouldn't work with it. Maybe it needs some kind of a detection on how this byte is coded. I need traces from both UPSes to make certain it will cover both cases.

Maaaybe the original issue were I did this driver has one...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be #1238 and #1239 I believe.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, initially copying all of bytes_available in the original loop seemed reasonable. Say the usb method returned one new byte, here now you'd copy 1 - 1 == 0 bytes (and check if that one is end of line)...

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there seem to be only a trace where UPS send 1 byte at a time during each interrupt transfer:

                      [D3] armac_command: Got a known command 'Q1\r'                         
   2.121903     [D3] read: ret 6 buf c1: 28 30 30 30 30  >(0000<                  
   2.126943     [D3] read: ret 6 buf 81: 32 30 30 30 30  >20000<
   2.130978     [D3] read: ret 6 buf c1: 33 30 30 30 30  >30000<
   2.133998     [D3] read: ret 6 buf 81: 38 30 30 30 30  >80000<      
(...)

Here bytes available (low nibble: tmpbuf[0] & 0x0f) seems to be equal to 1 and convey a 1 byte. Interrupt transmits 6 bytes at a time, but the rest is filled with literal ASCII 0. (would be cool if they used ZERO 0x00).

Current trace:

   0.083301     [D4] armac command Q1
   0.164847     [D4] read: ret 6 buf a6: 28 32 34 31 2e  >(241.<
   0.184839     [D4] read: ret 6 buf 86: 35 20 30 30 30  >5 000<
   0.205851     [D4] read: ret 6 buf a6: 2e 30 20 32 33  >.0 23<
   0.226849     [D4] read: ret 6 buf 86: 30 2e 33 20 30  >0.3 0<

The first nibble oscillates between a and 8 - previously c and 8.

But TBH, that seems like an easy fix. I'd assume this 6 means this UPS has available 6 bytes in a buffer. This interrupt transfer transferred 5... it can't transfer more. It's just FASTER than the previous model.

Instead of -1 one could:

bytes_available = (unsigned char)tmpbuf[0] & 0x0f;
if (bytes_available == 0) {
    /* End of transfer */
    break;
}
if (bytes_available > 5) {
    /* More bytes in UPS buffer, but this transfer only handles up to 5. */
    bytes_available = 5;
}

Or even use the 'ret' from usb interrupt in case it can transfer more later?

if (bytes_available > ret - 1) {
    /* More bytes in UPS buffer, but this transfer only handles up to 5. */
    bytes_available = ret - 1;
}

Copy link
Copy Markdown

@blaa blaa Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current take is this one:

diff --git a/drivers/nutdrv_qx.c b/drivers/nutdrv_qx.c
index 350c555a0..550eb4071 100644
--- a/drivers/nutdrv_qx.c
+++ b/drivers/nutdrv_qx.c
@@ -1825,6 +1825,9 @@ static int        armac_command(const char *cmd, char *buf, size_t buflen)
                 * (which will happen after successfully writing a command
                 * to the UPS) */
                if (ret != 6) {
+                       /* NOTE: If end condition is invalid for particular UPS we might make one
+                        * request more and get this error. If bufpos > (say) 10 this could be ignored
+                        * and the reply correctly read. */
                        upsdebugx(1,
                                "interrupt read error: %s (%d)",
                                ret ? nut_usb_strerror(ret) : "timeout",
@@ -1838,15 +1841,32 @@ static int      armac_command(const char *cmd, char *buf, size_t buflen)
                        tmpbuf[0], tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5],
                        tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5]);
 
+               /*
+                * On most tested devices (including R/2000I/PSW) this was equal to the number of
+                * bytes returned in the buffer, but on some newer UPS (R/3000I/PF1) it was 1 more
+                * (1 control + 5 bytes transferred and bytes_available equal to 6 instead of 5).
+                *
+                * Current assumption is that this is number of bytes available on the UPS side
+                * with up to 5 (ret - 1) transferred.
+                */
                bytes_available = (unsigned char)tmpbuf[0] & 0x0f;
                if (bytes_available == 0) {
                        /* End of transfer */
                        break;
                }
+               if (bytes_available > 5) {
+                       /* Single interrupt transfer has 1 control + 5 data bytes */
+                       bytes_available = 5;
+               }
 
                memcpy(buf + bufpos, tmpbuf + 1, bytes_available);
                bufpos += bytes_available;
 
+               if (tmpbuf[bytes_available] == 0x0d) {
+                       /* End of line is an end of the message as well */
+                       break;
+               }
+
                if (bytes_available <= 2) {
                        /* Slow down, let the UPS buffer more bytes */
                        usleep(15000);

It compiles, but could use some testing. ;-) Hopefully it will handle both cases without additional -1 magic.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looking at the data dumps in #1978 (comment) :

   0.414746     [D4] read: ret 6 buf 86: 2e 30 20 30 30  >.0 00<
   0.435756     [D4] read: ret 6 buf a6: 30 30 30 30 30  >00000<
000<.939819     [D4] read: ret 6 buf 83: 31 0d 30 30 30  >1
   1.940207     [D1] interrupt read error: Operation timed out (-7)

So for a line where we have lower nibble (first byte & 0x0f) equal to 6 and ret==6, we have 5 protocol content bytes. However in the last practical line, with the 0x0d, the nibble is 3 and there are 2 content bytes (and the UPS stops sending more, hence the timeout).

I believe, what the driver did originally, with your UPS where its nibble was 1 it copied one byte into the buffer. Where the nibble was 6 it probably copied 6 bytes (starting from tmpbuf+1 so overflowing the tmpbuf defined with size 6 exactly... wondering why compilers or run-times did not complain/segfault - not a write so don't care?) and probably adding garbage into that extra byte of the resulting string for each loop :\

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've dropped ret - 1 because 6 is hardcoded above. (Original driver does reads always by 6 bytes). But that's equivalent - if ret != 6 we bail out anyway.

In general we can introduce a constant: const int READ_SIZE = 6;
Use it in usb_interrupt_read, in following if and later instead of ret - it will be cleaner and I can prepare a patch like that.

Architecture-wise, I believe the UPS works like this:
([controller] --uart or rs232-- [usb bridge with buffer]) --usb-- [PC]
And this USB interface is a really poor USB Serial implementation. ;)

So the bytes_available is not a length of a whole serial buffer, just the amount that currently is stored within the usb bridge. MAYBE. As far as we can tell. It's important to differentiate a TRUE data from a fill (0, 0x30), when there's less data then the read size certainly. Would not be necessary if the fill was 0x00. I think deducting 5 is worthless, but we could store info like more_available = true if it's larger than READ_SIZE - 1.

Unsure what to do with this information though. If we find \x0d but more_available == true that's probably an error and we could print a message - not much more though.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looking at the data dumps in #1978 (comment) :

   0.414746     [D4] read: ret 6 buf 86: 2e 30 20 30 30  >.0 00<
   0.435756     [D4] read: ret 6 buf a6: 30 30 30 30 30  >00000<
000<.939819     [D4] read: ret 6 buf 83: 31 0d 30 30 30  >1
   1.940207     [D1] interrupt read error: Operation timed out (-7)

So for a line where we have lower nibble (first byte & 0x0f) equal to 6 and ret==6, we have 5 protocol content bytes. However in the last practical line, with the 0x0d, the nibble is 3 and there are 2 content bytes (and the UPS stops sending more, hence the timeout).

Watchout for the \r that shifts the 0 in the output in those lines. But yes, nibble is 3 and there are pretty much 2 bytes sent. I forgot about it today - and it suggest I should scan for \d and not just check the last byte.

I believe, what the driver did originally, with your UPS where its nibble was 1 it copied one byte into the buffer. Where the nibble was 6 it probably copied 6 bytes (starting from tmpbuf+1 so overflowing the tmpbuf defined with size 6 exactly... wondering why compilers or run-times did not complain/segfault - not a write so don't care?) and probably adding garbage into that extra byte of the resulting string for each loop :\

Driver when the nibble was 6 could be reading 1 byte over tmpbuf. C compilers and run-times are quirky. :S In Rust that would certainly be a runtime Panic, maybe compilation error. tmpbuf is on the stack, so there's a big chance it was reading 1 byte more from the stack just.

Certainly adding a garbage - that changed the meaning of this nibble in a way not observed in other UPSes.

I'll take a look at this final byte too, maybe it needs special handling to find \r correctly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be more productive for you to cherry-pick the changes I've proposed so far and start another PR from your fork to override this one, to avoid me posting your patches from discussion into git :) WDYT?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, certainly. The problem is - testing. I can probably get this changes tested on the new UPS. Trouble is, to check on mine/other ones. When my sister is watering the plants I'll ask her to power it on; maaaybe I could do it remotely. (;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug impacts-release-2.8.0 Issues reported against NUT release 2.8.0 (maybe vanilla or with minor packaging tweaks) Qx protocol driver Driver based on Megatec Q<number> such as new nutdrv_qx, or obsoleted blazer and some others

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem with NUTDRV_QX, subdriver armac - unable start driver

2 participants