Problem with NUTDRV_QX, subdriver armac (add EOL support)#2003
Problem with NUTDRV_QX, subdriver armac (add EOL support)#2003jimklimov wants to merge 2 commits intonetworkupstools:masterfrom
Conversation
…upstools#1978] Authored by Tomasz bla Fortuna @blaa at networkupstools#1978 (comment) Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
|
Hey, waaaait a bit with this. ;-) As I've wrote in the issue - this might break compatibility with other Armac/powermanagerii devices though. The end of line test is certainly SAFE, but the -1 added to the copying look rather suspicious. I want to test it with my UPS (I'm on vacation and can do this in september ;s), or get a traffic dump from the @bdkacz for whom the previous version worked. So, blindly merging that can break compatibility. ;s |
|
@blaa: I have a question about the logic in this proposed change:
Am I right with a gut feeling that you would lose one character in the middle of a reply from each loop here under the "right" conditions? Should there be an |
|
Converted to "draft" as asked by @blaa (to avoid merge before regression testing) ;) |
| memcpy(buf + bufpos, tmpbuf + 1, bytes_available - 1); | ||
| bufpos += bytes_available - 1; | ||
|
|
||
| if (tmpbuf[bytes_available - 1] == 0x0d) { |
There was a problem hiding this comment.
This should be safe in all cases... (although it uses the bytes_available as well and it's suspicious that it worked without -1 previously).
There was a problem hiding this comment.
At least, it is unsigned and non-zero (per previous checks), so array-wise the expression is safe :)
|
|
||
| memcpy(buf + bufpos, tmpbuf + 1, bytes_available); | ||
| bufpos += bytes_available; | ||
| memcpy(buf + bufpos, tmpbuf + 1, bytes_available - 1); |
There was a problem hiding this comment.
This doesn't look like a simple mistake in dropping -1. My UPS rather wouldn't work with it. Maybe it needs some kind of a detection on how this byte is coded. I need traces from both UPSes to make certain it will cover both cases.
Maaaybe the original issue were I did this driver has one...
There was a problem hiding this comment.
Otherwise, initially copying all of bytes_available in the original loop seemed reasonable. Say the usb method returned one new byte, here now you'd copy 1 - 1 == 0 bytes (and check if that one is end of line)...
There was a problem hiding this comment.
there seem to be only a trace where UPS send 1 byte at a time during each interrupt transfer:
[D3] armac_command: Got a known command 'Q1\r'
2.121903 [D3] read: ret 6 buf c1: 28 30 30 30 30 >(0000<
2.126943 [D3] read: ret 6 buf 81: 32 30 30 30 30 >20000<
2.130978 [D3] read: ret 6 buf c1: 33 30 30 30 30 >30000<
2.133998 [D3] read: ret 6 buf 81: 38 30 30 30 30 >80000<
(...)
Here bytes available (low nibble: tmpbuf[0] & 0x0f) seems to be equal to 1 and convey a 1 byte. Interrupt transmits 6 bytes at a time, but the rest is filled with literal ASCII 0. (would be cool if they used ZERO 0x00).
Current trace:
0.083301 [D4] armac command Q1
0.164847 [D4] read: ret 6 buf a6: 28 32 34 31 2e >(241.<
0.184839 [D4] read: ret 6 buf 86: 35 20 30 30 30 >5 000<
0.205851 [D4] read: ret 6 buf a6: 2e 30 20 32 33 >.0 23<
0.226849 [D4] read: ret 6 buf 86: 30 2e 33 20 30 >0.3 0<
The first nibble oscillates between a and 8 - previously c and 8.
But TBH, that seems like an easy fix. I'd assume this 6 means this UPS has available 6 bytes in a buffer. This interrupt transfer transferred 5... it can't transfer more. It's just FASTER than the previous model.
Instead of -1 one could:
bytes_available = (unsigned char)tmpbuf[0] & 0x0f;
if (bytes_available == 0) {
/* End of transfer */
break;
}
if (bytes_available > 5) {
/* More bytes in UPS buffer, but this transfer only handles up to 5. */
bytes_available = 5;
}
Or even use the 'ret' from usb interrupt in case it can transfer more later?
if (bytes_available > ret - 1) {
/* More bytes in UPS buffer, but this transfer only handles up to 5. */
bytes_available = ret - 1;
}
There was a problem hiding this comment.
My current take is this one:
diff --git a/drivers/nutdrv_qx.c b/drivers/nutdrv_qx.c
index 350c555a0..550eb4071 100644
--- a/drivers/nutdrv_qx.c
+++ b/drivers/nutdrv_qx.c
@@ -1825,6 +1825,9 @@ static int armac_command(const char *cmd, char *buf, size_t buflen)
* (which will happen after successfully writing a command
* to the UPS) */
if (ret != 6) {
+ /* NOTE: If end condition is invalid for particular UPS we might make one
+ * request more and get this error. If bufpos > (say) 10 this could be ignored
+ * and the reply correctly read. */
upsdebugx(1,
"interrupt read error: %s (%d)",
ret ? nut_usb_strerror(ret) : "timeout",
@@ -1838,15 +1841,32 @@ static int armac_command(const char *cmd, char *buf, size_t buflen)
tmpbuf[0], tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5],
tmpbuf[1], tmpbuf[2], tmpbuf[3], tmpbuf[4], tmpbuf[5]);
+ /*
+ * On most tested devices (including R/2000I/PSW) this was equal to the number of
+ * bytes returned in the buffer, but on some newer UPS (R/3000I/PF1) it was 1 more
+ * (1 control + 5 bytes transferred and bytes_available equal to 6 instead of 5).
+ *
+ * Current assumption is that this is number of bytes available on the UPS side
+ * with up to 5 (ret - 1) transferred.
+ */
bytes_available = (unsigned char)tmpbuf[0] & 0x0f;
if (bytes_available == 0) {
/* End of transfer */
break;
}
+ if (bytes_available > 5) {
+ /* Single interrupt transfer has 1 control + 5 data bytes */
+ bytes_available = 5;
+ }
memcpy(buf + bufpos, tmpbuf + 1, bytes_available);
bufpos += bytes_available;
+ if (tmpbuf[bytes_available] == 0x0d) {
+ /* End of line is an end of the message as well */
+ break;
+ }
+
if (bytes_available <= 2) {
/* Slow down, let the UPS buffer more bytes */
usleep(15000);It compiles, but could use some testing. ;-) Hopefully it will handle both cases without additional -1 magic.
There was a problem hiding this comment.
Hm, looking at the data dumps in #1978 (comment) :
0.414746 [D4] read: ret 6 buf 86: 2e 30 20 30 30 >.0 00<
0.435756 [D4] read: ret 6 buf a6: 30 30 30 30 30 >00000<
000<.939819 [D4] read: ret 6 buf 83: 31 0d 30 30 30 >1
1.940207 [D1] interrupt read error: Operation timed out (-7)
So for a line where we have lower nibble (first byte & 0x0f) equal to 6 and ret==6, we have 5 protocol content bytes. However in the last practical line, with the 0x0d, the nibble is 3 and there are 2 content bytes (and the UPS stops sending more, hence the timeout).
I believe, what the driver did originally, with your UPS where its nibble was 1 it copied one byte into the buffer. Where the nibble was 6 it probably copied 6 bytes (starting from tmpbuf+1 so overflowing the tmpbuf defined with size 6 exactly... wondering why compilers or run-times did not complain/segfault - not a write so don't care?) and probably adding garbage into that extra byte of the resulting string for each loop :\
There was a problem hiding this comment.
Yeah, I've dropped ret - 1 because 6 is hardcoded above. (Original driver does reads always by 6 bytes). But that's equivalent - if ret != 6 we bail out anyway.
In general we can introduce a constant: const int READ_SIZE = 6;
Use it in usb_interrupt_read, in following if and later instead of ret - it will be cleaner and I can prepare a patch like that.
Architecture-wise, I believe the UPS works like this:
([controller] --uart or rs232-- [usb bridge with buffer]) --usb-- [PC]
And this USB interface is a really poor USB Serial implementation. ;)
So the bytes_available is not a length of a whole serial buffer, just the amount that currently is stored within the usb bridge. MAYBE. As far as we can tell. It's important to differentiate a TRUE data from a fill (0, 0x30), when there's less data then the read size certainly. Would not be necessary if the fill was 0x00. I think deducting 5 is worthless, but we could store info like more_available = true if it's larger than READ_SIZE - 1.
Unsure what to do with this information though. If we find \x0d but more_available == true that's probably an error and we could print a message - not much more though.
There was a problem hiding this comment.
Hm, looking at the data dumps in #1978 (comment) :
0.414746 [D4] read: ret 6 buf 86: 2e 30 20 30 30 >.0 00< 0.435756 [D4] read: ret 6 buf a6: 30 30 30 30 30 >00000< 000<.939819 [D4] read: ret 6 buf 83: 31 0d 30 30 30 >1 1.940207 [D1] interrupt read error: Operation timed out (-7)So for a line where we have lower nibble (first byte
& 0x0f) equal to6andret==6, we have 5 protocol content bytes. However in the last practical line, with the 0x0d, the nibble is3and there are 2 content bytes (and the UPS stops sending more, hence the timeout).
Watchout for the \r that shifts the 0 in the output in those lines. But yes, nibble is 3 and there are pretty much 2 bytes sent. I forgot about it today - and it suggest I should scan for \d and not just check the last byte.
I believe, what the driver did originally, with your UPS where its nibble was
1it copied one byte into the buffer. Where the nibble was6it probably copied 6 bytes (starting fromtmpbuf+1so overflowing thetmpbufdefined with size 6 exactly... wondering why compilers or run-times did not complain/segfault - not a write so don't care?) and probably adding garbage into that extra byte of the resulting string for each loop :\
Driver when the nibble was 6 could be reading 1 byte over tmpbuf. C compilers and run-times are quirky. :S In Rust that would certainly be a runtime Panic, maybe compilation error. tmpbuf is on the stack, so there's a big chance it was reading 1 byte more from the stack just.
Certainly adding a garbage - that changed the meaning of this nibble in a way not observed in other UPSes.
I'll take a look at this final byte too, maybe it needs special handling to find \r correctly.
There was a problem hiding this comment.
I think it will be more productive for you to cherry-pick the changes I've proposed so far and start another PR from your fork to override this one, to avoid me posting your patches from discussion into git :) WDYT?
There was a problem hiding this comment.
Yeah, certainly. The problem is - testing. I can probably get this changes tested on the new UPS. Trouble is, to check on mine/other ones. When my sister is watering the plants I'll ask her to power it on; maaaybe I could do it remotely. (;
Closes: #1978
Thanks @blaa for proposing the fix, and @convicte for testing it.