Skip to content

Comms refactor mk2#463

Closed
micolous wants to merge 10 commits intoProxmark:masterfrom
micolous:revert-450
Closed

Comms refactor mk2#463
micolous wants to merge 10 commits intoProxmark:masterfrom
micolous:revert-450

Conversation

@micolous
Copy link
Contributor

@micolous micolous commented Nov 4, 2017

This reverts #450, which reverted #371. As before, this is blocking other work that I'm doing on Android support. To address the issues raised post-merge in #450:

Flashing is much slower

After this commit, the Flasher uses the same code as the rest of proxmark3, which executes SendCommand on the Receiver thread (which is otherwise unmodified), and spins. Previously, the custom SendCommand in flasher would execute on the main thread entirely.

Flashing for me took 18.5 sec vs. 12.2 sec without the patch, so yep, performance regression.

I've since added another commit (c7376c3) which adds a special case for handling CMD_FINISH_WRITE on the main thread, such that the flash speed is 12.3 sec. Other commands break when running in this fashion, so I haven't applied this workaround to other commands.

hw status reports slower speeds.

I get the same speeds reported in the client with and without this patch. The speed is determined by the firmware, and there aren't firmware modifications in this commit.

hf search hangs if no card is present (#449)

I don't see the same issue (on Linux, on OSX...)

I also tried lf search, no change observed without a card on patches, though without a card both would require pressing the PM-button in order to continue any operation. This may be the root cause if you have a dual-frequency antenna setup.

ping @Fl0-0 @pwpiwi @cjbrigato

@marshmellow42
Copy link
Contributor

marshmellow42 commented Nov 4, 2017

The hf search issue will have no connection to the lf search hang. (Connected to the hitag/cotag/em4x05 detection)
Duel frequency antenna (or both antennas attached ) has no correlation. It would appear there is still a Windows related bug in this code change.

@micolous
Copy link
Contributor Author

micolous commented Nov 4, 2017

So you're saying that at present, this code hangs on Windows when doing hf search, regardless of the prior state of the hardware? This sounds like a similar issue to #434.

@Fl0-0
Copy link
Contributor

Fl0-0 commented Nov 4, 2017

micolous:revert-450 hf search works great with my cards on Linux.

@marshmellow42
Copy link
Contributor

i have finally had a little time to test this pull myself and have found no evidence of lingering issues (caused or exposed) from this PR on windows. hf search works fine on 14a, mifare + compatibles + magic, ntag, 15693, 14b, and when no tag is on the antenna.

i therefore cannot explain #449. user error?
@Fl0-0 have you tested this pull request?

@Fl0-0
Copy link
Contributor

Fl0-0 commented Nov 12, 2017

Yes i have tested it. Works great with my cards (14a, mifare + magic, 15693, legic prime).

@pwpiwi
Copy link
Contributor

pwpiwi commented Nov 12, 2017

Issue #449 was before latest commit from @micolous.

printf("uart_receiver: get lock\n");
#endif
// Lock up receives, in case they try to take it away from us.
pthread_mutex_lock(&conn->recv_lock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this lock? There is only only one thread calling this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also called in flash.c in CloseProxmark -- this ensures that when we do a serial port dance in enter_bootloader, that the receive thread can be stalled while we get a new serial port connection in place.

// We aren't normally trying to transmit in the flasher when the port would
// be reset, so we can just keep going at this point.
pthread_mutex_lock(&txcmd_lock);
if (txcmd_pending) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we eliminate txcmd_pending? It is kind of a self made signal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely. I'll replace it with an if statement instead, because that makes a bit more sense.

The reason is the execution flow is as follows:

  1. SendCommand called on main thread.
  2. SendCommand waits for txcmd_lock.
  3. SendCommand checks to see if there is an existing command in the buffer, if not, then stuffs the command in the buffer then returns.
  4. SendCommand gets another command, gets a lock.
  5. SendCommand sees there's already a command in the buffer, calls pthread_cond_wait to wait for the txcmd to be sent, and releases the lock in the interim (and blocks the main thread)
  6. uart_receiver acquires the txcmd_lock
  7. uart_receiver sees a command in the buffer and sends it.
  8. uart_receiver clears txcmd_pending, and signals to txcmd_sig that it is ready for more data
  9. uart_receiver unlocks txcmd_lock
  10. SendCommand unblocks, locks txcmd_lock and stuffs a new command into the buffer.
  11. SendCommand unlocks txcmd_lock, and then returns.
  12. uart_receiver picks up the new buffered command on the next loop.

It could be replaced also if txcmd was a pointer instead, but many operations modify an existing UsbCommand immediately after sending it, only to send it again. This could lead to a race between the main and uart_receiver threads, leading to the command possibly being corrupted in flight.

The goal is to make SendCommand slightly asynchronous -- it'll only block if there's another command still to be sent out, or if it's a flasher command (in which case it is being sent immediately).

@pwpiwi
Copy link
Contributor

pwpiwi commented Nov 15, 2017

I've since added another commit (c7376c3) which adds a special case for handling CMD_FINISH_WRITE on the main thread, such that the flash speed is 12.3 sec. Other commands break when running in this fashion, so I haven't applied this workaround to other commands.

"special case" sounds like an ugly workaround. Why do other commands break? Makes me shiver...

Is there a reason for the msleep(100)? They may be the reason for the performance issue?

I get the same speeds reported in the client with and without this patch. The speed is determined by the firmware, and there aren't firmware modifications in this commit.

Correct. hw status doesn't even care if the data can be successfully received by the client code. There is no handshake or error correction.

@micolous
Copy link
Contributor Author

"special case" sounds like an ugly workaround. Why do other commands break? Makes me shiver...

I agree. But I'm trying to not rock the boat...

The existing PM3 client code does RX/TX in a receive thread, and then different callers to SendCommand (on the main thread) will wait for the txbuffer to be available, and then sit on that. However, this has some drawbacks, that if the receive thread stops then the main thread will spin, and that it makes sending commands to the device slower.

In most circumstances I don't think this is a big deal, but the specific complaint raised in the last PR was that flashing was quite a lot slower. The flasher did everything on one thread, and when unifying the code, then it meant that when my changes were introduced, flasher would occasionally spin for a bit while it waited for the command to be sent and the txbuffer to be cleared, so it took noticeably longer.

While I didn't run a profiler to find this out, when I pulled it into the main thread everything worked with equivalent wall-clock performance.

This appears to be introduced in f0ba634, but as this was pre-GitHub (2012-12), I think you'll find that the context of that change is buried somewhere in the Proxmark Forum archives. Going back further I see that SendCommand was always executed on the main thread.

Is there a reason for the msleep(100)? They may be the reason for the performance issue?

I tried tweaking the sleeps as well (they also make me lose sleep), and didn't see a significant improvement until I pulled the flashing command back into the main thread.

I think running a profiler over the flasher (and other components) would be good to do, but that would also require writing some isolated tests. I haven't done much C profiling though, and I haven't yet been motivated to spend the time here.

@micolous
Copy link
Contributor Author

micolous commented Dec 3, 2017

@pwpiwi I think I've addressed the issues you've outlined already, can we merge this?

@iceman1001
Copy link
Member

Some conflicts need to be fixed, then decision on merge.

@pwpiwi
Copy link
Contributor

pwpiwi commented Feb 9, 2018

I didn't even test yet. There is one issue mentioned by @micolous which rings my warning bells:

special case for handling CMD_FINISH_WRITE on the main thread, such that the flash speed is 12.3 sec. Other commands break when running in this fashion

If there is a special case handling required for a command to make it run faster and others break when treated the same, this indicates a race condition, hand shake problem or timing issue to me.

@micolous
Copy link
Contributor Author

micolous commented Feb 11, 2018

@iceman1001:

When @pwpiwi is satisfied I'll rebase this against master. :)

@pwpiwi:

If there is a special case handling required for a command to make it run faster and others break when treated the same, this indicates a race condition, hand shake problem or timing issue to me.

I agree there's an issue with multi-threaded access in PM3, but it has been a problem even before this patch. I addressed that issue with mutexes.

SendCommand already pushed a command into a buffer which is sent out by the Receiver thread. However, it never actually locks txcmd or provides any atomic guarantee. Without this patch, it was possible for two threads to concurrently update this buffer (a critical section), they both think they were successful, and you have a bad day.

We shouldn't be in a position where moving some methods around causes all the code to break (especially on single platforms).

flasher is special because it only executes commands in series. It doesn't allow user input when it is running, it doesn't allow scripting, it gains no advantages from non-blocking IO, and the states that the device can be in while flashing are limited (in comparison to normal device operation). The existing flasher had its own version of SendCommand that executed all of its commands on the main thread:

void SendCommand(UsbCommand* txcmd) {
// printf("send: ");
// cmd_debug(txcmd);
if (!uart_send(sp,(byte_t*)txcmd,sizeof(UsbCommand))) {
printf("Sending bytes to proxmark failed\n");
exit(1);
}
}

In light the performance complaints, flasher's special circumstances, and the existing behaviour, I special-cased CMD_FINISH_WRITE.

@pwpiwi
Copy link
Contributor

pwpiwi commented Feb 11, 2018

@micolous: thanks for the explanation. Makes sense and mutes the warning bell 😃

Let's start testing after rebase.

@micolous
Copy link
Contributor Author

@pwpiwi I rebased it against master, please take a look.

I tested this on Ubuntu 17.10 and OSX 10.13.

@iceman1001
Copy link
Member

Status on this one? @pwpiwi @marshmellow42

@marshmellow42
Copy link
Contributor

I've made several line comments. Do those not notify? There are some minor find and replace errors, and a potential infinite loop.

@marshmellow42
Copy link
Contributor

Hold that thought, I may be confusing this PR for another ..

@iceman1001
Copy link
Member

I belive this one should compensate for the slowdown of pm3, (booting from 6sec -> 12sec)
not the OSX bug where the client becomes slower and slower over time.

@pwpiwi
Copy link
Contributor

pwpiwi commented Feb 18, 2018

I believe the main purpose if this PR was to prepare for @micolous Android library. And some refactoring to concentrate all UART related code pieces. And some additional mutexes to prevent race conditions (but I am wondering how these could occur with the current "send command - wait for response" client)

@micolous
Copy link
Contributor Author

Slowness of PM3 booting is not something that I've touched on this PR. That is a firmware issue.

The purpose of this PR is where I've minimised the changes needed to get PM3 running as a library, and I can throw in a compatible uart implementation and it all just works. Getting these functions out of proxmark3.c means that I don't have a library trying to declare main(). This also insulates some of the global variable access with functions that can be exported, because I also care about getting/setting offline state.

This change would also allow some more robust tests to be run on the client as well. I have ported the hf mf hardnested test from the CI tests, which works on devices with >2GB RAM. This is also more robust, because externalising PrintAndLog means you can stash the log wherever you please and run expect-style tests. While the code there is Java, it would also be possible in C.

@micolous
Copy link
Contributor Author

Also tested with Windows now (using ProxSpace), seems to be happy.

client/comms.c Outdated
if (msclock() - start_time > 2000 && show_warning) {
// 2 seconds elapsed (but this doesn't mean the timeout was exceeded)
PrintAndLog("Waiting for a response from the proxmark...");
PrintAndLog("Don't forget to cancel its operation first by pressing on the button");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording was changed in a recent PR. As it is now it can be confusing as it sounds like the user is supposed to press the button regardless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e0977d8

while (!WaitForResponse(CMD_ANY, &resp)) {
// Keep waiting for a response
msleep(100);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there is no response?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly what ReceiveCommand did before... it would wait forever. The only difference here is that msleep makes this less of a tight loop.

@pwpiwi
Copy link
Contributor

pwpiwi commented Feb 26, 2018

@marshmellow42, @iceman1001: Did you test this successfully? Which platforms?

Any Mac-Users here?

@ghost
Copy link

ghost commented Mar 19, 2018

Tested as described by doing as I did before repeatedly running lf search
I flashed each time to use the appropriate matching firmware. Also, timed the flashing.

I am not sure how to tell, but I know when I did the earlier tests I know I ran through the setup instructions on the wiki, so I would be using GNU Readline.

I'm using a non-GUI .

TLDR I was not able to reproduce my crash in these circumstances.

`$ time ./flasher /dev/tty.usbmodem881 ../armsrc/obj/fullimage.elf
Loading ELF file '../armsrc/obj/fullimage.elf'...
Loading usable ELF segments:
0: V 0x00102000 P 0x00102000 (0x0002d358->0x0002d358) [R X] @0x94
1: V 0x00200000 P 0x0012f358 (0x00001a3c->0x00001a3c) [RW ] @0x2d3ec
Note: Extending previous segment from 0x2d358 to 0x2ed94 bytes

Waiting for Proxmark to appear on /dev/tty.usbmodem881. Found.

Flashing...
Writing segments for file: ../armsrc/obj/fullimage.elf
0x00102000..0x00130d93 [0x2ed94 / 375 blocks]....................................................................................................................................................................................................................................................................................................................................................................................... OK

Resetting hardware...
All done.

Have a nice day!

real 0m6.510s
user 0m4.972s
sys 0m0.116s
$ pwd
../testing/proxmark3-r450/client
`

`$ time ./flasher /dev/tty.usbmodem881 ../armsrc/obj/fullimage.elf
Loading ELF file '../armsrc/obj/fullimage.elf'...
Loading usable ELF segments:
0: V 0x00102000 P 0x00102000 (0x0002d358->0x0002d358) [R X] @0x94
1: V 0x00200000 P 0x0012f358 (0x00001a3c->0x00001a3c) [RW ] @0x2d3ec
Note: Extending previous segment from 0x2d358 to 0x2ed94 bytes

Waiting for Proxmark to appear on /dev/tty.usbmodem881. Found.

Flashing...
Writing segments for file: ../armsrc/obj/fullimage.elf
0x00102000..0x00130d93 [0x2ed94 / 375 blocks]....................................................................................................................................................................................................................................................................................................................................................................................... OK

Resetting hardware...
All done.

Have a nice day!

real 0m6.333s
user 0m0.011s
sys 0m0.035s
$ pwd
../testing/proxmark3-upstream/client`

Attached is the compile from the micolous code base I did have a repeating warning when compiling that was not present in the main Proxmark code.
In file included from ../common/iso14443crc.h:11: ../include/common.h:19:23: warning: redefinition of typedef 'byte_t' is a C11 feature [-Wtypedef-redefinition] typedef unsigned char byte_t; ^ ../uart/uart.h:42:23: note: previous definition is here typedef unsigned char byte_t; ^ 1 warning generated.
mircolus-compile.txt

@micolous
Copy link
Contributor Author

Thanks @tuxthemadpenguin. I think in the absence of further crashes on a clean build environment, we can assert that the crashes that you saw were unrelated to the changes in this PR.

As for telling which readline was linked, you should be able to do it with otool -L.

For example, following the wiki instructions, you'll get a build using /usr/local/opt/readline/lib/libreadline.7.dylib (GNU readline from homebrew), and linked against Qt:

$ otool -L ./client/proxmark3
./client/proxmark3:
	/usr/local/opt/readline/lib/libreadline.7.dylib (compatibility version 7.0.0, current version 7.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)
	/usr/local/opt/qt/lib/QtWidgets.framework/Versions/5/QtWidgets (compatibility version 5.10.0, current version 5.10.1)
	/usr/local/opt/qt/lib/QtGui.framework/Versions/5/QtGui (compatibility version 5.10.0, current version 5.10.1)
	/usr/local/opt/qt/lib/QtCore.framework/Versions/5/QtCore (compatibility version 5.10.0, current version 5.10.1)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 400.9.0)

This build is linked against /usr/lib/libedit.3.dylib, so is Apple's libedit (and has no GUI):

$ otool -L ./client/proxmark3
./client/proxmark3:
	/usr/lib/libedit.3.dylib (compatibility version 2.0.0, current version 3.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 400.9.0)

As for the build warning, that byte_t definition is the same on both:

$ ag 'typedef unsigned char byte_t' proxmark3-r450/
proxmark3-r450/include/common.h
19:typedef unsigned char byte_t;

proxmark3-r450/uart/uart.h
42:typedef unsigned char byte_t;

$ ag 'typedef unsigned char byte_t' proxmark3-upstream/
proxmark3-upstream/include/common.h
19:typedef unsigned char byte_t;

proxmark3-upstream/uart/uart.h
42:typedef unsigned char byte_t;

@pwpiwi I'll rebase this on #585 (which has a better documented fix for libedit/readline issues than what is here). Please let me know if there are any other blockers.

@iceman1001
Copy link
Member

The rl_initialize call in this PR makes the client crash after a few times on ubuntu1704. I posted in another thread about the error message.
removing the call made client stable again,

…note on race condition ugliness.
- Remove unused rarg parameter.
- Make uart_receiver non-static, and add force_align_arg_pointer to header.
- Fix byte_t redefinition warning mentioned by @tuxthemadpenguin

- Pull serial port example across to uart.h

- Make all examples platform-specific

- Remove ModemManager rant on non-Linux platforms
@micolous
Copy link
Contributor Author

micolous commented Mar 21, 2018

Removed rl_initialize for now, updated #585 and rebased this against it. Also:

  • fixed the byte_t redefinition issue
  • pulled example serial port into a common include file
  • used a platform-native serial port everywhere
  • remove ModemManager note from Flasher on non-Linux platforms

@pwpiwi
Copy link
Contributor

pwpiwi commented Mar 28, 2018

In order to make progress, I have prepared PR #587 which just moves code around between files, some minor changes (like replacing byte_t by uint8_t) and the changes to the help texts. After this has been committed (and another rebase - sorry), your real changes will be much easier to find and review.

@pwpiwi
Copy link
Contributor

pwpiwi commented Apr 18, 2018

See PR #595 for a second part of this PR.

pwpiwi added a commit to pwpiwi/proxmark3 that referenced this pull request May 8, 2018
* make uart_receiver() static in comms.c
* move receiver thread creation and respective mutexes to comms.c
* use comms.c for flasher as well
pwpiwi added a commit to pwpiwi/proxmark3 that referenced this pull request May 14, 2018
* make uart_communication() static in comms.c
* move receiver thread creation and respective mutexes to comms.c
* add mutex and signal for tx buffer
* use comms.c for flasher as well
* remove comm functions from client/proxmark3.h
* this completes isolating all USB communication related functions in comms.c
pwpiwi added a commit that referenced this pull request Jun 3, 2018
* make uart_communication(), storeCommand() and getCommand() static in comms.c
* move receiver thread creation and respective mutexes to comms.c
* add mutex and signal for tx buffer
* use comms.c for flasher as well
* remove comm functions from client/proxmark3.h
* this completes isolating all USB communication related functions in comms.c
* don't assume a port to be defined by a name. Change parameter in OpenProxmark() to void*
* comms.c: set sp and serial_port_name to NULL when offline
@pwpiwi
Copy link
Contributor

pwpiwi commented Jun 3, 2018

Replaced by PR #587, #595, #596, and #607

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants