Skip to content

Conversation

@CV-Bowen
Copy link
Contributor

Summary

This PR includes a series of critical fixes and improvements for the rpmsg_router driver to enhance multi-core communication stability and reliability. The changes address several race conditions, memory management issues, and edge cases that could cause system crashes or hangs in production environments.

Key Changes

  1. Fix race condition and use-after-free - Prevents concurrent endpoint destruction from causing use-after-free errors
  2. Fix double free issue - Resolves double free vulnerability in rpmsg_router_hub_unbind
  3. Avoid blocking when remote destroys endpoint - Uses rpmsg_send_offchannel_raw to prevent deadlock scenarios
  4. Improve NS message handling - Properly distinguishes between NS_CREATE and NS_ACK messages
  5. Fix binding failures - Prevents rpmsg:cpu from failing to bind to destination
  6. Improve endpoint state management - Correctly sets dst_addr to RPMSG_ADDR_ANY when receiving NS_DESTROY
  7. Add readiness checks - Verifies hub endpoint is ready before sending power management messages
  8. Fix return value handling - Corrects return value when receiving DESTROY command
  9. Code readability improvement - Renames dst_ept to peer_ept for better clarity

These fixes have been validated in production environments and resolve critical stability issues in multi-core communication scenarios.

Impact

Stability

  • High impact: Fixes multiple critical bugs that could cause system crashes
  • Resolves race conditions that lead to use-after-free and double free errors
  • Prevents deadlock scenarios during endpoint destruction
  • Improves overall system reliability in multi-core configurations

Compatibility

  • No breaking changes: External API and interfaces remain unchanged
  • All fixes are internal to the rpmsg_router driver
  • Existing applications will continue to work without modification

Code Quality

  • Improved: Better code readability with consistent naming conventions
  • Enhanced error handling and state management
  • More robust endpoint lifecycle management

Testing

Test Environment

  • Platform: QEMU ARMv8A (qemu-armv8a:v8a_server and qemu-armv8a:v8a_proxy)
  • Configuration: Multi-core setup with rpmsg_router enabled
  • Test Date: January 21, 2026
  • Test Scenarios: Normal operation, endpoint creation/destruction, concurrent operations, power management

Test Steps

  1. Build the system:

    cmake -B cmake_out/v8a_server -DBOARD_CONFIG=qemu-armv8a:v8a_server -GNinja
    cmake --build cmake_out/v8a_server
    
    cmake -B cmake_out/v8a_proxy -DBOARD_CONFIG=qemu-armv8a:v8a_proxy -GNinja
    cmake --build cmake_out/v8a_proxy
  2. Run:

qemu-system-aarch64 -cpu cortex-a53 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-chardev stdio,id=con,mux=on -serial chardev:con \
-object memory-backend-file,discard-data=on,id=shmmem-shmem0,mem-path=/dev/shm/my_shmem0,size=4194304,share=yes \
-device ivshmem-plain,id=shmem0,memdev=shmmem-shmem0,addr=0xb \
-device virtio-serial-device,bus=virtio-mmio-bus.0 \
-chardev socket,path=/tmp/rpmsg_port_uart_socket,server=on,wait=off,id=foo \
-device virtconsole,chardev=foo \
-mon chardev=con,mode=readline -kernel ./nuttx/cmake_out/v8a_server/nuttx \
-gdb tcp::7775
[    0.000000] [ 0] [  INFO] [server] pci_register_rptun_ivshmem_driver: Register ivshmem driver, id=0, cpuname=proxy, master=1
[    0.000000] [ 3] [  INFO] [server] pci_scan_bus: pci_scan_bus for bus 0
[    0.000000] [ 3] [  INFO] [server] pci_scan_bus: class = 00000600, hdr_type = 00000000
[    0.000000] [ 3] [  INFO] [server] pci_scan_bus: 00:00 [1b36:0008]
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar0 set bad mask
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar1 set bad mask
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar2 set bad mask
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar3 set bad mask
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar4 set bad mask
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar5 set bad mask
[    0.000000] [ 3] [  INFO] [server] pci_scan_bus: class = 00000200, hdr_type = 00000000
[    0.000000] [ 3] [  INFO] [server] pci_scan_bus: 00:08 [1af4:1000]
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar0: mask64=fffffffe 32bytes
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar1: mask64=fffffff0 4096bytes
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar2 set bad mask
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar3 set bad mask
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar4: mask64=fffffffffffffff0 16384bytes
[    0.000000] [ 3] [  INFO] [server] pci_scan_bus: class = 00000500, hdr_type = 00000000
[    0.000000] [ 3] [  INFO] [server] pci_scan_bus: 00:58 [1af4:1110]
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar0: mask64=fffffff0 256bytes
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar1 set bad mask
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar2: mask64=fffffffffffffff0 4194304bytes
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar4 set bad mask
[    0.000000] [ 3] [  INFO] [server] pci_setup_device: pbar5 set bad mask
[    0.000000] [ 3] [  INFO] [server] ivshmem_probe: shmem addr=0x10400000 size=4194304 reg=0x10008000
[    0.000000] [ 3] [  INFO] [server] rptun_ivshmem_probe: shmem addr=0x10400000 size=4194304

NuttShell (NSH) NuttX-12.10.0
server> 
server> 
server> [    0.000000] [ 0] [  INFO] [proxy] pci_register_rptun_ivshmem_driver: Register ivshmem driver, id=0, cpuname=server, master=0
[    0.000000] [ 3] [  INFO] [proxy] pci_scan_bus: pci_scan_bus for bus 0
[    0.000000] [ 3] [  INFO] [proxy] pci_scan_bus: class = 00000600, hdr_type = 00000000
[    0.000000] [ 3] [  INFO] [proxy] pci_scan_bus: 00:00 [1b36:0008]
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar0 set bad mask
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar1 set bad mask
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar2 set bad mask
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar3 set bad mask
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar4 set bad mask
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar5 set bad mask
[    0.000000] [ 3] [  INFO] [proxy] pci_scan_bus: class = 00000200, hdr_type = 00000000
[    0.000000] [ 3] [  INFO] [proxy] pci_scan_bus: 00:08 [1af4:1000]
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar0: mask64=fffffffe 32bytes
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar1: mask64=fffffff0 4096bytes
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar2 set bad mask
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar3 set bad mask
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar4: mask64=fffffffffffffff0 16384bytes
[    0.000000] [ 3] [  INFO] [proxy] pci_scan_bus: class = 00000500, hdr_type = 00000000
[    0.000000] [ 3] [  INFO] [proxy] pci_scan_bus: 00:58 [1af4:1110]
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar0: mask64=fffffff0 256bytes
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar1 set bad mask
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar2: mask64=fffffffffffffff0 4194304bytes
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar4 set bad mask
[    0.000000] [ 3] [  INFO] [proxy] pci_setup_device: pbar5 set bad mask
[    0.000000] [ 3] [  INFO] [proxy] ivshmem_probe: shmem addr=0x10400000 size=4194304 reg=0x10008000
[    0.000000] [ 3] [  INFO] [proxy] rptun_ivshmem_probe: shmem addr=0x10400000 size=4194304
[    0.000000] [ 3] [  INFO] [proxy] rptun_ivshmem_probe: Start the wdog

server> 
server> 
server> ps
  TID   PID  PPID PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK            STACK    USED FILLED COMMAND
    0     0     0   0 FIFO     Kthread   - Ready              0000000000000000 0008160 0001792  21.9%  Idle_Task
    1     0     0 192 FIFO     Kthread   - Waiting  Semaphore 0000000000000000 0008096 0001344  16.6%  hpwork 0x40478c60 0x40478ce0
    2     0     0 100 FIFO     Kthread   - Waiting  Semaphore 0000000000000000 0008096 0001344  16.6%  lpwork 0x40478d10 0x40478d90
    5     0     0 224 FIFO     Kthread   - Waiting  Semaphore 0000000000000000 0008096 0002016  24.9%  rpmsg-uart-rx proxy2 0x404a4080
    6     0     0 224 FIFO     Kthread   - Waiting  Semaphore 0000000000000000 0008096 0001968  24.3%  rpmsg-uart-tx proxy2 0x404a4080
    7     7     0 100 FIFO     Task      - Running            0000000000000000 0008128 0004224  51.9%  nsh_main
    8     0     0 224 FIFO     Kthread   - Waiting  Semaphore 0000000000000000 0008096 0001824  22.5%  rpmsg-virtio proxy 0x40492ef8
server> 
server> 
server> uname -a
NuttX server 12.10.0 04a9df8e34f Jan 21 2026 21:05:59 arm64 qemu-armv8a
server> rpmsg dump all
[    0.000000] [ 7] [ EMERG] [server] Dump rpmsg info between cpu (master: yes)server <==> proxy:
[    0.000000] [ 7] [ EMERG] [server] rpmsg vq RX:
[    0.000000] [ 7] [ EMERG] [server] rpmsg vq TX:
[    0.000000] [ 7] [ EMERG] [server]   rpmsg ept list:
[    0.000000] [ 7] [ EMERG] [server]     ept NS
[    0.000000] [ 7] [ EMERG] [server]     ept rpmsg-sensor
[    0.000000] [ 7] [ EMERG] [server]     ept rpmsg-ping
[    0.000000] [ 7] [ EMERG] [server]     ept rpmsg-syslog
[    0.000000] [ 7] [ EMERG] [server]   rpmsg buffer list:
[    0.000000] [ 7] [ EMERG] [server]     RX buffer, total 8, pending 0
[    0.000000] [ 7] [ EMERG] [server]     TX buffer, total 8, pending 0
[    0.000000] [ 7] [ EMERG] [server] Remote: proxy2 state: 1
[    0.000000] [ 7] [ EMERG] [server] ept NS
[    0.000000] [ 7] [ EMERG] [server] ept rpmsg-sensor
[    0.000000] [ 7] [ EMERG] [server] ept rpmsg-ping
[    0.000000] [ 7] [ EMERG] [server] rpmsg_port queue RX: {used: 0, avail: 8}
[    0.000000] [ 7] [ EMERG] [server] rpmsg buffer list:
[    0.000000] [ 7] [ EMERG] [server] rpmsg_port queue TX: {used: 0, avail: 8}
[    0.000000] [ 7] [ EMERG] [server] rpmsg buffer list:
server> rpmsg ping all 1 1 1 1
[    0.000000] [ 7] [ EMERG] [server] ping times: 1
[    0.000000] [ 7] [ EMERG] [server] buffer_len: 1520, send_len: 17
[    0.000000] [ 7] [ EMERG] [server] avg: 0 s, 17090352 ns
[    0.000000] [ 7] [ EMERG] [server] min: 0 s, 17090352 ns
[    0.000000] [ 7] [ EMERG] [server] max: 0 s, 17090352 ns
[    0.000000] [ 7] [ EMERG] [server] rate: 0.007957 Mbits/sec
[    0.000000] [ 7] [ EMERG] [server] ping times: 1
[    0.000000] [ 7] [ EMERG] [server] buffer_len: 2024, send_len: 17
[    0.000000] [ 7] [ EMERG] [server] avg: 0 s, 7277024 ns
[    0.000000] [ 7] [ EMERG] [server] min: 0 s, 7277024 ns
[    0.000000] [ 7] [ EMERG] [server] max: 0 s, 7277024 ns
[    0.000000] [ 7] [ EMERG] [server] rate: 0.018688 Mbits/sec
server>

yintao and others added 2 commits January 21, 2026 10:35
It should return 0 if there are edge that have not been established
or have been destroyed at unbind

Signed-off-by: yintao <yintao@xiaomi.com>
to avoid rpmsg_send() blocked when rpmsg channel is not ready

Signed-off-by: ligd <liguiding1@xiaomi.com>
@CV-Bowen CV-Bowen requested a review from jerpelea as a code owner January 21, 2026 13:42
@github-actions github-actions bot added Area: Drivers Drivers issues Size: M The size of the change in this PR is medium labels Jan 21, 2026
yintao added 7 commits January 21, 2026 21:46
when edge_ept received NS_DESTROY message to sync the behavior with
other rpmsg transport

if (ns_msg.flags == RPMSG_NS_DESTROY) {
	if (_ept)
		_ept->dest_addr = RPMSG_ADDR_ANY;
		...

Signed-off-by: yintao <yintao@xiaomi.com>
Use rpmsg_send_offchannel_raw to fix stuck issue avoid getting stuck
when remote destroy ept.

When send messages from ap to android in Rptun thread, Android may
have sent NS_DESTROY through rpmsg_port and changed ept->dst to
RPMSG_ADDR_ANY in rpmsg_port_ns_callback; So rptun thread may be
stuck at this time because rpmsg_send detected dst_ept's dst_addr
is RPMSG_ADDR_ANY.

rpmsg_virtio_thread(rptun audio):
rpmsg_virtio_rx_callback -> rpmsg_router_cb -> rpmsg_send(dst_ept)
send to android

rpmsg_port_thread (android send NS_DESTROY):
rpmsg_port_ns_callback -> dst_ept->dest_addr = RPMSG_ADDR_ANY

Signed-off-by: yintao <yintao@xiaomi.com>
fist free when rpmsg_port_unregister at rpmsg spi thread
<free+12>
<rpmsg_router_hub_ept_release+6>
<rpmsg_ept_decref+28>
<rpmsg_unregister_endpoint+120>
<rpmsg_destroy_ept+40>
<rpmsg_router_hub_unbind+22>
<rpmsg_device_destory+172>
<rpmsg_port_unregister+26>

this thread will free r:droid and r:audio at last;
But in this process,
rpmsg_destroy_ept "r:droid" will send NS_destroy to audio;
audio will response NS_destroy to ap; if r:droid has not been
removed from the ept list yet, the rptun_audio thread on ap will
occur as follows:

3  0x103016fa in kasan_check_report
4  0x103018b6 in __asan_store4_noabort
5  0x106ca352 in metal_list_del
6             in rpmsg_unregister_endpoint
7  0x106ca77a in rpmsg_destroy_ept
8  0x102db232 in rpmsg_router_hub_unbind
9  0x106cabb2 in rpmsg_virtio_ns_callback
10 0x106cad5c in rpmsg_virtio_rx_callback
11 0x106cc190 in virtqueue_notification
12 0x106ca064 in rproc_virtio_notified
13 0x106c9ad0 in remoteproc_get_notification
14 0x102dd4ba in rptun_worker at rptun/rptun.c:334
15 rptun_worker (arg=<optimized out>) at rptun/rptun.c:328
16 0x102dd974 in rptun_thread  rptun/rptun.c:353
17 0x102cd558 in nxtask_start () at task/task_start.c:1

this will lead to r:audio be freed again.

Signed-off-by: yintao <yintao@xiaomi.com>
…r_edge

If ns_msg->flags == RPMSG_NS_CREATE_ACK, means already know peer's address
so direclty use usr_ept->dest_addr

Signed-off-by: yintao <yintao@xiaomi.com>
nxsig_clockwait
/home/work/data/miui_codes/build_home_rom/nuttx/sched/signal/sig_timedwait.c:329 (discriminator 1)
nxsig_clockwait
/home/work/data/miui_codes/build_home_rom/nuttx/sched/signal/sig_timedwait.c:320
nxsig_nanosleep
/home/work/data/miui_codes/build_home_rom/nuttx/sched/signal/sig_nanosleep.c:96
nxsig_usleep
/home/work/data/miui_codes/build_home_rom/nuttx/sched/signal/sig_usleep.c:84
__metal_sleep_usec
/home/work/data/miui_codes/build_home_rom/nuttx/drivers/../include/metal/system/nuttx/sleep.h:27
rpmsg_device_destory
/home/work/data/miui_codes/build_home_rom/nuttx/drivers/rpmsg/rpmsg.c:473
rpmsg_port_unregister
/home/work/data/miui_codes/build_home_rom/nuttx/drivers/rpmsg/rpmsg_port.c:751
rpmsg_port_spi_process_packet
/home/work/data/miui_codes/build_home_rom/nuttx/drivers/rpmsg/rpmsg_port_spi.c:572
nxtask_start
/home/work/data/miui_codes/build_home_rom/nuttx/sched/task/task_start.c:111

Signed-off-by: yintao <yintao@xiaomi.com>
Should only destroy one side endpoint when receive one side
NS destroy message.

Before:
edge1           hub                  edge2
destroy -->     edge1 destroy
                edge2 destroy
                (used-after-free)
                edge2 destroy    <-- destroy
                edge1 destory

After
edge1           hub                  edge2
destroy -->     edge1 destroy
                edge2 destroy    <-- destroy

Signed-off-by: yintao <yintao@xiaomi.com>
peer_ept is a better name than dst_ept

Signed-off-by: yintao <yintao@xiaomi.com>
edge = ept->priv;
if (edge == NULL)

if (edge)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (edge)
if (edge != NULL)

/* Retransmit data to dest edge core */

if (!dst_ept)
if (!peer_ept)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!peer_ept)
if (peer_ept == NULL)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Drivers Drivers issues Size: M The size of the change in this PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants