Skip to content

Conversation

@anchao
Copy link
Contributor

@anchao anchao commented Mar 4, 2024

Summary

sched/group: move task group into task_tcb_s to improve performance

move task group into task_tcb_s to avoid access allocator to improve performance

for Task Termination, the time consumption will be reduced ~2us (Tricore TC397 300MHZ):
15.97(us) -> 13.55(us)

In interval B:
image

Signed-off-by: chao an anchao@lixiang.com

Impact

N/A

Testing

ci-check


/* Task Group *************************************************************/

struct task_group_s group; /* Pointer to shared task group data */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but, main thread may exist before other thread, the used-after-free will happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task will kill all pthreads in the group before exiting. Will the task exit early than pthread?

#0  nxsched_release_tcb (tcb=0x409f04 <nxtask_exithook+96>, ttype=255 '\377') at sched/sched_releasetcb.c:99
#1  0x000000000040a0e5 in nxtask_terminate (pid=5) at task/task_terminate.c:122
#2  0x00000000004087c7 in pthread_cancel (thread=5) at pthread/pthread_cancel.c:110
#3  0x000000000040867b in group_cancel_children_handler (pid=5, arg=0x4) at group/group_killchildren.c:113
#4  0x000000000040a342 in group_foreachchild (group=0x7ffff3db53e0, handler=0x408613 <group_cancel_children_handler>, arg=0x4) at group/group_foreachchild.c:70
#5  0x000000000040870e in group_kill_children (tcb=0x7ffff3db52a0) at group/group_killchildren.c:212
#6  0x0000000000407adb in _exit (status=0) at task/exit.c:67
#7  0x0000000000404222 in exit (status=0) at stdlib/lib_exit.c:126
#8  0x000000000040dffe in nxtask_startup (entrypt=0x444546 <hello_main>, argc=1, argv=0x7ffff3db58c0) at sched/task_startup.c:70
#9  0x00000000004076b4 in nxtask_start () at task/task_start.c:134
#10 0x0000000000413081 in pre_start () at sim/sim_initialstate.c:52

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exit terminate the whole proccess, but pthread_exit terminate the current thread. You can reproduce the case by:

  1. Create a work thread and call printf in a loop
  2. Exit the main thread by pthread_exit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got, I have a update which delayed the release of the parent tcb, the parent tcb will be released after all child threads exit.

@anchao anchao force-pushed the 24030401 branch 5 times, most recently from b4e2da5 to c64e763 Compare March 4, 2024 12:43
@xiaoxiang781216
Copy link
Contributor

@anchao please rebase again, the last master fix the ci broken.

@anchao anchao force-pushed the 24030401 branch 3 times, most recently from e8c56f0 to 79cdd63 Compare March 5, 2024 09:11
@anchao anchao force-pushed the 24030401 branch 3 times, most recently from d112a77 to 4349caa Compare March 6, 2024 10:26
move task group into task_tcb_s to avoid access allocator to improve performance

for Task Termination, the time consumption will be reduced ~2us (Tricore TC397 300MHZ):
15.97(us) -> 13.55(us)

Signed-off-by: chao an <anchao@lixiang.com>
@acassis acassis merged commit 29e50ff into apache:master Mar 10, 2024
@masayuki2009
Copy link
Contributor

@anchao

I noticed the following crash with this PR.

rv-virt:ksmp64

$ /home/ishikawa/opensource/QEMU/qemu-8.2.0/build/qemu-system-riscv64 -semihosting -nographic -cpu rv64 -smp 8 -M virt,aclint=on -bios none -kernel nuttx

NuttShell (NSH) NuttX-12.0.0
nsh>
nsh>
nsh> uname -a
NuttX 12.0.0 29e50ffa73 Mar 11 2024 04:27:16 risc-v rv-virt
nsh> ps
  PID GROUP CPU PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0   0 FIFO     Kthread   - Assigned           0000000000000000 003056 001624  53.1%  CPU0 IDLE
    1     1   1   0 FIFO     Kthread   - Running            0000000000000000 003056 001136  37.1%  CPU1 IDLE
    2     2   2   0 FIFO     Kthread   - Running            0000000000000000 003056 001136  37.1%  CPU2 IDLE
    3     3   3   0 FIFO     Kthread   - Running            0000000000000000 003056 001136  37.1%  CPU3 IDLE
    4     4 --- 100 RR       Kthread   - Waiting  Semaphore 0000000000000000 001968 000704  35.7%  lpwork 0x80205ca0 0x80205cc8
    5     5   0 100 RR       Task      - Running            0000000000000000 003008 001416  47.0%  /system/bin/init
nsh> free
                   total       used       free    maxused    maxfree  nused  nfree
        Kmem:    2049024      15360    2033664      34384    2031824     32      3
        Page:    4194304     602112    3592192    3592192
nsh> hello
Hello, World!!
[    1.340000] [CPU1] riscv_exception: EXCEPTION: Load access fault. MCAUSE: 0000000000000005, EPC: 0000000080004bf6, MTVAL: 0000000000000008
[    1.340000] [CPU1] riscv_exception: PANIC!!! Exception = 0000000000000005

sabre-6quad:netnsh_smp

$ /home/ishikawa/opensource/QEMU/qemu-8.2.0/build/qemu-system-arm -semihosting -M sabrelite -m 1024 -smp 4 -kernel nuttx -nographic -net nic -net user,hostfwd=tcp::20023-:23,hostfwd=tcp::20021-:21,hostfwd=tcp::25001-:5001,hostfwd=tcp::25555-:5555
telnetd [7:100]

NuttShell (NSH) NuttX-12.0.0
nsh> uname -a
NuttX  12.0.0 29e50ffa73 Mar 11 2024 04:11:35 arm sabre-6quad
nsh> ps
  PID GROUP CPU PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0   0 FIFO     Kthread   - Assigned           0000000000000000 002032 000568  27.9%  CPU0 IDLE
    1     1   1   0 FIFO     Kthread   - Running            0000000000000000 002032 000664  32.6%  CPU1 IDLE
    2     2   2   0 FIFO     Kthread   - Running            0000000000000000 002032 000664  32.6%  CPU2 IDLE
    3     3   3   0 FIFO     Kthread   - Running            0000000000000000 002032 000664  32.6%  CPU3 IDLE
    4     4 --- 224 RR       Kthread   - Waiting  Semaphore 0000000000000000 001984 000432  21.7%  hpwork 0x10843830 0x10843844
    5     5 --- 100 RR       Kthread   - Waiting  Semaphore 0000000000000000 001984 000432  21.7%  lpwork 0x10843808 0x1084381c
    6     6   0 100 RR       Task      - Running            0000000000000000 003032 001264  41.6%  nsh_main
    7     7 --- 100 RR       Task      - Waiting  Semaphore 0000000000000000 002016 000760  37.6%  telnetd
nsh> free
                   total       used       free    maxused    maxfree  nused  nfree
        Umem: 1065033304      17520 1065015784      17616 1065015784     38      1
nsh> ifconfig
eth0	Link encap:Ethernet HWaddr 00:e0:de:ad:be:ef at UP mtu 1500
	inet addr:10.0.2.15 DRaddr:10.0.2.2 Mask:255.255.255.0

             IPv4   TCP   UDP  ICMP
Received     0000  0000  0000  0000
Dropped      0000  0000  0000  0000
  IPv4        VHL: 0000   Frg: 0000
  Checksum   0000  0000  0000  ----
  TCP         ACK: 0000   SYN: 0000
              RST: 0000  0000
  Type       0000  ----  ----  0000
Sent         0000  0000  0000  0000
  Rexmit     ----  0000  ----  ----
nsh> renew eth0
[    1.010000] [CPU0] arm_prefetchabort: Prefetch abort. PC: 00100000 IFAR: 00100000 IFSR: 0000000d
Traceback (most recent call last):

esp32:wifi_smp (dev board)

nsh> uname -a
NuttX  12.0.0 29e50ffa73 Mar 11 2024 04:01:32 xtensa esp32-devkitc
nsh> help
help usage:  help [-v] [<cmd>]

    .           cp          exit        kill        printf      true        
    [           cmp         expr        ls          ps          truncate    
    ?           dirname     false       md5         pwd         uname       
    alias       date        fdinfo      mkdir       rm          umount      
    unalias     dd          free        mkrd        rmdir       unset       
    arp         df          help        mount       set         uptime      
    basename    dmesg       hexdump     mv          sleep       usleep      
    break       echo        ifconfig    nfsmount    source      wget        
    cat         env         ifdown      nslookup    test        xd          
    cd          exec        ifup        pidof       time        

Builtin Apps:
    telnetd       ntpcstart     ping          taskset       
    wapi          ostest        getprime      hello         
    webserver     sh            renew         nsh           
    ntpcstatus    ntpcstop      iperf         smp           
nsh> ps
  PID GROUP CPU PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0   0 FIFO     Kthread   - Assigned           0000000000000000 002032 000720  35.4%  CPU0 IDLE
    1     1   1   0 FIFO     Kthread   - Running            0000000000000000 002024 000432  21.3%  CPU1 IDLE
    2     2 --- 100 RR       Kthread   - Waiting  Semaphore 0000000000000000 001976 000680  34.4%  lpwork 0x3ffb0c10 0x3ffb0c24
    3     3   0 100 RR       Task      - Running            0000000000000000 003024 001696  56.0%  nsh_main
    4     4 --- 255 RR       Kthread   - Waiting  Semaphore 0000000000000000 000696 000344  49.4%  spiflash_op 0x3ffe0f00
    5     5 --- 255 RR       Kthread   - Waiting  Semaphore 0000000000000000 000696 000344  49.4%  spiflash_op 0x3ffe0f00
    6     6 --- 223 RR       Kthread   - Waiting  Semaphore 0000000000000000 001992 000664  33.3%  rt_timer
    7     7 --- 253 RR       Kthread   - Waiting  MQ empty  0000000000000000 006624 001184  17.8%  wifi
    8     8 --- 100 RR       Task      - Waiting  Semaphore 0000000000000000 002008 001016  50.5%  telnetd
nsh> free
                   total       used       free    maxused    maxfree  nused  nfree
        Umem:     238756      63244     175512      63308     110000    117      2
nsh> hello
Hello, World!!
[    3.830000] [CPU0] xtensa_panic: Unhandled Exception 2 task: hello
[    3.830000] [CPU0] _assert: Current Version: NuttX  12.0.0 29e50ffa73 Mar 11 2024 04:01:32 xtensa
[    3.830000] [CPU0] _assert: Assertion failed panic: at file: common/xtensa_assert.c:84 task(CPU0): hello process: Kernel 0x40104d00
[    3.830000] [CPU0] up_dump_register:    PC: 00000000    PS: 00000016
[    3.830000] [CPU0] up_dump_register:    A0: 00000000    A1: 60035cf8    A2: 00000001    A3: 60035d60
[    3.830000] [CPU0] up_dump_register:    A4: 3ffd08c0    A5: 3ffb0a92    A6: 00000002    A7: 00000000
[    3.830000] [CPU0] up_dump_register:    A8: 800d88a0    A9: 3ffd0880   A10: 00000000   A11: 00000000
[    3.830000] [CPU0] up_dump_register:   A12: 00000000   A13: 3f000000   A14: 00000000   A15: 3ffe0e80
[    3.830000] [CPU0] up_dump_register:   SAR: 0000002b CAUSE: 00000014 VADDR: 00000000
[    3.830000] [CPU0] up_dump_register:  LBEG: 3ffb0c7c  LEND: 3ffcfda0  LCNT: 00060023

@masayuki2009
Copy link
Contributor

@anchao
Let me revert this PR until the issues are resolved.

@anchao
Copy link
Contributor Author

anchao commented Mar 11, 2024

I noticed the following crash with this PR.

@masayuki2009 emm ... sorry for the regression, thanks for your verification again. I have a fix that verifying internally. I will send a PR later.

@husong2
Copy link

husong2 commented Nov 29, 2025

@anchao @xiaoxiang781216
If there is a sleep call in a child thread, a crash will occur. The test case is as follows:

#define TIMEOUT 5		/* Timeout value of 5 seconds. */
#define INTHREAD 0		/* Control going to or is already for Thread */
#define INMAIN 1		/* Control going to or is already for Main */

static int sem1;			/* Manual semaphore */

static void *a_thread_func(void *arg)
{
	printf("a_thread_func entry \n");

	/* Indicate to main() that the thread was created. */
	sem1 = INTHREAD;

	/* Wait for main to detach change the attribute object and try and detach this thread.
	 * Wait for a timeout value of 10 seconds before timing out if the thread was not able
	 * to be detached. */
	// sleep(TIMEOUT);
	printf("a_thread_func entry 2222\n");
	printf
	    ("Test FAILED: Did not detach the thread, main still waiting for it to end execution.\n");
	pthread_exit((void *)PTS_FAIL);
	return NULL;
}

int main(void)
{
	pthread_t new_th;
	pthread_attr_t new_attr;
	int ret_val;

	/* Initializing */
	sem1 = INMAIN;
	if (pthread_attr_init(&new_attr) != 0) {
		perror("Cannot initialize attribute object\n");
		return PTS_UNRESOLVED;
	}

	/* Create a new thread passing it the new attribute object */
	if (pthread_create(&new_th, &new_attr, a_thread_func, NULL) != 0) {
		perror("Error creating thread\n");
		return PTS_UNRESOLVED;
	}

	/* Wait for thread to indicate that the start routine for the thread has started. */
	while (sem1 == INMAIN)
		sleep(1);

	/* If pthread_detach fails, that means that the test fails as well. */
	ret_val = pthread_detach(new_th);

	if (ret_val != 0) {
		/* Thread is already detached. */
		if (ret_val == EINVAL) {
			printf("Test FAILED\n");
			return PTS_FAIL;
		}
		/* pthread_detach() failed for another reason. */
		else {
			printf("Error in pthread_detach(), error: %d\n",
			       ret_val);
			return PTS_UNRESOLVED;
		}
	}

	printf("Test PASSED\n");

	printf("Test PASSED 2222 \n");

	return PTS_PASS;

}

The specific logic is as follows: When the process starts to exit by calling exit(), the main thread will be removed from the group. The call chain is as follows:

#0  group_leave (tcb=tcb@entry=0x412a1470) at ../../nuttx/sched/group/group_leave.c:195
#1  0x0012a32e in nxtask_exithook (tcb=tcb@entry=0x412a1470, status=status@entry=0) at ../../nuttx/sched/task/task_exithook.c:481
#2  0x0012ad78 in _exit (status=112, status@entry=1093276784) at ../../nuttx/sched/task/exit.c:106
#3  0x001bb22c in exit (status=1093276784) at ../../nuttx/libs/libc/stdlib/lib_exit.c:126
#4  0x001b92bc in nxtask_startup (entrypt=entrypt@entry=0x412a1470, argc=<optimized out>, argv=<optimized out>) at ../../nuttx/libs/libc/sched/task_startup.c:66
#5  0x00129e14 in nxtask_start () at ../../nuttx/sched/task/task_start.c:104
#6  0x00000000 in ?? ()

Then, when the child thread executes sleep(), it will also trigger the exit process: the child thread is removed from the group and its TCB is deleted. At this point, since the TCB and the group are allocated together in memory, the memory space of the group will be deleted along with the TCB. The call chain is as follows:

#0  group_leave (tcb=tcb@entry=0x412a1a28) at ../../nuttx/sched/group/group_leave.c:195
#1  0x0012a32e in nxtask_exithook (tcb=tcb@entry=0x412a1a28, status=status@entry=0) at ../../nuttx/sched/task/task_exithook.c:481
#2  0x001209b0 in nx_pthread_exit (exit_value=exit_value@entry=0xffffffff) at ../../nuttx/sched/pthread/pthread_exit.c:123
#3  0x00116726 in pthread_exit (exit_value=exit_value@entry=0xffffffff) at ../../nuttx/libs/libc/pthread/pthread_exit.c:71
#4  0x00111084 in leave_cancellation_point () at ../../nuttx/libs/libc/sched/task_cancelpt.c:200
#5  0x00134238 in clock_nanosleep (clockid=clockid@entry=0, flags=flags@entry=0, rqtp=rqtp@entry=0x412a2ec8, rmtp=rmtp@entry=0x412a2ed8)
    at ../../nuttx/sched/signal/sig_nanosleep.c:208
#6  0x004256c0 in sleep (seconds=seconds@entry=5) at ../../nuttx/libs/libc/unistd/lib_sleep.c:112
#7  0x00211d0e in a_thread_func (arg=<error reading variable: value has been optimized out>)
    at ../../apps/testing/ltp/ltp/testcases/open_posix_testsuite/conformance/interfaces/pthread_attr_init/2-1.c:46
#8  0x004089e0 in pthread_startup (entry=<optimized out>, arg=<optimized out>) at ../../nuttx/libs/libc/pthread/pthread_create.c:61
#9  0x00445baa in pthread_start () at ../../nuttx/sched/pthread/pthread_create.c:147
#10 0x00000000 in ?? ()
#0  umm_delayfree (mem=mem@entry=0x4129fa28) at ../../nuttx/mm/umm_heap/umm_free.c:69
#1  0x00122c7e in nxsched_release_tcb (tcb=tcb@entry=0x4129fa28, ttype=1)
    at ../../nuttx/sched/sched/sched_releasetcb.c:213
#2  0x001d3a70 in nxtask_exit () at ../../nuttx/sched/task/task_exit.c:128
#3  0x0013b568 in up_exit (status=status@entry=0) at ../../nuttx/arch/arm/src/common/arm_exit.c:59
#4  0x00120808 in nx_pthread_exit (exit_value=exit_value@entry=0xffffffff)
    at ../../nuttx/sched/pthread/pthread_exit.c:131
#5  0x00116636 in pthread_exit (exit_value=exit_value@entry=0xffffffff)
    at ../../nuttx/libs/libc/pthread/pthread_exit.c:71
#6  0x00110f94 in leave_cancellation_point () at ../../nuttx/libs/libc/sched/task_cancelpt.c:204
#7  0x001340c0 in clock_nanosleep (clockid=clockid@entry=0, flags=flags@entry=0, rqtp=rqtp@entry=0x412a0ec8,
    rmtp=rmtp@entry=0x412a0ed8) at ../../nuttx/sched/signal/sig_nanosleep.c:208
#8  0x00424ef0 in sleep (seconds=seconds@entry=5) at ../../nuttx/libs/libc/unistd/lib_sleep.c:112
#9  0x0021153e in a_thread_func (arg=<error reading variable: value has been optimized out>)
    at ../../apps/testing/ltp/ltp/testcases/open_posix_testsuite/conformance/interfaces/pthread_attr_init/2-1.c:46
#10 0x00408210 in pthread_startup (entry=<optimized out>, arg=<optimized out>)
    at ../../nuttx/libs/libc/pthread/pthread_create.c:61
#11 0x004453da in pthread_start () at ../../nuttx/sched/pthread/pthread_create.c:147
#12 0x00000000 in ?? ()

Finally, the process starts to exit by calling up_exit(). The exit process still contains an operation to delete the TCB, resulting in a double free. The call chain is as follows:

#0  __assert (filename=filename@entry=0x4d1504 "../../nuttx/mm/mm_heap/mm_free.c", linenum=linenum@entry=67, msg=msg@entry=0x0) at ../../nuttx/libs/libc/assert/lib_assert.c:39
#1  0x00117e8e in add_delaylist (heap=heap@entry=0x4129e000, mem=mem@entry=0x4129f470, asan_check=false) at ../../nuttx/mm/mm_heap/mm_free.c:67
#2  0x001186e4 in mm_delayfree (heap=0x4129e000, mem=mem@entry=0x4129f470) at ../../nuttx/mm/mm_heap/mm_free.c:249
#3  0x00117c1e in umm_delayfree (mem=mem@entry=0x4129f470) at ../../nuttx/mm/umm_heap/umm_free.c:70
#4  0x00122cae in nxsched_release_tcb (tcb=tcb@entry=0x4129f470, ttype=0) at ../../nuttx/sched/sched/sched_releasetcb.c:213
#5  0x001d3aa0 in nxtask_exit () at ../../nuttx/sched/task/task_exit.c:128
#6  0x0013b598 in up_exit (status=status@entry=0) at ../../nuttx/arch/arm/src/common/arm_exit.c:59
#7  0x0012ac00 in _exit (status=112, status@entry=1093268592) at ../../nuttx/sched/task/exit.c:114
#8  0x001bb0e4 in exit (status=1093268592) at ../../nuttx/libs/libc/stdlib/lib_exit.c:126
#9  0x001b9174 in nxtask_startup (entrypt=entrypt@entry=0x4129f470, argc=<optimized out>, argv=<optimized out>) at ../../nuttx/libs/libc/sched/task_startup.c:66
#10 0x00129c74 in nxtask_start () at ../../nuttx/sched/task/task_start.c:104
#11 0x00000000 in ?? ()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants