Skip to content

Conversation

@GUIDINGLI
Copy link
Contributor

Summary

This is a amend of:
0169a51

caused by wrong merge operation

related PR:
#5504

Impact

mem sem_take in IRQ

Testing

VELA

@masayuki2009
Copy link
Contributor

@GUIDINGLI

The issue still happens.

NuttShell (NSH) NuttX-3.6.1
nsh> uname -a
NuttX  3.6.1 3aef1a7012 Feb 18 2022 15:28:54 arm spresense
nsh> ps
  PID GROUP CPU PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK   STACK   USED  FILLED COMMAND
    0     0   0   0 FIFO     Kthread N-- Assigned           00000000 001000 000488  48.8%  CPU0 IDLE
    1     1   1   0 FIFO     Kthread N-- Running            00000000 001000 000228  22.8%  CPU1 IDLE
    2     2 --- 224 RR       Kthread --- Waiting  Signal    00000000 002016 000468  23.2%  hpwork 0x2d05d378
    3     3 ---  60 RR       Kthread --- Waiting  Semaphore 00000000 002016 000292  14.4%  lpwork 0x2d05d384
    5     5 --- 200 RR       Task    --- Waiting  MQ empty  00000000 001000 000496  49.6%  cxd56_pm_task
    6     6   0 100 RR       Task    --- Running            00000000 003048 001188  38.9%  spresense_main
nsh> free
                   total       used       free    largest  nused  nfree
        Umem:    1176144      42240    1133904    1131840    111      2
nsh> mount
  /mnt/spif type smartfs
  /proc type procfs
nsh> ifconfig
wlan0	Link encap:Ethernet HWaddr 00:00:00:00:00:00 at UP
	inet addr:0.0.0.0 DRaddr:0.0.0.0 Mask:0.0.0.0

nsh> gs2200m raspi3-g wifi-test-24g & 
gs2200m [7:50]
nsh> renew wlan0
nsh> ntpcstart
Started the NTP daemon as PID=24
nsh> ifconfig
wlan0	Link encap:Ethernet HWaddr 3c:95:09:00:89:96 at UP
	inet addr:192.168.10.22 DRaddr:192.168.10.1 Mask:255.255.255.0

nsh> mount
  /mnt/sd0 type vfat
  /mnt/spif type smartfs
  /proc type procfs
nsh> telnetd
nsh> webserver &
webserver [26:100]
nsh> Starting webserver
date
Fri, Feb 18 06:31:40 2022
nsh> nxplayer
NxPlayer version 1.05
h for commands, q to exit

nxplayer> play http://192.168.10.11/~ishikawa/audio/01-Technopolis-48k.wav
nxplayer> [   26.557831] [CPU1] arm_hardfault: Hard Fault escalation:
[   26.560272] [CPU1] arm_hardfault: PANIC!!! Hard Fault!:[   26.565308] [CPU1] arm_hardfault: 	IRQ: 3 regs: 0x2d095e44
[   26.570770] [CPU1] arm_hardfault: 	BASEPRI: 000000e0 PRIMASK: 00000000 IPSR: 00000003 CONTROL: 00000000
[   26.580139] [CPU1] arm_hardfault: 	CFSR: 00008200 HFSR: 40000000 DFSR: 00000000 BFAR: 02a9fa64 AFSR: 00000000
[   26.590057] [CPU1] arm_hardfault: Hard Fault Reason:
[   26.595001] [CPU1] up_assert: Assertion failed CPU1 at file:armv7-m/arm_hardfault.c line: 174 task: Telnet session
[   26.605315] [CPU1] arm_registerdump: R0: 2d0946d8 R1: 2d095f68 R2: 02a9fa5c  R3: 00001260
[   26.613464] [CPU1] arm_registerdump: R4: 2d0946d8 R5: 2d060c50 R6: 0d009355  FP: 2d094208
[   26.621642] [CPU1] arm_registerdump: R8: 2d095f68 SB: 2d060d98 SL: 00000048 R11: 0d049b08
[   26.629790] [CPU1] arm_registerdump: IP: 2d095f80 SP: 2d095f18 LR: 0d00952d  PC: 0d009384
[   26.637969] [CPU1] arm_registerdump: xPSR: 21000000 BASEPRI: 000000e0 CONTROL: 00000000
[   26.645934] [CPU1] arm_registerdump: EXC_RETURN: ffffffe9
[   26.651335] [CPU1] arm_dump_stack: IRQ Stack:
[   26.655669] [CPU1] arm_dump_stack: sp:     2d05ce78
[   26.660521] [CPU1] arm_dump_stack:   base: 2d05c6f8
[   26.665373] [CPU1] arm_dump_stack:   size: 00000800
[   26.670256] [CPU1] arm_stackdump: 2d05ce60: 2d05fea8 2d0706c0 2d05fea8 2d05ce78 2d070740 0d00a1a5 00000001 00000000
[   26.680662] [CPU1] arm_stackdump: 2d05ce80: 0d009384 0d008fe5 00000000 2d05c6f8 2d05cea8 40000000 00000003 2d095e44
[   26.691068] [CPU1] arm_stackdump: 2d05cea0: 00008200 0d0077a1 00000080 0d001f07 40000000 00000000 02a9fa64 00000000
[   26.701505] [CPU1] arm_stackdump: 2d05cec0: 0d001c25 00000000 2d05fea8 00000003 00000000 0d00406f 00000000 0d001e3f
[   26.711911] [CPU1] arm_dump_stack: User Stack:
[   26.716336] [CPU1] arm_dump_stack: sp:     2d095f18
[   26.721189] [CPU1] arm_dump_stack:   base: 2d095970
[   26.726041] [CPU1] arm_dump_stack:   size: 000007e8
[   26.730923] [CPU1] arm_stackdump: 2d095f00: 00000000 00000000 00000000 00000000 00000000 0d00919f 2d0946d8 0d00952d
[   26.741330] [CPU1] arm_stackdump: 2d095f20: 2d095f68 2d060c50 2d070e58 2d095f60 2d072098 0d0093f5 0d0093d1 000003b8
[   26.751736] [CPU1] arm_stackdump: 2d095f40: 2d072958 0d00e1f3 2d095f60 0d049af5 0d049aef 0d049afa 0d049af4 0d049aee
[   26.762173] [CPU1] arm_stackdump: 2d095f60: 00000000 00000000 00000000 00000005 00000100 00000130 00033668 000002c0
[   26.772579] [CPU1] arm_stackdump: 2d095f80: 00000018 2d072050 2d09467c 00000000 2d072050 00000003 00000000 0d058b06
[   26.782985] [CPU1] arm_stackdump: 2d095fa0: 00000000 0d03437b 00000003 00000400 00000400 2d072958 2d094210 0d034385
[   26.793422] [CPU1] arm_stackdump: 2d095fc0: 2d094210 0d02f229 2d094210 2d09467c 0d059741 2d094210 2d09605c 00000000
[   26.803829] [CPU1] arm_stackdump: 2d095fe0: ffffffff 00000001 00000000 00000004 00000000 0d02b90d 00000000 2d070740
[   26.814235] [CPU1] arm_stackdump: 2d096000: 2d05d75c 0d004cb5 00000001 0d004e11 00000001 0d004e11 2d05ff0d 2d05ff0c
[   26.824641] [CPU1] arm_stackdump: 2d096020: 2d05ff0d 2d094210 2d09467c 00000001 00000000 00000000 00000000 00000004
[   26.835078] [CPU1] arm_stackdump: 2d096040: 00000000 0d02c3c7 00000000 0d004e11 2d05ff0d 2d094681 00000000 2d09467c
[   26.845484] [CPU1] arm_stackdump: 2d096060: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   26.855891] [CPU1] arm_stackdump: 2d096080: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   26.866327] [CPU1] arm_stackdump: 2d0960a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   26.876734] [CPU1] arm_stackdump: 2d0960c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   26.887140] [CPU1] arm_stackdump: 2d0960e0: 00000000 00000000 00000000 00000000 00000000 00000000 2d09467c 2d094210
[   26.897577] [CPU1] arm_stackdump: 2d096100: 2d09467c 0d058b7d 00000000 00000000 00000000 00000000 00000000 0d02fea5
[   26.907983] [CPU1] arm_stackdump: 2d096120: 00000001 2d095958 0d02fe29 00000101 00000000 0d0079d3 00000001 2d095958
[   26.918450] [CPU1] arm_showtasks:    PID    PRI      USED     STACK   FILLED    COMMAND
[   26.926385] [CPU1] arm_showtasks:   ----   ----       480      2048    23.4%    irq
[   26.934075] [CPU1] arm_dump_task:      0      0       488      1000    48.8%    CPU0 IDLE
[   26.942223] [CPU1] arm_dump_task:      1      0       228      1000    22.8%    CPU1 IDLE
[   26.950402] [CPU1] arm_dump_task:      2    224       644      2016    31.9%    hpwork
[   26.958306] [CPU1] arm_dump_task:      3     60       804      2016    39.8%    lpwork
[   26.966149] [CPU1] arm_dump_task:      5    200       496      1000    49.6%    cxd56_pm_task
[   26.974724] [CPU1] arm_dump_task:      6    100      1580      3048    51.8%    spresense_main
[   26.983269] [CPU1] arm_dump_task:      7     50      1556      2000    77.8%    gs2200m
[   26.991233] [CPU1] arm_dump_task:     24    100      1764      1976    89.2%!   NTP daemon
[   26.999534] [CPU1] arm_dump_task:     25    100       624      2008    31.0%    Telnet daemon
[   27.008048] [CPU1] arm_dump_task:     26    100       628      2024    31.0%    webserver
[   27.016227] [CPU1] arm_dump_task:     27    100      1092      3048    35.8%    nxplayer
[   27.024314] [CPU1] arm_dump_task:     28    246       860      3072    27.9%    playthread
[   27.032462] [CPU1] arm_dump_task:     29    252       500      1024    48.8%    cxd56
[   27.040274] [CPU1] arm_dump_task:     30    100       556      1000    55.6%    telnet_io
[   27.048453] [CPU1] arm_dump_task:     63    100      1468      2024    72.5%    Telnet session
NuttShell (NSH) NuttX-3.6.1
nsh> uname -a
NuttX  3.6.1 3aef1a7012 Feb 18 2022 15:28:54 arm spresense
nsh> cat /proc/uptime
     24.74
nsh> ps
  PID GROUP CPU PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK   STACK   USED  FILLED COMMAND
    0     0   0   0 FIFO     Kthread N-- Running            00000000 001000 000488  48.8%  CPU0 IDLE
    1     1   1   0 FIFO     Kthread N-- Assigned           00000000 001000 000228  22.8%  CPU1 IDLE
    2     2 --- 224 RR       Kthread --- Waiting  Semaphore 00000000 002016 000644  31.9%  hpwork 0x2d05d378
    3     3 ---  60 RR       Kthread --- Waiting  Semaphore 00000000 002016 000804  39.8%  lpwork 0x2d05d384
   52    52   1 100 RR       Task    --- Running            00000000 002024 001468  72.5%  Telnet session
    5     5 --- 200 RR       Task    --- Waiting  MQ empty  00000000 001000 000496  49.6%  cxd56_pm_task
    6     6 --- 100 RR       Task    --- Waiting  Semaphore 00000000 003048 001580  51.8%  spresense_main
    7     7 ---  50 RR       Task    --- Waiting  Semaphore 00000000 002000 001556  77.8%  gs2200m raspi3-g wifi-test-24g
   24    24 --- 100 RR       Task    --- Waiting  Signal    00000000 001976 001764  89.2%! NTP daemon 0.pool.ntp.org;1.pool.ntp.org;
   25    25 --- 100 RR       Task    --- Waiting  Semaphore 00000000 002008 000624  31.0%  Telnet daemon 0x2d06bc60
   26    26 --- 100 RR       Task    --- Waiting  Semaphore 00000000 002024 000628  31.0%  webserver
   27    27 --- 100 RR       Task    --- Waiting  Semaphore 00000000 003048 001092  35.8%  nxplayer
   28    27 --- 246 RR       pthread --- Waiting  Semaphore 00000000 003072 000860  27.9%  playthread 0x2d06fbf0
   29    27 --- 252 RR       pthread --- Waiting  MQ empty  00000000 001024 000500  48.8%  cxd56 0x2d069ae0
   30    30 --- 100 RR       Kthread --- Waiting  Semaphore 00000000 001000 000556  55.6%  telnet_io
nsh> free
                   total       used       free    largest  nused  nfree
        Umem:    1176144     209424     966720     958112    253     12
nsh> ifconfig
wlan0	Link encap:Ethernet HWaddr 3c:95:09:00:89:96 at UP
	inet addr:192.168.10.22 DRaddr:192.168.10.1 Mask:255.255.255.0

nsh> exit
Connection closed by foreign host.
Trying 192.168.10.22...
Connected to 192.168.10.22.
Escape character is '^]'.

NuttShell (NSH) NuttX-3.6.1
nsh> uname -a
NuttX  3.6.1 3aef1a7012 Feb 18 2022 15:28:54 arm spresense
nsh> cat /proc/uptime
     26.36
nsh> ps
  PID GROUP CPU PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK   STACK   USED  FILLED COMMAND
    0     0   0   0 FIFO     Kthread N-- Running            00000000 001000 000488  48.8%  CPU0 IDLE
    1     1   1   0 FIFO     Kthread N-- Assigned           00000000 001000 000228  22.8%  CPU1 IDLE
    2     2 --- 224 RR       Kthread --- Waiting  Semaphore 00000000 002016 000644  31.9%  hpwork 0x2d05d378
    3     3 ---  60 RR       Kthread --- Waiting  Semaphore 00000000 002016 000804  39.8%  lpwork 0x2d05d384
    5     5 --- 200 RR       Task    --- Waiting  MQ empty  00000000 001000 000496  49.6%  cxd56_pm_task
    6     6 --- 100 RR       Task    --- Waiting  Semaphore 00000000 003048 001580  51.8%  spresense_main
    7     7 ---  50 RR       Task    --- Waiting  Semaphore 00000000 002000 001556  77.8%  gs2200m raspi3-g wifi-test-24g
   24    24 --- 100 RR       Task    --- Waiting  Signal    00000000 001976 001764  89.2%! NTP daemon 0.pool.ntp.org;1.pool.ntp.org;
   25    25 --- 100 RR       Task    --- Waiting  Semaphore 00000000 002008 000624  31.0%  Telnet daemon 0x2d06bc60
   26    26 --- 100 RR       Task    --- Waiting  Semaphore 00000000 002024 000628  31.0%  webserver
   27    27 --- 100 RR       Task    --- Waiting  Semaphore 00000000 003048 001092  35.8%  nxplayer
   28    27 --- 246 RR       pthread --- Waiting  Semaphore 00000000 003072 000860  27.9%  playthread 0x2d06fbf0
   29    27   0 252 RR       pthread --- Running            00000000 001024 000500  48.8%  cxd56 0x2d069ae0
   30    30 --- 100 RR       Kthread --- Waiting  Semaphore 00000000 001000 000556  55.6%  telnet_io
   63    63   1 100 RR       Task    --- Running            00000000 002024 001468  72.5%  Telnet session
nsh> free
Traceback (most recent call last):
  File "./expect_nuttx_telnetd_test.py", line 23, in <module>
    child.expect('nsh> ')
  File "/home/ishikawa/.local/lib/python3.6/site-packages/pexpect/spawnbase.py", line 344, in expect
    timeout, searchwindowsize, async_)
  File "/home/ishikawa/.local/lib/python3.6/site-packages/pexpect/spawnbase.py", line 372, in expect_list
    return exp.expect_loop(timeout)
  File "/home/ishikawa/.local/lib/python3.6/site-packages/pexpect/expect.py", line 181, in expect_loop
    return self.timeout(e)
  File "/home/ishikawa/.local/lib/python3.6/site-packages/pexpect/expect.py", line 144, in timeout
    raise exc
pexpect.exceptions.TIMEOUT: Timeout exceeded.

@masayuki2009
Copy link
Contributor

masayuki2009 commented Feb 18, 2022

@GUIDINGLI
I noticed that we can reproduce the memory corruption issue with sabre-6quad:netnsh_smp (QEMU)

$ ~/opensource/QEMU/qemu-5.2/build/qemu-system-arm -net nic -net user,hostfwd=tcp:127.0.0.1:10023-10.0.2.15:23,hostfwd=tcp:127.0.0.1:10021-10.0.2.15:21,hostfwd=tcp:127.0.0.1:15001-10.0.2.15:5001 -M sabrelite -smp 4 -kernel ./nuttx -nographic

ABCDGHIJKNOPQ

NuttShell (NSH) NuttX-3.6.1
nsh> nfsmount 43.31.77.50 /mnt/nfs /exports-nuttx 
nsh> md5 -f /mnt/nfs/audio/xxx.wav 
[   37.200000] [CPU1] up_assert: Assertion failed CPU1 at file:inode/fs_files.c line: 75 task: Telnet daemon
[   37.200000] [CPU1] arm_registerdump: R0: 00000001 R1: 00000000 R2: 00000000  R3: 00000000
[   37.210000] [CPU1] arm_registerdump: R4: 108608a0 R5: 10853500 R6: 1086091c  R7: 10861380
[   37.210000] [CPU1] arm_registerdump: R8: 00000000 SB: 00000018 SL: 00000000  FP: 00000001
[   37.210000] [CPU1] arm_registerdump: IP: 00000001 SP: 10861380 LR: 10809ac0  PC: 1080a22c
[   37.210000] [CPU1] arm_registerdump: CPSR: 60000053
[   37.210000] [CPU1] arm_dump_stack: IRQ Stack:
[   37.210000] [CPU1] arm_dump_stack: sp:     10861380
[   37.210000] [CPU1] arm_dump_stack:   base: 10853d10
[   37.210000] [CPU1] arm_dump_stack:   size: 00000800
[   37.210000] [CPU1] arm_dump_stack: ERROR: IRQ Stack pointer is not within the stack
[   37.220000] [CPU1] arm_dump_stack: User Stack:
[   37.220000] [CPU1] arm_dump_stack: sp:     10861380
[   37.220000] [CPU1] arm_dump_stack:   base: 10860d30
[   37.220000] [CPU1] arm_dump_stack:   size: 000007d8
[   37.220000] [CPU1] arm_stackdump: 10861380: 00000001 10809ac0 1080a22c 108088d0 00000000 10864a30 10860a50 00000000
[   37.220000] [CPU1] arm_stackdump: 108613a0: 00000001 10806788 10864a30 1081c3e0 00000000 10860a50 00000000 10864a30
[   37.220000] [CPU1] arm_stackdump: 108613c0: 00000000 1081c6f8 00000000 00000000 00000000 10860c20 108649f0 10864880
[   37.220000] [CPU1] arm_stackdump: 108613e0: 10864a30 00000000 108608a0 10846a6d 00000064 00000800 10846a61 10803898
[   37.230000] [CPU1] arm_stackdump: 10861400: 00000000 10864880 00000000 00000000 10846a6d 10802cd4 10864880 10846a6d
[   37.230000] [CPU1] arm_stackdump: 10861420: 10846a6d 10864880 00000064 00000000 00000000 1081d8ac 00000000 10802bc4
[   37.230000] [CPU1] arm_stackdump: 10861440: 00000800 1080ac8c 00000000 00000000 10860880 00000003 1081c998 00000000
[   37.230000] [CPU1] arm_stackdump: 10861460: 10802c14 10802c0c 00000800 1080ac8c 00000000 1080ac8c 10860880 10802c24
[   37.230000] [CPU1] arm_stackdump: 10861480: 00000000 00000064 10860880 1080b078 00000000 deadbeef deadbeef 00000000
[   37.230000] [CPU1] arm_stackdump: 108614a0: 00000001 00000010 6ad60002 0202000a 00000000 00000000 00000000 7665642f
[   37.240000] [CPU1] arm_stackdump: 108614c0: 6c65742f 3074656e deadbe00 deadbeef deadbeef 1080ae94 00000002 10860d08
[   37.240000] [CPU1] arm_stackdump: 108614e0: 00000000 00000000 00000000 00000000 00000000 10806ab0 00000000 10803234
[   37.240000] [CPU1] arm_showtasks:    PID    PRI      USED     STACK   FILLED    COMMAND
[   37.240000] [CPU1] arm_showtasks:   ----   ----       128      2048     6.2%    irq
[   37.240000] [CPU1] arm_dump_task:      0      0       480      1000    48.0%    CPU0 IDLE
[   37.240000] [CPU1] arm_dump_task:      1      0       208      1000    20.8%    CPU1 IDLE
[   37.240000] [CPU1] arm_dump_task:      2      0       208      1000    20.8%    CPU2 IDLE
[   37.240000] [CPU1] arm_dump_task:      3      0       208      1000    20.8%    CPU3 IDLE
[   37.240000] [CPU1] arm_dump_task:      4    224       160      2016     7.9%    hpwork
[   37.240000] [CPU1] arm_dump_task:      5    100       520      2016    25.7%    lpwork
[   37.240000] [CPU1] arm_dump_task:      6    100      1640      3048    53.8%    nsh_main
[   37.240000] [CPU1] arm_dump_task:      7    100       824      2008    41.0%    Telnet daemon
[   37.240000] [CPU1] arm_dump_task:      8    100       336      1000    33.6%    telnet_io
$ telnet localhost 10023
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

NuttShell (NSH) NuttX-3.6.1
nsh> uname -a
NuttX  3.6.1 3aef1a7012 Feb 18 2022 15:39:53 arm sabre-6quad
nsh> cat /proc/uptime
     36.29
nsh> ps
  PID GROUP CPU PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK   STACK   USED  FILLED COMMAND
    0     0   0   0 FIFO     Kthread N-- Running            00000000 001000 000480  48.0%  CPU0 IDLE
    1     1   1   0 FIFO     Kthread N-- Assigned           00000000 001000 000208  20.8%  CPU1 IDLE
    2     2   2   0 FIFO     Kthread N-- Assigned           00000000 001000 000208  20.8%  CPU2 IDLE
    3     3   3   0 FIFO     Kthread N-- Running            00000000 001000 000208  20.8%  CPU3 IDLE
    4     4 --- 224 RR       Kthread --- Waiting  Semaphore 00000000 002016 000160   7.9%  hpwork 0x10855d4c
    5     5   0 100 RR       Kthread --- Running            00000000 002016 000520  25.7%  lpwork 0x10855d58
    6     6 --- 100 RR       Task    --- Waiting  Semaphore 00000000 003048 001640  53.8%  nsh_main
    7     7 --- 100 RR       Task    --- Waiting  Semaphore 00000000 002008 000592  29.4%  Telnet daemon 0x10860880
    8     8 --- 100 RR       Kthread --- Waiting  Semaphore 00000000 001000 000336  33.6%  telnet_io
   28    28   2 100 RR       Task    --- Running            00000000 002024 001176  58.1%  Telnet session
nsh> free
                   total       used       free    largest  nused  nfree
        Umem: 1064972960      36112 1064936848 1064936816     93      2
nsh> ifconfig
eth0	Link encap:Ethernet HWaddr 00:e0:de:ad:be:ef at UP
	inet addr:10.0.2.15 DRaddr:10.0.2.2 Mask:255.255.255.0

             IPv4   TCP   UDP  ICMP
Received     55dc  55dc  0000  0000
Dropped      0000  0000  0000  0000
  IPv4        VHL: 0000   Frg: 0000
  Checksum   0000  0000  0000  ----
  TCP         ACK: 0000   SYN: 0000
              RST: 0000  0000
  Type       0000  ----  ----  0000
Sent         a4d9  a4d9  0000  0000
  Rexmit     ----  0005  ----  ----
nsh> exit
Connection closed by foreign host.
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection closed by foreign host.
Traceback (most recent call last):
  File "./expect_nuttx_telnetd_test.py", line 12, in <module>
    child.expect('nsh> ')
  File "/home/ishikawa/.local/lib/python3.6/site-packages/pexpect/spawnbase.py", line 344, in expect
    timeout, searchwindowsize, async_)
  File "/home/ishikawa/.local/lib/python3.6/site-packages/pexpect/spawnbase.py", line 372, in expect_list
    return exp.expect_loop(timeout)
  File "/home/ishikawa/.local/lib/python3.6/site-packages/pexpect/expect.py", line 179, in expect_loop
    return self.eof(e)
  File "/home/ishikawa/.local/lib/python3.6/site-packages/pexpect/expect.py", line 122, in eof
    raise exc

@GUIDINGLI
Copy link
Contributor Author

GUIDINGLI commented Feb 18, 2022

@masayuki2009
I tried as your method, sabre-6quad:netnsh_smp (QEMU).
After setup nfsserver & I tried mount on localhost succeed.

But nfsmount will meet failed with 13.Do you have any ideas ?

ligd@Opt:~/platform/mainline/nuttx$ sudo qemu-system-arm -net nic -net user,hostfwd=tcp:127.0.0.1:10023-10.0.2.15:23,hostfwd=tcp:127.0.0.1:10021-10.0.2.15:21,hostfwd=tcp:127.0.0.1:15001-10.0.2.15:5001 -M sabrelite -smp 4 -kernel ./nuttx -nographic

NuttShell (NSH) NuttX-10.2.0
nsh> 
nsh> nfsmount 192.168.31.12 /mnt/nfs /home/ligd/nfsserver
nsh: nfsmount: mount failed: 13
nsh> ping 192.168.31.12
PING 192.168.31.12 56 bytes of data
56 bytes from 192.168.31.12: icmp_seq=0 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=1 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=2 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=3 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=4 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=5 time=20 ms
56 bytes from 192.168.31.12: icmp_seq=6 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=7 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=8 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=9 time=10 ms
10 packets transmitted, 10 received, 0% packet loss, time 10100 ms
nsh> 

ligd@Opt:~/platform/miwear/ap/nuttx$ ifconfig
enp0s31f6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.31.12  netmask 255.255.255.0  broadcast 192.168.31.255
        inet6 fe80::a808:4407:d74e:610d  prefixlen 64  scopeid 0x20<link>
        ether 48:4d:7e:ba:45:79  txqueuelen 1000  (Ethernet)
        RX packets 230444262  bytes 96898530572 (96.8 GB)
        RX errors 0  dropped 339  overruns 0  frame 0
        TX packets 208757218  bytes 182553456623 (182.5 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 19  memory 0xf7000000-f7020000  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 36054974  bytes 2545359767 (2.5 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 36054974  bytes 2545359767 (2.5 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ligd@Opt:~/platform/miwear/ap/nuttx$ sudo showmount -e
Export list for Opt:
/home/ligd/nfsserver *

ligd@Opt:~/platform/miwear/ap/nuttx$ cat /etc/exports

/home/ligd/nfsserver *(rw,sync,no_root_squash,no_subtree_check)

@masayuki2009
Copy link
Contributor

@GUIDINGLI

But nfsmount will meet failed with 13.Do you have any ideas ?
...
ligd@Opt:~/platform/miwear/ap/nuttx$ cat /etc/exports

/home/ligd/nfsserver *(rw,sync,no_root_squash,no_subtree_check)

The following line is my setting in /etc/exports
I think you need insecure option.

/exports-nuttx 43.31.77.50/24(rw,sync,no_root_squash,no_subtree_check,insecure)

@GUIDINGLI
Copy link
Contributor Author

GUIDINGLI commented Feb 18, 2022

It worked, thanks !
But I haven't meet the error.

ligd@Opt:~/platform/mainline/nuttx$ sudo qemu-system-arm -net nic -net user,hostfwd=tcp:127.0.0.1:10023-10.0.2.15:23,hostfwd=tcp:127.0.0.1:10021-10.0.2.15:21,hostfwd=tcp:127.0.0.1:15001-10.0.2.15:5001 -M sabrelite -smp 4 -kernel ./nuttx -nographic
[sudo] password for ligd: 

NuttShell (NSH) NuttX-10.2.0
nsh> 
nsh> nfsmount 192.168.31.12 /mnt/nfs /home/ligd/nfsserver
nsh> 
nsh> md5 -f /mnt/nfs/bl.mp3
5ca4b22ff848cd393e3e289368a5139f
nsh> md5 -f /mnt/nfs/bl.mp3
5ca4b22ff848cd393e3e289368a5139f
nsh> md5 -f /mnt/nfs/bl.mp3
5ca4b22ff848cd393e3e289368a5139f
nsh> md5 -f /mnt/nfs/bl.mp3
5ca4b22ff848cd393e3e289368a5139f
nsh> md5 -f /mnt/nfs/bl.mp3
5ca4b22ff848cd393e3e289368a5139f
nsh> md5 -f /mnt/nfs/bl.mp3
5ca4b22ff848cd393e3e289368a5139f
nsh> md5 -f /mnt/nfs/bl.mp3
5ca4b22ff848cd393e3e289368a5139f
nsh> ps
  PID GROUP CPU PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK   STACK   USED  FILLED COMMAND
    0     0   0   0 FIFO     Kthread N-- Assigned           00000000 001000 000464  46.4%  CPU0 IDLE
    1     1   1   0 FIFO     Kthread N-- Running            00000000 001000 000208  20.8%  CPU1 IDLE
    2     2   2   0 FIFO     Kthread N-- Running            00000000 001000 000208  20.8%  CPU2 IDLE
    3     3   3   0 FIFO     Kthread N-- Running            00000000 001000 000208  20.8%  CPU3 IDLE
    4     4 --- 224 RR       Kthread --- Waiting  Semaphore 00000000 002016 000152   7.5%  hpwork 0x1084f50c
    5     5 --- 100 RR       Kthread --- Waiting  Semaphore 00000000 002016 000520  25.7%  lpwork 0x1084f518
    6     6   0 100 RR       Task    --- Running            00000000 003048 001704  55.9%  nsh_main
    7     7 --- 100 RR       Task    --- Waiting  Semaphore 00000000 002008 000384  19.1%  Telnet daemon 0x10859f50
nsh> 

@GUIDINGLI
Copy link
Contributor Author

GUIDINGLI commented Feb 18, 2022

Could you add a assert(0) at mm_takesemaphore() to see if there is someone who called it in IRQ context ?

 76 bool mm_takesemaphore(FAR struct mm_heap_s *heap)
 77 {
 78 #if defined(CONFIG_BUILD_FLAT) || defined(__KERNEL__)
 79   /* Check current environment */
 80 
 81   if (up_interrupt_context())
 82     {
 83 #if !defined(CONFIG_SMP) && defined(CONFIG_DEBUG_MM)
 84       int val;
 85 
 86       /* Check the semaphore value, if held by someone, then return false.
 87        * Else, we can take it, return true.
 88        */
 89 
 90       _SEM_GETVALUE(&heap->mm_semaphore, &val);
 91 
 92       return val > 0;
 93 #else
 94       /* Can't take semaphore in SMP interrupt handler */
 95 
 **_96       assert(0);_**                         **// here**
 97       return false;
 98 #endif
 99     }
100   else
101 #endif

@masayuki2009
Copy link
Contributor

masayuki2009 commented Feb 18, 2022

It worked, thanks !
But I haven't meet the error.

@GUIDINGLI
As I explained, you need to ** run ** some commands 'ps/free' via ** telnet ** during executing md5 on the nsh.

@GUIDINGLI
Copy link
Contributor Author

GUIDINGLI commented Feb 18, 2022

OK, add telnet.
But with which cmd, I haven't run this before.

ligd@Opt:~$ sudo telnet 10.0.2.15
[sudo] password for ligd: 
Trying 10.0.2.15...


^C

ligd@Opt:~$ telnet 127.0.0.1
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused


ligd@Opt:~/platform/mainline/nuttx$ qemu-system-arm -net nic -net user,hostfwd=tcp:127.0.0.1:10023-10.0.2.15:23,hostfwd=tcp:127.0.0.1:10021-10.0.2.15:21,hostfwd=tcp:127.0.0.1:15001-10.0.2.15:5001 -M sabrelite -smp 4 -kernel ./nuttx -nographic

NuttShell (NSH) NuttX-10.2.0
nsh> nfsmount 192.168.31.12 /mnt/nfs /home/ligd/nfsserver
nsh> ifconfig
eth0	Link encap:Ethernet HWaddr 00:e0:de:ad:be:ef at UP
	inet addr:10.0.2.15 DRaddr:10.0.2.2 Mask:255.255.255.0

             IPv4   TCP   UDP  ICMP
Received     0013  0013  0000  0000
Dropped      0000  0000  0000  0000
  IPv4        VHL: 0000   Frg: 0000
  Checksum   0000  0000  0000  ----
  TCP         ACK: 0000   SYN: 0000
              RST: 0000  0000
  Type       0000  ----  ----  0000
Sent         0019  0019  0000  0000
  Rexmit     ----  0005  ----  ----
nsh> 
nsh> ps
  PID GROUP CPU PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK   STACK   USED  FILLED COMMAND
    0     0   0   0 FIFO     Kthread N-- Assigned           00000000 001000 000464  46.4%  CPU0 IDLE
    1     1   1   0 FIFO     Kthread N-- Running            00000000 001000 000208  20.8%  CPU1 IDLE
    2     2   2   0 FIFO     Kthread N-- Running            00000000 001000 000208  20.8%  CPU2 IDLE
    3     3   3   0 FIFO     Kthread N-- Running            00000000 001000 000208  20.8%  CPU3 IDLE
    4     4 --- 224 RR       Kthread --- Waiting  Semaphore 00000000 002016 000152   7.5%  hpwork 0x1084f54c
    5     5 --- 100 RR       Kthread --- Waiting  Semaphore 00000000 002016 000352  17.4%  lpwork 0x1084f558
    6     6   0 100 RR       Task    --- Running            00000000 003048 001624  53.2%  nsh_main
    7     7 --- 100 RR       Task    --- Waiting  Semaphore 00000000 002008 000384  19.1%  Telnet daemon 0x10859f50
nsh> 
nsh> ping 192.168.31.12
PING 192.168.31.12 56 bytes of data
56 bytes from 192.168.31.12: icmp_seq=0 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=1 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=2 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=3 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=4 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=5 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=6 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=7 time=20 ms
56 bytes from 192.168.31.12: icmp_seq=8 time=10 ms
56 bytes from 192.168.31.12: icmp_seq=9 time=10 ms
10 packets transmitted, 10 received, 0% packet loss, time 10110 ms
nsh> 

@masayuki2009
Copy link
Contributor

@GUIDINGLI

ligd@Opt:~$ telnet 127.0.0.1

You need to specify the port number 10023 which is included in the command line when starting qemu.

$ telnet 127.0.0.1 10023

@masayuki2009
Copy link
Contributor

Could you add a assert(0) at mm_takesemaphore() to see if there is someone who called it in IRQ context ?

@GUIDINGLI
I added the assert(0)

diff --git a/mm/mm_heap/mm_sem.c b/mm/mm_heap/mm_sem.c
index 0911a6d300..c3ad37ffe8 100644
--- a/mm/mm_heap/mm_sem.c
+++ b/mm/mm_heap/mm_sem.c
@@ -93,6 +93,7 @@ bool mm_takesemaphore(FAR struct mm_heap_s *heap)
 #else
       /* Can't take semaphore in SMP interrupt handler */
 
+      assert(0);
       return false;
 #endif
     }

However, it stopped at another place due to memory corruption.

nsh> i= 0
md5 -f /mnt/nfs/audio/xxx.wav
[   39.590000] [CPU2] up_assert: Assertion failed CPU2 at file:vfs/fs_read.c line: 69 task: Telnet session
[   39.590000] [CPU2] arm_registerdump: R0: 00000001 R1: 00000000 R2: 00000000  R3: 00000000
[   39.590000] [CPU2] arm_registerdump: R4: 10864900 R5: 10853500 R6: 1086497c  R7: 108652f0
[   39.590000] [CPU2] arm_registerdump: R8: 00000000 SB: 00000001 SL: 1086564b  FP: 00000000
[   39.590000] [CPU2] arm_registerdump: IP: 00000001 SP: 108652f0 LR: 10809ac0  PC: 1080a22c
[   39.590000] [CPU2] arm_registerdump: CPSR: 60000053
[   39.600000] [CPU2] arm_dump_stack: IRQ Stack:
[   39.600000] [CPU2] arm_dump_stack: sp:     108652f0
[   39.600000] [CPU2] arm_dump_stack:   base: 10854510
[   39.600000] [CPU2] arm_dump_stack:   size: 00000800
[   39.600000] [CPU2] arm_dump_stack: ERROR: IRQ Stack pointer is not within the stack
[   39.600000] [CPU2] arm_dump_stack: User Stack:
[   39.600000] [CPU2] arm_dump_stack: sp:     108652f0
[   39.600000] [CPU2] arm_dump_stack:   base: 10864bf0
[   39.600000] [CPU2] arm_dump_stack:   size: 000007e8

@GUIDINGLI
Copy link
Contributor Author

@masayuki2009
Got your crash!

That is because lots of users use mm_takesemaphore() in idle thread, and don't care returns, like mm_malloc.
And it still called mm_givesemaphore(), so the semcount will be a large value.

So I obey the original design, don't check the IDLE in mm_takesemaphore(), let it call sem_wait

@GUIDINGLI
Copy link
Contributor Author

@masayuki2009
The newest version can fix this problem ?

@masayuki2009
Copy link
Contributor

The newest version can fix this problem ?

@GUIDINGLI
Let me check the latest PR tomorrow.

@masayuki2009
Copy link
Contributor

masayuki2009 commented Feb 18, 2022

Let me check the latest PR tomorrow.

@GUIDINGLI

Please change the commit message correctly.
I think this is a bug fix of previous changes and NOT the wrong merge operation.

That is because lots of users use mm_takesemaphore() in idle thread, and don't care returns, like mm_malloc.
And it still called mm_givesemaphore(), so the semcount will be a large value.

So I obey the original design, don't check the IDLE in mm_takesemaphore(), let it call sem_wait

Author: ligd <liguiding1@xiaomi.com>
Date:   Fri Feb 18 13:26:08 2022 +0800

    mm: handle take mm sem in IRQ
    
    This is a amend of:
    0169a51220a68d8d3bed0c20b6d606f43497a9e9
    
    caused by wrong merge operation
    
    Signed-off-by: ligd <liguiding1@xiaomi.com>

@masayuki2009
Copy link
Contributor

Let me check the latest PR tomorrow.

@GUIDINGLI
I confirmed that spresense:wifi_smp stress tests work for over 10hrs.
So, please amend the commit log messages as I pointed out.
After that, I will merge this PR.

@pkarashchenko
Copy link
Contributor

Posting a question for a thread as is was resolved, but question was not answered: Should we get back return _SEM_TRYWAIT(&heap->mm_semaphore) >= 0; at least for the else if (sched_idletask())?

@pkarashchenko
Copy link
Contributor

pkarashchenko commented Feb 20, 2022

Another question:
Should we rework all places that currently use DEBUGVERIFY(mm_takesemaphore(heap)); and change to

      if (!mm_takesemaphore(heap))
        {
          return;
        }

or similar?

@GUIDINGLI GUIDINGLI force-pushed the mine1 branch 2 times, most recently from b288a45 to b8e6d45 Compare February 21, 2022 08:25
@GUIDINGLI
Copy link
Contributor Author

Let me check the latest PR tomorrow.

@GUIDINGLI I confirmed that spresense:wifi_smp stress tests work for over 10hrs. So, please amend the commit log messages as I pointed out. After that, I will merge this PR.

OK, done

@GUIDINGLI
Copy link
Contributor Author

Posting a question for a thread as is was resolved, but question was not answered: Should we get back return _SEM_TRYWAIT(&heap->mm_semaphore) >= 0; at least for the else if (sched_idletask())?

System will not allow user call _SEM_TRYWAIT in idle thread.
See,
#5266
#5368

@GUIDINGLI
Copy link
Contributor Author

Another question: Should we rework all places that currently use DEBUGVERIFY(mm_takesemaphore(heap)); and change to

      if (!mm_takesemaphore(heap))
        {
          return;
        }

or similar?

Yes, we can do this, but that will be another PR.

@pkarashchenko
Copy link
Contributor

Posting a question for a thread as is was resolved, but question was not answered: Should we get back return _SEM_TRYWAIT(&heap->mm_semaphore) >= 0; at least for the else if (sched_idletask())?

System will not allow user call _SEM_TRYWAIT in idle thread. See, #5266 #5368

This is true only if CONFIG_PRIORITY_INHERITANCE=y.

@pkarashchenko
Copy link
Contributor

Posting a question for a thread as is was resolved, but question was not answered: Should we get back return _SEM_TRYWAIT(&heap->mm_semaphore) >= 0; at least for the else if (sched_idletask())?

System will not allow user call _SEM_TRYWAIT in idle thread. See, #5266 #5368

This is true only if CONFIG_PRIORITY_INHERITANCE=y.

again, this is wrong. only priority is not boosted

@pkarashchenko
Copy link
Contributor

I already pointed out the code that is affected by removal of _SEM_TRYWAIT(&heap->mm_semaphore) >= 0; from a else if (sched_idletask()). The case now will be always hit. I'm not sure is this will bring any issues and just want to be sure that we find and review impact in all possible places.

  if (mm_takesemaphore(heap) == false)
    {
      kasan_unpoison(mem, mm_malloc_size(mem));

      /* We are in IDLE task & can't get sem, or meet -ESRCH return,
       * which means we are in situations during context switching(See
       * mm_takesemaphore() & getpid()). Then add to the delay list.
       */

      mm_add_delaylist(heap, mem);
      return;
    }

@GUIDINGLI
Copy link
Contributor Author

GUIDINGLI commented Feb 21, 2022

@masayuki2009
Can I report a hang bug for you ?
When I run your testcase, I find another problem.

Take cmd_md5 as a loop, take cmd_free as a loop.
Run both them in bg.

Then after about half an hour, the system stuck.

ligd@Opt:~/platform/mainline/nuttx$ qemu-system-arm -net nic -net user,hostfwd=tcp:127.0.0.1:10023-10.0.2.15:23,hostfwd=tcp:127.0.0.1:10021-10.0.2.15:21,hostfwd=tcp:127.0.0.1:15001-10.0.2.15:5001 -M sabrelite -smp 4 -kernel ./nuttx -nographic

NuttShell (NSH) NuttX-10.2.0
nsh> nfsmount 10.221.68.177 /mnt/nfs /home/ligd/nfsserver
nsh> 
nsh> 
nsh> md5 -f /mnt/nfs/bl.mp3 &
md5 [8:100]
nsh> free &
free [9:100]
nsh>                    total       used       free    largest  nused  nfree
        Umem: 1064999840      36064 1064963776 1064963776     89      1
                   total       used       free    largest  nused  nfree
        Umem: 1064999840      35936 1064963904 1064963872     88      2
                   total       used       free    largest  nused  nfree
        Umem: 1064999840      35936 1064963904 1064963776     88      2
                   total       used       free    largest  nused  nfree
        Umem: 1064999840      35936 1064963904 1064963776     88      2
                   total       used       free    largest  nused  nfree
        Umem: 1064999840      36064 1064963776 1064963776     89      1
5ca4b22ff848cd393e3e289368a5139f
                   total       used       free    largest  nused  nfree
        Umem: 1064999840      35584 1064964256 1064963872     84      3
                   total       used       free    largest  nused  nfree
        Umem: 1064999840      36064 1064963776 1064963776     89      1
                   total       used       free    largest  nused  nfree
        Umem: 1064999840      35936 1064963904 1064963776     88      2
                   total       used       free    largest  nused  nfree
        Umem: 1064999840      36064 1064963776 1064963776     89      1



ligd@Opt:~/platform/mainline/apps$ git diff
diff --git a/nshlib/nsh_codeccmd.c b/nshlib/nsh_codeccmd.c
index 7f21accd3..a62da05a0 100644
--- a/nshlib/nsh_codeccmd.c
+++ b/nshlib/nsh_codeccmd.c
@@ -573,7 +573,11 @@ int cmd_base64decode(FAR struct nsh_vtbl_s *vtbl, int argc, char **argv)
 #ifdef HAVE_CODECS_HASH_MD5
 int cmd_md5(FAR struct nsh_vtbl_s *vtbl, int argc, char **argv)
 {
-  return cmd_codecs_proc(vtbl, argc, argv, CODEC_MODE_HASH_MD5, md5_cb);
+  while (1)
+    {
+      cmd_codecs_proc(vtbl, argc, argv, CODEC_MODE_HASH_MD5, md5_cb);
+      usleep(100 * 1000);
+    }
 }
 #endif
 
diff --git a/nshlib/nsh_mmcmds.c b/nshlib/nsh_mmcmds.c
index 898e127fd..d6ef0df87 100644
--- a/nshlib/nsh_mmcmds.c
+++ b/nshlib/nsh_mmcmds.c
@@ -39,7 +39,11 @@
 
 int cmd_free(FAR struct nsh_vtbl_s *vtbl, int argc, char **argv)
 {
-  return nsh_catfile(vtbl, argv[0], CONFIG_NSH_PROC_MOUNTPOINT "/meminfo");
+  while (1)
+    {
+      nsh_catfile(vtbl, argv[0], CONFIG_NSH_PROC_MOUNTPOINT "/meminfo");
+      usleep(100 * 1000);
+    }
 }

@masayuki2009
Copy link
Contributor

Can I report a hang bug for you ?
When I run your testcase, I find another problem.

Take cmd_md5 as a loop, take cmd_free as a loop.
Run both them in bg.

Then after about half an hour, the system stuck.

@GUIDINGLI

Thanks for the report.
I think this might be another issue.
Because I've seen the same issue with sabre-6quad:netnsh before.

@GUIDINGLI
Copy link
Contributor Author

I already pointed out the code that is affected by removal of _SEM_TRYWAIT(&heap->mm_semaphore) >= 0; from a else if (sched_idletask()). The case now will be always hit. I'm not sure is this will bring any issues and just want to be sure that we find and review impact in all possible places.

  if (mm_takesemaphore(heap) == false)
    {
      kasan_unpoison(mem, mm_malloc_size(mem));

      /* We are in IDLE task & can't get sem, or meet -ESRCH return,
       * which means we are in situations during context switching(See
       * mm_takesemaphore() & getpid()). Then add to the delay list.
       */

      mm_add_delaylist(heap, mem);
      return;
    }

We always use malloc/free at system init idle (like some initialize...), and at this time we allow user use sem_wait/trywait.
But we don't let users use sem_wait/trywait in idle loop, that is the real idle thread.

So, I commit another patch:

Author: ligd <liguiding1@xiaomi.com>
Date:   Mon Feb 21 18:14:10 2022 +0800

    os init_state: add new state OSINIT_OSIDLELOOP
    
    This is for distinguish we are in init idle or idle loop.
    Use for assertion for sem_trywait & sem_wait
    
    Signed-off-by: ligd <liguiding1@xiaomi.com>

@pkarashchenko
Copy link
Contributor

We always use malloc/free at system init idle (like some initialize...), and at this time we allow user use sem_wait/trywait.
But we don't let users use sem_wait/trywait in idle loop, that is the real idle thread.

there is up_idle from IDLE thread that can be platform / board specific. So we do not know if allocation or free is happening there. I'm against keeping this case unhandled

@GUIDINGLI
Copy link
Contributor Author

there is up_idle from IDLE thread that can be platform / board specific. So we do not know if allocation or free is happening there. I'm against keeping this case unhandled

If you keeping this un-handled, but how to implement up_idle(), in IDLE, user can call wait or not ?
If I haven't miss your point, you want:

  1. CONFIG_PRIORITY_INHERITANCE=y, idle can't call wait.
  2. if don't set CONFIG_PRIORITY_INHERITANCE, idle can call wait.

Then you want the user who implement up_idle() to divide the situation ?

@pkarashchenko
Copy link
Contributor

there is up_idle from IDLE thread that can be platform / board specific. So we do not know if allocation or free is happening there. I'm against keeping this case unhandled

If you keeping this un-handled, but how to implement up_idle(), in IDLE, user can call wait or not ? If I haven't miss your point, you want:

1. CONFIG_PRIORITY_INHERITANCE=y,  idle can't call wait.

2. if don't set CONFIG_PRIORITY_INHERITANCE, idle can call wait.

Then you want the user who implement up_idle() to divide the situation ?

I reexamined the changes again and see that you routed IDLE task case to _SEM_WAIT(&heap->mm_semaphore); call in else. That is ok. I missed the point when return false; was removed together with else if (sched_idletask()) condition.

@GUIDINGLI
Copy link
Contributor Author

@pkarashchenko @masayuki2009 @xiaoxiang781216

I want to give a limitation to idle thread, that user can't call wait/trywait in IDLE thread.
To reach this point, so I added a new init_state OSINIT_OS_IDLELOOP.
How do you think ?

@xiaoxiang781216
Copy link
Contributor

@pkarashchenko @masayuki2009 @xiaoxiang781216

I want to give a limitation to idle thread, that user can't call wait/trywait in IDLE thread. To reach this point, so I added a new init_state OSINIT_OS_IDLELOOP. How do you think ?

Let's move the change to new PR? It isn't related to heap at all.

@pkarashchenko
Copy link
Contributor

@pkarashchenko @masayuki2009 @xiaoxiang781216
I want to give a limitation to idle thread, that user can't call wait/trywait in IDLE thread. To reach this point, so I added a new init_state OSINIT_OS_IDLELOOP. How do you think ?

Let's move the change to new PR? It isn't related to heap at all.

Agree. This is a point for wider discussion. Common sense whispering to me we can even assert i case if IDLE thread tries to pen on sync primitive, but maybe I'm missing some points. so do not want to mix the things.

@GUIDINGLI
Copy link
Contributor Author

OK, that the new PR:
#5577

This is a fix of:
0169a51
This is caused by wrong memory sem operation in IDLE.

Fix:
Obey the original design, don't check the IDLE in mm_takesemaphore()

Signed-off-by: ligd <liguiding1@xiaomi.com>
@xiaoxiang781216 xiaoxiang781216 merged commit 419bc2f into apache:master Feb 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants