Skip to content

Conversation

@TakuyaMiyasita
Copy link
Contributor

@TakuyaMiyasita TakuyaMiyasita commented Nov 7, 2024

Summary

Some armv7-m-based SoCs do not work with atomic instructions, even though armv7-m supports them.

To avoid using atomic instructions generated by gcc, CONFIG_LIBC_ARCH_ATOMIC is newly introduced with which arch_atomic.c is linked explicitly.

However, the function names need to be changed to avoid build errors, since the functions described in stdatomic.h are gcc built-in and inlined when the code is compiled.

Impact

CONFIG_ARCH_CHIP_CXD32XX.

Testing

About ARCH_CHIP_CXD32XX, the build is passed and ostest is passed, but the board is out-of-tree.
About raspberrypi-pico:nsh, the build is passed. (not executed)
About spresense:smp, the build is passed. (not executed)

tool-chain version:

arm-none-eabi-gcc (GNU Arm Embedded Toolchain 10.3-2021.10) 10.3.1 20210824 (release)
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and

arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors) 5.3.1 20160307 (release) [ARM/embedded-5-branch revision 234589]
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@github-actions github-actions bot added Arch: arm Issues related to ARM (32-bit) architecture Area: OS Components OS Components issues Size: M The size of the change in this PR is medium labels Nov 7, 2024
@nuttxpr
Copy link

nuttxpr commented Nov 7, 2024

[Experimental Bot, please feedback here]

No, this PR does not fully meet the NuttX requirements. Here's why and how to fix it:

Missing Information/Improvements Needed:

  • Summary: While you explain what you changed, the why is not sufficiently clear. Why do these SoCs not work with atomic instructions even though armv7-m supports them? Is this a silicon bug? A missing coprocessor? More details are crucial. Also, be more specific about "what functional part of the code is being changed." Mention affected files/modules (e.g., arch_atomic.c, stdatomic.h).
  • Impact: You only mention CONFIG_ARCH_CHIP_CXD32XX. This is incomplete. Address all the impact points. Even if the answer is "NO," explicitly state it. For example:
    • Is new feature added?: NO
    • Is existing feature changed?: YES (atomic operation implementation for CXD32XX)
    • Impact on user: Potentially YES if users rely on atomics on CXD32XX. Explain what they might observe before/after the change.
    • Impact on build: YES (new config option CONFIG_LIBC_ARCH_NXATOMIC) Explain how to use the new option.
    • Impact on hardware: YES (CXD32XX). Explain why other armv7-m are not affected.
    • Impact on documentation: YES (at least for the new config option). State if documentation has been updated in the PR or if a separate documentation update is planned.
    • Impact on security: Possibly YES or NO. Justify your answer. Incorrect atomic operations can have security implications.
    • Impact on compatibility: Potentially YES. Does this break compatibility with existing code that uses atomics on CXD32XX?
  • Testing: This is insufficient.
    • "ostes is passed" is too vague. Which tests in ostes?
    • "Build is passed" for other platforms isn't real testing. You need to run tests on representative platforms to ensure no regressions. If you cannot run on actual hardware, explain why and consider using a simulator.
    • Provide relevant log snippets. Don't just dump entire logs. Focus on the sections that demonstrate the issue before and the fix after. Show that atomic operations are now working correctly on CXD32XX.
    • Testing on only one compiler version is not ideal. If feasible, test with other common toolchains.

Example of Improved Impact Section:

  • Is new feature added?: NO
  • Is existing feature changed?: YES (atomic operation implementation for CXD32XX)
  • Impact on user: YES. Previously, atomic operations on CXD32XX could lead to undefined behavior due to silicon errata [link to errata if available]. This change ensures correct atomic behavior. Users relying on atomic operations on CXD32XX should see improved stability and data consistency.
  • Impact on build: YES. A new Kconfig option CONFIG_LIBC_ARCH_NXATOMIC is introduced. To enable it for CXD32XX, add CONFIG_LIBC_ARCH_NXATOMIC=y to your defconfig.
  • Impact on hardware: YES (CXD32XX). This change specifically addresses the atomic operation issue on CXD32XX. Other armv7-m architectures are not affected because they correctly implement the required atomic instructions.
  • Impact on documentation: YES. The Kconfig documentation has been updated to describe the CONFIG_LIBC_ARCH_NXATOMIC option and its usage. (Provide path to updated documentation file.)
  • Impact on security: YES. Incorrect atomic operations could potentially lead to race conditions and data corruption, which could have security implications. This change mitigates these risks on CXD32XX.
  • Impact on compatibility: Potentially YES. If existing code on CXD32XX relied on the previous (incorrect) behavior of atomics, it might require adaptation. However, this is unlikely as the previous behavior was undefined.

By addressing these points, your PR will be much stronger and easier for reviewers to evaluate. Remember, clarity and thoroughness are key to a successful contribution!

@TakuyaMiyasita
Copy link
Contributor Author

@nuttxpr

Why do these SoCs not work with atomic instructions even though armv7-m supports them? Is this a silicon bug? A missing coprocessor

please refer following comment.
4cec713#commitcomment-148460811

@lupyuen
Copy link
Member

lupyuen commented Nov 7, 2024

Sorry I have to cancel the CI Check, there's a problem with CI Test:

@TakuyaMiyasita TakuyaMiyasita force-pushed the fix/1730-platform-atomic branch 2 times, most recently from e52d399 to 60e61b9 Compare November 8, 2024 04:01
@TakuyaMiyasita
Copy link
Contributor Author

@xiaoxiang781216

About CI test : sim/cxxtest was not successful.

I checked it with master and the result was same.

I think this problem is not come from this PR,
maybe due to the url : https://git.busybox.net/uClibc++/snapshot/$@ which is described in nuttx/libs/libxx/uClibc++/Make.defs.

@lupyuen
Copy link
Member

lupyuen commented Nov 8, 2024

SSL certificate expired 😂

Configuration/Tool: sim/cxxtest
curl: (60) SSL certificate problem: certificate has expired
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it.

https://github.com/apache/nuttx/actions/runs/11735845781/job/32694187326?pr=14681#step:7:121

@lupyuen
Copy link
Member

lupyuen commented Nov 8, 2024

Hi @TakuyaMiyasita I cloned your branch to my repo and enabled all builds. Let's wait for the results here thanks! https://github.com/nuttxpr2/nuttx/actions/runs/11740819831

Copy link
Member

@lupyuen lupyuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, everything builds OK except for the 2 SSL Cert Errors. Thanks! :-)
https://github.com/nuttxpr2/nuttx/actions/runs/11740819831

@TakuyaMiyasita TakuyaMiyasita force-pushed the fix/1730-platform-atomic branch from 60e61b9 to 66c8e3f Compare November 11, 2024 02:06
@TakuyaMiyasita TakuyaMiyasita force-pushed the fix/1730-platform-atomic branch 3 times, most recently from 903d1df to f1b2ab6 Compare November 11, 2024 07:31
@github-actions github-actions bot added Size: L The size of the change in this PR is large and removed Size: M The size of the change in this PR is medium labels Nov 11, 2024
@TakuyaMiyasita TakuyaMiyasita force-pushed the fix/1730-platform-atomic branch from f1b2ab6 to 0a2ae9d Compare November 11, 2024 07:56
@github-actions github-actions bot added Size: M The size of the change in this PR is medium and removed Size: L The size of the change in this PR is large labels Nov 11, 2024
@TakuyaMiyasita
Copy link
Contributor Author

@xiaoxiang781216
Thank you for your precious advices.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Nov 11, 2024

@xiaoxiang781216 Thank you for your precious advices.

you are welcome, thank sending patch to fix the issue, which make the atomic support more generalize than before.

@xiaoxiang781216 xiaoxiang781216 changed the title arch_atomic : Introduce CONFIG_LIBC_ARCH_NXATOMIC arch_atomic : Introduce CONFIG_LIBC_ARCH_ATOMIC Nov 11, 2024
@TakuyaMiyasita TakuyaMiyasita force-pushed the fix/1730-platform-atomic branch from cbc21c6 to 662cf5d Compare November 12, 2024 01:50
@xiaoxiang781216
Copy link
Contributor

@TakuyaMiyasita but the new patch doesn't resolve my and @masayuki2009 's comment.

@TakuyaMiyasita TakuyaMiyasita force-pushed the fix/1730-platform-atomic branch from 662cf5d to 06de36d Compare November 12, 2024 03:19
@github-actions github-actions bot added Size: L The size of the change in this PR is large and removed Size: M The size of the change in this PR is medium labels Nov 12, 2024
Some armv7-m-based SoCs do not work with atomic instructions,
even though armv7-m supports them.

To avoid using atomic instructions generated by gcc,
CONFIG_LIBC_ARCH_ATOMIC is newly introduced with which
arch_atomic.c is linked explicitly.

However, the function names need to be changed to avoid
build errors, since the functions described in stdatomic.h
are gcc built-in and inlined when the code is compiled.

Signed-off-by: Takuya Miyasita <Takuya.Miyashita@sony.com>
@TakuyaMiyasita TakuyaMiyasita force-pushed the fix/1730-platform-atomic branch from 06de36d to 14514dc Compare November 12, 2024 03:52
@TakuyaMiyasita
Copy link
Contributor Author

@xiaoxiang781216 @masayuki2009
Thank you for your advices.
I have done them, please...

@xiaoxiang781216
Copy link
Contributor

LGTM.

@masayuki2009 masayuki2009 merged commit 81e7b13 into apache:master Nov 12, 2024
@lupyuen
Copy link
Member

lupyuen commented Nov 12, 2024

Sorry @TakuyaMiyasita: I think esp32s3-devkit/toywasm might be failing due to this PR?

Configuration/Tool: esp32s3-devkit/toywasm
In file included from /github/workspace/sources/apps/interpreters/toywasm/toywasm/lib/exec_context.h:2,
                 from toywasm/libdyld/dyld.c:26:
Error: /tools/xtensa-esp32s3-elf-gcc/lib/gcc/xtensa-esp32s3-elf/12.2.0/include/stdatomic.h:31:5: error: redeclaration of enumerator 'memory_order_relaxed'
   31 |     memory_order_relaxed = __ATOMIC_RELAXED,
      |     ^~~~~~~~~~~~~~~~~~~~
In file included from /github/workspace/sources/nuttx/include/nuttx/atomic.h:31,
                 from /github/workspace/sources/nuttx/include/nuttx/fs/fs.h:45,
                 from /github/workspace/sources/nuttx/include/stdio.h:36,
                 from toywasm/libdyld/dyld.c:18:
/github/workspace/sources/nuttx/include/nuttx/lib/stdatomic.h:170:5: note: previous definition of 'memory_order_relaxed' with type 'enum <anonymous>'
  170 |     memory_order_relaxed = __ATOMIC_RELAXED,
      |     ^~~~~~~~~~~~~~~~~~~~

https://github.com/NuttX/nuttx/actions/runs/11791606414/job/32843958478#step:7:19347

Update: There are 3 more failures at nuttx-dashboard.org: esp32s3-box/lvgl-3, esp32s3-devkit/cxx, esp32s3-devkit/qemu_debug

@tmedicci
Copy link
Contributor

Sorry @TakuyaMiyasita: I think esp32s3-devkit/toywasm might be failing due to this PR?

Configuration/Tool: esp32s3-devkit/toywasm
In file included from /github/workspace/sources/apps/interpreters/toywasm/toywasm/lib/exec_context.h:2,
                 from toywasm/libdyld/dyld.c:26:
Error: /tools/xtensa-esp32s3-elf-gcc/lib/gcc/xtensa-esp32s3-elf/12.2.0/include/stdatomic.h:31:5: error: redeclaration of enumerator 'memory_order_relaxed'
   31 |     memory_order_relaxed = __ATOMIC_RELAXED,
      |     ^~~~~~~~~~~~~~~~~~~~
In file included from /github/workspace/sources/nuttx/include/nuttx/atomic.h:31,
                 from /github/workspace/sources/nuttx/include/nuttx/fs/fs.h:45,
                 from /github/workspace/sources/nuttx/include/stdio.h:36,
                 from toywasm/libdyld/dyld.c:18:
/github/workspace/sources/nuttx/include/nuttx/lib/stdatomic.h:170:5: note: previous definition of 'memory_order_relaxed' with type 'enum <anonymous>'
  170 |     memory_order_relaxed = __ATOMIC_RELAXED,
      |     ^~~~~~~~~~~~~~~~~~~~

https://github.com/NuttX/nuttx/actions/runs/11791606414/job/32843958478#step:7:19347

Update: There are 3 more failures at nuttx-dashboard.org: esp32s3-box/lvgl-3, esp32s3-devkit/cxx, esp32s3-devkit/qemu_debug

There are more we caught on our internal CI. I've just triggered the CI in the/before the commit specifically to double-check. As soon as it finishes, I'll report here.

@tmedicci
Copy link
Contributor

General reports:

  • Our CI builds the firmware with the following configs enabled:
diff --git a/boards/xtensa/esp32s3/esp32s3-devkit/configs/cxx/defconfig b/boards/xtensa/esp32s3/esp32s3-devkit/configs/cxx/defconfig
index 1ce9a56b6a..617a4b8737 100644
--- a/boards/xtensa/esp32s3/esp32s3-devkit/configs/cxx/defconfig
+++ b/boards/xtensa/esp32s3/esp32s3-devkit/configs/cxx/defconfig
@@ -22,8 +22,12 @@ CONFIG_BOARD_LOOPSPERMSEC=16717
 CONFIG_BUILTIN=y
 CONFIG_CXX_LOCALIZATION=y
 CONFIG_CXX_WCHAR=y
+CONFIG_DEBUG_ASSERTIONS=y
+CONFIG_DEBUG_ASSERTIONS_EXPRESSION=y
+CONFIG_DEBUG_FEATURES=y
 CONFIG_DEBUG_FULLOPT=y
 CONFIG_DEBUG_SYMBOLS=y
+CONFIG_ESP32S3_MERGE_BINS=y
 CONFIG_ESP32S3_UART0=y
 CONFIG_FS_PROCFS=y
 CONFIG_HAVE_CXX=y

That being said, we have the following build errors related to this PR:

  • esp32s3-devkit:cxx
  • esp32s3-devkit:qemu_debug
  • esp32s3-devkit:qemu_debug
  • esp32s3-devkit:toywasm

About HW testing, we found the following bugs so far:

  • esp32s3-devkit:blewifi fails to scan BT and ping a connected AP:
dump_assert_info: Current Version: NuttX  10.4.0 81e7b13a05 Nov 12 2024 12:42:27 xtensa
dump_assert_info: Assertion failed sem != ((void*)0) && (sizeof(*((( atomic_short *)&(sem)->semcount))) == 1 ? nx_atomic_load_1((( atomic_short *)&(sem)->semcount), 0) : sizeof(*((( atomic_short *)&(sem)->semcount))) == 2 ? nx_atomic_load_2((( atomic_short *)&(sem)->semcount), 0) : sizeof(*((( atomic_short *)&(sem)->semcount))) == 4 ? nx_atomic_load_4((( atomic_short *)&(sem)->semcount), 0) : nx_atomic_load_8((( atomic_short *)&(sem)->semcount), 0)) < 0: at file: semaphore/sem_waitirq.c:90 task: Idle_Task process: Kernel 0x42062eb0

@masayuki2009
Copy link
Contributor

Let me revert this PR until esp32s3-related issues have been resolved.

@acassis
Copy link
Contributor

acassis commented Nov 12, 2024

@tmedicci CONFIG_DEBUG_ASSERTIONS and any other debug shouldn't be included in the board config (in the past Greg rejected all defconfigs with it, now we know there are some because we don't have same quality control). In the production firmware debug assertion and/or debug symbols shouldn't be enabled.

So, I think the issue needs to be fixed without this option.

@lupyuen
Copy link
Member

lupyuen commented Nov 12, 2024

FYI If we need to Enable All Builds in our repo, please patch build.yml like this:

@xiaoxiang781216
Copy link
Contributor

General reports:

  • Our CI builds the firmware with the following configs enabled:
diff --git a/boards/xtensa/esp32s3/esp32s3-devkit/configs/cxx/defconfig b/boards/xtensa/esp32s3/esp32s3-devkit/configs/cxx/defconfig
index 1ce9a56b6a..617a4b8737 100644
--- a/boards/xtensa/esp32s3/esp32s3-devkit/configs/cxx/defconfig
+++ b/boards/xtensa/esp32s3/esp32s3-devkit/configs/cxx/defconfig
@@ -22,8 +22,12 @@ CONFIG_BOARD_LOOPSPERMSEC=16717
 CONFIG_BUILTIN=y
 CONFIG_CXX_LOCALIZATION=y
 CONFIG_CXX_WCHAR=y
+CONFIG_DEBUG_ASSERTIONS=y
+CONFIG_DEBUG_ASSERTIONS_EXPRESSION=y
+CONFIG_DEBUG_FEATURES=y
 CONFIG_DEBUG_FULLOPT=y
 CONFIG_DEBUG_SYMBOLS=y
+CONFIG_ESP32S3_MERGE_BINS=y
 CONFIG_ESP32S3_UART0=y
 CONFIG_FS_PROCFS=y
 CONFIG_HAVE_CXX=y

That being said, we have the following build errors related to this PR:

  • esp32s3-devkit:cxx
  • esp32s3-devkit:qemu_debug
  • esp32s3-devkit:qemu_debug
  • esp32s3-devkit:toywasm

About HW testing, we found the following bugs so far:

  • esp32s3-devkit:blewifi fails to scan BT and ping a connected AP:
dump_assert_info: Current Version: NuttX  10.4.0 81e7b13a05 Nov 12 2024 12:42:27 xtensa
dump_assert_info: Assertion failed sem != ((void*)0) && (sizeof(*((( atomic_short *)&(sem)->semcount))) == 1 ? nx_atomic_load_1((( atomic_short *)&(sem)->semcount), 0) : sizeof(*((( atomic_short *)&(sem)->semcount))) == 2 ? nx_atomic_load_2((( atomic_short *)&(sem)->semcount), 0) : sizeof(*((( atomic_short *)&(sem)->semcount))) == 4 ? nx_atomic_load_4((( atomic_short *)&(sem)->semcount), 0) : nx_atomic_load_8((( atomic_short *)&(sem)->semcount), 0)) < 0: at file: semaphore/sem_waitirq.c:90 task: Idle_Task process: Kernel 0x42062eb0

@tmedicci could you enable this option in some of the mainline defconfig?

@tmedicci
Copy link
Contributor

tmedicci commented Nov 12, 2024

@tmedicci CONFIG_DEBUG_ASSERTIONS and any other debug shouldn't be included in the board config (in the past Greg rejected all defconfigs with it, now we know there are some because we don't have same quality control). In the production firmware debug assertion and/or debug symbols shouldn't be enabled.

So, I think the issue needs to be fixed without this option.

I didn't say it's enabled on defconfig, I said our internal CI enables it to test for situations where it could assert. In fact, there are some defconfigs on NuttX (upstream) that contain this config (it may have been submitted accidently): among other board's defconfigs there are a lot of them that enables it too. We need to fix that!

About the debug assertions, it doesn't matter. Ideally, the issue should be fixed for both situations, right? Nothing is expected to assert. If asserted, something may not be in the right place.

@tmedicci could you enable this option in some of the mainline defconfig?

Hi @xiaoxiang781216 , I didn't get your question. Do you mean about those configs I've sent? Our internal CI always "adds" them to the defconfigs to run the tests with debug assertions enabled (this helps quite a lot to get bugs during execution).

@raiden00pl
Copy link
Member

@tmedicci CONFIG_DEBUG_ASSERTIONS and any other debug shouldn't be included in the board config (in the past Greg rejected all defconfigs with it, now we know there are some because we don't have same quality control). In the production firmware debug assertion and/or debug symbols shouldn't be enabled.

So, I think the issue needs to be fixed without this option.

This is a bad approach. Upsteram configurations aren't production firmware but rather examples of use. Without DEBUGASSERTs enabled it's much harder to catch bugs. What's more, a developer implementing a new feature may never enable DEBUGASSERT and thus contribute a non-working driver to the upstream (which crashes when DEBUG_ASSERTIONS=y in some of the internal OS functions). Another thing, having DEBUG_SYMBOLS disabled is very irritating when you are testing configurations.

@acassis
Copy link
Contributor

acassis commented Nov 12, 2024

@tmedicci CONFIG_DEBUG_ASSERTIONS and any other debug shouldn't be included in the board config (in the past Greg rejected all defconfigs with it, now we know there are some because we don't have same quality control). In the production firmware debug assertion and/or debug symbols shouldn't be enabled.
So, I think the issue needs to be fixed without this option.

This is a bad approach. Upsteram configurations aren't production firmware but rather examples of use. Without DEBUGASSERTs enabled it's much harder to catch bugs. What's more, a developer implementing a new feature may never enable DEBUGASSERT and thus contribute a non-working driver to the upstream (which crashes when DEBUG_ASSERTIONS=y in some of the internal OS functions). Another thing, having DEBUG_SYMBOLS disabled is very irritating when you are testing configurations.

I think there are some cases where having the DEBUG enabled by default helped us to spot some issues early. I don't remember why it wasn't accepted.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Nov 12, 2024

@tmedicci CONFIG_DEBUG_ASSERTIONS and any other debug shouldn't be included in the board config (in the past Greg rejected all defconfigs with it, now we know there are some because we don't have same quality control). In the production firmware debug assertion and/or debug symbols shouldn't be enabled.

So, I think the issue needs to be fixed without this option.

I don't believe that the mainline defconfig is in the production quality, since many defconfig just demo how a single functionality work. Why not enable CONFIG_DEBUG_ASSERTIONS in some toy defconfig?

@xiaoxiang781216
Copy link
Contributor

@tmedicci CONFIG_DEBUG_ASSERTIONS and any other debug shouldn't be included in the board config (in the past Greg rejected all defconfigs with it, now we know there are some because we don't have same quality control). In the production firmware debug assertion and/or debug symbols shouldn't be enabled.
So, I think the issue needs to be fixed without this option.

I didn't say it's enabled on defconfig, I said our internal CI enables it to test for situations where it could assert. In fact, there are some defconfigs on NuttX (upstream) that contain this config (it may have been submitted accidently): among other board's defconfigs there are a lot of them that enables it too. We need to fix that!

About the debug assertions, it doesn't matter. Ideally, the issue should be fixed for both situations, right? Nothing is expected to assert. If asserted, something may not be in the right place.

@tmedicci could you enable this option in some of the mainline defconfig?

Hi @xiaoxiang781216 , I didn't get your question. Do you mean about those configs I've sent? Our internal CI always "adds" them to the defconfigs to run the tests with debug assertions enabled (this helps quite a lot to get bugs during execution).

since github ci doesn't catch the compiler error you report, I would expect that we change some mainline defconfig to trigger this warning, so github ci can catch it in the future.

@tmedicci
Copy link
Contributor

@tmedicci CONFIG_DEBUG_ASSERTIONS and any other debug shouldn't be included in the board config (in the past Greg rejected all defconfigs with it, now we know there are some because we don't have same quality control). In the production firmware debug assertion and/or debug symbols shouldn't be enabled.
So, I think the issue needs to be fixed without this option.

I didn't say it's enabled on defconfig, I said our internal CI enables it to test for situations where it could assert. In fact, there are some defconfigs on NuttX (upstream) that contain this config (it may have been submitted accidently): among other board's defconfigs there are a lot of them that enables it too. We need to fix that!
About the debug assertions, it doesn't matter. Ideally, the issue should be fixed for both situations, right? Nothing is expected to assert. If asserted, something may not be in the right place.

@tmedicci could you enable this option in some of the mainline defconfig?

Hi @xiaoxiang781216 , I didn't get your question. Do you mean about those configs I've sent? Our internal CI always "adds" them to the defconfigs to run the tests with debug assertions enabled (this helps quite a lot to get bugs during execution).

since github ci doesn't catch the compiler error you report, I would expect that we change some mainline defconfig to trigger this warning, so github ci can catch it in the future.

Oh, I see. Build errors are not related to any config (I really don't know why CI didn't get the issue. Those defconfigs throw the error even without any modification. Build farm was able to get it).

The other error (on HW testing) was caught by enabling CONFIG_DEBUG_ASSERTIONS. I don't see any trouble letting them enabled for all defconfigs (we can spot errors easily). Anyway, we always enabled them (using kconfig-merge) on our internal CI for all defconfigs.

@xiaoxiang781216
Copy link
Contributor

@lupyuen could you look at why ci can't catch the compiler error before merging?

@lupyuen
Copy link
Member

lupyuen commented Nov 12, 2024

@xiaoxiang781216 That's because the CI Check for this Complex PR doesn't include xtensa-02, which contains esp32s3-devkit. Due to cost-cutting, we can only run xtensa-01 for Complex PRs. (And we can't run arm-01 either)

That's why we need our NuttX Build Farm and nuttx-dashboard.org to catch these errors after merging (without incurring extra cost to NuttX Project).

For Complex PRs that might break the build: I highly recommend running a Full Build in the contributor's repo:

If we really need to run xtensa-01 and xtensa-02 (and arm-01) for every PR: We could set up a VM with Self-Hosted GitHub Runners, as recommended by ASF. But this will be expensive, because we need professional IT security to maintain our VM, to ensure that we don't run unauthorised code pushed down from GitHub. Yeah so we're kinda stuck :-(

Update: We could also swap some targets between xtensa-01 and xtensa-02. Move the riskier, failure-prone targets from xtensa-02 into xtensa-01. And move the less risky targets from xtensa-01 out to xtensa-02.

@xiaoxiang781216
Copy link
Contributor

thanks for explanation, it's clear now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arch: arm Issues related to ARM (32-bit) architecture Area: OS Components OS Components issues Size: L The size of the change in this PR is large

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants