-
Notifications
You must be signed in to change notification settings - Fork 1.5k
boards: spresense: Fix link errors in parallel build. #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
boards: spresense: Fix link errors in parallel build. #102
Conversation
In parallel build, current build system updates some '.a' files simultaneously and thus '.a' files are corrupted. To avoid this errors, ar command is locked by flock command. Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
|
@jerpelea I added flock command for parallel build. Do you have any concerns? As far as I confirmed (please see the summary), it works well. |
|
@MasayukiIshikawa can it be replicated on other devices? I did not see this issue on my machine |
@jerpelea Though it's possible to apply this changes to other devices, how can I do build tests? Also, do we assume to use Ubuntu platform? I think we should wait for a new CI system. |
|
@MasayukiIshikawa i am building with ubuntu on a multicore mashine How did you spot the issue ? |
@jerpelea I can reproduce link errors for spresense:wifi configuration with -j4 build with my Ubuntu machine. Actually it totally depends on host processor performance (CPU/Memory/HDD or SSD). For example if I specify -j5 it has no problem. So you need to find a number which causes parallel build issues. |
|
strange since I can't reproduce |
|
Is flock available on all platforms? Linux, Cygwin, MSYS, macOS, FreeBSD? I would think so. Windows native would be an issue. |
|
We cannot merge this PR while there is ongoing discussion. We need for you to advice us when the discussion reaches a conclusion. You make close the PR yourself if you don't want to merge, or lets us know if you do want to merge. Alin... since you are committer, you could also just "Rebase and merge" the change to master if you are happy with it. |
|
I am trying to reproduce the issue before merging it and for now i tried all combinations from j1 to j56 without success. |
|
Hi all, |
|
@xiaoxiang781216 can you provide a way to replicate it ? |
|
No, we hit the problem randomly, why we need provide a 100% repo step for a race condition issue? The problem is there: t is a bad thing that multiple thread write to a same library(libapps.a). |
@xiaoxiang781216 Thanks and your comments are right. |
@patacongo as long as I checked other platforms, Linux/Cygwin/FreeBSD provide flock command. macOS does not provide the command but we can find a solution. However I was not able to find a solution for MSYS|MinGW. http://www.polarhome.com/service/man/?qf=FLOCK&af=0&tf=2&of=Cygwin |
|
flock() is available on both Cygwin and MSYS2. I checked Cygwin earlier. I am on MSYS2 now an I see: $ which flock |
|
It appears that macOS does support flock(): https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/flock.2.html But that is odd because flock() is part of util-linux. I also see at https://stackoverflow.com/questions/10526651/mac-os-x-equivalent-of-linux-flock1-command: "There is a cross-platform flock command here: https://github.com/discoteq/flock" Perhaps that is the flock() used on maOS? |
|
If we build the cross-platform version, we would be good everywhere and the behavior would always be the same, even on Window native. I think we should consider that. |
|
@patacongo since: |
|
I think we should keep windows native support. It hasn't been used in sometime, but there have been users in the past and there will be users in the future. It is an important component for some environments and some SDKs. It is the normal history of the Windows native build to languish for a year or two, then some one needs it and brings it up to date. That cycle has repeated many times. I would vote to keep the hooks in place and bring it up to date when next needed. Notice that I gave you a reference to a platform independent version of flock() that can be used with Windows native. I think that abandoning a platform is a very serious thing and should be considered very carefully. There should be no quick decisions to do that. It is a violation of one of the basic principles of the Inviolables and, I think we would need a full discussion and full vote before we did anything like that. |
|
A simplifying thought occurs to me. Currently the Windows native build requires a few Unix like tools that it historically gets from GnuWin32. But I think we could use the MSYS2 tools instead. So it seems to me that we could reduce the native build to a special case MSYS2 build. The special case is that it does not use the Bash shell. Rather, it uses a CMD.com shell of some kind. This is necessary because the Windows native build environment (probably Visual Studio) executes in the context of CMD.com. I think if the few .bat files needed in the build were replaced with .c executables for Windows, then you should be to build essentially shell-less (although some CMD.com executables would still be required). That is why, for example, there is a configure.c which is a work-alike for configure.sh (but much faster). There is also a configure.sh and a configure.bat We should get rid of both of those really and unify to configure.c which works in a POSIX environment as well as a Windows native environment. There are five other .bat file in nuttx/tools and they are trivial. Let me experiment a little with the native Windows build. I have not tried it in a long time. Let me see where things are at. Let's not make any hasty decisions. |
|
Let's raise this issues in a discussion thread. Let's discuss for awhile and see if the group would like to remove Windows native support of not. I think there are reasons for keeping it as well as reasons for removing it. Let's get consensus and then we should probably have a vote. |
|
I am modifying all config related stuff to remove the hardcode arch and board list, and find that configure.c call opendir/readdir/closedir, I think that Windows just support FindFirstFile/FindNextFile API? |
Sure, this type of change definitely need to make a consensus in community. |
|
@xiaoxiang781216 thanks for the clarification I think that instead of avoiding the race by adding a dependency we should fix @patacongo downstream projects use Windows native and we should not remove support to work around bugs |
|
Hi, |
|
I think that we should fix the issue with apps instead of avoiding it on each platform |
|
This has been open for a long time. Most people seem to be opposed to the change. @Ouss4 asked if we should close this PR without merging several days ago. No one responded. I am closing it now. If anyone has strong feelings, you should re-open it. Let's make the default state closed so that we do not have to see this every day as an open PR. |
Summary
Impact
Limitations / TODO
flock command needs to be installed but on Ubuntu platform the command is installed by default.
There is no limitations for the current Spresense configurations. However, if you add apps/examples/posix_spawn, parallel build would fail. I think this is a Makefile problem under the directory.
Testing