Skip to content

Conversation

@masayuki2009
Copy link
Contributor

Summary

  • In pararell build, current build system updates some '.a' files simultaniously and thus '.a' files are corrupted. To aoid this erros, ar command is locked by flock command.

Impact

  • Spresense build is only affected.

Limitations / TODO

  • flock command needs to be installed but on Ubuntu platform the command is installed by default.

  • There is no limitations for the current Spresense configurations. However, if you add apps/examples/posix_spawn, parallel build would fail. I think this is a Makefile problem under the directory.

Testing

  • I tested all the current configurations for Spresense

In parallel build, current build system updates some '.a' files
simultaneously and thus '.a' files are corrupted. To avoid this
errors, ar command is locked by flock command.

Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
@masayuki2009
Copy link
Contributor Author

@jerpelea I added flock command for parallel build. Do you have any concerns? As far as I confirmed (please see the summary), it works well.

@jerpelea
Copy link
Contributor

@MasayukiIshikawa can it be replicated on other devices?

I did not see this issue on my machine

@masayuki2009
Copy link
Contributor Author

@MasayukiIshikawa can it be replicated on other devices?

I did not see this issue on my machine

@jerpelea Though it's possible to apply this changes to other devices, how can I do build tests? Also, do we assume to use Ubuntu platform? I think we should wait for a new CI system.

@jerpelea
Copy link
Contributor

@MasayukiIshikawa i am building with ubuntu on a multicore mashine

How did you spot the issue ?
Are you using mac? windows? linux?

@masayuki2009
Copy link
Contributor Author

@MasayukiIshikawa i am building with ubuntu on a multicore mashine

How did you spot the issue ?
Are you using mac? windows? linux?

@jerpelea I can reproduce link errors for spresense:wifi configuration with -j4 build with my Ubuntu machine. Actually it totally depends on host processor performance (CPU/Memory/HDD or SSD). For example if I specify -j5 it has no problem. So you need to find a number which causes parallel build issues.

@jerpelea
Copy link
Contributor

strange since I can't reproduce
I am building j40 and j56

@patacongo
Copy link
Contributor

Is flock available on all platforms? Linux, Cygwin, MSYS, macOS, FreeBSD? I would think so. Windows native would be an issue.

@patacongo
Copy link
Contributor

We cannot merge this PR while there is ongoing discussion. We need for you to advice us when the discussion reaches a conclusion. You make close the PR yourself if you don't want to merge, or lets us know if you do want to merge.

Alin... since you are committer, you could also just "Rebase and merge" the change to master if you are happy with it.

@jerpelea
Copy link
Contributor

jerpelea commented Jan 15, 2020

I am trying to reproduce the issue before merging it and for now i tried all combinations from j1 to j56 without success.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Jan 15, 2020

Hi all,
This is a common issue, please find a global solution to fix it(at least the file lock should move to common place), otherwise it is hard to speed up the automation build without the reliable parallel build.

@jerpelea
Copy link
Contributor

@xiaoxiang781216 can you provide a way to replicate it ?

@xiaoxiang781216
Copy link
Contributor

No, we hit the problem randomly, why we need provide a 100% repo step for a race condition issue? The problem is there: t is a bad thing that multiple thread write to a same library(libapps.a).
We either:
1.Add file lock or
2.Change how libapps.a get generated
Either method is fine, but the fix need put to the common place.

@masayuki2009
Copy link
Contributor Author

No, we hit the problem randomly, why we need provide a 100% repo step for a race condition issue? The problem is there: t is a bad thing that multiple thread write to a same library(libapps.a).
We either:
1.Add file lock or
2.Change how libapps.a get generated
Either method is fine, but the fix need put to the common place.

@xiaoxiang781216 Thanks and your comments are right.

@masayuki2009
Copy link
Contributor Author

Is flock available on all platforms? Linux, Cygwin, MSYS, macOS, FreeBSD? I would think so. Windows native would be an issue.

@patacongo as long as I checked other platforms, Linux/Cygwin/FreeBSD provide flock command. macOS does not provide the command but we can find a solution. However I was not able to find a solution for MSYS|MinGW.

http://www.polarhome.com/service/man/?qf=FLOCK&af=0&tf=2&of=Cygwin
https://config9.com/linux/bash/mac-os-x-equivalent-of-linux-flock1-command/
https://www.freebsd.org/cgi/man.cgi?query=flock&amp;sektion=2

@patacongo
Copy link
Contributor

flock() is available on both Cygwin and MSYS2. I checked Cygwin earlier. I am on MSYS2 now an I see:

$ which flock
/usr/bin/flock
$ flock --version
flock from util-linux 2.33.1

@patacongo
Copy link
Contributor

It appears that macOS does support flock(): https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/flock.2.html

But that is odd because flock() is part of util-linux. I also see at https://stackoverflow.com/questions/10526651/mac-os-x-equivalent-of-linux-flock1-command: "There is a cross-platform flock command here: https://github.com/discoteq/flock"

Perhaps that is the flock() used on maOS?

@patacongo
Copy link
Contributor

If we build the cross-platform version, we would be good everywhere and the behavior would always be the same, even on Window native. I think we should consider that.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Jan 15, 2020

@patacongo since:
1.There are many methods to build NuttX on Windows: Cygwin, MSYS2, Unbuntu on Windows.
2.Makefile.win for Windows native doesn't update for a long time, I suppose it is broken now.
Do you consider that it's fine to remove Window native to save the maintain cost? I am asking this question because it will give the user VERY BAD impression if we document that NuttX support Windows native but fail to build in reality.

@patacongo
Copy link
Contributor

I think we should keep windows native support. It hasn't been used in sometime, but there have been users in the past and there will be users in the future. It is an important component for some environments and some SDKs.

It is the normal history of the Windows native build to languish for a year or two, then some one needs it and brings it up to date. That cycle has repeated many times.

I would vote to keep the hooks in place and bring it up to date when next needed. Notice that I gave you a reference to a platform independent version of flock() that can be used with Windows native.

I think that abandoning a platform is a very serious thing and should be considered very carefully. There should be no quick decisions to do that. It is a violation of one of the basic principles of the Inviolables and, I think we would need a full discussion and full vote before we did anything like that.

@patacongo
Copy link
Contributor

A simplifying thought occurs to me. Currently the Windows native build requires a few Unix like tools that it historically gets from GnuWin32. But I think we could use the MSYS2 tools instead. So it seems to me that we could reduce the native build to a special case MSYS2 build.

The special case is that it does not use the Bash shell. Rather, it uses a CMD.com shell of some kind. This is necessary because the Windows native build environment (probably Visual Studio) executes in the context of CMD.com.

I think if the few .bat files needed in the build were replaced with .c executables for Windows, then you should be to build essentially shell-less (although some CMD.com executables would still be required).

That is why, for example, there is a configure.c which is a work-alike for configure.sh (but much faster). There is also a configure.sh and a configure.bat We should get rid of both of those really and unify to configure.c which works in a POSIX environment as well as a Windows native environment.

There are five other .bat file in nuttx/tools and they are trivial.

Let me experiment a little with the native Windows build. I have not tried it in a long time. Let me see where things are at. Let's not make any hasty decisions.

@patacongo
Copy link
Contributor

Let's raise this issues in a discussion thread. Let's discuss for awhile and see if the group would like to remove Windows native support of not. I think there are reasons for keeping it as well as reasons for removing it. Let's get consensus and then we should probably have a vote.

@xiaoxiang781216
Copy link
Contributor

I am modifying all config related stuff to remove the hardcode arch and board list, and find that configure.c call opendir/readdir/closedir, I think that Windows just support FindFirstFile/FindNextFile API?
Since the build need kconfig-backend and gmake, both tools need a Unix like environment? I can't understand Windows native has any addtional value if the essential tools need Cygwin or MSYS to build anyway.

@xiaoxiang781216
Copy link
Contributor

Let's raise this issues in a discussion thread. Let's discuss for awhile and see if the group would like to remove Windows native support of not. I think there are reasons for keeping it as well as reasons for removing it. Let's get consensus and then we should probably have a vote.

Sure, this type of change definitely need to make a consensus in community.

@jerpelea
Copy link
Contributor

@xiaoxiang781216 thanks for the clarification

I think that instead of avoiding the race by adding a dependency we should fix
how the apps are generated. This approach will fix it for all platforms and ensure less friction.

@patacongo downstream projects use Windows native and we should not remove support to work around bugs

@Ouss4
Copy link
Member

Ouss4 commented Jan 18, 2020

Hi,
Please advice, should this PR be closed without getting merged?

@jerpelea
Copy link
Contributor

I think that we should fix the issue with apps instead of avoiding it on each platform

@patacongo
Copy link
Contributor

This has been open for a long time. Most people seem to be opposed to the change. @Ouss4 asked if we should close this PR without merging several days ago. No one responded.

I am closing it now. If anyone has strong feelings, you should re-open it. Let's make the default state closed so that we do not have to see this every day as an open PR.

@patacongo patacongo closed this Jan 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants