Skip to content

Fix creation of documentation from libnutclient_misc.txt#418

Closed
svarshavchik wants to merge 2 commits intonetworkupstools:libusb-1.0from
svarshavchik:fix-git-checkout-makefile
Closed

Fix creation of documentation from libnutclient_misc.txt#418
svarshavchik wants to merge 2 commits intonetworkupstools:libusb-1.0from
svarshavchik:fix-git-checkout-makefile

Conversation

@svarshavchik
Copy link
Copy Markdown
Contributor

A parallel build of a fresh git checkout of the libusb-1.0 branch fails.

A simple .txt.3 rule doesn't understand that libnutclient_misc.txt
creates a bunch of .3.

A parallel build, say 'make -j 10' results in make trying to figure out
how to build nutclient_authenticate.3 before building libnutclient_misc.3,
and without a corresponding .txt in sight it gives up.

Signed-off-by: Sam Varshavchik mrsam@courier-mta.com

A simple .txt.3 rule doesn't understand that libnutclient_misc.txt
creates a bunch of .3.

A parallel build, say 'make -j 10' results in make trying to figure out
how to build nutclient_authenticate.3 before building libnutclient_misc.3,
and without a corresponding .txt in sight it gives up.

Signed-off-by: Sam Varshavchik <mrsam@courier-mta.com>
Comment thread docs/man/Makefile.am Outdated
nutscan_get_serial_ports_list.3 \
nutscan_init.3

$(LIBNUTCLIENT_MISC_DEPS): libnutclient_misc.3:
Copy link
Copy Markdown
Member

@jimklimov jimklimov Apr 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the trailing colon : correct here (I don't remember seeing such syntax; if it exists - is it portable to non-GNU makes)?

Also, doesn't this recipe define that each of those six files (in LVAL) depend on this one (RVAL) and after this prerequisite is met (RVAL was generated), the rule to make the LVALs is to just touch (thus creating just empty files)? Or is this what the trailing colon fixes somehow? Possibly, the correct rule might be to $(MAKE) $@ those targets...

DISCLAIMER: I did not (and in the near future likely won't) look into the full context code of the Makefile, so quite likely I am missing something.

@jimklimov
Copy link
Copy Markdown
Member

I did run a number of builds ranging from -j4 to -j8, although not on sufficiently parallel-CPU machines (e.g. on Travis regularly since the beginning of the year), and did not hit this issue with "original" NUT codebase.

If your setup is capable of really building and suffering with such level of parallelism, did you verify that a clean checkout of this patched branch no longer shows the issue, and that the .3 files in question are indeed generated with non-empty content?

@svarshavchik
Copy link
Copy Markdown
Contributor Author

The extra colon was, indeed, a harmless typo. I removed it.

I can reproduce the build failure every time simply by:

git clone into a clean directory.

./autogen.sh

./configure --with-dev

make -j 10

The patch does not affect the actual generation of the man pages. They are still generated by the existing rules. The new rule makes them dependent on libnutclient_misc.3, which is the actual rule that generates all of them. Because they're all dependent on libnutclient_misc.3, these build rules will not run until libnutclient_misc.3 gets created together with the remaining six .3 files; and they do get created just fine:

$ cat nutclient_authenticate.3
.so man3/libnutclient_misc.3

The touch is intentional, and serves merely to update the timestamp on the other six files. This is to prevent another harmless race condition. It is not specified in which order a2x is going to create all seven files. It is possible that, say, nutclient_authenticate.3 gets written out first, then the main libnutclient_misc.3 man page, with the timing just right and the system clock ticking to the next second right in between the two.

The end result of nutclient_authenticate.3's timestamp older than libnutclient_misc.3. With each subsequent make (if the dummy build rule does nothing), make will think it's out of date and attempt to rebuild it. An explicit touch, to update the timestamp of the target, avoids this race condition; guaranteeing that make will always see the target's timestamp after its fake dependency.

This is just the simplest way to resolve the build failure, without writing more complicated make rules.

@jimklimov jimklimov requested review from clepple and zykh April 9, 2017 13:40
@jimklimov
Copy link
Copy Markdown
Member

Ok, thank you very much for the thorough explanation (the context I was missing). In this case, LGTM.

👍

@jimklimov jimklimov mentioned this pull request Apr 9, 2017
@clepple
Copy link
Copy Markdown
Member

clepple commented Oct 7, 2017

Slowly working through my backlog. I still haven't had a chance to try this myself, but I appreciate the explanation.

Given the mess of the libusb-1.0 branch history, we might want to rebase these onto master or something, but either way, I'd like to get this into the next release.

@clepple clepple added this to the 2.7.5 milestone Oct 7, 2017
@jimklimov
Copy link
Copy Markdown
Member

Oh great, I've reinvented this particular wheel, and it took me a month to remember there is one already... here's my shot : 42ity@62a1ee5 :)

@jimklimov
Copy link
Copy Markdown
Member

@clepple : I can now confirm that the original problem (flaky parallel builds) does exist, required sufficient parallelism to trigger it in practice though. A fix of this sort (tested in our fork with my commit above) resolves the problem.

@clepple
Copy link
Copy Markdown
Member

clepple commented Nov 21, 2017

@jimklimov I have no strong feelings on the technical implementation, but I want to @zykh's rebased USB branch for #300 merged first, and then we can rebase either this set of commits or yours onto the new master.

@clepple
Copy link
Copy Markdown
Member

clepple commented Nov 27, 2017

Rebased onto master; about to push to CI, and then if all looks good, I will merge master into libusb-1.0+0.1 and do some testing with that.

@clepple
Copy link
Copy Markdown
Member

clepple commented Nov 27, 2017

@svarshavchik I think I might be hitting a different parallel build issue. In three separate cases (master, this pull request @ a3a3d2d, and this pull request rebased onto master @ c065ef33), a make -j10 causes errors like so:

make[2]: Leaving directory '/Users/clepple/Projects/nut/github/nut/docs'
a2x: ERROR: "xmllint" --nonet --noout --valid "/Users/clepple/Projects/nut/github/nut/docs/user-manual.xml" returned non-zero exit status 1
make[1]: *** [Makefile:855: user-manual.html] Error 1
make[1]: *** Waiting for unfinished jobs....
a2x: ERROR: "xmllint" --nonet --noout --valid "/Users/clepple/Projects/nut/github/nut/docs/user-manual.xml" returned non-zero exit status 1
a2x: ERROR: "xmllint" --nonet --noout --valid "/Users/clepple/Projects/nut/github/nut/docs/user-manual.xml" returned non-zero exit status 1
make[1]: *** [Makefile:861: user-manual.pdf] Error 1

This is with a2x version 8.6.9, Python 2.7.13 on macOS 10.12.6. I might have to hack up the a2x script to see the intermediate files, but I vaguely recall seeing something like this once before: if you try to build several documents with a2x at once, there is a temporary file with a constant name that can get deleted while other parallel jobs are using it. (I tend to only use -jN when rebuilding large swaths of code rather than documentation, so these are dusty neurons.)

@jimklimov Are you seeing this error as well, or is it the same as what Sam originally described?

@svarshavchik
Copy link
Copy Markdown
Contributor Author

This pull request fixes a race condition in docs/man.

This looks like a race condition in the docs directory.

I don't get a failure on a plain 'make -j10'. But 'make -j10 pdf', which appears to be what you did, definitely seems to have a race condition.

Taking one of the commands at random, an a2x on FAQ.txt, running strace on it I see it doing:

18625 openat(AT_FDCWD, "custom.xsl", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3

18625 openat(AT_FDCWD, "doclist.xsl", O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>

18628 openat(AT_FDCWD, "listings.xml", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3

18629 openat(AT_FDCWD, "FAQ.rtex", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3

and a bunch more.

Running a different individual pdf command under strace shows, once again:

18647 openat(AT_FDCWD, "custom.xsl", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3

18647 openat(AT_FDCWD, "doclist.xsl", O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>

18650 openat(AT_FDCWD, "listings.xml", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3

Not good. Yes, running a bunch of these suckers at the same time will result in them stepping on each other toes.

Looking at the strace closely, this is the dblatex command, the parent process, for the first two, which then forks and execs xsltproc to create listing.xml

The dblatex command gets kicked off by a2x.

Perusing dblatex's man page: when building a single file, it wants a '-o' option to specify the destination where it scribbles the files.

Perusing a2x.py's source, it does not appear to have any means of supply the -o option, so, by default, it dumps stuff into the current directory.

Ok, this is going to be a different issue. I think that the way to solve this one is to mess around with the .txt.pdf rule that builds each pdf in its own separate subdirectory, then moves the end result back up to the parent directory.

@jimklimov
Copy link
Copy Markdown
Member

No, I did not hit that one - but our primary packaging target does not involve PDF either...

@clepple clepple closed this in 4dc254c Nov 29, 2017
clepple added a commit that referenced this pull request Dec 3, 2017
Closes: #300

Notable commits from master:

 * Documentation
 * statepath environment variables (#473 / #507)
 * nut-scanner (issue #500 / pull #502)
 * Fix one parallel build problem (#418)
 * Compute output load if not provided (#484)
jimklimov pushed a commit to jimklimov/nut that referenced this pull request Aug 7, 2018
…eam to DMF-aware codebase)

A simple .txt.3 rule doesn't understand that libnutclient_misc.txt
creates a bunch of .3.

A parallel build, say 'make -j 10' results in make trying to figure out
how to build nutclient_authenticate.3 before building libnutclient_misc.3,
and without a corresponding .txt in sight it gives up.

Signed-off-by: Sam Varshavchik <mrsam@courier-mta.com>
Rebased onto master by Charles Lepple clepple+nut@gmail.com
Closes: networkupstools#418
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants