Fix IPFilter IPv6 detection, especially on NetBSD#596
Fix IPFilter IPv6 detection, especially on NetBSD#596sborrill wants to merge 2 commits intosquid-cache:masterfrom
Conversation
- include correct headers for autoconf to work - use correct #define value as output by autoconf
…erscores so if you have a line break and an indent, instead of HAVE_STRUCT_NATLOOKUP_NL_REALIPADDR_IN6 you get HAVE_STRUCT_NATLOOKUP_NL_REALIPADDR_IN6___
|
Can one of the admins verify this patch? |
|
OK to test |
|
This is a workaround for a specific situation Solaris. It was extensively tested with Solaris 10, 11, and 12 containing IPFilter with the Oracle modifications, and with IPFilter v4 and v5 official code on both BSD and Linux What is "broken" ? |
|
@sborrill, thank you for working on fixing these problems! Please interpret the question quoted below as an indication that the PR needs a more detailed description rather than an indication that your contribution is unwelcomed! I have updated the description now, but please review and adjust it to match your understanding of the problem. It will probably need further adjustments to detail the affected environments. The PR title/description will become the official commit message when the PR is auto-merged by Anubis.
@yadij, according to the PR commits:
FWIW, I have not checked all of the above claims except for a partial check of the last two: I indeed see /* Define to 1 if `nl_realipaddr.in6' is a member of `struct natlookup '. */
/* #undef HAVE_STRUCT_NATLOOKUP_NL_REALIPADDR_IN6___ */I suspect that some of the above claims should be interpreted in an "environment X" context. In other words, they may be irrelevant or even false in some contexts. |
|
We have another bug similar to item 4 in the PR description: configure.ac has AC_CHECK_MEMBERS( /* Define to 1 if `tm_gmtoff' is a member of `struct tm'. */
#define HAVE_STRUCT_TM_TM_GMTOFF 1There is also AC_CHECK_MEMBER( The above two problems are old (going back to 6a9f638 and b121884) and should probably be fixed in a dedicated PR. @yadij, would you be willing to volunteer a dedicated PR fixing them? Finally, there is also a |
|
@yadij, there is more information about NetBSD-specific failures at #177 (comment) which triggered this pull request. |
|
On 10/04/2020 20:24, Alex Rousskov wrote:
@sborrill <https://github.com/sborrill>, thank you for working on fixing
these problems! Please interpret the question quoted below as an
indication that the PR needs a more detailed description rather than an
indication that your contribution is unwelcomed! I have updated the
description
<#596 (comment)> now, but
please review and adjust it to match your understanding of the problem.
It will probably need further adjustments to detail the affected
environments.
The PR title/description will become the official commit message when
the PR is auto-merged by Anubis
<https://github.com/measurement-factory/anubis/blob/master/README.md#commit-message>.
What is "broken" ?
Broken is errors such as "ERROR: NAT/TPROXY lookup failed to locate
original IPs on local=127.0.0.1:3200 remote=192.168.1.102:50256 FD 22
flags=33". This is because none of the IPFilter 5 IPv6 changes have been
compiled in as the autoconf tests did not work for the reasons below.
@yadij <https://github.com/yadij>, according to the PR commits:
1. acinclude/os-deps.m4 was missing sys/param.h header
Should be non-contentious, omitted by accident I guess. OS-specific, but
requirement set by the author of IPFilter himself.
sys/param.h sets __NetBSD_Version__ which is needed to set USE_INET6 in
netinet/ip_compat.h (part of IPFilter):
https://nxr.netbsd.org/xref/src/sys/external/bsd/ipf/netinet/ip_compat.h#43
For FreeBSD there is the same requirement for sys/param.h (but on
__FreeBSD_Version__) unless USE_INET6 is set elsewhere.
Referencing @yadij: "It was extensively tested [...] with IPFilter v4
and v5 official code on both BSD and Linux". Darren Reed is a NetBSD
developer, he has been not a FreeBSD committer since 2009. OpenBSD and
DragonFly haven't included IPFilter since 2001 and 2011 respectively. So
as far as there official BSD IPFilter code, NetBSD is probably it.
2. acinclude/os-deps.m4 was incorrectly assuming that netinet/ip_nat.h
header is always present
3. acinclude/os-deps.m4 was missing netinet/ip_nat.h header where it
was used instead of ip_nat.h
2 & 3 should be non-contentious, the presence is checked earlier, but
the include is incorrectly non-conditional.
4. src/ip/Intercept.cc assumed that, AC_CHECK_MEMBERS(|struct X|)
outputs |HAVE_X| but it actually outputs |HAVE_STRUCT_X|.
5. acinclude/os-deps.m4: AC_CHECK_MEMBERS(|X |) converted trailing
whitespace into surprising trailing underscores: |HAVE_X__|
FWIW, I have not checked all of the above claims except for a partial
check of the last two: I indeed see
|HAVE_STRUCT_NATLOOKUP_NL_REALIPADDR_IN6___| on my Linux box:
/* Define to 1 if `nl_realipaddr.in6' is a member of `struct natlookup '. */
/* #undef HAVE_STRUCT_NATLOOKUP_NL_REALIPADDR_IN6___ */
I think this is a bug in the original patch that was just never seen.
The NAT IP address lookup appears to work even with with this not
defined (and also with it defined).
I suspect that some of the above claims should be interpreted in an
"environment X" context. In other words, they may be irrelevant or even
false in some other contexts.
1, 2 and 3 are using autoconf to correctly include files only if present.
4 and 5 appear to be generally reproducible.
…--
Dr. Stephen Borrill, Director and Solutions Architect
Precedence Technologies Ltd T: +44 (0) 8456 446 800
Technology House, 36a Union Lane F: +44 (0) 8456 446 899
Cambridge, CB4 1QB, United Kingdom W: http://www.precedence.co.uk/
Limited company registered in England and Wales. Company number 3725626
|
|
The issue I have with this is that the configure.ac checks were designed to pass in the Solaris build environment. The changes look okay for Linux - but we need confirmation that Solaris builds are not re-broken before this can be merged. |
While it would be nice to have a confirmation for Solaris, such a confirmation should not become an indefinite precondition for merging (IMO): Solaris does not have a priority over NetBSD -- both are officially untested platforms. If we do not know of any specific problems on Solaris, we should merge a NetBSD fix that is known to work. I now explicitly asked for Solaris feedback using bug 4828 report (that led to this problem). FWIW, 20 days ago, I also posted a link to this PR there. If no specific problems resurface for a week, I suggest removing "confirmation that Solaris builds are not re-broken" precondition from this PR. |
|
Since it is from Yuri I do. Marking as cleared. |
Has been broken since 4eaf432. 1. acinclude/os-deps.m4 was missing sys/param.h header 2. acinclude/os-deps.m4 was incorrectly assuming that netinet/ip_nat.h header is always present 3. acinclude/os-deps.m4 was missing netinet/ip_nat.h header where it was used instead of ip_nat.h 4. src/ip/Intercept.cc assumed that AC_CHECK_MEMBERS(`struct X`) outputs `HAVE_X`, but it actually outputs `HAVE_STRUCT_X`. 5. acinclude/os-deps.m4: AC_CHECK_MEMBERS(`X `) converted trailing whitespace into surprising trailing underscores: `HAVE_X__`
|
@kinkie, please check Jenkins build errors for the staged commit of this PR (5da6c77). The second one feels especially unrelated to this PR.
In general, do we need to change how/when Jenkins tests are vetted, added, and/or tested to decrease Jenkins instability? It feels like the situation is not getting better for quite a while. |
|
On Sat, May 2, 2020 at 6:06 PM Alex Rousskov ***@***.***> wrote:
@kinkie <https://github.com/kinkie>, please check Jenkins build errors
for the staged commit of this PR (5da6c77
<5da6c77>).
The second one feels especially unrelated to this PR.
1. 4.4.7/tr1/exp_integral.tcc:353:17: error: call to function
__expint_E1 error
<https://build.squid-cache.org/job/5-pr-auto/COMPILER=clang,OS=centos-6,label=docker-build-host/648/console>
2. ./bootstrap.sh: line 62: which: command not found error
<https://build.squid-cache.org/job/5-pr-auto/COMPILER=clang,OS=fedora-30,label=docker-build-host/648/console>
In general, do we need to change how/when Jenkins tests are vetted, added,
and/or tested to decrease Jenkins instability? It feels like the situation
is not getting better for quite a while.
It actually has; there was a moment when it was especially bad.
AFAICT error #2 has no impact on the build succeeding or failing. It causes
bootstrap.sh to fail to detect some tool, and to fall back to
# path for $tool not found. Not defining, and hoping for the best
which works because the tool is indeed present in $PATH
A source of unreliability we've had is around git checkouts; we were asking
it to do a shallow copy and than go to a specific revision identified by
hash; but due to the build backlog we had too many commits past the hash we
were asking for and therefore the git checkout was failing. I've increased
the depth of the clone op from 1 to 10 or 20.
Would it make sense to use instead a reference repository to update
periodically to obtain the same result with more reliability?
I am progressively tightening the screws on reliability and consistency of
the CI framework.
…--
Francesco
|
I am not sure what you mean by "a reference repository" in this context, and I do not know what exactly you are trying to optimize, but "update periodically" would be a step in the wrong direction IMHO. I see two reasonable approaches:
Each build should checkout the necessary commit from an up-to-date (as of build start time) repository. The entire Squid repository on disk is only 150MB in size. Master sources are 25MB. Nothing is worth optimizing here at the expense of reliability! Unreliability should only come from things we cannot control like GitHub and our hosting providers. If the current hosting limitations force our CI to be unreliable, we should migrate away and/or change what we test. |
Agreed. Unfortunately, Jenkins "console" output is "optimized" to hide the true error. Is there some URL where I can see the output from the failed test cases? |
It is https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---reference-if-ableltrepositorygt . The approach you mention is the one we take, except that we also do https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---depthltdepthgt and this has caused some problems due to queueing of commits to be tested.
You are right, except that we download that for every OS/compiler variant, plus once to check changes, per build. This means for intance that a trunk-matrix build ends up being over 5GB in downloads for the cloning alone. The shallow clone was introduced at first for the pr-test builds, and has since been added to other tiers. On top of this, we have the occasional docker image downloads from the docker hub, a full download is ~18GB. But yes, I agree with you, don't get me wrong. I'm looking for a way to improve things and I'd like to tap the shared mind |
AFAICT
LOL! The approach I mentioned is not in the spelling of a command name. It is in using a reliable command. When one wants to test a particular commit,
That does not scale, of course. The repository update has to be done per tested commit, not per test. If |
That is not a precise summary. Tests that cannot easily share a directory where the repository can be updated without the risk of conflicts (e.g., Semaphore CI and Jenkins) can do their own cloning, of course. |
The error has emerged, but it's not very helpful unfortunately: |
I am often confused by Jenkins, but I think you are pointing to a PR 623 branch error, not an error for the staged commit of this PR (i.e. not the auto-branch error for 5da6c77). As I suspected earlier, other PRs are probably suffering from the same problem. We can probably add PR 623 to that growing list of the affected unrelated PRs. @kinkie, do you think it is time to remove those unstable test nodes from regular PR and auto-branch tests? Their existence currently hurts more than it helps. They can still test, say, master and v4 branches so that you have use cases to verify their stability as you work on improving it. Testing official branches will not hurt PRs even if those tests fail. |
For PR tests, we currently test only:
These are the current stable (or in Fedora's case latest-until-3-days-ago) versions of the major distros we test. I'm afraid that the PR test failures we see are quite real |
Real or not, they are unrelated to the PRs they block. I see two primary possibilities:
@kinkie, what do you think we are dealing with? Would you be able to reproduce and triage these failures in an environment that can expose their true nature? |
|
@kinkie <https://github.com/kinkie>, what do you think we are dealing
with? Would you be able to reproduce and triage these failures in an
environment that can expose their true nature?
yes, I'll check it out
…--
Francesco
|
I believe that the problem is in the platform: testHttpRequest.cc:52 is: I've debugged it, at that point aRequest is not null in that platform/compiler choice, so there's some issue in how the compiler handles CPPUNIT_ASSERT I think the best way forward is to retire support for fedora-30/clang, also because version 32 was recently released |
Sounds good to me. We can always go back to testing on fedora-30/clang if we figure out how to fix the problem with that environment, of course. The same principle should apply to any other individual broken testing environment. |
|
.. and the problem is also present in Fedora 31. Same solution, I guess.
…On Mon, May 4, 2020 at 6:39 PM Alex Rousskov ***@***.***> wrote:
I think the best way forward is to retire support for fedora-30/clang,
also because version 32 was recently released
Sounds good to me. We can always go back to testing on fedora-30/clang if
we figure out how to fix the problem with that environment, of course.
The same principle should apply to any other individual broken testing
environment.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#596 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHPVDHKCXEYNHCENUBKY7TRP34UDANCNFSM4MEYMESA>
.
--
Francesco
|
|
Fixed and re-kicked all failed builds. Also adjusted Anubis configuration
…On Mon, May 4, 2020 at 7:23 PM Francesco Chemolli ***@***.***> wrote:
.. and the problem is also present in Fedora 31. Same solution, I guess.
On Mon, May 4, 2020 at 6:39 PM Alex Rousskov ***@***.***>
wrote:
> I think the best way forward is to retire support for fedora-30/clang,
> also because version 32 was recently released
>
> Sounds good to me. We can always go back to testing on fedora-30/clang if
> we figure out how to fix the problem with that environment, of course.
>
> The same principle should apply to any other individual broken testing
> environment.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#596 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABHPVDHKCXEYNHCENUBKY7TRP34UDANCNFSM4MEYMESA>
> .
>
--
Francesco
--
Francesco
|
Has been broken since 4eaf432. 1. acinclude/os-deps.m4 was missing sys/param.h header 2. acinclude/os-deps.m4 was incorrectly assuming that netinet/ip_nat.h header is always present 3. acinclude/os-deps.m4 was missing netinet/ip_nat.h header where it was used instead of ip_nat.h 4. src/ip/Intercept.cc assumed that AC_CHECK_MEMBERS(`struct X`) outputs `HAVE_X`, but it actually outputs `HAVE_STRUCT_X`. 5. acinclude/os-deps.m4: AC_CHECK_MEMBERS(`X `) converted trailing whitespace into surprising trailing underscores: `HAVE_X__`
|
Do clang throw any warnings with the -fsanitize=undefined option enabled?
|
libtool: link: /usr/bin/ccache clang++ -Werror -Qunused-arguments -D_REENTRANT -I/usr/include/libxml2 -I/usr/include/p11-kit-1 -I/usr/include/p11-kit-1 -fsanitize=undefined -march=native -march=native -g -o VirtualDeleteOperator VirtualDeleteOperator.o stub_libmem.o test_tools.o stub_cbdata.o stub_debug.o stub_MemBuf.o stub_SBuf.o stub_tools.o stub_fatal.o ../src/globals.o ../src/time.o ../src/base/.libs/libbase.a ../lib/.libs/libmiscutil.a ../compat/.libs/libcompatsquid.a -lm -lresolv -lcap -ldl |
Has been broken since 4eaf432. 1. acinclude/os-deps.m4 was missing sys/param.h header 2. acinclude/os-deps.m4 was incorrectly assuming that netinet/ip_nat.h header is always present 3. acinclude/os-deps.m4 was missing netinet/ip_nat.h header where it was used instead of ip_nat.h 4. src/ip/Intercept.cc assumed that AC_CHECK_MEMBERS(`struct X`) outputs `HAVE_X`, but it actually outputs `HAVE_STRUCT_X`. 5. acinclude/os-deps.m4: AC_CHECK_MEMBERS(`X `) converted trailing whitespace into surprising trailing underscores: `HAVE_X__`
Has been broken since 4eaf432. 1. acinclude/os-deps.m4 was missing sys/param.h header 2. acinclude/os-deps.m4 was incorrectly assuming that netinet/ip_nat.h header is always present 3. acinclude/os-deps.m4 was missing netinet/ip_nat.h header where it was used instead of ip_nat.h 4. src/ip/Intercept.cc assumed that AC_CHECK_MEMBERS(`struct X`) outputs `HAVE_X`, but it actually outputs `HAVE_STRUCT_X`. 5. acinclude/os-deps.m4: AC_CHECK_MEMBERS(`X `) converted trailing whitespace into surprising trailing underscores: `HAVE_X__`
Has been broken since 4eaf432. 1. acinclude/os-deps.m4 was missing sys/param.h header 2. acinclude/os-deps.m4 was incorrectly assuming that netinet/ip_nat.h header is always present 3. acinclude/os-deps.m4 was missing netinet/ip_nat.h header where it was used instead of ip_nat.h 4. src/ip/Intercept.cc assumed that AC_CHECK_MEMBERS(`struct X`) outputs `HAVE_X`, but it actually outputs `HAVE_STRUCT_X`. 5. acinclude/os-deps.m4: AC_CHECK_MEMBERS(`X `) converted trailing whitespace into surprising trailing underscores: `HAVE_X__`
Has been broken since 4eaf432. 1. acinclude/os-deps.m4 was missing sys/param.h header 2. acinclude/os-deps.m4 was incorrectly assuming that netinet/ip_nat.h header is always present 3. acinclude/os-deps.m4 was missing netinet/ip_nat.h header where it was used instead of ip_nat.h 4. src/ip/Intercept.cc assumed that AC_CHECK_MEMBERS(`struct X`) outputs `HAVE_X`, but it actually outputs `HAVE_STRUCT_X`. 5. acinclude/os-deps.m4: AC_CHECK_MEMBERS(`X `) converted trailing whitespace into surprising trailing underscores: `HAVE_X__`
Has been broken since 4eaf432. 1. acinclude/os-deps.m4 was missing sys/param.h header 2. acinclude/os-deps.m4 was incorrectly assuming that netinet/ip_nat.h header is always present 3. acinclude/os-deps.m4 was missing netinet/ip_nat.h header where it was used instead of ip_nat.h 4. src/ip/Intercept.cc assumed that AC_CHECK_MEMBERS(`struct X`) outputs `HAVE_X`, but it actually outputs `HAVE_STRUCT_X`. 5. acinclude/os-deps.m4: AC_CHECK_MEMBERS(`X `) converted trailing whitespace into surprising trailing underscores: `HAVE_X__`
Has been broken since 4eaf432.
acinclude/os-deps.m4 was missing sys/param.h header
acinclude/os-deps.m4 was incorrectly assuming that netinet/ip_nat.h
header is always present
acinclude/os-deps.m4 was missing netinet/ip_nat.h header where it was
used instead of ip_nat.h
src/ip/Intercept.cc assumed that AC_CHECK_MEMBERS(
struct X)outputs
HAVE_X, but it actually outputsHAVE_STRUCT_X.acinclude/os-deps.m4: AC_CHECK_MEMBERS(
X) converted trailingwhitespace into surprising trailing underscores:
HAVE_X__