fix Issue 21114 - core.exception.AssertError@std/socket.d(1004): Asse… by WalterBright · Pull Request #7579 · dlang/phobos

WalterBright · 2020-08-05T00:20:13Z

…rtion failure

Using asserts to detect environmental errors is just wrong.

dlang-bot · 2020-08-05T00:20:16Z

Thanks for your pull request, @WalterBright!

Bugzilla references

Auto-close	Bugzilla	Severity	Description
✓	21114	normal	core.exception.AssertError@std/socket.d(1004): Assertion failure

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#7579"

…rtion failure

Geod24 · 2020-08-05T00:35:24Z

Using asserts to detect environmental errors is just wrong.

unittest that do IO are just wrong

CyberShadow · 2020-08-05T06:24:06Z

This is already in a softUnittest block.

CyberShadow · 2020-08-05T06:25:09Z

What does this PR fix? The error is informational and does not cause the test suite to fail.

CyberShadow

Not sure what this is supposed to fix, but if the problem is harmless but "scary" messages in the CI logs, I think a better solution would be to disable softUnittest blocks when running in CI, but leave them on when users run tests individually.

WalterBright · 2020-08-05T08:12:23Z

What does this PR fix? The error is informational and does not cause the test suite to fail.

Because when I get a thousand line log file that displays in a 20 line window with the last line that says "failed somewhere in the last thousand lines", and I scroll back in that miserably tiny window, I run into these seg fault traces with "don't mind me, I'm not important" and I wonder if they are the actual failure or not. I'm really tired of useless, misleading log files.

Assert failures for PROGRAMMING BUGS and NOT ENVIRONMENTAL ISSUES.

You can change it another way if you prefer, but the only reason a benign stack trace should ever appear in a log file is if the test is testing the stack trace. A stack trace is not informational at all for environmental errors, and has no place in the log file.

CyberShadow · 2020-08-05T08:18:48Z

Assert failures for PROGRAMMING BUGS and NOT ENVIRONMENTAL ISSUES.

The assert is correctly testing the module's code.. it's just that, unfortunately, there is no easy way to do this test without relying on the environment (in this case, access to the public DNS being set up and working). In principle, it is no different than std.file using real files on disk to test its code, except that generally we can expect the host OS's filesystem to work as expected.

You can change it another way if you prefer, but the only reason a benign stack trace should ever appear in a log file is if the test is testing the stack trace. A stack trace is not informational at all for environmental errors, and has no place in the log file.

I don't think there is a general solution to this problem. There will always be a class of tests where we can't reliably programmatically know if the test is failing because there is a problem with the environment that the test is running on, or if someone introduced a regression.

Instead of getting angry and typing in all caps, why not put that energy into adjusting this change as I suggested?

An alternative would be to hide the stack trace (i.e. write .message instead of .toString()), if that's the issue that you find is in need of improvement.

WalterBright · 2020-08-05T08:21:09Z

I tried to link to that hated log file, but now that environmental "informational" error didn't happen. Please, all random heisenbugs need to be removed from the test suite. There are so many tests being run now that have heisenbugs its becoming a miracle to get a PR to pass them all. Rerunning the test suite and it just fails again with another heisenbug in another test.

The current heisenbug is here: https://app.circleci.com/pipelines/github/dlang/dmd/9002/workflows/994d8911-0c38-40e5-a3e2-3615caccd480/jobs/36578

Patiently scrolling back in that miserable window eventually leads one to this seg fault:

 ... runnable/test18534.d           -O  -fPIC ()
test_results/runner(_D4core7runtime18runModuleUnitTestsUZ19unittestSegvHandlerUNbiPSQCk3sys5posix6signal9siginfo_tPvZv+0x55)[0x55918596f371]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7f73875638a0]
test_results/runner(_ZN12TypeFunction9isnothrowEb+0x1a)[0x559185834f2a]
test_results/runner(_ZN12TypeFunction10syntaxCopyEv+0xa2)[0x559185832f02]
test_results/runner(_ZN22DsymbolSemanticVisitor23funcDeclarationSemanticEP15FuncDeclaration+0x3dd)[0x5591857d4601]
test_results/runner(_ZN22DsymbolSemanticVisitor5visitEP15FuncDeclaration+0x1d)[0x5591857d7021]
test_results/runner(_ZN15FuncDeclaration6acceptEP7Visitor+0x22)[0x55918580f64a]
test_results/runner(_Z15dsymbolSemanticP7DsymbolP5Scope+0x42)[0x5591857cbe96]
test_results/runner(_ZN22DsymbolSemanticVisitor14attribSemanticEP17AttribDeclaration+0xab)[0x5591857cfce7]
test_results/runner(_ZN22DsymbolSemanticVisitor5visitEP17AttribDeclaration+0x1d)[0x5591857cfd4d]
test_results/runner(_ZN16ParseTimeVisitorI10ASTCodegenE5visitEP15LinkDeclaration+0x1f)[0x5591857f5ca7]
test_results/runner(_ZN15LinkDeclaration6acceptEP7Visitor+0x22)[0x5591857df946]
test_results/runner(_Z15dsymbolSemanticP7DsymbolP5Scope+0x42)[0x5591857cbe96]
test_results/runner(_ZN22DsymbolSemanticVisitor14attribSemanticEP17AttribDeclaration+0xab)[0x5591857cfce7]
test_results/runner(_ZN22DsymbolSemanticVisitor5visitEP17AttribDeclaration+0x1d)[0x5591857cfd4d]
test_results/runner(_ZN16ParseTimeVisitorI10ASTCodegenE5visitEP15ProtDeclaration+0x1f)[0x5591857f5d5b]
test_results/runner(_ZN15ProtDeclaration6acceptEP7Visitor+0x22)[0x5591857e0176]
test_results/runner(_Z15dsymbolSemanticP7DsymbolP5Scope+0x42)[0x5591857cbe96]
test_results/runner(_D3dmd10dsymbolsem22DsymbolSemanticVisitor5visitMRCQBx7dmodule6ModuleZ__T9__lambda2TCQDf7dsymbol7DsymbolZQBgMFQBaZv+0x21)[0x5591857d1731]
test_results/runner(_D3dmd7dsymbol14foreachDsymbolFPSQBf4root5array__T5ArrayTCQCeQCd7DsymbolZQxMDFQvZvZv+0x78)[0x559185783b04]
test_results/runner(_ZN22DsymbolSemanticVisitor5visitEP6Module+0x6c)[0x5591857d16c0]
test_results/runner(_ZN6Module6acceptEP7Visitor+0x22)[0x559185777ef6]
test_results/runner(_Z15dsymbolSemanticP7DsymbolP5Scope+0x42)[0x5591857cbe96]
test_results/runner(_ZN22DsymbolSemanticVisitor5visitEP6Import+0x42b)[0x5591857cf3af]
test_results/runner(_ZN6Import6acceptEP7Visitor+0x1f)[0x5591858494cb]
test_results/runner(_Z15dsymbolSemanticP7DsymbolP5Scope+0x42)[0x5591857cbe96]
test_results/runner(_D3dmd10dsymbolsem22DsymbolSemanticVisitor5visitMRCQBx7dmodule6ModuleZ__T9__lambda2TCQDf7dsymbol7DsymbolZQBgMFQBaZv+0x21)[0x5591857d1731]
test_results/runner(_D3dmd7dsymbol14foreachDsymbolFPSQBf4root5array__T5ArrayTCQCeQCd7DsymbolZQxMDFQvZvZv+0x78)[0x559185783b04]
test_results/runner(_ZN22DsymbolSemanticVisitor5visitEP6Module+0x6c)[0x5591857d16c0]
test_results/runner(_ZN6Module6acceptEP7Visitor+0x22)[0x559185777ef6]
test_results/runner(_Z15dsymbolSemanticP7DsymbolP5Scope+0x42)[0x5591857cbe96]
test_results/runner(_D3dmd8frontend12fullSemanticFCQBd7dmodule6ModuleZv+0x2f)[0x559185731677]
test_results/runner(_D7support8compilesFAyaQdZxSQBa17CompilationResult+0xad)[0x5591856b17c9]
test_results/runner(_D10interfaces27check_implementations_2086117__unittest_L32_C5FZv+0x43)[0x5591856b0cdf]
test_results/runner(_D6runner14unitTestRunnerFZS4core7runtime14UnitTestResult+0x16a)[0x5591856df312]
test_results/runner(runModuleUnitTests+0xaf)[0x55918596f0df]
test_results/runner(_D2rt6dmain212_d_run_main2UAAamPUQgZiZ6runAllMFZv+0x26)[0x5591859452ce]
test_results/runner(_D2rt6dmain212_d_run_main2UAAamPUQgZiZ7tryExecMFMDFZvZv+0x31)[0x559185945249]
test_results/runner(_d_run_main2+0x2ca)[0x5591859451ae]
test_results/runner(_d_run_main+0x10b)[0x559185944edb]
test_results/runner(main+0x22)[0x5591856e585e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f73869c3b97]
test_results/runner(_start+0x2a)[0x5591856b0ada]
 ... runnable/test19891.d            -fPIC ()

which makes me want to scream because the log file gives me NO FRACKIN IDEA which file it was trying to compile when it seg faulted because it runs the tests in multiple threads and randomly interleaves the results.

CyberShadow · 2020-08-05T08:23:33Z

Well, that doesn't have anything to do with std.socket, but it seems that disabling concurrency is the obvious solution there. It would mean tests would take several times longer to run, but the first real failure will stop the test suite, so it should always be at the bottom of the log.

WalterBright · 2020-08-05T08:23:58Z

Instead of getting angry and typing in all caps, why not put that energy into adjusting this change as I suggested?

A test where the results are ignored is COMPLETELY USELESS and should be deleted, as this PR does.

I'm angry because I have about 5 PRs none of which pass the test suite because of random heisenbugs that appear and disappear.

CyberShadow · 2020-08-05T08:26:15Z

A test where the results are ignored is COMPLETELY USELESS and should be deleted, as this PR does.

No, it's definitely useful for people working on std.socket directly.

If someone changes the name resolution code, they would have no way of knowing that they broke it unless they tested it themselves.

Currently, the failure will be immediately obvious to anyone running the module unittests directly.

It would be reasonable to disable the tests in CI however, for the reasons you mention, which is exactly what I suggested above.

WalterBright · 2020-08-05T08:27:04Z

Well, that doesn't have anything to do with std.socket,

It's all part of the same problem we have with tolerating heisenbugs and useless log files in the test suite.

but it seems that disabling concurrency is the obvious solution there. It would mean tests would take several times longer to run, but the first real failure will stop the test suite, so it should always be at the bottom of the log.

Even better would be to fix the test runner to not interleave the results, or at least when a process fails it adds to the stack trace what file it was trying to work on.

CyberShadow · 2020-08-05T08:30:14Z

Even better would be to fix the test runner to not interleave the results, or at least when a process fails it adds to the stack trace what file it was trying to work on.

This sounds doable, but only if the test runner (the program starting multiple threads/processes) is under our control (i.e. a D program and not e.g. GNU Make).

I think a better solution would be to disable softUnittest blocks when running in CI, but leave them on when users run tests individually.

@wilzbach @Geod24 Is there some environment variable or other mechanism that we can detect that the unittests are executing under CI, no matter what CI it is (CircleCI, Brad's auto-tester, etc.)?

WalterBright · 2020-08-05T08:32:25Z

If someone changes the name resolution code, they would have no way of knowing that they broke it unless they tested it themselves.

Or they could do a better job of writing the test so it only asserts if no environmental errors were present. I.e. check the error codes returned by the network functions that are called.

CyberShadow · 2020-08-05T08:42:47Z

Or they could do a better job of writing the test so it only asserts if no environmental errors were present. I.e. check the error codes returned by the network functions that are called.

I don't think this is possible in general, and actually this particular failure looks like one instance of that. The assertion failure is probably due to the return value of toHostNameString being null, which isn't associated with any particular error code.

wilzbach · 2020-08-05T08:53:25Z

@wilzbach @Geod24 Is there some environment variable or other mechanism that we can detect that the unittests are executing under CI, no matter what CI it is (CircleCI, Brad's auto-tester, etc.)?

No, they all use different set-ups and environment variables. We would need to set an environment variable for all CIs in question. Though it could be set in the Phobos Makefile which would allow users to locally redefine the variable.
Something like D_PHOBOS_SOFTFAIL_SHOW_STACK ?= 0.

CyberShadow · 2020-08-05T08:54:51Z

OK, then I suggest changing the message and making the stack trace opt-in. I will submit a PR in a few minutes.

WalterBright · 2020-08-05T08:55:50Z

I don't think this is possible in general,

Network errors can't be detected separately from programming bugs? That should not be the case.

and actually this particular failure looks like one instance of that. The assertion failure is probably due to the return value of toHostNameString being null, which isn't associated with any particular error code.

toHostNameString is implemented in socket.d, so it's probably ignoring the error code, which would be a buggy design of socket.d.

CyberShadow · 2020-08-05T08:58:27Z

toHostNameString is implemented in socket.d, so it's probably ignoring the error code, which would be a buggy design of socket.d.

I would need access to a machine on which the failure is reproducible, but what's probably happening is that the configured DNS server for that machine, or its upstream server if it is resolving recursively, is unable to resolve PTR requests, and is returning an empty response. The implementation is definitely checking the error code (see Address.toHostString).

WalterBright · 2020-08-05T09:04:42Z

I looked at toHostString and it is definitely returning null on error, although it fails to document this. It isn't clear if it may return null for other reasons. It also throws for other undocumented reasons. What its argument numeric is for is undocumented.

But it's at least private so an attempt can be made to make it robust.

CyberShadow · 2020-08-05T09:05:59Z

I looked at toHostString and it is definitely returning null on error, although it fails to document this. It isn't clear if it may return null for other reasons. It also throws for other undocumented reasons. What its argument numeric is for is undocumented.

The documentation is in the public functions which call toHostString, which is implementation detail.

WalterBright · 2020-08-05T09:10:56Z

The documentation is in the public functions which call toHostString

Partially true. toAddrString omits mention of the null return.

In any case, I'll amend the PR to check for a null return, and not assert if it is null.

WalterBright · 2020-08-05T09:13:22Z

Eh, I take that back. toAddrString never returns null.

CyberShadow · 2020-08-05T09:14:21Z

I will submit a PR in a few minutes.

#7580

In any case, I'll amend the PR to check for a null return, and not assert if it is null.

I think that will weaken the test, and #7580 is a better (more general) improvement.

CyberShadow · 2020-08-05T09:15:03Z

Partially true. toAddrString omits mention of the null return.

toAddrString never returns null.

WalterBright · 2020-08-05T09:16:47Z

Closing in favor of #7580

Thank you, @CyberShadow

WalterBright requested a review from CyberShadow as a code owner August 5, 2020 00:20

dlang-bot added the Severity:Bug Fix label Aug 5, 2020

fix Issue 21114 - core.exception.AssertError@std/socket.d(1004): Asse…

297329e

…rtion failure

WalterBright force-pushed the fix21114 branch from 0c05ff7 to 297329e Compare August 5, 2020 00:30

WalterBright mentioned this pull request Aug 5, 2020

fix regression in passing test suite dlang/dmd#11508

Merged

CyberShadow requested changes Aug 5, 2020

View reviewed changes

CyberShadow mentioned this pull request Aug 5, 2020

Improve behavior of std.socket "soft" unittests #7580

Merged

WalterBright closed this Aug 5, 2020

WalterBright deleted the fix21114 branch August 5, 2020 09:17

Uh oh!

Comments

Conversation

WalterBright commented Aug 5, 2020

Uh oh!

dlang-bot commented Aug 5, 2020

Bugzilla references

Testing this PR locally

Uh oh!

Geod24 commented Aug 5, 2020

Uh oh!

CyberShadow commented Aug 5, 2020

Uh oh!

CyberShadow commented Aug 5, 2020

Uh oh!

CyberShadow left a comment

Choose a reason for hiding this comment

Uh oh!

WalterBright commented Aug 5, 2020

Uh oh!

CyberShadow commented Aug 5, 2020

Uh oh!

WalterBright commented Aug 5, 2020

Uh oh!

CyberShadow commented Aug 5, 2020

Uh oh!

WalterBright commented Aug 5, 2020

Uh oh!

CyberShadow commented Aug 5, 2020

Uh oh!

WalterBright commented Aug 5, 2020

Uh oh!

CyberShadow commented Aug 5, 2020

Uh oh!

WalterBright commented Aug 5, 2020

Uh oh!

CyberShadow commented Aug 5, 2020

Uh oh!

wilzbach commented Aug 5, 2020

Uh oh!

CyberShadow commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WalterBright commented Aug 5, 2020

Uh oh!

CyberShadow commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WalterBright commented Aug 5, 2020

Uh oh!

CyberShadow commented Aug 5, 2020

Uh oh!

WalterBright commented Aug 5, 2020

Uh oh!

WalterBright commented Aug 5, 2020

Uh oh!

CyberShadow commented Aug 5, 2020

Uh oh!

CyberShadow commented Aug 5, 2020

Uh oh!

WalterBright commented Aug 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CyberShadow commented Aug 5, 2020 •

edited

Loading

CyberShadow commented Aug 5, 2020 •

edited

Loading