fix Issue 21114 - core.exception.AssertError@std/socket.d(1004): Asse…#7579
fix Issue 21114 - core.exception.AssertError@std/socket.d(1004): Asse…#7579WalterBright wants to merge 1 commit intodlang:masterfrom
Conversation
|
Thanks for your pull request, @WalterBright! Bugzilla references
Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + phobos#7579" |
|
|
This is already in a |
|
What does this PR fix? The error is informational and does not cause the test suite to fail. |
CyberShadow
left a comment
There was a problem hiding this comment.
Not sure what this is supposed to fix, but if the problem is harmless but "scary" messages in the CI logs, I think a better solution would be to disable softUnittest blocks when running in CI, but leave them on when users run tests individually.
Because when I get a thousand line log file that displays in a 20 line window with the last line that says "failed somewhere in the last thousand lines", and I scroll back in that miserably tiny window, I run into these seg fault traces with "don't mind me, I'm not important" and I wonder if they are the actual failure or not. I'm really tired of useless, misleading log files. Assert failures for PROGRAMMING BUGS and NOT ENVIRONMENTAL ISSUES. You can change it another way if you prefer, but the only reason a benign stack trace should ever appear in a log file is if the test is testing the stack trace. A stack trace is not informational at all for environmental errors, and has no place in the log file. |
The assert is correctly testing the module's code.. it's just that, unfortunately, there is no easy way to do this test without relying on the environment (in this case, access to the public DNS being set up and working). In principle, it is no different than
I don't think there is a general solution to this problem. There will always be a class of tests where we can't reliably programmatically know if the test is failing because there is a problem with the environment that the test is running on, or if someone introduced a regression. Instead of getting angry and typing in all caps, why not put that energy into adjusting this change as I suggested? An alternative would be to hide the stack trace (i.e. write |
|
I tried to link to that hated log file, but now that environmental "informational" error didn't happen. Please, all random heisenbugs need to be removed from the test suite. There are so many tests being run now that have heisenbugs its becoming a miracle to get a PR to pass them all. Rerunning the test suite and it just fails again with another heisenbug in another test. The current heisenbug is here: https://app.circleci.com/pipelines/github/dlang/dmd/9002/workflows/994d8911-0c38-40e5-a3e2-3615caccd480/jobs/36578 Patiently scrolling back in that miserable window eventually leads one to this seg fault: which makes me want to scream because the log file gives me NO FRACKIN IDEA which file it was trying to compile when it seg faulted because it runs the tests in multiple threads and randomly interleaves the results. |
|
Well, that doesn't have anything to do with std.socket, but it seems that disabling concurrency is the obvious solution there. It would mean tests would take several times longer to run, but the first real failure will stop the test suite, so it should always be at the bottom of the log. |
A test where the results are ignored is COMPLETELY USELESS and should be deleted, as this PR does. I'm angry because I have about 5 PRs none of which pass the test suite because of random heisenbugs that appear and disappear. |
No, it's definitely useful for people working on std.socket directly. If someone changes the name resolution code, they would have no way of knowing that they broke it unless they tested it themselves. Currently, the failure will be immediately obvious to anyone running the module unittests directly. It would be reasonable to disable the tests in CI however, for the reasons you mention, which is exactly what I suggested above. |
It's all part of the same problem we have with tolerating heisenbugs and useless log files in the test suite.
Even better would be to fix the test runner to not interleave the results, or at least when a process fails it adds to the stack trace what file it was trying to work on. |
This sounds doable, but only if the test runner (the program starting multiple threads/processes) is under our control (i.e. a D program and not e.g. GNU Make).
@wilzbach @Geod24 Is there some environment variable or other mechanism that we can detect that the unittests are executing under CI, no matter what CI it is (CircleCI, Brad's auto-tester, etc.)? |
Or they could do a better job of writing the test so it only asserts if no environmental errors were present. I.e. check the error codes returned by the network functions that are called. |
I don't think this is possible in general, and actually this particular failure looks like one instance of that. The assertion failure is probably due to the return value of |
No, they all use different set-ups and environment variables. We would need to set an environment variable for all CIs in question. Though it could be set in the Phobos Makefile which would allow users to locally redefine the variable. |
|
OK, then I suggest changing the message and making the stack trace opt-in. I will submit a PR in a few minutes. |
Network errors can't be detected separately from programming bugs? That should not be the case.
toHostNameString is implemented in socket.d, so it's probably ignoring the error code, which would be a buggy design of socket.d. |
I would need access to a machine on which the failure is reproducible, but what's probably happening is that the configured DNS server for that machine, or its upstream server if it is resolving recursively, is unable to resolve PTR requests, and is returning an empty response. The implementation is definitely checking the error code (see |
|
I looked at toHostString and it is definitely returning null on error, although it fails to document this. It isn't clear if it may return null for other reasons. It also throws for other undocumented reasons. What its argument But it's at least |
The documentation is in the public functions which call |
Partially true. toAddrString omits mention of the null return. In any case, I'll amend the PR to check for a null return, and not assert if it is null. |
|
Eh, I take that back. toAddrString never returns null. |
|
|
Closing in favor of #7580 Thank you, @CyberShadow |
…rtion failure
Using asserts to detect environmental errors is just wrong.