Skip to content

Comments

fix regression in passing test suite#11508

Merged
Geod24 merged 1 commit intodlang:masterfrom
WalterBright:testMaster
Aug 6, 2020
Merged

fix regression in passing test suite#11508
Geod24 merged 1 commit intodlang:masterfrom
WalterBright:testMaster

Conversation

@WalterBright
Copy link
Member

Random failings in test suite. Checking to see if it's my PRs or master.

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @WalterBright!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#11508"

@WalterBright
Copy link
Member Author

Looks like bugs in the test suite because master is failing.

@MoonlightSentinel
Copy link
Contributor

MoonlightSentinel commented Aug 4, 2020

The unit target aborted due to a segfault. I guess it's the same issue as seen on some FreeBSD64/Linux_64 runs (segfaulting d_do_test built with current master`), so it's probably a codegen regression.

@WalterBright
Copy link
Member Author

@MoonlightSentinel I don't know if the test runners (like d_do_test) are compiled with the compiler under test, but they shouldn't be. Compiling them with the compiler under test means the test suite that is designed to isolate bugs in the compiler is rendered useless when the test runner doesn't work.

@WalterBright
Copy link
Member Author

Another problem with the test suite is the compiler is built with the internal asserts OFF. This is just terrible, as those asserts are designed to check for internal faults. The test suite needs to be run with asserts ON.

@MoonlightSentinel
Copy link
Contributor

@MoonlightSentinel I don't know if the test runners (like d_do_test) are compiled with the compiler under test, but they shouldn't be. Compiling them with the compiler under test means the test suite that is designed to isolate bugs in the compiler is rendered useless when the test runner doesn't work.

They are usually built with a different host compiler except a few machines wich don't have an appropriate host dmd (too old). You can look for a HOST_DMD lines to see which compiler is used (usually the first few lines of the test dmd for the auto-tester).

Another problem with the test suite is the compiler is built with the internal asserts OFF. This is just terrible, as those asserts are designed to check for internal faults. The test suite needs to be run with asserts ON.

Time to finish #10679 then?

@WalterBright
Copy link
Member Author

@WalterBright
Copy link
Member Author

This was the last codegen pull before problems appeared:

72d4899#diff-14cad335df0c70943a59c3fe5a22595c

I'm going to try unwinding it to see if things improve.

@WalterBright WalterBright force-pushed the testMaster branch 3 times, most recently from de483bf to a134f64 Compare August 6, 2020 01:29
@WalterBright
Copy link
Member Author

Now it's failing in semaphore? What's going on here?

@Geod24
Copy link
Member

Geod24 commented Aug 6, 2020

Ignore the old semaphore failure. Semaphore was disabled for a while (some token expired and no one noticed).
I eventually noticed, but obviously it had broken in the meantime (because it's the only CI testing DMD as a library) so I fixed it and re-enabled semaphore. What you're seeing is tests being run on a commit before that fix was made.

@WalterBright
Copy link
Member Author

Ok, thanks for clearing that up. Do I need to rebase this PR?

@WalterBright WalterBright force-pushed the testMaster branch 4 times, most recently from 31fbcbd to e2408bf Compare August 6, 2020 08:36
@WalterBright WalterBright changed the title see if master passes test suite fix regression in passing test suite Aug 6, 2020
@Geod24
Copy link
Member

Geod24 commented Aug 6, 2020

Ah, did you find the regression ?

@WalterBright
Copy link
Member Author

WalterBright commented Aug 6, 2020

Yes.

@Geod24
Copy link
Member

Geod24 commented Aug 6, 2020

The one I filled is OSX-specific (the assert failure). I believe this one is about the random SEGV that is predominantly seen on FreeBSD (but also on Linux sometimes) ?

@WalterBright
Copy link
Member Author

yes, the random FreeBSD and Linux fault.

@Geod24
Copy link
Member

Geod24 commented Aug 6, 2020

Then it would be great if we could get:

  • A test;
  • An explanation of what went wrong;

@WalterBright
Copy link
Member Author

It would, but I don't have one. I simply changed things between the working and non-working version until I found the trigger, which was changing tyml to TYoffset. I wasn't able to isolate a test case.

It would be helpful if:

  1. asserts in the compiler are turned on
    https://issues.dlang.org/show_bug.cgi?id=20457

  2. the coverage tester worked on the backend code
    https://issues.dlang.org/show_bug.cgi?id=21119

  3. the test runner was compiled with the previous release, not the compiler under test
    https://issues.dlang.org/show_bug.cgi?id=21053

I've filed bug reports for those issues. All will help in isolating bugs and crafting better test cases.

@WalterBright WalterBright added the Review:Blocking Other Work review and pulling should be a priority label Aug 6, 2020
@WalterBright
Copy link
Member Author

This is pretty much preventing anybody else from passing the test suite.

@Geod24
Copy link
Member

Geod24 commented Aug 6, 2020

It would, but I don't have one. I simply changed things between the working and non-working version until I found the trigger, which was changing tyml to TYoffset. I wasn't able to isolate a test case.

And that is not really inspiring. Most of your backend PRs are pulled on faith, because we assume you know what you do. Sure there's reviewing going on, but without a deep understanding of the backend that reviewing will not catch the most serious (logic bug).

I really wish we didn't have to deal with the DMD backend. It's a burden to most contributors, gives a bad first impression to users, and occasionally gives us bugs like this.

@Geod24 Geod24 merged commit 4368106 into dlang:master Aug 6, 2020
@wilzbach
Copy link
Contributor

wilzbach commented Aug 6, 2020

I really wish we didn't have to deal with the DMD backend. It's a burden to most contributors, gives a bad first impression to users, and occasionally gives us bugs like this.

Walter's response recently was:

You don't have to spend your time on DMD's backend if you don't want to.

I fully agree with @Geod24 and I really don't feel compelled at all to put my time into anything towards DMD's backend, e.g. from this thread:

The coverage tester worked on the backend code
https://issues.dlang.org/show_bug.cgi?id=21119


the test runner was compiled with the previous release, not the compiler under test

-> #11519

@Geod24
Copy link
Member

Geod24 commented Aug 6, 2020

You don't have to spend your time on DMD's backend if you don't want to.

That's only true in theory. If you touch the frontend, runtime, or Phobos, you're going to run into backend issues eventually.

@MoonlightSentinel
Copy link
Contributor

  1. the coverage tester worked on the backend code
    https://issues.dlang.org/show_bug.cgi?id=21119

Workaround: #11522

@MoonlightSentinel
Copy link
Contributor

Please use a proper commit title in the future, this PR fixes a regression in the backend, not the test suite.

@WalterBright WalterBright deleted the testMaster branch August 6, 2020 19:40
@WalterBright
Copy link
Member Author

@wilzbach I share your concern in not wasting developer time. I suspect the biggest timewaster of developer time are the numerous problems with heisenbugs in the test suite due to networking problems. Most of them could likely be corrected by detecting networking errors as networking errors (not failures), going to sleep for a minute, and then trying again. The current behavior of halting the entire test to wait for someone to come along and restart it wastes everyone's time who works with github. More and more of the test suite has been relying on networking, meaning it gets harder and harder for any PRs to get a clean run through it.

I've made a collection of these problems:

https://issues.dlang.org/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&keywords=TestSuite%2C%20&keywords_type=allwords&list_id=232610&query_format=advanced

Some have been corrected, but those remain.

@andralex
Copy link
Member

andralex commented Aug 7, 2020

FWIW this automation of detecting transient errors and resuming automatically would greatly help any project using CI. CIs have very many moving pieces so they tend to fail often. Wherever I worked, a good fraction of efficiency was lost due to such issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review:Blocking Other Work review and pulling should be a priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants