Skip to content

Comments

dmd.mars: Always produce COFF when targeting Windows#12825

Closed
ibuclaw wants to merge 1 commit intodlang:masterfrom
ibuclaw:disable-omf
Closed

dmd.mars: Always produce COFF when targeting Windows#12825
ibuclaw wants to merge 1 commit intodlang:masterfrom
ibuclaw:disable-omf

Conversation

@ibuclaw
Copy link
Member

@ibuclaw ibuclaw commented Jul 6, 2021

Testing whether Azure pipelines will pass. Optlink is causing issues, again...

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @ibuclaw!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#12825"

@MoonlightSentinel MoonlightSentinel added the Review:Needs Changelog A changelog entry needs to be added to /changelog label Jul 6, 2021
@MoonlightSentinel
Copy link
Contributor

MoonlightSentinel commented Jul 6, 2021

Should probably come with a way to revert to OMF if anybody actually wants/needs to use it (maybe -m32omf?).
Also needs a changelog entry

@thewilsonator
Copy link
Contributor

see also https://issues.dlang.org/show_bug.cgi?id=18964

@Geod24
Copy link
Member

Geod24 commented Jul 7, 2021

lib\druntime.lib: Error: Object module `errno_c_32.obj` is 32 bit OMF, but it should be 64 bit MS-Coff
lib\druntime.lib: Error: Object module `minit.obj` is 32 bit OMF, but it should be 64 bit MS-Coff

@ibuclaw
Copy link
Member Author

ibuclaw commented Jul 7, 2021

lib\druntime.lib: Error: Object module `errno_c_32.obj` is 32 bit OMF, but it should be 64 bit MS-Coff
lib\druntime.lib: Error: Object module `minit.obj` is 32 bit OMF, but it should be 64 bit MS-Coff

Thanks, just trialing things at the moment (I guess I will probably end with removing the OMF pipelines?).

Tah @thewilsonator - looks like what @ManuEvans has previously done would be a good baseline to begin from.

@Geod24
Copy link
Member

Geod24 commented Jul 7, 2021

CC @TurkeyMan

@ibuclaw ibuclaw force-pushed the disable-omf branch 3 times, most recently from 531c3ca to cd0c687 Compare July 7, 2021 09:33
@ibuclaw
Copy link
Member Author

ibuclaw commented Jul 7, 2021

Only had a cursory blast through the codebase, I have no means of testing, so just waiting for the CI results.

@ibuclaw
Copy link
Member Author

ibuclaw commented Jul 7, 2021

This can be split up into two logical PRs (well, three to remove the dependency on target variables). No point doing that unless together they all work.

global.params.libfiles.push(phobosLibname.xarraydup.ptr);

if (target.mscoff)
if (target.omfobj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if you just negated the condition and left the associated branches as they are to reduce unnecessary diff.

@kinke
Copy link
Contributor

kinke commented Jul 7, 2021

I guess I will probably end with removing the OMF pipelines?

Yes, unless they are still supposed to be tested with -m32omf.

@ibuclaw
Copy link
Member Author

ibuclaw commented Jul 7, 2021

I guess I will probably end with removing the OMF pipelines?

Yes, unless they are still supposed to be tested with -m32omf.

I better add -m32omf as a deprecated switch then. ;-)

@ibuclaw ibuclaw marked this pull request as draft July 7, 2021 20:54
Copy link
Member

@WalterBright WalterBright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing 18 files and 200 lines just to change a default? DMD is not ready to be packagized. But I digress.

I've hitched my wagon to Microsoft Link before. The troubles are:

  1. Versions. Every customer has a different version of that linker. Some would work, some wouldn't. I couldn't just ship someone with a non-working MS-Link a linker that did work.
  2. Bugs. Microsoft placed no priority whatsoever on fixing bugs. I was faced with being dead in the water for indeterminate times. Obviously, this was untenable.
  3. The compiler didn't work out of the box. I had to tell customers they had to buy some MS dev system to get a linker.
  4. When MS-Link didn't work, or crashed, etc., I always got blamed.

Having a linker I had control over resolved these issues.

Nothing has changed with those points. It's important that DMD work out of the box. I know that we rely on MS-Link for 64 bits, but at least they can get their code working by default out of the box.

We can fix Optlink bugs if we need to. I have fixed many. We cannot do jack about MS-Link bugs, except desperately search for some workaround. No attempt has been made even to figure which line of code triggered the Optlink error. Maybe it is trivial, maybe it isn't. That's not really a reason to just abandon it.

I oppose this PR.

@ibuclaw
Copy link
Member Author

ibuclaw commented Jul 8, 2021

Maybe also add it to a list of bug proofs, but that takes 5 minutes, and does not really happen that often, I believe.

I have counted 3 issues in the just as many weeks related to OMF pipeline failures. One of which caused the contributor to spend almost a month going round in circles trying to isolate and work around it. This is not an acceptable use of anyone's time, irrespective of whether they did this on their company time or not.

@DoctorNoobingstoneIPresume

One phrase that crops up a lot at beerconf is "we need to drain the swamp". OMF is but one of the many things that need either urgent irrigation or draining, because OOB experience on 32bit is terrible for reasons already given here (and many more, including PIC and PIE support), and telling users to use ms32coff to fix their issues is not acceptable.

Linker bugs (hypothetical or not) are being thrown a lot round here as a reason why they are right. These excuses should be struck down by the general discourse. What if there were no bugs in the linker? People will still have a terrible UX on 32bit with OMF for the same non-linker related reasons given here by others and we'll still be telling users to use ms32coff anyway.

So even with bugs fixed, IIUC, the main problem is that OMF is not used by other toolchains and users are surprised. (Or are there other UX problems with OMF ?)

On Linux and OSX, a lot of stress is emphasised on supporting the toolchain and ABI that comes installed on the system. Why then must we take the NIH stance on Windows then?

Partial answers:

I believe that the multiple toolchains (and the two object file formats) in Windows have been a driving force behind COM, and that COM has proven useful not just for fixing compatibility (e.g.: it is useful for fine-grained versioning, including in open-source projects, not just for Windows -- 7zip and, until recently anyway, Mozilla).

Also, are we not glad, as users, that there are two competitive open-source C++ toolchains ? And that Visual C++ is also available and competitive ? The opposite of a monopoly (such as Chrome's engine in the world of browsers).

@WalterBright : If OPTLINK is significantly faster than ld, gold and lld, than fixing bugs and making it scalable could encourage both users and contributors. Some projects suffer from long link times and high RAM usage. Currently, I just wish to ask whether it is indeed significantly faster. If so, is that still attributed to it being written in Assembler ? Or is there something in OMF which is not in COFF which makes it faster ?

@WalterBright
Copy link
Member

PIC and PIE support

AFAIK that has no relevance to Win32.

One of which caused the contributor to spend almost a month going round in circles trying to isolate and work around it.

AFAIK it was just assumed to be an optlink bug, because it didn't present with mscoff. But there are other differences between mscoff and omf code generation which could very well be the problem (i.e. a codegen bug). There could be other sources of the problem, too, like the system import libs for OMF were built wrong. I suspect that because the very small number of lines of code in that project were using unusual Windows API calls.

It just seems peculiar to me that adding a hundred lines of code to dmd suddenly causes Optlink failures, when we add hundreds of lines of code all the time to dmd and there are no such failures. So what's different about it? The different Windows API calls.

Another common source of strange errors when changing the toolchain is inadvertent use of undefined behavior, like uninitialized stack variables (though that shouldn't happen with D). I've certainly encountered many such coding bugs in the past, and one reason why supporting multiple tool chains is a good idea is it exposes latent bugs.

It's too easy to dismiss it as Optlink, as the actual problem was never found. Now, if Optlink does produce the Unexpected Termination display, that is definitely an Optlink bug.

@WalterBright
Copy link
Member

I have counted 3 issues in the just as many weeks

I filed one of them. Were bugzilla issues filed for the other two?

@WalterBright
Copy link
Member

On Linux and OSX, a lot of stress is emphasized on supporting the toolchain and ABI that comes installed on the system. Why then must we take the NIH stance on Windows then?

  1. the Microsoft toolchain does not come with Windows.
  2. on Linux and OSX, it's trivial for users to apt-get the latest toolchain, so dealing with out-of-date parts of the toolchain is not necessary.

@thewilsonator
Copy link
Contributor

the Microsoft toolchain does not come with Windows.

I would like to stress again that we ship lld (LLVM based linker) and minGW import libraries on windows. Lack of toolchain is not a problem.

@ibuclaw
Copy link
Member Author

ibuclaw commented Jul 8, 2021

PIC and PIE support

AFAIK that has no relevance to Win32.

Fraying 32-bit support in general is blocking meaningful progress on all platforms. 32-bit DMD Linux releases have had the compiler segfaulting for a few years now (unless you are running on an EOL system). The fix is easy, but PIC bugs in the DMD backend are blocking it.

One of which caused the contributor to spend almost a month going round in circles trying to isolate and work around it.

AFAIK it was just assumed to be an optlink bug, because it didn't present with mscoff. But there are other differences between mscoff and omf code generation which could very well be the problem (i.e. a codegen bug). There could be other sources of the problem, too, like the system import libs for OMF were built wrong. I suspect that because the very small number of lines of code in that project were using unusual Windows API calls.

Another reason to sunset OMF. With COFF, if a problem occurs in DMD pipelines, you can try compiling the same code with either GDC or LDC to see if the same crash occurs. If it doesn't, you know the fault is DMD. Because neither GCC or LLVM supports OMF we have no way to track down, verify, or debug problems.

It's too easy to dismiss it as Optlink, as the actual problem was never found. Now, if Optlink does produce the Unexpected Termination display, that is definitely an Optlink bug.

People have been complaining for years in bugzilla, on forums, in other D projects repos. It would be harder to dismiss if this was just a one-off, or a rare occurrence. But in reality, the backlog of evidence is too damn high, and first-time contributors continually run into heisenbugs that manifest only on DMD 32-bit.

@DoctorNoobingstoneIPresume
Copy link

DoctorNoobingstoneIPresume commented Jul 8, 2021

Maybe also add it to a list of bug proofs, but that takes 5 minutes, and does not really happen that often, I believe.

I have counted 3 issues in the just as many weeks related to OMF pipeline failures. One of which caused the contributor to spend almost a month going round in circles trying to isolate and work around it. This is not an acceptable use of anyone's time, irrespective of whether they did this on their company time or not.

Can such unacceptable spending of time and effort not be avoided (or at least reduced) by having the DMC/OPTLINK/OMF check temporarily be marked as not required for merging of checked dlang/druntime/phobos PRs (but still performed by our Github pipelines) ?

@ibuclaw
Copy link
Member Author

ibuclaw commented Jul 8, 2021

Can such unacceptable spending of time and effort not be avoided (or at least reduced) by having the DMC/OPTLINK/OMF check temporarily be marked as not required for merging of checked dlang/druntime/phobos PRs (but still performed by our Github pipelines) ?

We release OMF binaries, so unless that stops...

@DoctorNoobingstoneIPresume
Copy link

DoctorNoobingstoneIPresume commented Jul 8, 2021

it's about ownership. optlink is not under our control: it lives in a different organization, and very few people here have merge access to it.

If that's blocking anyone, please let me know and I'll fix it.

For me personally, it is my own laziness and lack of ability which is blocking me from learning to work on OPTLINK. If I had been able to fix the bug (edit: or at least understand it) I am sure I could convince the maintainer to merge that fix. No, what I really want is support for this toolchain, but at the expense of effort made by others. Why do I want it ? I have tried explaining, but got "How many others want support for DMC/OPTLINK/OMF ?". It is difficult for me to answer that.

@DoctorNoobingstoneIPresume
Copy link

DoctorNoobingstoneIPresume commented Jul 8, 2021

Can such unacceptable spending of time and effort not be avoided (or at least reduced) by having the DMC/OPTLINK/OMF check temporarily be marked as not required for merging of checked dlang/druntime/phobos PRs (but still performed by our Github pipelines) ?

We release OMF binaries, so unless that stops...

IIUC, you want support for OMF to be eliminated completely, and you have been using the argument of time and effort needed to investigate fails of OMF checks.

I am proposing, as a compromise, from the point-of-view of a non-contributor and of possibly a weird user, that such failures are simply not investigated. I feel this leaves at least some form of support for OMF.

Regarding changes of default target from OMF to COFF, FWMOIW (for whatever my opinion is worth): I would vote against, but if you feel strongly for that, I can live with that just fine (edit: plus I am not important).

Why have I dared answer to this conversation ? I felt that the following question was being asked: "Is anyone still using OPTLINK or OMF [or interoperating with DMC] [edit: or Embarcadero] [or happy for small size of toolchain and independence] [or appreciative of testing with multiple toolchain versions] ?"

So I have answered.

Have I been desconsiderate of other people's time and effort ? Maybe. I apologize if so. Until today, I had not been aware that OPTLINK had bugs. Even so, dropping it feels too harsh to me, especially when a compromise of ignoring that pipeline (until OPTLINK is fixed, if ever) can IMO be made (sad as it may seem to me) (but, then again, I am too lazy to learn OPTLINK).

@DoctorNoobingstoneIPresume

In the future, we might run into similar situations even with a linker supported by a big Company or by a big Community. Maybe such situations have happened in the past. We might find out that bugs are not quickly fixed even by those larger teams.

On one hand, we have optlink bugs, which happens frequently, we have evidence of that. On the other hand, we have hypothetical "some linker" bug. Did we ever hit such case of a non-optlink bug ? Yes, ldc had issues with older versions of ld.bfd (you can see it mentioned in each release under "Known issues"). What did they do ? Switched to gold by default on Linux. And that's the only instance I can think of. The bug has since been fixed (and I'd wager, was reported before LDC hit it), and using gold has other benefits anyways (it's faster).

gold is faster than ld, but is it faster than lld or vice-versa ?

@rikkimax
Copy link
Contributor

  • and no-one depends critically on it. If people run into problems and request assistance on the forums/discord/whatever the first thing anyone will suggest is use -m32mscoff

I can't remember the last time I or anyone else recommended anyone to make a 32bit build on Discord.

Straight to 64bit.

Ever since dub switched to coff by default for 32bit, I don't remember having to question this at all.

thewilsonator added a commit to thewilsonator/dmd that referenced this pull request Sep 30, 2021
This rebases dlang#12825 but doesn't attempt to rename phobos libraries and doesn't change the build infra where possible(and doesn't have a large diff in `link.d`). See if this fixes the CI issues.
thewilsonator added a commit to thewilsonator/dmd that referenced this pull request Sep 30, 2021
This rebases dlang#12825 but doesn't attempt to rename phobos libraries and doesn't change the build infra where possible(and doesn't have a large diff in `link.d`). See if this fixes the CI issues.
thewilsonator added a commit to thewilsonator/dmd that referenced this pull request Sep 30, 2021
This rebases dlang#12825 but doesn't attempt to rename phobos libraries and doesn't change the build infra where possible(and doesn't have a large diff in `link.d`). See if this fixes the CI issues.
thewilsonator added a commit to thewilsonator/dmd that referenced this pull request Sep 30, 2021
This rebases dlang#12825 but doesn't attempt to rename phobos libraries and doesn't change the build infra where possible(and doesn't have a large diff in `link.d`). See if this fixes the CI issues.
thewilsonator added a commit to thewilsonator/dmd that referenced this pull request Sep 30, 2021
This rebases dlang#12825 but doesn't attempt to rename phobos libraries and doesn't change the build infra where possible(and doesn't have a large diff in `link.d`). See if this fixes the CI issues.
thewilsonator added a commit to thewilsonator/dmd that referenced this pull request Oct 6, 2021
This rebases dlang#12825 but doesn't attempt to rename phobos libraries and doesn't change the build infra where possible(and doesn't have a large diff in `link.d`). See if this fixes the CI issues.
RazvanN7 pushed a commit that referenced this pull request Jan 21, 2022
* Produce MS Coff by default when targetting windows

This rebases #12825 but doesn't attempt to rename phobos libraries and doesn't change the build infra where possible(and doesn't have a large diff in `link.d`). See if this fixes the CI issues.

* remove OMF piplines

* Update glue.d

* Update target.d

* Update target.d

* Update frontend.h
@ibuclaw ibuclaw marked this pull request as ready for review January 23, 2022 08:13
@ibuclaw ibuclaw closed this Jan 23, 2022
@ibuclaw ibuclaw deleted the disable-omf branch January 23, 2022 08:13
@TurkeyMan
Copy link
Contributor

Reading through this discussion invokes an overwhelming and visceral sense of PTSD.
I'll climb back in my hole...

@ibuclaw
Copy link
Member Author

ibuclaw commented Jan 23, 2022

Reading through this discussion invokes an overwhelming and visceral sense of PTSD.
I'll climb back in my hole...

@TurkeyMan at least optlink is now bowing out and coff is the default for 32bit now too. :-)

@thewilsonator
Copy link
Contributor

@TurkeyMan this was closed because I rebased it and that got merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.