Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.
/ druntime Public archive

Comments

Fix issue 16211 - Cyclic dependencies broken again#1602

Merged
MartinNowak merged 11 commits intodlang:masterfrom
schveiguy:fixcycles
Aug 21, 2016
Merged

Fix issue 16211 - Cyclic dependencies broken again#1602
MartinNowak merged 11 commits intodlang:masterfrom
schveiguy:fixcycles

Conversation

@schveiguy
Copy link
Member

@schveiguy schveiguy commented Jun 28, 2016

Note: you may want to look just at the first commit. I left the old algorithm there to prevent the diff from comparing two unrelated functions. The second commit removes the old algorithm from the source. No longer the case, there are a lot of commits, and the code has changed significantly since the first commit.

The new algorithm finds cycles that have an intermediate module in the cycle twice. For example:
a -> b -> c -> b -> a, where only a and c have module ctors, but b does not. the old algorithm would short circuit the evaluation at b the second time, thinking it was already visited. This unit test has been added.

Essentially, the real graph of modules is between only the modules with static ctors (relevant modules), and the "edges" are really which other relevant modules can be reached. The fact that you have to go through non-relevant modules isn't important, and the non-relevant modules can be used more than once. For this reason, the algorithm needs to be very different than a straight cycle detection algorithm.

The new algorithm is a copy of the original algorithm I wrote in 2010, but enhanced to deal with the current module system.

I also implemented a crude dijkstra search to try and print the shortest cycle possible given the two offending modules. Works rather well, actually.

Unfortunately, there is still an ugly wart of a linear search whenever a module index has to be looked up, because the module info is now all immutable (previous version stored the index inside the module info). Also note that a module can import another module multiple times (see https://issues.dlang.org/show_bug.cgi?id=16208), so I try to minimize the looking up, but I can't avoid it there without allocating an edge map.

Finally, I added a bit better mechanism for figuring out which unit test fails when doing module sorting, because it's all done within a wrapper (and the stack trace doesn't always show line numbers).

pinging @MartinNowak since you are most familiar with this code (and wrote the old version).

This requires dlang/phobos#4493

Note: even though this is a regression fix, this will likely cause code breakage outside phobos, which is why I did not target stable.

@dlang-bot
Copy link
Contributor

Fix Bugzilla Description
16211 [REG 2.058] Cyclic dependencies broken again

@JackStouffer
Copy link
Contributor

Why was this closed? The cycle detection is still broken.

@schveiguy schveiguy reopened this Jul 6, 2016
@schveiguy schveiguy removed the Blocked label Jul 6, 2016
@schveiguy
Copy link
Member Author

@JackStouffer correct. This should now pass

@dnadlinger
Copy link
Contributor

Apparently, GitHub auto-closed this when the auto-tester merged the Phobos PR. Note all the "auto"s – I'm totally innocent. ;P

@schveiguy
Copy link
Member Author

Yay a windows specific cycle! I'll have to fix that first

src/rt/minfo.d Outdated
{
// save to the error message. Note that if we are throwing an
// exception, we don't care about being careful with using
// stack memory. Just use the GC/runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be sure that everything is already brought up at this point when using druntime as a shared library itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that @MartinNowak already solved that issue when he changed the code to using module groups. I think cycle detection is run on each shared objects' group, which can't have upstream dependencies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I get what you are saying, the GC may not have been set up yet? In the old version, when it was decided to throw an Error with the cycle, the code used GC concatenation, so I assumed this was OK to use. Isn't the GC inside druntime, so I would think it is available by now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this will work. Modules are only initialized/sorted after rt_init, i.e. shared libraries aren't constructed in dso_registry until the runtime is initialized.

@dnadlinger
Copy link
Contributor

dnadlinger commented Jul 6, 2016

Seems like Win32 is still broken.

My go-to codebase to get an idea about the impact of this on real-world code would usually be Weka's, but they have disabled cycle detection anyway since the issues with "uncontrollable" template emission were impractical for them to circumvent – they opted for disabling the checks entirely over basically making sure not to use static constructors with templates ever (this depends of course also on the dependencies between your modules – if you strictly avoid any cycles, working around any template issues would be manageable too).

In other words, this definitely fixes a real bug (and should go in), but I'm curious how many more projects will discover issues with cycles now.

@schveiguy
Copy link
Member Author

but I'm curious how many more projects will discover issues with cycles now.

I expect there will be some. But it may be fixing bugs in their code! However, based on how blunt a tool cycle detection is, it's likely not helping at all (most cycle detection problems are frivolous). This is going to be controversial, but it's still a bug fix. Don't know how to do it any better, open to ideas.

@schveiguy
Copy link
Member Author

GitHub auto-closed this

I suspected so, because the comment was weird. I wonder how that happened?

@schveiguy
Copy link
Member Author

I'm concerned here -- this cycle doesn't seem to be windows specific. Need to investigate the reason why it doesn't fail on OSX or others.

@schveiguy
Copy link
Member Author

schveiguy commented Jul 6, 2016

There appears to be a bug in dmd or druntime. On OSX, std.internal.math.biguintcore only lists core.cpuid as a depencency, but it clearly imports many other modules.

I can fix the cycle pretty easily I think. But we need to figure out how to reduce this other bug...

@schveiguy
Copy link
Member Author

Needs dlang/phobos#4571

@schveiguy schveiguy removed the Blocked label Jul 6, 2016
@schveiguy
Copy link
Member Author

Nuts. New shiny object to chase. I'm going to get a proper Windows test build going before going any further

@dnadlinger
Copy link
Contributor

I suspected so, because the comment was weird. I wonder how that happened?

Maybe fix (dlang/druntime#1602) in your PR description was enough? No idea.

@schveiguy
Copy link
Member Author

OK, I think dlang/phobos#4580 will make this pass.

@MartinNowak
Copy link
Member

MartinNowak commented Jul 9, 2016

I'm a bit concerned about this going in the wrong direction. The cycle detection is already overly conservative and annoys people, b/c it forces them through weird hoops, even though their code just works fine. Just look through bugzilla how many issues we have related to this topic.
I think we need more support from the compiler to detect actual cycles, i.e. figure out symbols used in a ctor/dtor and just get their module dependencies.
In fact this feature is so overcomplicated for the little it does, that we should spend some time to come up with completely different approaches.

@MartinNowak
Copy link
Member

MartinNowak commented Jul 9, 2016

There were two reasons to replace the old cycle detector. The code was very hard to maintain, and the algorithm had a fairly bad runtime (O(N^2) IIRC). We shouldn't go back to that if possible.
Shouldn't it just suffice to tweak the conditions here

druntime/src/rt/minfo.d

Lines 180 to 195 in 90bd014

if (m.flags & mask)
{
if (m.flags & MIstandalone || !m.importedModules.length)
{ // trivial ctor => sort in
ctors[cidx++] = m;
bts(ctordone, bitnum);
}
else
{ // non-trivial ctor => defer
bts(ctorstart, bitnum);
}
}
else // no ctor => mark as visited
{
bts(ctordone, bitnum);
}
, e.g. marking a module w/o ctor but with imports as ctorstart?
That should definitely work, the only addition we need to make this work is accepting cycles between modules without dtors. This could either be done by the backtracking code that wants to print the cycle, or if that is too slow, by adding more state.

@schveiguy
Copy link
Member Author

schveiguy commented Jul 9, 2016

I had a realization last night -- imports in unit tests shouldn't play any role in cycles, because they cannot be called from static ctors. If we remove unit test imports from the cycle detection, then we can work around most of the problems I have fixed in Phobos. We still need to handle imports inside functions, because static ctors can call them.

In terms of how complex the algorithm has to be, it just has to be. Otherwise the cycle detection is annoying and incorrect.

The O(n^2) requirement is expected, as we need to construct a reachability graph between all nodes. My algorithm avoids the O(n^2) space requirements as much as possible (it still takes O(n^2) space, but only for reachability between relevant nodes).

@schveiguy
Copy link
Member Author

marking a module w/o ctor but with imports as ctorstart?

No, this is how it was before I fixed it in 2010. It doesn't find cycles that involve a non-ctor-containing node twice. Such as a -> b -> c -> b -> a, where a and c have ctors. You can't count the irrelevant nodes during cycle detection, they are just a way to get to the next relevant node.

@schveiguy
Copy link
Member Author

I think we need more support from the compiler to detect actual cycles, i.e. figure out symbols used in a ctor/dtor and just get their module dependencies.

this would be awesome.

@MartinNowak
Copy link
Member

I'm fine for now with just printing that there is a cycle and not causing a fatal crash.

I think we need to figure this out anyhow, an option to disable this being fatal was requested a couple of times (and can be supported via the -DRT- options, e.g. -DRT-nocyclecheck).
Adding such an option might be enough for the transition phase (not sure).

If we just print the cycle, how do we decide the order of the static ctors? In other words, what static ctor gets run first in the cycle above, ddmd.traits or ddmd.cond?

That's a tricky question and I haven't yet thought long enough about this to give an answer. My intuition says we should run ddmd.traits first b/c it's at the bottom of the depedency tree (before the cycle ascends back up). Though a much simpler resolution would be to mark the end of the cycle ddmd.cond as resolved and continue where we left of.
OTOH we can't guarantee any working order and know nothing about the importance, so any order is a good as any other, and using the simplest implementation would be a reasonable solution, i.e. print the cycle, ignore the module at the end (ddmd.cond) and go on (replace throw w/ printf).

@schveiguy
Copy link
Member Author

My intuition says we should run ddmd.traits first b/c it's at the bottom of the depedency tree

That's only because the module order happened to put ddmd.cond first in the list of modules :)

If it was the other way around, the cycle would be traits -> cond -> traits. I wonder if it would be worth sorting the module names first to create a deterministic ordering, even in the event of cycles. We are already putting the names into a hash table, instead if we just put it in a sorted list, and then use binary search to take the place of the hashing, we could get similar performance and more determinism.

The fundamental problem here is that the cycle detection means there is a real determinism problem, where there is no right answer. All we can do is run the constructors in whichever order we pick (I think from an algorithm standpoint, it makes sense to print the cycle and prune that branch as normal, so in the cycle above ddmd.traits would run first), and hope that it will not create issues. We for sure need a switch to enable this behavior.

@MartinNowak
Copy link
Member

I wonder if it would be worth sorting the module names first to create a deterministic ordering, even in the event of cycles.

Not to keen on presorting the modules, its add even more complexity, startup time, and code into every binary and we already print the cycle, so people know that sth. doesn't work as expected.
Can we recommend a simple workaround for people w/ harmless cycles?

@MartinNowak MartinNowak added this to the 2.072.0 milestone Sep 25, 2016
@MartinNowak
Copy link
Member

Don't want to push, but we should include such a switch in 2.072.0.

@schveiguy
Copy link
Member Author

Don't want to push, but we should include such a switch in 2.072.0.

I will try to get to it by tomorrow.

@schveiguy
Copy link
Member Author

I will try to get to it by tomorrow.

Obviously didn't get there, I've been crazy busy. I am traveling today, and will not get to this until at least Saturday, and if not, it will be next week. If you start releasing betas, I'll put it into stable branch.

@MartinNowak
Copy link
Member

MartinNowak commented Oct 1, 2016

Let's use -DRT-cyclecheck=printonly, -DRT-cyclecheck=no (still best-effort sorting), and -DRT-cyclecheck=yes (default, but also explicit to override other cyclecheck settings).

@schveiguy
Copy link
Member Author

I had settled on this scheme:

-DRT-onCycle=abort -> print cycle and abort (current behavior)
-DRT-onCycle=print -> print cycle but make best effort to sort ctors
-DRT-onCycle=ignore -> ignore cycles, don't print anything

I can switch easily to your scheme. Almost done with the PR.

@schveiguy
Copy link
Member Author

@MartinNowak having issues building stable branch of dmd:

CC=clang++ dmd -ofdmd -m64 -vtls -J. -L-lstdc++ -version=MARS -wi access.d aggregate.d aliasthis.d apply.d argtypes.d arrayop.d arraytypes.d attrib.d builtin.d canthrow.d clone.d complex.d cond.d constfold.d cppmangle.d ctfeexpr.d dcast.d dclass.d declaration.d delegatize.d denum.d dimport.d dinifile.d dinterpret.d dmacro.d dmangle.d dmodule.d doc.d dscope.d dstruct.d dsymbol.d dtemplate.d dversion.d entity.d errors.d escape.d expression.d func.d globals.d hdrgen.d id.d identifier.d impcnvtab.d imphint.d init.d inline.d intrange.d json.d lexer.d lib.d link.d mars.d mtype.d nogc.d nspace.d opover.d optimize.d parse.d sapply.d sideeffect.d statement.d staticassert.d target.d tokens.d traits.d utf.d visitor.d typinf.d utils.d statementsem.d safe.d objc.d libmach.d scanmach.d irstate.d toelfdebug.d toctype.d glue.d gluelayer.d todt.d tocsym.d toir.d dmsc.d backend/bcomplex.d backend/cc.d backend/cdef.d backend/cgcv.d backend/code.d backend/dt.d backend/el.d backend/global.d backend/obj.d backend/oper.d backend/outbuf.d backend/rtlsym.d backend/ty.d backend/type.d tk/dlist.d root/aav.d root/array.d root/ctfloat.d root/file.d root/filename.d root/man.d root/outbuffer.d root/port.d root/response.d root/rmem.d root/rootobject.d root/speller.d root/stringtable.d newdelete.o glue.a backend.a
Undefined symbols for architecture x86_64:
  "symboldata(unsigned long long, unsigned int)", referenced from:
      el_ptr(Symbol*) in backend.a(el.o)
      el_convstring(elem*) in backend.a(el.o)
      out_readonly_sym(unsigned int, void*, int) in backend.a(out.o)
      Obj::sym_cdata(unsigned int, char*, int) in backend.a(machobj.o)
  "_align(unsigned long long, unsigned long long)", referenced from:
      codgen() in backend.a(cgcod.o)
      stackoffsets(int) in backend.a(cgcod.o)
      outjmptab(block*) in backend.a(cod3.o)
      outswitab(block*) in backend.a(cod3.o)
      type_paramsize(TYPE*) in backend.a(type.o)
      alignOffset(int, unsigned long long) in backend.a(out.o)
      cdfunc(elem*, unsigned int*) in backend.a(cod1.o)
      ...
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
--- errorlevel 1
make[1]: *** [dmd] Error 1
make: *** [all] Error 2

Any ideas? I've tried several versions of dmd, including 2.069 and 2.071

@dnadlinger
Copy link
Contributor

@MartinNowak
Copy link
Member

MartinNowak commented Oct 4, 2016

I can switch easily to your scheme. Almost done with the PR.

Looks great, let's not start with camel cases for command lines and stick to -DRT-oncycle=.

Any ideas? I've tried several versions of dmd, including 2.069 and 2.071

You don't need to build dmd-stable yourself, do you?
curl -fsS https://dlang.org/install.sh | bash -s dmd-beta should get you a recent 2.072.0-b1 dmd binary against which you can test druntime. Once you sourced the activate env file, try to run make -f posix.mak unittest DMD=$DMD.

@schveiguy
Copy link
Member Author

@MartinNowak I created the PR: #1668. I didn't see your message about downloading dmd stable. I'll try that.

@MartinNowak
Copy link
Member

Unfortunately this change did broke more projects than initially tested, at least Higgs, vibe.d/libasync, and an intermediate state of ddmd. I'd expect that it hit a lot more projects not tested by us.
We had the idea of temporarily adding back the replaced implementation as a fallback, for when the new code reports an error, so we can properly deprecate this cycle detection change.
Flags were a nice supplemental addition (already requested a couple of times), but don't really replace a soft-fault deprecate-then-error transition.

@schveiguy
Copy link
Member Author

I may be able to "run" the old algorithm, or at least adapt the new algorithm to run like the old one did, if errors are seen. Alternatively, we can resurrect the entire original code, and just call it if the corrected algorithm fails.

It's an unfortunate perception that this is "breaking" projects -- those projects are already broken in terms of having a cycle. Whether the cycle is harmless or not, the truth is that the compiler doesn't give us enough information to determine whether it's truly broken or not.

@CyberShadow
Copy link
Member

It's an unfortunate perception that this is "breaking" projects -- those projects are already broken in terms of having a cycle.

No, that's a fallacy. You could say "broken" about a lot of things that are written in ways that under some circumstances can end up being buggy, but that's just not a useful definition. Relying on that makes for an unusable language because each version makes most existing code unusable as-is. If a piece of software has been tested, known to work correctly for all foreseeable inputs and circumstances, and has been successfully put in production, it is not broken, it's working as expected. A compiler update that causes correct software to stop working is broken.

I don't want to start a debate, but this is an argument that has been brought up a few times. D needs to be useful foremost, facilitating creation of correct programs is just one side of that.

@schveiguy
Copy link
Member Author

@CyberShadow There is no need to debate, we definitely can go through deprecation state.

D claims to handle the proper initialization of globals via dependency sorting at runtime. The sorting was broken. The answer to running the program should have been (and is now), "I don't know how to properly intialize things". Essentially, we can embrace undefined behavior or we can reject it. Which is more useful to a user? In my experience a broken compiler that spits out UB is far worse than one that doesn't compile (or in this case run) my broken code.

The most unfortunate thing in this whole mess is the fact that cycles are left for runtime with minimal information (the import dependencies) to go on. It's really a problem that should be solved by the compiler/linker.

@CyberShadow
Copy link
Member

Which is more useful to a user? In my experience a broken compiler that spits out UB is far worse than one that doesn't compile (or in this case run) my broken code.

Yes - but only for new programs. Making working software stop working is the problem. This is why we can't reinvent things every year and are likely stuck with the http://wiki.dlang.org/Language_issues list forever.

@schveiguy
Copy link
Member Author

We can agree to disagree. I personally don't want my programs blowing up at some future date due to a forseeable initialization ambiguity. Others might be OK with that I guess.

This is not a reinvention. I fixed this problem in 2010, and it resurfaced within a year. I should have put a test in for it at the time, we are much better about that now.

In any case, we can stop the pointless philosophical discussion and just put in the deprecation mechanism.

@CyberShadow
Copy link
Member

We can agree to disagree.

Just to clarify, this is not just about a debate of personal preferences. It became clear a long time ago that the point where we can break working code willy-nilly in D is long past. As such, this is not negotiable: we all must take code breakage seriously.

A method of notifying the user without actually breaking their code is the correct solution here. For this we usually have deprecation paths, or in this particular case, a message printed on startup is good too.

@schveiguy
Copy link
Member Author

Again, not willy-nilly. The compiler was generating code that creates undefined behavior. I think we can all agree that any fix to any broken code of this nature has the chance to "break" code that depended on the brokenness that's being fixed. It's a question of where we draw the line. This is why it's not a winnable argument either way. Either you have code that crashes the mars lander, or you have fixes that set the project back months. Which is worse depends on what your job is.

The issues surrounding this fix are very woolly, and we have a path forward, let's just implement it.

@ibuclaw
Copy link
Member

ibuclaw commented Dec 17, 2016

The most unfortunate thing in this whole mess is the fact that cycles are left for runtime with minimal information (the import dependencies) to go on. It's really a problem that should be solved by the compiler/linker.

@schveiguy - what information would be useful to you?

@schveiguy
Copy link
Member Author

Well, an actual dependency map for the ctors. Right now you have things that are listed as dependencies simply because they are imported somewhere in the module. But the compiler knows which variables depend on which other modules because it can see what functions are called in the ctor.

@schveiguy
Copy link
Member Author

With that, the cycle detection would be closer to reality. Most cycles are harmless.

@ibuclaw
Copy link
Member

ibuclaw commented Dec 18, 2016

Right now you have things that are listed as dependencies simply because they are imported somewhere in the module. But the compiler knows which variables depend on which other modules because it can see what functions are called in the ctor.

Well, it is just an array of imported modules, which may not be the best indicator of dependencies between them. I can't think of any technical blockers that would prevent adding new information about which imported modules are used in the construction/destruction phases, it could even be an int that just masks which indexes in the importes modules array are used by the ctor.

@MartinNowak
Copy link
Member

All the ideas and thoughts on how to improve this are fairly spread over multiple Bugzilla tickets, forum entries, and github comments. Can we please aggregate all of this in a single Bugzilla ticket (or didn't we already have one).
Even the Trello card isn't a good starting place (https://trello.com/c/V8WAxAlD/61-redesign-of-cycle-detection-issue-16211).

One related idea was that ctors only using private variables can be marked as MIstandalone, that would work for just initializing immutable variables.

MartinNowak added a commit to MartinNowak/druntime that referenced this pull request Dec 18, 2016
- only print a deprecation warning for the fixed cycle detection
  cases (cycles over modules without ctor/dtor)
@MartinNowak
Copy link
Member

See #1717

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Regression PRs that fix regressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants