Skip to content

Move lib and scan modules into a package#12766

Merged
Geod24 merged 1 commit intodlang:masterfrom
thewilsonator:libpackage
Jul 7, 2021
Merged

Move lib and scan modules into a package#12766
Geod24 merged 1 commit intodlang:masterfrom
thewilsonator:libpackage

Conversation

@thewilsonator
Copy link
Contributor

@thewilsonator thewilsonator commented Jun 27, 2021

The only symbols declared in these modules used outside them are declared in src/lib.d.

This commit changes:

  • lib{elf,mach,mscoff,omf}.d => lib/{elf,mach,mscoff,omf}.d: Those files are only imported by a single module, dmd.lib;
  • lib.d => lib/package.d: The single module importing lib*.d, which provides the single entry point for this package;
    scan*.d => lib/scan*.d: Those files are never imported by modules except for the corresponding lib/{elf,mach,mscoff,omf}.d;

This increases encapsulation by:

  • reducing the visibility of symbols in scan*.d to dmd.lib, so the context one needs looking at the module in isolation is reduced;
  • it reduces the visibility of symbols in {elf,mach,...}.d;
  • It signals to the reader that package.d is the entry point, based on a language feature, not coding convention

I hope I have all the doc links correct.

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @thewilsonator! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#12766"

@WalterBright
Copy link
Member

We've extensively discussed this. Not doing packages.

@thewilsonator
Copy link
Contributor Author

Nope

Copy link
Contributor

@wilzbach wilzbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's zero harm in them and it drastically helps newcomers to get around in the code base.

@12345swordy
Copy link
Contributor

We've extensively discussed this. Not doing packages.

Walter, as a newcomer to the dmd codebase, I legitimately don't understand why you are so adamant of being against organizing the files in a way that promotes encapsulation, by using packages here. I have talk about it in discord where the expresstionsem file seemed to be importing modules from everywhere with no regards to being organized.

@WalterBright
Copy link
Member

We've been over that many times, including yet another discussion about it recently in the n.g.

Copy link
Member

@WalterBright WalterBright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not approved. See many discussions about this, in the n.g. and at DConf.

@Geod24
Copy link
Member

Geod24 commented Jun 27, 2021

We've been over that many times, including yet another discussion about it recently in the n.g.

Well you have been known to change your mind from time to time, I'm sure @thewilsonator had hope this was one of those cases :)

The discussion on the n.g. boiled down to encapsulation. My takeaway from it - correct me if I'm wrong - was that packaging should be done to increase encapsulation, and that spilling modules into package that imports each other is a waste of time.

But what @thewilsonator chose as a starting point is a textbook example of packages.
The file changes are:

  • lib{elf,mach,mscoff,omf}.d => lib/{elf,mach,mscoff,omf}.d: Those files are only imported by a single module, dmd.lib;
  • lib.d => lib/package.d => The single module importing lib*.d, which provides the single entry point for this package;
  • scan*.d => Those files are never imported by modules except for the corresponding lib{elf,mach,mscoff,omf}.d;

In that regard, making lib a package increases encapsulation, because:

  • It reduces the visibility of symbols in scan*.d to dmd.lib, so one looking at the module in isolation needs a reduced context;
  • Likewise, it reduces the visibility of symbols in {elf,mach,...}.d;
  • It signal to the reader that package.d is the entry point, based on a language feature, not coding convention;

I'm still very doubtful of some of the packages structures that were floated around in the newsgroup, but this is an obvious case where packages can be used.

@thewilsonator
Copy link
Contributor Author

I'm sure @thewilsonator had hope this was one of those cases :)

On the contrary #12744 (comment) the number of files in src/dmd is getting even sillier.

The discussion on the n.g. boiled down to encapsulation. My takeaway from it - correct me if I'm wrong - was that packaging should be done to increase encapsulation, and that spilling modules into package that imports each other is a waste of time.

Indeed, there are only two sets of modules that are close to being as encapsulated as they can possibly be and fit for becoming packages at the moment:

  • lib*/scan* which depend on dmd.root, dmd.utils (for path stuff) and dmd.{global, errors} for error reporting
  • the various visitor modules, which depend on the AST only

@PetarKirov
Copy link
Member

@thewilsonator can you incorporate @Geod24's explanation of the changes in the commit message and PR description? Otherwise, obviously LGTM.

@WalterBright
Copy link
Member

Absolutely no code in D (that uses packages) respects it.

That's mostly right. I've attempted to prevent a lot of it, such as repeated attempts to have the backend code import from the front end, and root importing from the dmd level.

dmd level modules each import every other dmd module (more or less). There is no hierarchy. At one time cparse.d imported target.d, which transitively imports everything else in dmd. @ibuclaw and I worked together to successfully remove that dependency. We could go further and eliminate nearly all dependencies from target.d. There's no reason target.d should be importing statement.d.

That sort of work will be worth while. Not having a fake package hierarchy that the code implementation does not remotely respect.

For people who want an overview of what the files are, see https://github.com/dlang/dmd/blob/master/src/dmd/README.md which does a more readable and thorough job than any misleading, fake package hierarchy. It could be improved with links to the generated Ddoc documentation files for each module. (Which is why I harangue PR submitters to add proper Ddoc function comments.)

I've said all this before, repeatedly.

BTW, something that would make dmd much easier to work on would be:

  1. removing the obvious cut/paste code that was put into astbase.d. I did some of it already with astenums.d, which saved us from hundreds of lines of stinky copy/pasta code.
  2. auto-generate the .h files, like symbol.h, instead of doing them manually. Inconsistencies have been causing bugs constantly for 10-15 years now. @ibuclaw just submitted another fix for one in scope.h. This isn't just a code smell, it's a code sewer.

@Geod24
Copy link
Member

Geod24 commented Jun 29, 2021

I've attempted to prevent a lot of it, such as repeated attempts to have the backend code import from the front end, and root importing from the dmd level.

Something we can all agree on. I can't recall a seasoned DMD contributors arguing against this rule. I do recall having broken this rule myself, by mistake. This was corrected by @ibuclaw in #10720 and as you can see, there was no disagreement that it was the correct course of action.

But that does not in any way justify "Never import a file from an uplevel directory". It doesn't refute any of the arguments that were made, by me or other contributors.

dmd level modules each import every other dmd module (more or less). There is no hierarchy. [...]

There is no apparent hierarchy to the modules. But there is an inherent hierarchy to DMD's code, otherwise it would be an unworkable mess. There are obvious separation between different components of DMD: the driver (mars.d & affiliates), frontend, IR, backend. There is also an inherent separation within those components: the AST class hierarchy is the most obvious example, but we also have a hierarchy of visitors.

What us contributors are asking for is to be able to express this separation using D's native unit of encapsulation, a.k.a. the module. (While I'm using the term "module" here, it encompass packages as well - Modules can only scale when used with packages).

Not having a fake package hierarchy that the code implementation does not remotely respect.

What is fake about dmd.lib ? I gave a thorough explanation about why it can be a package here.

For people who want an overview of what the files are [..]

READMEs and conventions are no substitutes for using languages features for their intended use.
To make an analogy, why do we need the deprecated attribute if we can just mark functions as deprecated in the documentation ? And even if we agree that deprecated is needed, why do we need to add a message to it, users can just look up the documentation when they see the message ?
Documentation is a supplement, not a replacement.

removing the obvious cut/paste code that was put into astbase.d. I did some of it already with astenums.d, which saved us from hundreds of lines of stinky copy/pasta code.

That's good. And can be pursued in parallel of packaging. The two do not conflict, nor is one a substitute for the other.

auto-generate the .h files, like symbol.h, instead of doing them manually.

This has been worked on by contributors for years now, under the dtoh name.
This led to the dreaded frontend.hwhich you wanted to remove. However we can't fully switch to using it until it is completely stable and usable, and auto-generating .h files that can accurately represent all the constructs that are used in a codebase as large and diverse as DMD is no small task.
For ref, just over the past 2 weeks, there have been 9 PRs raised or merged in support of this.

@WalterBright
Copy link
Member

This led to the dreaded frontend.h which you wanted to remove.

This is a misunderstanding. I wanted to remove it from the git source code base, and have it be auto-generated as part of the build process instead.

dtoh

I'm very glad to see progress on this front. All the .h files need to be removed and auto-generated. It's the flip side of ImportC, which will hopefully eliminate most of the .h => .d conversion nuisance and consequent errors. We can't get to 100% with ImportC due to endless C extensions and things in C that are not representable in D. But we should be able to do 100% for .h file emission for extern(C) and extern(C++) functions.

@ibuclaw
Copy link
Member

ibuclaw commented Jun 30, 2021

This led to the dreaded frontend.h which you wanted to remove.

This is a misunderstanding. I wanted to remove it from the git source code base, and have it be auto-generated as part of the build process instead.

dtoh

I'm very glad to see progress on this front. All the .h files need to be removed and auto-generated.

I still have occasional doubts about whether integrating it into the compiler was the right choice. I originally had it as a dedicated external tool aimed specifically at translating dmd codebase by having inside knowledge about what equivalent special types are used internally, etc. By the time I stopped working on it, it was able to produce something that was 100% valid and compiled with the test C++ front-end.

Whilst generalising it has no doubt made it better for all, it has regressed somewhat on its ability to translate dmd. I do fear that the choice to move it inside the compiler is holding it back though. For instance, why translate a non-trivially templated type (and all its forward referenced dependencies) when a void* will do just fine for its intended purpose? If only we were able to drop in hints about what to generate via a crafted code comment format (/// dtoh: void*) or maybe provide a sort of map file to tell it what to do when it encounters something like Identifier[Array!const(char)[]]*.

Alas, I digress. :-)

@MoonlightSentinel
Copy link
Contributor

MoonlightSentinel commented Jun 30, 2021

For instance, why translate a non-trivially templated type (and all its forward referenced dependencies) when a void* will do just fine for its intended purpose?

dtoh already uses forward declarations whenever possible to skip/defer the actual declaration. It will only emit TemplateDeclaration's if either an instance is used by value (e.g. void foo(Array!int) ) or the template is explicitly marked as extern(C++). The latter could even be ignored because dtoh can resolve the dependendency on first use.

Identifier[Array!const(char)[]]*

Emitted as void** because it's a pointer to an AA.

Alas, I digress. :-)

Indeed. The whole discussion about astbase.d and header files is moot because they imply some maintenance that is either automated or done by copy + pasting some declarations. This can be a nuisance but implies that you already made significant changes (worthy of a PR) and hence already did >95% of the work.

This PR improves the structure of the code base to make the 95% easier for contributors who have not yet been working on DMD for decades.

Copy link
Contributor

@MoonlightSentinel MoonlightSentinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rationale outlined in #12766 (comment)

@thewilsonator
Copy link
Contributor Author

@thewilsonator can you incorporate @Geod24's explanation of the changes in the commit message and PR description? Otherwise, obviously LGTM.

Done

@PetarKirov
Copy link
Member

@WalterBright I think it's about time we embrace the D way of doing things in dmd :P

We wouldn't be in the worst company:

Clang C++ compiler: https://github.com/llvm/llvm-project/tree/llvmorg-12.0.0/clang/lib
C# compiler: https://github.com/dotnet/roslyn/tree/main/src/Compilers/CSharp/Portable
Dart compiler: https://github.com/dart-lang/sdk/tree/2.13.0/pkg
Rust compiler: https://rustc-dev-guide.rust-lang.org/compiler-src.html#compiler https://github.com/rust-lang/rust/tree/master/compiler
Swift compiler: https://github.com/apple/swift/tree/main/lib

@WalterBright
Copy link
Member

But that does not in any way justify "Never import a file from an uplevel directory".

Good question. What dmd has is the package hierarchy literally upside down. May I suggest the chapter "The Dependency-Inversion Principle" in Robert Martin's book "Agile Software Development."

https://www.amazon.com/Software-Development-Principles-Patterns-Practices/dp/0135974445

This PR improves the structure of the code base to make the 95% easier for contributors who have not yet been working on DMD for decades.

https://github.com/dlang/dmd/blob/master/src/dmd/README.md

Copy link
Member

@Geod24 Geod24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/**
* A module defining an abstract library.
* Implementations for various formats are in separate `libXXX.d` modules.
* Implementations for various formats are in `src/dmd/lib/` modules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well technically that's not quite accurate:

  • There is a 1-to-1 mapping between modules and files, but they are not the same concept. When referring to modules, we use the name in the ModuleDeclaration, e.g. dmd.lib.elf;
  • dmd.lib still contains scan* which are not implementation, more like helpers;

But since it's consistent with the previous wording, we can fix it later. I think we have an opportunity to remove the function-backed factory now that we are going with proper encapsulation.

@Geod24 Geod24 added the Merge:72h no objection -> merge The PR will be merged if there are no objections raised. label Jul 2, 2021
Comment on lines +186 to +194
| [lib/package.d](https://github.com/dlang/dmd/blob/master/src/dmd/lib/package.d) | Abstract library class |
| [lib/elf.d](https://github.com/dlang/dmd/blob/master/src/dmd/lib/elf.d) | Library in ELF format (Unix) |
| [lib/mach.d](https://github.com/dlang/dmd/blob/master/src/dmd/lib/mach.d) | Library in Mach-O format (macOS) |
| [lib/mscoff.d](https://github.com/dlang/dmd/blob/master/src/dmd/lib/mscoff.d) | Library in COFF format (32/64-bit Windows) |
| [lib/omf.d](https://github.com/dlang/dmd/blob/master/src/dmd/lib/omf.d) | Library in OMF format (legacy 32-bit Windows) |
| [lib/scanelf.d](https://github.com/dlang/dmd/blob/master/src/dmd/lib/scanelf.d) | Extract symbol names from a library in ELF format |
| [lib/scanmach.d](https://github.com/dlang/dmd/blob/master/src/dmd/lib/scanmach.d) | Extract symbol names from a library in Mach-O format |
| [lib/scanmscoff.d](https://github.com/dlang/dmd/blob/master/src/dmd/lib/scanmscoff.d) | Extract symbol names from a library in COFF format |
| [lib/scanomf.d](https://github.com/dlang/dmd/blob/master/src/dmd/lib/scanomf.d) | Extract symbol names from a library in OMF format |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove this entire table and simply add the 'lib' folder to 'Directory structure' at the start of the README.

The only symbols declared in these modules used outside them are declared in `src/lib.d`.

This commit changes:
* `lib{elf,mach,mscoff,omf}.d` => `lib/{elf,mach,mscoff,omf}.d`: Those files are only imported by a single module, dmd.lib;
* `lib.d` => `lib/package.d`: The single module importing `lib*.d`, which provides the single entry point for this package;
`scan*.d` => `lib/scan*.d`: Those files are never imported by modules except for the corresponding `lib/{elf,mach,mscoff,omf}.d`;

This increases encapsulation by:
* reducing the visibility of symbols in `scan*.d` to `dmd.lib`, so the context one needs looking at the module in isolation is reduced;
* it reduces the visibility of symbols in `{elf,mach,...}.d`;
* It signals to the reader that `package.d` is the entry point, based on a language feature, not coding convention;
@RazvanN7
Copy link
Contributor

RazvanN7 commented Jul 6, 2021

@WalterBright Given that this PR has received 6 approvals from community members, are you inclined on accepting this?

@Geod24 Geod24 dismissed WalterBright’s stale review July 7, 2021 02:12

There is a clear consensus that this is what contributors want.

@Geod24 Geod24 merged commit 69a3cce into dlang:master Jul 7, 2021
@thewilsonator thewilsonator deleted the libpackage branch July 7, 2021 04:05
@nordlow
Copy link
Contributor

nordlow commented Jul 7, 2021

@WalterBright what are the pros of not encapsulating these files into packages? Are you worried it's gonna hamper yours and others productivity?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merge:72h no objection -> merge The PR will be merged if there are no objections raised.

Projects

None yet

Development

Successfully merging this pull request may close these issues.