Skip to content

Comments

[refactor] Move Type::ctype into an external AA#4569

Open
yebblies wants to merge 17 commits intodlang:masterfrom
yebblies:ctypeaa
Open

[refactor] Move Type::ctype into an external AA#4569
yebblies wants to merge 17 commits intodlang:masterfrom
yebblies:ctypeaa

Conversation

@yebblies
Copy link
Contributor

@yebblies yebblies commented Apr 7, 2015

The idea is that backend-specific passes shouldn't need to add methods to the frontend ast classes. Using a hash scales much better, if the performance is acceptable.

@yebblies
Copy link
Contributor Author

yebblies commented Apr 7, 2015

I can't see any performance difference on the autotester.

@MartinNowak
Copy link
Member

It's a logical consequence to do this, though I wonder if other alternatives have been evaluated.
In the long-run we'd need a better AA implementation for this.

One alternative idea would be to have a backend header, declaring a backend specific payload, that gets embedded in the frontend types. This could remain an opaque type for the frontend.

@yebblies
Copy link
Contributor Author

It's a logical consequence to do this, though I wonder if other alternatives have been evaluated.

Other alternatives?

In the long-run we'd need a better AA implementation for this.

I thought the compiler's AA was pretty fast.

One alternative idea would be to have a backend header, declaring a backend specific payload, that gets embedded in the frontend types. This could remain an opaque type for the frontend.

The ultimate goal is compiler-as-a-library, which cannot (reasonably) work like that. We need a way to add passes without any modification of the ast types.

@MartinNowak
Copy link
Member

I thought the compiler's AA was pretty fast.

The string table is fast, the aav is better than a few years ago, but not great.

The ultimate goal is compiler-as-a-library, which cannot (reasonably) work like that.

Injecting them via template argument?
I'm just worried about the overall performance. Double dispatch already comes at quite some price, if we now start to turn field accesses into hash lookups, we'll add another source of uniform slowdown.

How often is the ctype looked up, e.g. when compiling Phobos.

@yebblies
Copy link
Contributor Author

Injecting them via template argument?

I'd consider that unreasonable.

I'm just worried about the overall performance. Double dispatch already comes at quite some price, if we now start to turn field accesses into hash lookups, we'll add another source of uniform slowdown.

Yeah, that's true. Although it doesn't looks like it's big enough to worry about.

How often is the ctype looked up, e.g. when compiling Phobos.

Let's find out.

@MartinNowak
Copy link
Member

It would help, if you can provide those numbers and also explain what other fields you want to move out of the AST.

@ibuclaw
Copy link
Member

ibuclaw commented Apr 14, 2015

also explain what other fields you want to move out of the AST.

aggregate.h:

  • AggregateDeclaration
    • Symbol *stag
    • Symbol *sinit
  • ClassDeclaration
    • Symbol *vtblsym

declaration.h:

  • FuncDeclaration
    • Symbol *shidden

dsymbol.h:

  • Dsymbol
    • Symbol *csym
    • Symbol *isym

enum.h:

  • EnumDeclaration
    • Symbol *sinit

expression.h:

  • StructLiteralExp
    • Symbol *sinit
    • Symbol *sym

module.h:

  • Module
    • int doppelganger
    • Symbol *cov
    • unsigned *covb
    • Symbol *sictor
    • Symbol *sdtor
    • Symbol *ssharedctor
    • Symbol *sshareddtor
    • Symbol *stest
    • Symbol *sfilename
    • Symbol *massert
    • Symbol *munittest
    • Symbol *marray

mtype.h:

  • Type
    • type *ctype

statement.h:

  • LabelStatement
    • block *lblock
    • Blocks fwdrefs
  • DefaultStatement
    • block *cblock (gdc only)
  • CaseStatement
    • block *cblock

@yebblies - Did I miss any other transparent BE types?

@yebblies
Copy link
Contributor Author

@ibuclaw The list from backend.d is

struct Symbol;
struct TYPE;
alias type = TYPE;
struct code;
struct block;

Would be nice for that list to be empty.

It would help, if you can provide those numbers and also explain what other fields you want to move out of the AST.

@MartinNowak Building phobos on win32 results in ~10^6 ctype lookups. I want to move all backend-specific types out of the ast. I might look at doing the same with other passes (eg ctfe) sometime in the future.

@Safety0ff
Copy link
Contributor

Keeping backend fields in the AST also eagerly allocates memory for fields that are never used.
E.g. I moved csym and isym out of Dsymbol and it resulted in a ~1.8% overall memory reduction for building phobos with no performance impact.

src/toctype.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened here? Seems unrelated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor refactoring.
In the old version: If the unqualified version of this type already has a ctype, we copy it and add the qualifiers. Otherwise we create a new Classsym/ctype.

In the new version, we generate a new ctype if this is the unqualified version, otherwise recurse to get the unqualified ctype then copy it.

As far as I can tell, the old code would generate a symbol with the wrong qualifiers when called with a qualified before being called on the unqualified symbol, because both would be generated and set to the same value but the 'add modifiers' code would never be reached.

@WalterBright
Copy link
Member

The autotester isn't a great guide for performance testing, as it only deals with small programs.

A compiler slows down bit by bit, from barnacles accumulating on the hull one by one. Each one doesn't do much, but the accumulation does.

Perhaps use an opaque pointer instead?

@yebblies
Copy link
Contributor Author

The autotester isn't a great guide for performance testing, as it only deals with small programs.

The win32 phobos build is fairly big.

A compiler slows down bit by bit, from barnacles accumulating on the hull one by one. Each one doesn't do much, but the accumulation does.

I know, but sometimes it's worth it.

Perhaps use an opaque pointer instead?

A pointer to what, though? Different passes need to store different data, and having a pointer for each is exactly the problem I'm hoping to avoid.

There were similar concerns about the minor but systemic slowdowns from switching to the double-dispatch visitor implementation, but I think that has payed off in making the code more maintainable. My goal it to make it as easy as possible to maintain glue layers/backends, and that means removing all backend-specific code and data from the frontend ast.

@MartinNowak
Copy link
Member

There were similar concerns about the minor but systemic slowdowns from switching to the double-dispatch visitor implementation, but I think that has payed off in making the code more maintainable.

It was a requirement for ddmd, because D doesn't support implementing classes in multiple files.
The visitor might also have helped gdc/ldc, and we separated a few things in the compiler
better. That said, it still slows down the compiler and is slightly more difficult to maintain (try
to jump to CondExp::toIR or debug something).
Moving data to hashtables will also make things slightly slower and slightly harder to maintain.

A pointer to what, though? Different passes need to store different data, and having a pointer for each is exactly the problem I'm hoping to avoid.

  • You can statically declare "opaque" (as in ptr) backend types (in the glue layer) and embed them in the AST.
  • It's still possible to track additional data for optional passes in a hash table.
  • Using a hashtable by default is a pessimization for non-sparse data.

Can you please try to find out what LLVM et.al. do?

@ibuclaw
Copy link
Member

ibuclaw commented Apr 20, 2015

A pointer to what, though? Different passes need to store different data, and having a pointer for each is exactly the problem I'm hoping to avoid.

I think Walter meant void*? If so, still not pleasant.

@yebblies
Copy link
Contributor Author

It was a requirement for ddmd, because D doesn't support implementing classes in multiple files.

Yeah, sort of. There were other options but that was by far the best.

The visitor might also have helped gdc/ldc, and we separated a few things in the compiler
better. That said, it still slows down the compiler and is slightly more difficult to maintain (try
to jump to CondExp::toIR or debug something).

The fact that you no longer need to update headers when adding a pass more than makes up for any maintenance burden IMO.

Moving data to hashtables will also make things slightly slower and slightly harder to maintain.

There's no guarantee it makes it slower. When running e2ir, the ctype hash table is much more likely to be in cache than the Type classes themselves. (Not so useful in the current AA but I have a version with one less indirection that is much more cache-friendly.)
The pointer to the Type never needs to be dereferenced for the hash table access.

You can statically declare "opaque" (as in ptr) backend types (in the glue layer) and embed them in the AST.

ie the current system. Requires knowing which passes need cached data at compile time, potential wasted memory if they're not needed. Some backends require multiple values per class, and this either means lots of ugly or an extra indirection.

It's still possible to track additional data for optional passes in a hash table.

It's not really about optional passes, it's about not polluting the frontend code with backend-specific data members.

Using a hashtable by default is a pessimization for non-sparse data.

Not always. More cache friendly, less space allocated when not all classes are codegen'd, etc

Can you please try to find out what LLVM et.al. do?

I tried, didn't get very far.

Is this just the usual knee-jerk reaction to anything that might hurt performance? Are there any concrete goals that would make this acceptable? I honestly would accept a fairly big performance hit if it got us closer to having a 100% unified frontend.

@yebblies
Copy link
Contributor Author

@ibuclaw Would you accept using this kind of approach in GDC? @klickverbot @redstar Would you use this in LDC? I assume you guys have access to fast hash tables from your backends' support libraries.

@ibuclaw
Copy link
Member

ibuclaw commented Apr 20, 2015

@yebblies - I'm considering a future of GDC without any IRState, Symbol, or dt_t baggage from DMD. So far the design is looking much cleaner, but then again it is not complete either. ;-)

@ibuclaw
Copy link
Member

ibuclaw commented Apr 20, 2015

I assume you guys have access to fast hash tables from your backends' support libraries.

Yes.

@MartinNowak
Copy link
Member

less space allocated when not all classes are codegen'd

That's why I said non-sparse data ;), might indeed make sense for rare backend data.

Is this just the usual knee-jerk reaction to anything that might hurt performance? Are there any concrete goals that would make this acceptable? I honestly would accept a fairly big performance hit if it got us closer to having a 100% unified frontend.

The last release came with a 10-25% slowdown (Issue 14431).

When running e2ir, the ctype hash table is much more likely to be in cache than the Type classes themselves.

That's an interesting argument, but it doesn't quite work out for ctype. The slowdown is neglectable but measureable.
We cannot decide this on the basis of such a tiny change. How about we convert most if not all backend types and maybe tweak the AA first to have a meaningful comparison.
I'm generally in favor of this change.

@ibuclaw
Copy link
Member

ibuclaw commented Apr 20, 2015

How about we convert most if not all backend types and maybe tweak the AA first to have a meaningful comparison.

+1 - I agree, let's convert all and benchmark.

@yebblies
Copy link
Contributor Author

We cannot decide this on the basis of such a tiny change. How about we convert most if not all backend types and maybe tweak the AA first to have a meaningful comparison.
I'm generally in favor of this change.

Thanks, I can work with that.

@redstar
Copy link
Contributor

redstar commented Apr 20, 2015

Yes, let's try it.

@redstar
Copy link
Contributor

redstar commented Apr 21, 2015

Considering the performance impact: Why not using a factory class for AST nodes? In this case each backend could provide decorated AST nodes. The factory itself could be made configurable to support a library solution. The trade-off would be a cast to the new AST types if access to new members is required.

@yebblies
Copy link
Contributor Author

Considering the performance impact: Why not using a factory class for AST nodes? In this case each backend could provide decorated AST nodes. The factory itself could be made configurable to support a library solution. The trade-off would be a cast to the new AST types if access to new members is required.

The big downside of that is that it is a very invasive change. It also only supports a single backend at a time.

@ibuclaw
Copy link
Member

ibuclaw commented Apr 21, 2015

It also only supports a single backend at a time.

Why would we want to be interchanging backends during the same compilation process? Correct me if I misread this. :-)

@yebblies
Copy link
Contributor Author

Why would we want to be interchanging backends during the same compilation process? Correct me if I misread this. :-)

I mean that CTFE is sort of another backend, etc.

@ibuclaw
Copy link
Member

ibuclaw commented Jul 25, 2015

Any update on this? I'd like to push for removing struct block from arraytypes.h

@yebblies
Copy link
Contributor Author

I haven't touched it since dconf. I had a big argument with Walter about this, and he basically said that the only way he would accept this is if I can show that with all of the similar changes implemented dmd's performance improves in some measurable way. i.e. showing that the performance difference is negligible is not enough. Also that we should beware of crustaceans and their ill effects on the movement speed of ships.

I did some work on toSymbol but didn't get it to stop segfaulting, and haven't gone near it since dconf.

@ibuclaw
Copy link
Member

ibuclaw commented Jul 26, 2015

If speed is a problem, maybe convert the AA hash implementation into a template? You could also allow overriding how keys are hashed to be naive for speed (eg: integer and pointer keys don't need hashing).

@ibuclaw
Copy link
Member

ibuclaw commented May 4, 2016

@MartinNowak @yebblies ping.

So, lets get benchmarks on this, and decide if it's the best approach? The only other approach I can think of is to have a synthetic struct pointer, which each visitor in the glue defines locally.

E.g:
toctype.c

struct X {
  type *ctype;
}

tocsym.c

struct X {
  Symbol* stag;
  Symbol* sinit;
}

And so on.

However I think this is the most agreeable suggestion so far.

@andralex
Copy link
Member

There's been controversy on this, and @WalterBright all but vetoed this approach. However the context has changed since, what with the more widespread use of visitors by @RazvanN7 and others. Thoughts on reviving or reframing this? I'm thinking an opaque pointer is a simple technique that other compilers use as well.

@ibuclaw
Copy link
Member

ibuclaw commented Nov 20, 2017

It's still useful on my side to have all dmd-specific fields removed.

@ibuclaw
Copy link
Member

ibuclaw commented May 1, 2022

It also saves memory on frontend AST nodes (#13808).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants