Skip to content

Conversation

@StephanTLavavej
Copy link
Member

Our recursive tuple implementation is a challenge for compiler throughput. By profiling the compiler, @xiangfan-ms identified something that we were previously unaware of. Name lookup within enable_if_t constraints can be expensive in certain situations (enormous tuples, where the recursive inheritance and pack expansion lead to quadratic behavior within the compiler, combined with permissive mode). This affects the names of structs/classes, but doesn't affect alias/variable templates for a reason which was novel to me - struct/class names become injected-class-names, so the compiler remembers that they can be member names, but alias templates and variable templates aren't declared as member names. Within a large tuple with many bases, name lookup for is_meow has to consider whether it could be an injected-class-name (which will consider the many base classes in permissive mode), but name lookup for is_meow_v can be optimized because the compiler knows that it has never been seen as a member.

This is significant enough to motivate an exception to our usual convention, where we don't use _STD qualification for non-functions (because only functions are vulnerable to ADL). I've gone through <tuple> and looked at each of its enable_if_t constraints, and I've marked all of the mentioned structs/classes as _STD (including tuple itself when forming another specialization). I didn't bother modifying the !_HAS_CONDITIONAL_EXPLICIT codepaths (they are used for CUDA only, and hopefully not forever). Also, plain tuple still refers to the tuple's own injected-class-name; AFAIK this is not a bottleneck (because it should be found immediately, preventing any lookup into base classes in permissive mode).

@StephanTLavavej StephanTLavavej added the throughput Must compile faster label Nov 20, 2020
@StephanTLavavej StephanTLavavej requested a review from a team as a code owner November 20, 2020 05:40
@StephanTLavavej
Copy link
Member Author

@xiangfan-ms has profiled this PR and confirmed that it improves tuple throughput, especially when there are more than 50 or so template arguments. (100-ish arguments have been encountered in real world code.) Also, there are no observed throughput regressions even for smaller tuples (32 args was observed to be sped up from 299 ms to 290 ms).

Copy link
Contributor

@CaseyCarter CaseyCarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a request for more documentation.

@StephanTLavavej StephanTLavavej merged commit debeb6f into microsoft:master Dec 3, 2020
@StephanTLavavej StephanTLavavej deleted the tuple_throughput branch December 3, 2020 23:24
@StephanTLavavej
Copy link
Member Author

Thanks again for writing the explanation here, @CaseyCarter - it's the difference between a good codebase and an excellent codebase. 😻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

throughput Must compile faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants