Add _STD to improve tuple throughput #1490
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Our recursive
tupleimplementation is a challenge for compiler throughput. By profiling the compiler, @xiangfan-ms identified something that we were previously unaware of. Name lookup withinenable_if_tconstraints can be expensive in certain situations (enormous tuples, where the recursive inheritance and pack expansion lead to quadratic behavior within the compiler, combined with permissive mode). This affects the names of structs/classes, but doesn't affect alias/variable templates for a reason which was novel to me - struct/class names become injected-class-names, so the compiler remembers that they can be member names, but alias templates and variable templates aren't declared as member names. Within a large tuple with many bases, name lookup foris_meowhas to consider whether it could be an injected-class-name (which will consider the many base classes in permissive mode), but name lookup foris_meow_vcan be optimized because the compiler knows that it has never been seen as a member.This is significant enough to motivate an exception to our usual convention, where we don't use
_STDqualification for non-functions (because only functions are vulnerable to ADL). I've gone through<tuple>and looked at each of itsenable_if_tconstraints, and I've marked all of the mentioned structs/classes as_STD(includingtupleitself when forming another specialization). I didn't bother modifying the!_HAS_CONDITIONAL_EXPLICITcodepaths (they are used for CUDA only, and hopefully not forever). Also, plaintuplestill refers to the tuple's own injected-class-name; AFAIK this is not a bottleneck (because it should be found immediately, preventing any lookup into base classes in permissive mode).