Skip to content

Conversation

@kunalspathak
Copy link
Contributor

@kunalspathak kunalspathak commented Jul 22, 2022

Create a hash table to store the mapping of AssertionDsc to AssertionIndex.

Design:

  1. Since vnBased is passed around in Equals() method of AssertionDsc, it was not possible to use it in HashTable's KeyFuncs type. I tried various approaches like createing template, function pointers, inheritance, etc. but none of them solved the problem. In the end, I have just included the bool vnBased in AssertionDsc. Although this field value is redundant since this is already stored on Compiler object and it does increase the size of AssertionDsc from 48 bytes to 56 bytes. However, most popular maximum size of AssertionDsc is 64, we end up taking 512 bytes extra during jitting per method. I also tried to have a static field over AssertionDscKeyFuncs but there was a worry that if multiple methods are being compiled at the same time, the value of vnBased might get outdated for one of the thread.
  2. The hashMap will be used only for checking the already added assertions. There is a scope for having separate hashmaps for other usage sites where we would try finding the assertion, but on the instrumentation I did I didn't see them iterating too many times.
  3. Hash code description: TODO
  4. Debug check against existing linear approach.

Fixes: #10592

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 22, 2022
@ghost ghost assigned kunalspathak Jul 22, 2022
@ghost
Copy link

ghost commented Jul 22, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Create a hash table to store the mapping of AssertionDsc to AssertionIndex.

Fixes: #10592

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak kunalspathak changed the title Assertion prop Hash the AssertionDsc struct for faster lookup Jul 23, 2022
@AndyAyersMS
Copy link
Member

Some thoughts:

  • I wonder if you should revisit the analysis I did in JIT: look at cost impact of assertion dup detection #10592 and see if it still seems to ring true.
  • Might be interesting to extract MC's with high assertion counts and just look at the TP impact on those. There's no easy way to do this sort of extraction currently.
  • Did you verify the hash functions gave decent hash distributions?
  • There are other places where we search the assertion table where it would be nice to do this same sort of fast lookup

@kunalspathak
Copy link
Contributor Author

kunalspathak commented Jul 27, 2022

Some thoughts:

I did some instrumentation which is not exactly similar to what you had.

Here is how I instrumented it. I have added instrumentation for the 5 types of lookups we do today: AddAssertion, SubType, SubRange, EqualOrNotEqual and NoNull. For each of the 5 categories, I added two counters, the one that measures the number of times we found a match (counter name ending with Count) and number of iterations we did to find that match (counter name ending with Iter). From those 2 counters, if I do Iter / Count, I get average iterations we did before we found the match. Correct me if that is not accurate metrics to measure.

With that instrumentation, I ran superpmi benchmarks, libraries.pmi and asp.net.

benchmarks

Type Count Iter Avg. (iteration / match)
AddAssertion 304815 3238908 10.62581566
SubType 1226 5262 4.292006525
SubRange 66 671 10.16666667
EqualOrNotEqual 2718 4476 1.646799117
NoNull 5957 8479 1.423367467

And here is graph that maps number of methods (Y-axis) to the number of iterations / match.

image

libraries.pmi

Type Count Iter Avg. (iteration / match)
AddAssertion 2170199 14038080 6.46856809
SubType 5503 16903 3.071597311
SubRange 1182 21236 17.96615905
EqualOrNotEqual 26052 48974 1.879855673
NoNull 52621 83920 1.594800555

image

asp

Type Count Iter Ratio
AddAssertion 447038 4735534 10.59
SubType 2848 7971 2.80
SubRange 23 577 25.09
EqualOrNotEqual 2108 3578 1.70
NoNull 6893 9866 1.43

Essentially, if I am interpreting the data correctly, there is not much benefit in hashing other categories for faster lookup other than AddAssertion.

I have not instrumented the places where we iterate through all the assertions and try to find the best matching assertion (e.g. optAssertionIsNonNullInternal, optGlobalAssertionIsEqualOrNotEqualZero, etc.) because there are more detailed filtering that happens which I cannot generalize it inside hash function. I might have to create a separate hash table(s) if I have to do the lookups for each of this category as well as others like SubType and SubRange. We can certainly brainstorm to see how we can come up with such a hash function and that would unlock us from removing the assertion count limit IMO.

  • Might be interesting to extract MC's with high assertion counts and just look at the TP impact on those.

There is some noise, but overall I see improvements. Below are the results of running superpmi 5 times on handful of method contexts that has high assertion count/iteration (last 2 columns).

BEFORE AFTER Diff MethodID AddAssertionCount AddAssertionIter
0.7833 0.7959 1.61% 1288 1005 23305
0.8283 0.8188 -1.15% 1288 1005 23305
0.8287 0.8628 4.11% 1288 1005 23305
0.8139 0.8304 2.03% 1288 1005 23305
0.891 0.8553 -4.01% 1288 1005 23305
0.8704 0.8336 -4.23% 6792 954 30966
1.0456 0.848 -18.90% 6792 954 30966
0.7753 0.7897 1.86% 6792 954 30966
0.7818 0.8297 6.13% 6792 954 30966
0.7933 0.8978 13.17% 6792 954 30966
0.8098 0.7786 -3.85% 7322 1619 163731
0.8415 0.7977 -5.20% 7322 1619 163731
0.7746 0.7546 -2.58% 7322 1619 163731
0.8081 0.7753 -4.06% 7322 1619 163731
0.7757 0.7963 2.66% 7322 1619 163731
0.8003 0.7895 -1.35% 10982 588 20762
0.7804 0.7331 -6.06% 10982 588 20762
0.8053 0.9039 12.24% 10982 588 20762
0.7354 0.7928 7.81% 10982 588 20762
0.8371 0.8674 3.62% 10982 588 20762
0.8473 1.2294 45.10% 14206 669 19263
0.8092 0.8866 9.57% 14206 669 19263
0.8201 0.8509 3.76% 14206 669 19263
0.8275 0.7842 -5.23% 14206 669 19263
0.8763 0.777 -11.33% 14206 669 19263
0.7974 0.7645 -4.13% 16249 1142 94749
0.7687 0.8726 13.52% 16249 1142 94749
0.8487 0.8185 -3.56% 16249 1142 94749
0.8022 0.8126 1.30% 16249 1142 94749
0.8002 0.7818 -2.30% 16249 1142 94749

There's no easy way to do this sort of extraction currently.

If you think this instrumentation will be helpful in future, I can send a PR for the existing instrumentation I have done in kunalspathak@b549c14.

  • Did you verify the hash functions gave decent hash distributions?

I do see 13 entries sometimes...let me try to narrow down.

  • There are other places where we search the assertion table where it would be nice to do this same sort of fast lookup

I think AddAssertion is good enough. We should discuss how we can fit lookups while doing assertion prop in it.

@kunalspathak
Copy link
Contributor Author

I do see 13 entries sometimes...let me try to narrow down.

In benchmark superpmi, I see 3-4 methods that has 13 entries, but number of iterations we did for the match was 18497 and found 122 matches, so roughly 150ish iterations to find a match. So, I guess something < 15 still looks good.

@kunalspathak
Copy link
Contributor Author

kunalspathak commented Jul 28, 2022

because there are more detailed filtering that happens which I cannot generalize it inside hash function.

In addition to that, we are essentially scanning through bunch of AssertionDsc to find the best match and not trying to extract a particular entry which makes me feel that probably, we can have some sort of 1-to-many data structure (again a hashmap of key to list of values), where we can just iterate through e.g. the ones whose assertionKind == OAK_EQUAL.

Here is the data for benchmarks collection and there are average 4 iterations to find the match, so not sure it will be worth doing. On the contrary, if we increase the assertion limit, we might see these avg. iterations count / match increase.

Type Count Iter Ratio
PropLclVar 1277662 5621756 4.40
PropEqualOrNot 2102 7371 3.51
PropEqualZero 109 2028 18.61
PropNonNull 133846 338982 2.53
PropBndChk 5 18 3.60

Here are the numbers for libraries.pmi:

Type Count Iter Ratio
PropLclVar 2573831 10841450 4.21
PropEqualOrNot 18970 69778 3.68
PropEqualZero 411 8143 19.81
PropNonNull 804509 2387847 2.97
PropBndChk 5 39 7.80

@AndyAyersMS
Copy link
Member

Correct me if that is not accurate metrics to measure

I'm not sure I fully understand what you are measuring. Is this right?

Seems like good metrics would be

  • number of times we look for an assertion (broken out say by the 5 categories of lookup)
  • number of times we match (which you have: count)
  • number of assertions scanned when we match (which you have: iter)
  • number of times we fail to match
  • number of assertions scanned when we fail to match

The interesting cases to me are the non-AddAssertion cases when we don't find a match -- in those cases we are doing work for nothing and would end up doing even more work for nothing if we made the table bigger.

@kunalspathak
Copy link
Contributor Author

Correct me if that is not accurate metrics to measure

I'm not sure I fully understand what you are measuring. Is this right?

Seems like good metrics would be

  • number of times we look for an assertion (broken out say by the 5 categories of lookup)
  • number of times we match (which you have: count)

That's right.

  • number of assertions scanned when we match (which you have: iter)

That's right.

  • number of times we fail to match
  • number of assertions scanned when we fail to match

Yes, I realized that I should include these two and will do it today.

The interesting cases to me are the non-AddAssertion cases when we don't find a match -- in those cases we are doing work for nothing and would end up doing even more work for nothing if we made the table bigger.

Agree.

@kunalspathak
Copy link
Contributor Author

Here is the aggregated information for benchmarks collection collected on windows/x64 using kunalspathak@f98c74b and sorted by the call count.

  • Category: Call sites at which measurements were collected.
  • Call count: Number of times method was called.
  • Calls / method: There are 37835 methods in the collection. This column represents average calls per method.
  • Match count: Number of times we found a match.
  • Matched iterations: Number of iterations performed before we found a match.
  • Iteration / match: Average number of iterations took to find the matching assertion.
  • Missed count: Number of times we did not find a match.
  • Missed iterations: Number of iterations conducted before we realized there is no match.
  • Iterations / missed: Average number of iterations took before we realized there is no match.
Category Call count Calls / method Match count Matched iterations Iteration / match Missed count Missed iterations Iteration / missed
optAssertionProp_LclVar 2059107 54.42 114601 473927 4.14 1944506 8676449 4.46
AddAssertion 1218385 32.20 304815 3238908 10.63 913570 7026107 7.69
optGlobalAssertionIsEqualOrNotEqualZero 220768 5.84 109 2028 18.61 220659 1061317 4.81
optAssertionIsNonNullInternal 199981 5.29 139803 347461 2.49 60178 303178 5.04
SubType 96002 2.54 1226 5262 4.29 94776 424824 4.48
optLocalAssertionIsEqualOrNotEqual 68958 1.82 2718 4476 1.65 66240 149151 2.25
optGlobalAssertionIsEqualOrNotEqual 47196 1.25 2102 7371 3.51 45094 233733 5.18
optAssertionProp_BndsChk 34049 0.90 5 18 3.60 24860 199594 8.03
SubRange 5571 0.15 66 671 10.17 5505 26797 4.87

Observations:

  • AddAssertion seems the 2nd hottest method that performs look-up and having hash table should improve the performance.
  • We should optimize the lookups made in optAssertionProp_LclVar by probably adding another hash table for assertions that has (assertionKind == OAK_EQUAL) && (op1.kind == O1K_LCLVAR).
  • For others, given the calls / method is low and the query criteria is different for each of them, we should better not touch them. I am thinking of doing smaller optimizations for them though like have a flag for each of those and if we never created those assertions, then short-circuit the method.
  • Since I have put in lot of work to add instrumentation, I will probably have a separate PR to get that in.

@kunalspathak
Copy link
Contributor Author

I am inclined to merge this as is and I will have follow-up PRs for:

  • Hash table to improve optAssertionProp_LclVar
  • Instrumentation for assertion stats
  • Micro-optimizations for other assertion lookups.

@kunalspathak kunalspathak marked this pull request as ready for review July 28, 2022 23:43
@kunalspathak
Copy link
Contributor Author

@dotnet/jit-contrib

@kunalspathak
Copy link
Contributor Author

spmi asmdiffs and replay errors are related to re-publishing the log files.

@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone Aug 1, 2022
@kunalspathak
Copy link
Contributor Author

@dotnet/jit-contrib

Comment on lines +2027 to +2028
if (optAssertionDscMap->Set(*newAssertion, optAssertionCount + 1, AssertionDscMap::SetKind::SkipIfExist,
&fastAnswer))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not obvious - why do we need SetKind::SkipIfExist if we have LookupPointer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you will call the GetHashCode() twice, first through LookupPointer and then during Set. With SkipIfExist, you just call it once.

@AndyAyersMS
Copy link
Member

SPMI doesn't see this as a TP win. Are there reasons to believe this is not accurate?

@kunalspathak
Copy link
Contributor Author

SPMI doesn't see this as a TP win. Are there reasons to believe this is not accurate?

Not sure what the noise range is, but I agree that for this PR, we should expect to at least show some TP improvements. I will think about it.

@SingleAccretion
Copy link
Contributor

Not sure what the noise range is

The instrumentation is very precise.

@kunalspathak
Copy link
Contributor Author

Not sure what the noise range is

The instrumentation is very precise.

Thanks for reminding that. In that case, it could be that the hash function itself is expensive or time is consumed in traversing the chain.

@SingleAccretion
Copy link
Contributor

FWIW, I would also expect some slowdowns from the introduced copying of AssertionDscs, especially in the "few assertions" cases (common during local propagation).

@kunalspathak
Copy link
Contributor Author

I will close this PR for now and get to it when I get time.

@ghost ghost locked as resolved and limited conversation to collaborators Oct 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JIT: look at cost impact of assertion dup detection

5 participants