Skip to content

Use MlasComputeSoftmax in traditional ML ops#3892

Merged
skottmckay merged 4 commits intomasterfrom
skottmckay/MlasSoftmaxInTraditionalML
May 11, 2020
Merged

Use MlasComputeSoftmax in traditional ML ops#3892
skottmckay merged 4 commits intomasterfrom
skottmckay/MlasSoftmaxInTraditionalML

Conversation

@skottmckay
Copy link
Contributor

Description:
Use new MlasComputeSoftmax in traditional ML ops where possible.

Motivation and Context
Improve performance.

Performance testing
Testing of a range of batch sizes selected based on a) the total work values where the MLAS implementation is selected and b) roughly 32K, 64K and 128K amounts of work beyond that. Test was using LinearRegressor where Softmax is applied as a post transform, so it's only one part of the work done.

The input to softmax as a post transform in LinearRegressor is n * num targets (i.e. num targets is the number of items in each batch). The first row when batch size is less than 8 is the non-mlas version. When batch size is 8 or larger we always pick the mlas version. Average times are in nanoseconds.

3 runs for master and new of 2 seconds of execution time each. Per-run output is average time per execution in 2 second period. Final row is avg master/avg new code across runs, so 2 == 2x faster. Linux results were similar.

Input master 1   new 1    master/new
features_16.targets_2.SOFTMAX,n=23000 448982 451875.3 432642   442810 446423.4 429215.6   1.013072
features_16.targets_2.SOFTMAX,n=24000 465973.7 459945 461949.4   464034.3 460854.7 453677.6   1.001113
features_16.targets_2.SOFTMAX,n=25000 473395.2 472742.6 459392.5   399560.3 402024.4 389493.4   1.180334
features_16.targets_2.SOFTMAX,n=32000 576963.4 578380.1 575228.4   500998.6 489242.2 463194.1   1.16673
features_16.targets_2.SOFTMAX,n=64000 1071973 1072847 1019504   918359 889334.6 888642.1   1.186496
                   
features_16.targets_3.SOFTMAX,n=10000 301081.5 306380.7 289473.7   307732.5 298687.8 302376.3   1.001718
features_16.targets_3.SOFTMAX,n=11000 318970 332627.3 315077.4   256501.7 257715.9 252776.9   1.267163
features_16.targets_3.SOFTMAX,n=12000 343273.7 342019.6 335848.5   267333.4 266701.9 260125   1.283236
features_16.targets_3.SOFTMAX,n=22000 568464.1 568528.6 563422.3   362459.2 377903.6 345894.5   1.535724
features_16.targets_3.SOFTMAX,n=44000 1040520 1036304 1003151   642363.4 659211.4 615101.7   1.595625
                   
features_16.targets_4.SOFTMAX,n=3500 153982.3 150097.5 153416   153409.8 152959.9 151115.5   0.992526
features_16.targets_4.SOFTMAX,n=4000 175244.3 171058.3 172434.6   172652.1 173280.3 165167.8   1.00107
features_16.targets_4.SOFTMAX,n=4500 199226.5 182921.9 193766.8   173506.5 171111.1 167755.9   1.108906
features_16.targets_4.SOFTMAX,n=8000 315076.2 307685.6 312659.6   256893.7 257927.2 247279.9   1.209667
features_16.targets_4.SOFTMAX,n=16000 556979.2 553207.1 536730.9   286389 281830.9 276319.4   1.953797
features_16.targets_4.SOFTMAX,n=32000 1020921 1020345 964487.5   493103.1 494206 471529.6   2.067504
                   
features_16.targets_5.SOFTMAX,n=3000 160645.4 157089 157173   161582 158110.9 157579.8   0.993874
features_16.targets_5.SOFTMAX,n=3500 176031.6 178724.4 181176.5   153923.1 154797.9 148884.1   1.149115
features_16.targets_5.SOFTMAX,n=4000 196756.7 204055.3 201570   174770.6 171350.1 159191.9   1.158012
features_16.targets_5.SOFTMAX,n=6500 312779.8 315171.7 310754.6   232474.6 222440.7 216966.8   1.38037
features_16.targets_5.SOFTMAX,n=13000 552782.6 554088 527806.1   252329.4 248745.6 243786.9   2.208992
features_16.targets_5.SOFTMAX,n=26000 1014811 999032.3 974045.9   420113.4 416247.4 405137.8   2.407864
                   
features_16.targets_6.SOFTMAX,n=2500 167084.4 161764.1 159420.2   161435.8 158809.4 152242.5   1.026865
features_16.targets_6.SOFTMAX,n=3000 177917.9 181826.4 178418.7   144516.4 141215.3 136841.9   1.259028
features_16.targets_6.SOFTMAX,n=3500 198159.1 224026 212525.9   152667.7 150450.4 147249.2   1.392808
features_16.targets_6.SOFTMAX,n=5500 313479.8 315698.8 306726.2   167460.5 163482 160465.1   1.901172
features_16.targets_6.SOFTMAX,n=11000 552945.8 553834.7 554031.9   226696.7 223263.1 220118.9   2.459732
features_16.targets_6.SOFTMAX,n=22000 998033.6 1011797 978502.3   371195.5 375280.5 360230.6   2.692425
                   
features_16.targets_7.SOFTMAX,n=2000 153247.2 151969.8 152624.9   158968.2 157016 147119.2   0.965925
features_16.targets_7.SOFTMAX,n=2500 181857.1 180047.3 177754.7   130593.6 129476.5 128154.5   1.391565
features_16.targets_7.SOFTMAX,n=3000 206375.2 222446.5 222885.3   141463.4 141513.5 139533.2   1.515395
features_16.targets_7.SOFTMAX,n=5000 327053.1 323482.3 328910.4   161937 158788.3 156604.9   2.028326
features_16.targets_7.SOFTMAX,n=10000 572551.3 576297.6 579633.8   214545.9 212872.3 206796.6   2.68788
features_16.targets_7.SOFTMAX,n=20000 1054120 1049391 1021747   361782.7 354491.1 342637.4   2.936741
                   
features_16.targets_8.SOFTMAX,n=2000 165727.8 168602.6 162398.7   95920.71 97612.02 93970.05   1.727513
features_16.targets_8.SOFTMAX,n=4000 310651.2 289640.9 295280   126852.6 125706.8 122745.5   2.376835
features_16.targets_8.SOFTMAX,n=8000 528216.2 518139.2 514104.4   158730.3 156406.5 150919.3   3.320322
features_16.targets_8.SOFTMAX,n=16000 1016263 971832.8 947382.7   223523 226598.8 208073.5   4.416795
                   
features_16.targets_9.SOFTMAX,n=2000 187535.1 182628.9 186176   121355.4 119658.1 117175.8   1.535864
features_16.targets_9.SOFTMAX,n=4000 336355.9 342210 329880.6   146983.9 152511.5 145906.2   2.265697
features_16.targets_9.SOFTMAX,n=8000 599245.2 595339.6 586062.1   200299.8 201305.1 194958.1   2.974527
features_16.targets_9.SOFTMAX,n=16000 1102796 1089709 1049027   324696.4 317685 328472.9   3.413089

@skottmckay skottmckay requested a review from a team as a code owner May 11, 2020 04:55
@skottmckay skottmckay merged commit d7e3956 into master May 11, 2020
@skottmckay skottmckay deleted the skottmckay/MlasSoftmaxInTraditionalML branch May 11, 2020 06:29
stevenlix pushed a commit that referenced this pull request May 12, 2020
* Use MlasSoftmax in ML ops

* Refine when mlas is used based on perf testing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants