Use MlasComputeSoftmax in traditional ML ops by skottmckay · Pull Request #3892 · microsoft/onnxruntime

skottmckay · 2020-05-11T04:55:37Z

Description:
Use new MlasComputeSoftmax in traditional ML ops where possible.

Motivation and Context
Improve performance.

Performance testing
Testing of a range of batch sizes selected based on a) the total work values where the MLAS implementation is selected and b) roughly 32K, 64K and 128K amounts of work beyond that. Test was using LinearRegressor where Softmax is applied as a post transform, so it's only one part of the work done.

The input to softmax as a post transform in LinearRegressor is n * num targets (i.e. num targets is the number of items in each batch). The first row when batch size is less than 8 is the non-mlas version. When batch size is 8 or larger we always pick the mlas version. Average times are in nanoseconds.

3 runs for master and new of 2 seconds of execution time each. Per-run output is average time per execution in 2 second period. Final row is avg master/avg new code across runs, so 2 == 2x faster. Linux results were similar.

Input	master 1	2	3	new 1	2	3	master/new
features_16.targets_2.SOFTMAX,n=23000	448982	451875.3	432642	442810	446423.4	429215.6	1.013072
features_16.targets_2.SOFTMAX,n=24000	465973.7	459945	461949.4	464034.3	460854.7	453677.6	1.001113
features_16.targets_2.SOFTMAX,n=25000	473395.2	472742.6	459392.5	399560.3	402024.4	389493.4	1.180334
features_16.targets_2.SOFTMAX,n=32000	576963.4	578380.1	575228.4	500998.6	489242.2	463194.1	1.16673
features_16.targets_2.SOFTMAX,n=64000	1071973	1072847	1019504	918359	889334.6	888642.1	1.186496

features_16.targets_3.SOFTMAX,n=10000	301081.5	306380.7	289473.7	307732.5	298687.8	302376.3	1.001718
features_16.targets_3.SOFTMAX,n=11000	318970	332627.3	315077.4	256501.7	257715.9	252776.9	1.267163
features_16.targets_3.SOFTMAX,n=12000	343273.7	342019.6	335848.5	267333.4	266701.9	260125	1.283236
features_16.targets_3.SOFTMAX,n=22000	568464.1	568528.6	563422.3	362459.2	377903.6	345894.5	1.535724
features_16.targets_3.SOFTMAX,n=44000	1040520	`1036304`	1003151	642363.4	659211.4	615101.7	1.595625

features_16.targets_4.SOFTMAX,n=3500	153982.3	150097.5	153416	153409.8	152959.9	151115.5	0.992526
features_16.targets_4.SOFTMAX,n=4000	175244.3	171058.3	172434.6	172652.1	173280.3	165167.8	1.00107
features_16.targets_4.SOFTMAX,n=4500	199226.5	182921.9	193766.8	173506.5	171111.1	167755.9	1.108906
features_16.targets_4.SOFTMAX,n=8000	315076.2	307685.6	312659.6	256893.7	257927.2	247279.9	1.209667
features_16.targets_4.SOFTMAX,n=16000	556979.2	553207.1	536730.9	286389	281830.9	276319.4	1.953797
features_16.targets_4.SOFTMAX,n=32000	1020921	1020345	964487.5	493103.1	494206	471529.6	2.067504

features_16.targets_5.SOFTMAX,n=3000	160645.4	157089	157173	161582	158110.9	157579.8	0.993874
features_16.targets_5.SOFTMAX,n=3500	176031.6	178724.4	181176.5	153923.1	154797.9	148884.1	1.149115
features_16.targets_5.SOFTMAX,n=4000	196756.7	204055.3	201570	174770.6	171350.1	159191.9	1.158012
features_16.targets_5.SOFTMAX,n=6500	312779.8	315171.7	310754.6	232474.6	222440.7	216966.8	1.38037
features_16.targets_5.SOFTMAX,n=13000	552782.6	554088	527806.1	252329.4	248745.6	243786.9	2.208992
features_16.targets_5.SOFTMAX,n=26000	1014811	999032.3	974045.9	420113.4	416247.4	405137.8	2.407864

features_16.targets_6.SOFTMAX,n=2500	167084.4	161764.1	159420.2	161435.8	158809.4	152242.5	1.026865
features_16.targets_6.SOFTMAX,n=3000	177917.9	181826.4	178418.7	144516.4	141215.3	136841.9	1.259028
features_16.targets_6.SOFTMAX,n=3500	198159.1	224026	212525.9	152667.7	150450.4	147249.2	1.392808
features_16.targets_6.SOFTMAX,n=5500	313479.8	315698.8	306726.2	167460.5	163482	160465.1	1.901172
features_16.targets_6.SOFTMAX,n=11000	552945.8	553834.7	554031.9	226696.7	223263.1	220118.9	2.459732
features_16.targets_6.SOFTMAX,n=22000	998033.6	1011797	978502.3	371195.5	375280.5	360230.6	2.692425

features_16.targets_7.SOFTMAX,n=2000	153247.2	151969.8	152624.9	158968.2	157016	147119.2	0.965925
features_16.targets_7.SOFTMAX,n=2500	181857.1	180047.3	177754.7	130593.6	129476.5	128154.5	1.391565
features_16.targets_7.SOFTMAX,n=3000	206375.2	222446.5	222885.3	141463.4	141513.5	139533.2	1.515395
features_16.targets_7.SOFTMAX,n=5000	327053.1	323482.3	328910.4	161937	158788.3	156604.9	2.028326
features_16.targets_7.SOFTMAX,n=10000	572551.3	576297.6	579633.8	214545.9	212872.3	206796.6	2.68788
features_16.targets_7.SOFTMAX,n=20000	1054120	1049391	1021747	361782.7	354491.1	342637.4	2.936741

features_16.targets_8.SOFTMAX,n=2000	165727.8	168602.6	162398.7	95920.71	97612.02	93970.05	1.727513
features_16.targets_8.SOFTMAX,n=4000	310651.2	289640.9	295280	126852.6	125706.8	122745.5	2.376835
features_16.targets_8.SOFTMAX,n=8000	528216.2	518139.2	514104.4	158730.3	156406.5	150919.3	3.320322
features_16.targets_8.SOFTMAX,n=16000	1016263	971832.8	947382.7	223523	226598.8	208073.5	4.416795

features_16.targets_9.SOFTMAX,n=2000	187535.1	182628.9	186176	121355.4	119658.1	117175.8	1.535864
features_16.targets_9.SOFTMAX,n=4000	336355.9	342210	329880.6	146983.9	152511.5	145906.2	2.265697
features_16.targets_9.SOFTMAX,n=8000	599245.2	595339.6	586062.1	200299.8	201305.1	194958.1	2.974527
features_16.targets_9.SOFTMAX,n=16000	1102796	1089709	1049027	324696.4	317685	328472.9	3.413089

…maxInTraditionalML

* Use MlasSoftmax in ML ops * Refine when mlas is used based on perf testing.

skottmckay added 4 commits May 8, 2020 08:56

Use MlasSoftmax in ML ops

2b1144b

Merge remote-tracking branch 'origin/master' into skottmckay/MlasSoft…

4b896cb

…maxInTraditionalML

Refine when mlas is used based on perf testing.

1aa15b4

Merge remote-tracking branch 'origin/master' into skottmckay/MlasSoft…

5e738c3

…maxInTraditionalML

skottmckay requested a review from a team as a code owner May 11, 2020 04:55

pranavsharma approved these changes May 11, 2020

View reviewed changes

skottmckay merged commit d7e3956 into master May 11, 2020

skottmckay deleted the skottmckay/MlasSoftmaxInTraditionalML branch May 11, 2020 06:29

tracysh mentioned this pull request May 11, 2020

MLAS: tune softmax kernels for partial vectors #3906

Merged

stevenlix pushed a commit that referenced this pull request May 12, 2020

Use MlasComputeSoftmax in traditional ML ops (#3892)

77a3f92

* Use MlasSoftmax in ML ops * Refine when mlas is used based on perf testing.

stevenlix mentioned this pull request May 12, 2020

Cherry pick PRs to release branch rel-1.3.0 #3911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use MlasComputeSoftmax in traditional ML ops#3892

Use MlasComputeSoftmax in traditional ML ops#3892
skottmckay merged 4 commits intomasterfrom
skottmckay/MlasSoftmaxInTraditionalML

skottmckay commented May 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

skottmckay commented May 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants