From 98965387f31c50a90b01686821d69bd1c3423e8a Mon Sep 17 00:00:00 2001 From: feifei14119 Date: Fri, 12 Dec 2025 12:47:26 +0800 Subject: [PATCH 1/7] add fmoe co with tilesize 32x128 --- ...6_blockscaleFp8_g1u1_novs_gelu_1tg_32x128.co | Bin 0 -> 29280 bytes ...f16_blockscaleFp8_g1u1_vs_gelu_1tg_32x128.co | Bin 0 -> 29320 bytes ...6_blockscaleFp8_g1u1_novs_silu_1tg_32x128.co | Bin 0 -> 28768 bytes ...f16_blockscaleFp8_g1u1_vs_silu_1tg_32x128.co | Bin 0 -> 28808 bytes 4 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_32x128.co create mode 100644 hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_32x128.co create mode 100644 hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_32x128.co create mode 100644 hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x128.co diff --git a/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_32x128.co b/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_32x128.co new file mode 100644 index 0000000000000000000000000000000000000000..4f0525b335e7274ee99cbc391e25ad68e3e4ccc0 GIT binary patch literal 29280 zcmeHQ4R}-4`9Dq4v`zX2w2;=OMvM#?#FDmHyTHn)=nx7mSbUfvG`CHqwEZ_laex2y zHvJF~5D^6(ZVqKmH#hg;hTAAXK|w*Oh=UC#DkwPcW6BgK=Kp)oxwpNEag~PcanIBE zgx`JN^SDaQmQ%JI=UaWFlebVV!?|E98J zW`vr2OGNV*FsKAmF;TH2wR%oP6$T|%A*#sIg-m`(fw5d6GeXb0OB5+@pP^+kR zGXd(~qs<1KaP8!)O6M>Z__=4X*XQ>*7DiKo`> zsjQUz{>K_6kJo*3V@;EU7O%GvI=8>UQz`wpy0(6yr*^ZW&a+th(ac92^;K1VsflfN z_-no=CH~ZLlc#oxovooC{7nT!^vufB~eb+YlhRJI9=cjH&x71Q!I^gwq8@Lh~jjiQ@pA>#RXB$)@zEs zC{7nT#mU_%HbgmFuPH8#;&g#ie5mZ&?i}~XQO?(El6#g#@w(tymY1sYanG74=j%Po z4N<%KeJ;|R$@w&iCK2$cPXDFYDa=zZP+!Dp>OS4q(qu!du-GdVp z0SxLrlkUE~2GHDy_w0x@z^+&W?CC>*hke>X*w=>$+SSOOS7HrtFxCKv`w(FBLQhkr zuX`#S?L&yLQ8*TBfD^F>IMs&$4@(V=P2Gd=ejg%)4Z=sU2KXe_0H5_Cz@`enIs|*; zSRVpt$6#-qA?65)F-J)5ON6WDXvQJEFA+4gac^8k%n>XxN66_zgiVt*!(i(}fbi*H zk2%8dm?MnpLxfE;HN%kChXCQjP!My3qL?GN`Viq^DeST+=|hO{ftVa~glREHnBJEN zlJ>eN?MsBP@=zXggo>CW%%k9ART0B0Snu--O4)P1Txd@n9d~gpbC>$C&E*X>c;vd4Q{_wy;!Pia#z(XmAr0x{Qn~DG5wbBFJNP#+rU#> zQ(b3x@(X%$*&7$!;#lli>aMEwG`VlCZ(O7V=WR@E8XOwuzJ5lb2fGoC1y@!ru9w^k ztHz9VFRZPvT;#9x)Jjtuirm#>mW*-N)!*cIS4*`^++&)m-TBU?W1K}({@`e6thve4 zBz0%hlQR2_qfz=F>~(~l`EGoFii`WH!(ZLd=&r0^QrFb*h{J!&;)V6Khe~3~#j(hX zr!?JR_v+X$95;JzlByc(F_bhSCF(ZKO;V%3roL{ix`!lodcyyuwt;%=_Ne=1O7pjY z189d!-EUL$WUs!naZKAuI=j%-E=#`sn-|YF>Y<}I*{R!2IP7`FGyp6nZwy~g9{=)) zve&!@XpSQeX{>J3C$lzmTHWW1F!`?&<3e#Ap%Bv68G}6ecw>-neHx8H8x(S~F_=dU zdSkGJFu@osB}_C17Z4hZLB98yWDLqAn~lLWgeGI~L4XYT36j%w!62XnxDO};j{<|h z_knDHE=Uup1egw#fi_?emoGdE+_+6Fr96M&R`1}48ubB zoI*OCK*l(BFr#f7j&&^1sb{U72G-W8k56sBEx2-9ekh-h#eK=h1+USe(OkYfbUDYQ zP!h+9lU-{kwvAisV)u6LWB22|8t=#O{sG>b@cuF0Ydb@1eJ5QRA@+R*?^N)o3jR>R zhZKB7!MizThBEp2vO-xLQ$i^mM}$Ujw1%u4F9}`3@v_im90!C3aD@Nuv?~mMj_~IQ ze~$3y2!D?7=Lmm}-!3_{UuMiA>ps7Y#z+64Ypu&^*!J|mwrw)o*IB~4I`wVPIUKTl z`Q?G@;>5U-U7O3b_D<#<7VvuQF0a=o>#CZWTzc5aMkNFxE60gso8WTEWbYa^N|r~D z4&0j|;=Bq>$LBCO6LSGL7nldU7gz#(6<7)!JQ#BUI2YIeq+A2cYo8sOjd2LE+4#I{ ztf0$z3u9%WGOjCw?h-{;9xCU$a_ELCx;ddaTsH^0OBLPR&|I#Y3!PojT^G8J>#l?D zGDTMrs^Gc`=!Pk}d7*h+HxIgRC_0|wRj#$AiY)6kHJ3r2-7WvPxg2tNxBOW19LRIK z<)1gtgq$k%tvziqC7JilB1ee*oXH+0LL%?Tll4T_oUa>^lRzAW1rYhzwp zz*}V(pP$dx;q%!Gs;hjh_&h^iUveUfJ;{M8gIx!I8Ds96;#wO&ZQR;%_$*x(i!5hl z1rpZaGljm5mD*jUrM@7JU8S>q7IwHZiyi6AW`FIxh`rW1Wdii`=gabUzZ;-IM><~jGY@qI!+#Ev&r%$mjt*i;_~)3*IJ_z--!6eDDjPK zIlT^P5hc-k1<# zPx5|;*l%HPz&QL4s569EN9VU5+w$!Y+uNBubnR9r`>Yc+v3bnIJ=;RM8=5g zr*F}(PG}j|F|Nh9O>S|uJJHt!^d%pdNPVvM`keYyKHsQ6Z3$cDaiMX1?zzTImjldb zXX&L!%|jAp!lWz{M^iT8tfMv-kNJ74Oui&vqRJQtl834?YCg%AsxsyR$#zvH@(|?9 zRGG*&kcX*qf{?!f**P8OLt6q%kmdYa*x4(_dpl$kHGQ3&St7N zo4kq1;@YfA&QD78QC(VwkHVFh=nEt!+FQ7dDY=DJ4a+y1dUVAK}2wexBcmg7q@V z8ISsT`{2j@sh{S0O+!lKLQZ{Evt07T{xy;=Hvn8KnR1r_H%LQsR{-yqM&^EM>#=Qv zZk;qRw*z>ul%2a0xKVQEt_J=GOuS2c4yJoBo^;&}UoVqB`93kusSixz3Lv1#0IFwHBAGOtQ7CrTxqbs})KWU@{JR!Bpw*8s1VMp|iJ zskW1$E0+dZrvm3n+1Ba6`I6H*19*c}Y^CuTj(I>h@*a$nQqM^qb)OhN#{-j6*ndRhP=xq|#rI%5XrAHzK(gyTF;3$jfPMsz z{|WWK2=NJvpTzjke8YW&WY<$-e8z9Z_z3&YXx>?rdDrW>VBV>=y^RZ>cc^1D@2-@u zY}d>BcCD^hE^d#!SE~2A>y>pLi!1$I=`6#xIdKv7bh}bdcT!!%^{K6=UvmFngZ>+K zEA@0Q`Tgtlz&hBo*OhvD9Cd*1-`aXg=h48QmLn>;iuM9YS+}$$aOgfkcJw1c*(ALwlNtDc$QAWgSBGAP?sybnZkwP}Z3w7wlH*>E6l| zb-kmmTc|GN;e3S7;}P}rq*6~$`%~2Qk-E;Id8O9KUh4+tm1^5t-Jo$Aj(I>h@&Lw3 zTTc%u_4EkzVm-Ex#%Va_1L4Rs5%u(}QcureyoCKLG#*8WPgr~aXgz^B zHMYA%@IYc8B{V>6xin}$}|$>O@4$K&7^*Od<}02TlX zfrUWl6!=Yn-xT;wxv^Q6n7O4|XAaaD%&m)zX1N7&E97>_w;RnAcT{4l+?FJn+RfFb zj+7eH?P*`3Kd{e-eF5wXx&8U~XLNF_JVSY|alZX(lM>DC)h2UCO|tp+MXBbt<&f`y zybAK2spjs_D!o-+Hb62hA5?8xky&H9W5`$N8|?F8UjX|;Zh!uL`)0=Ze^$$+bo27+ zf#wx88Rk0{4K^>k3-aBN?}5C2u=uPjt@6q&$+YUCYSU_Kjp@#dze0atpAY*2*cWp9 z^Y72-ob&I`O37kgRh?~KU6W(Jb5X8&<$aL95BUMeKg>0M9iIEqJ<2%5Acu5C&0DYN zC=BU}Ip*m1cgcFcfiRHZH&}sf=9@Woq}|N*%La+|6+=Y(%8Nw%>WgnS;2EJklz?Y( zv_7QgyjS0U(3vBi$&=b%MG%?MmB@XO|rn7*i0@jl?8r}m2&x3Ss;JjPkmglvyu1B^W06mZ{EE^mR=IF z?|_hN{v_nuBSK#Mnvffg3;BQE7IM>RAuoAP$V<-%`Ie7`eCs(O|6d65l%i(ql;Y-) z&Wq=74U?Q#7MYyiDmLZB#hG*TdUH;Eyg4T!!7Q^)FUxv8lS?+e97xbJYf_TgYBrm# zDJf=aTAH~9?QLc~Ye9QQnx5qh7+}sBG{~HjnQ6`$G6eH&)62^S=~)2nD~7;7E6Z%X z=pwV#YBgIgzS!J~_LUc5Hw5jgFNXgV7xEh4+>U21?RXCSLwsg!UZ0Jg`O@cy=c0Be zfBvg{)_BI@RGulbS=k4jG9KK_1GWI?00Y490$Y3L0X_5SP@XYs|`fQ?BVr`ZU6UDN4?HBIoq1 zL1~opKPY&oucGgByzQ6m8DUx*;bKj zPEsY$IX6Yg)zvCfZaG>0G{Rh~KkZXj&v%Pp$uIXiTlAKv3ETCSU4(b(Ezc9)qqlI| zS>jp9KGOY2Z+V$;v)*!$@Rxeap9!CY?f9O!J4m-&Z+Vt*mEN+4@IJleMZyOZTP5x* zqSz2z|BQ+mr^3AgGk+VxhET+}{8p3y#2p4lEbce3=}<}5s`>a9<`^&=Z` zvJt1ZKJ=D%9-Fo=*rLZ4eyA&0Y1E;9b~!vxGtzHw75eSa=PUXkJKdEpz0BL6ZWrxO zNBBK+yU;(Q@!PKP+p%1dO^R^#{L1ET%K z2)~zpDD*FB{GQYJ?fa2v-xuN6^^nkaY5aC+{9fKH+Fy?F+y98r@7MV4*7&{hsAzvB z!tcOOh5mrX?|F^itB;HJS0nuX^b4W?lg94_jo-mviuQvMeusW7^oKNldo+H3ep0mm zIl}KRzZ3euX#DnS{0=`Q+7Czg9r=UMAJO=|sPX&jR?+^~2*2013H@stzn2t0=DGi% zA&7ffEAVOTnbrVz0Pg@k3w1+Gt8GHQ#wO(Nln8lxiIA^dAmkYfggle>$1~zS z?_~0{Bc&yY%#LF&w?LjCr~?XxEd*`l=GxmO7} zM`^E^rjgNJakY?hmGs})q#eL0&doJeU`ObPf0Ns3zcYSD?XV|w zWPZrk2$XAHQfnHaGeyZ6-vKGtysW#^2%Xlu(E~Q{ZxeC>Awj5X^r1o8ozgsi1v3Q{N8&_=-<=$y{++k|8>#+euUo#Zwmbf z8ov`7zrP{nW1;`2#_zPo@3XU_{j&(abN>|j za~i*QHGZF;6YZZz_=Q43AJX`}r}*(T!ck?7@P@KRIHs%--cr^GCzLh9JIWg2l(I&6 zw||YmaF71Ft`X?j`ssfL!2OWe4=qdw0oAs3E44^&x$pIlfaYN>lpK?Bx145h-(i0aXk^@3|;)eXY zxggp-OX7sk_z)LF<$#b(oDk|iaY6X}0{4$(;)DNjygw0oAs2_Zdk zK~xS1$;1hv{t_32&o_kwLYxrN6Bk6~fRId_kpEUL$QSA<)g)~_C0QppbvkfH)}X({ z1?jG*)UK_kB!g2&c1CbT)&YqN@`ZXzcG`MMvRY5U86iIdxFFs2l-f1*lsF({V^nw| zDu*M1xO)bE_H5f>z^o`MrX*FSMV)OyMyI3UzM3>=YiAaOy$>M1xO zq_=|$QX>3zs9aRi4+A%Zu7Bc!gw<1ULP$?s5Lx)`QaK=`9}8{>)dAvy@Oh+g>WC9U z`T}r4g2Hc)$^jw$Sa3uB-CU5cdJ0Ymjn61>K~xS1$qsNrsQ<(T39F~zgphs|xF9MA zgk%ReA=Gc;f`rvma6(A$0vAN(fRH>MoDk|SaY4fBDL5gdcYzC{azIEP4^Buw7o?vH z;_T;w{A=D$KNqCAp9|8@1?lI4e7(6Kw3ZqP4oDF9cH)HW01_93)^NlDA(=QK#IGPO z2-y<{gk<7`>>)ku3AJ1hvLg=2KIn-P@(M5-7liDz9FT*sBW}oH;Fobhc+TK}P|m~& zp&X-eL3pmHL}9Q(N-J^Yq_E{Ncy z^m9R`?d#`)h<%cNE{L+v)6WIrTocX>`5N!@+_!`)DUl$Vb3vTo8WW)%<@0pTLf|A)f)i zj0?hZ1_wmtgiwyrxF9@Nd{?7fi4#J(M&p9;T)_dMT!|AxxklrH@La(Gp&W@5LOJ$x zL3;Qt{ag^SM(F2)OxxGb1rck6elCczM(F2){8y|Inm<;#AoTy0v~0uHAb;(hRZ@if#CtO}6RNC)=EP z4*X{uetB(uUA5HcZtyhu=r1q1{q^o9AAT{hv0<^_UFoTF`X^x~8rg z?Gk>ha^aF{cTHVYy}PQu(e0_Nb=Ujd4UP3xHMNrea(`px<+ZgpEglWcjZ$S37i@W6 zuP5K*ttuRQ#n`b@p2t&}=grF>b47lowsILZPD}6` zLE2X`7oeY~)iO3j(HC41t{hSZC%u!vp#NqS!eN6e6^i7S-(OB>G zGvRcnVsn$Iv zzc5cZ%$qQ|`_Nf1S=AOy9jhLWoj5@~oG_`|VDc4Ls@kdflVkhU#@M{+Hy_=T=gpXK z?bYIA4F5>_B+>k&ZGBJ-#7jL;Q2l#qYO{LXsxtXfEl}Om`kw}7RZru}K^;gs^3kMWi^;heOiZ;biZCCSG@nWz9hM;;(|Z zo>hslZ{U!0s=xX>qbip4nDV1Y`EqEQw;IH&hvYUn!h@KZh&6RUE|J{Ef$j6f3;0LuPXa+5Y7J+ N#s78HP{Bz5{|98vYgqsQ literal 0 HcmV?d00001 diff --git a/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_32x128.co b/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_32x128.co new file mode 100644 index 0000000000000000000000000000000000000000..a6717e27c9c4f3184ff1155d4ee5c3c9f5593b37 GIT binary patch literal 29320 zcmeHw4S18)+5c&hrft$MpoJ9M8nLp%AeOYn+67kRLv#p*7A#)O5Sp~a_oA{~zYX9aOZE{nnZ+bp;%xA-o&Ud~qIu}o!5~f!ec4Y{) zifT9GpneK%R^X)Trd(Y*mofjZT}#|vpUbw`(^%)J_1(2bk z)jD03l^&n(u||)}?Yz0MrpbdAx4RKKr?0_P>G?%B$JSMy^}{I6^`xoVesoQrGf+>Z-W=c1)mRi4In;qp*}%dI-aM{xS(qFbCigc_;o zmIlv5HTm}N6d&a&*4NeE@-H*2P*0W`;beVgXo=u-p)g42c0aEh8?NraR2nc?&ZP8T@C%@woN6iXwVt>6tC_{abbkB z^_ikKg42agaY|2$4H3@PXNt=rI9=crA1b@9C&w-&!uk44vTJz+uM3`Kd8s-dyVgcH zU*B17jNo;Fv)o*EeNU1PMmSyHN&Yf|*9A`Up|YvHL-|C6^YxwO(-FMBHcRzB>aJPR zGdKYez@XkU>F(QS0L_iKYkRZ-c19bZs~-U#_G$-VPd_4PS0i2fq785$+5m_85n#(A zS5u|8XDS@&M~IM7cqiHb$D$2zq8|Ys_B1p$^$fzt{fH1U2%kk8;EQMjoask^%@sa% z2zJM?egx2t!R{DC)DhyNj*!@&2v^V5j6-UFB4}#k?wGWwBUqx2klBw2o2O`o!P<`i zq0=EJ>Ik_}M;OzO2%BeVh9R#X0YZnNAnFK3QAcp}Bf`U;kjtW^A0a{qVoKBzrbiuN zMt>rBwAV#xeKGI>N@NBW&tNgh!j|oA6k;sai8F9_&Y)(9w7}>IlD#I>LYUBf=vMKBwDLSKm7` zp6ExE(6RV^)DfPJI>LXSPlRV|)s6K_8=O8*^%75Ale4O3naAy<$N#U;9@A5Ne*qf{ zJqE7Yn(8{kljrEkWp_;AY1jm+&KqVHy09D3Sa3zv zl6sGGQPsHd&PBELm5Y6qu3FEuh9YP6xTWKqH~E~^p4z3(aZT0EeEYI-_M)kOvNbf; z+~jKV^yJfo9~t-i|T6+PK+WG z+hRAK(DVe|tz*Bj-R!!_Q`K0HVWiP0QMYJr@-+Ht>g(32`$uAz=iGm(eXt(8J?eg$ zG?@S#L_1vSew);jz4{(bx_E`LfY5#T`D0G=2RClhLrrgWtLdB^q8I`eg)fE=S$)%2 zSk{BS-tYTpx=o+R+R$lrpYQ9f|2Vv|uxeX=FrSabeTm2gud%_gT)sSbImd)x z0>`VSIM!X&HesEE-P^f`-H-PgydT5+r+9D1`xkhx>kP6Dopf~s*-s?iDe+ele3(qf|qc-EO;5mLBT;B;eR{r3d5fx{5is( zBm6nSpCkM^!k^>!OAhW;7_%t4uWqC9(Ld-|=dc^Ly)w9Mo5J>Vmay(leH(N(o1$EP zx&QhYF>Ykn=5VaLletIw-R>NR+wE0!Rn1H(J!EHN;sTK6aXi_^IUEYvJI0Jrl(A#| z_oj(BuLD!@ISft5Tma4k<^k^omH=M|mI8+k#asZ+12zCDS3h&-%n8oHI0V@od|p;o z&~3kkv9e$p*OftciPV(`%ek%`x)D-0H#nE;=0bO=)XfXd0*Hu6_QtIXh=X2eB=)NO$JjbgY>q@1p=r%W(L7vki|Dw4Za(R#ZSo2)S^Lpf8 zH_wAy(Ifv?^YxH#=#iglu7JFtNB(2;e8@NU$eqn`L5vNGnc}c3L3_TUNUTcTfn^RC-$21;5s(5A>B+L!_3UsZT^3Z*-4IzVJ&XAwax8LG<$}6 zCz0(0$GRkpof-3JIqHV41vx+$doATUL78B+D#|67__;0O^7b~zI-`tlM0{g}_(rzr zHKRwzI3jlJ4-hvsRZ#{F@(0;7ouXfV>?AIWf5NPFLH4IkloeLY{(>6w`%azS7#Czu z@_q-|Q?NH+9R2{*8G@{%^Lvjy{rw=@-I+LI-Bvq0(}|kcJnpKlZ9(0Q%^0y@P`8lq zwxDiNbMQ8Nm~7RyU~=+Ex^{x8sT}>u$vOVi)SS2$$DLRC^=lk@$GS{MrlMqJ`SqxK z`WF3~xRwbW6IzVhlom(39es^MU-E(R)aPoq*RD_I^Nsq`7PnQI5S+m0o@2rc#m|g6 zEVcBAc^DQNz=RAFN7Hb^*+;A_7W4B~g?vfAM3pfPB#%&K)O?aJRb|Wtl5o%fkT1!0xiIez~R9AftkQfz--`8fmYyyz!AXBz#QPiz>&aT z0CR!A1dawi1{?$YPoNF>YhWJm37{SLU%&$3?}6iiPXUX7PXmjAe*`*!{|%f7>};lb zv)LV=D6Y+_#QcPKFV&^x_$VCl@m_y?d`=6uF(tOJs*(9-lQ+(fYs6&o`c0;sWhtw0 zOJ9liR|j=i5|jwst_a*25ZJ{GynRoJzx2>lJ{>^-UhBuTTz)z`k|{UlQ)qa@*^C)#mDm-Rj@%J zIqgv&Zy)-&FZs)Cw`o{uOwg{cYF0{~*t^!F%k~4;c}&^Mfg3#|vR4A{_l(Z|(%Ng= z2Hkqk;Oq|Iy`JIOtALw4_UtvlpL&Y3X&mG45}$+V9*ielcexuB(kI?0#yRgvH{y*d68+A+PxL$c0nu*@;!!_` zKZ$-L-YAFYcjhyq-`QKy?-A(BoNw6goL<|$`W-BF@cHGKZQ5o~4DAMa|7nodLn-pI z;`1XM(LOmztyd)HteBicIC3@VMcq|$`1NswP37}_NICiAb0r+Hr7TIUb0p_HT9!mO z^6|2yfbe^VP2+xP70~D5yoC1lN`79dizFAUs7xXpzZ&|>`1y{ptGQqL0`kN82<=;Z z{CreLNiKNQmqa-JabHr2@H@e#bHAZ9uPkz2m0(VkdURP=0q1y3S<`_Po)KBs0&nn) z&Z2px+D?J4+%q_98gQOxc-9Qy0*^gwCh$g2aTbkFF6IH@=zB0uvYwMX<~}igwg)CB zDY?8KAJcf`Vm=U#eiGv)>psb2o)P0^+luiL_Mg!>6d^ug@jVz1nrFB_knFfmjMKyi zpdZEKe?k2(LVUvFCow)W-*6ux+3}1RpNU&BKEnPC%{z;ncYTfv=ACNW*SPR`hdM^{ z?n>p#cDc*3;eO_pjFj>tWB{l=bu|>Hyupwe^(FqphbTtM&A_tfwd8r`A(y*VNPSbwfx! zMgKP9{DXI(uF!fxt*5(WJ>4zqDe~Jyc1KZHkY7(dJud6%Nz|2*y*0AWdYap7+gDwo z@r>`5^>jbRljglzPY=p^dRW%ePh>qkh4IkV)6=q^o|W|!@u=U!_oLr5&((T*P}bAK zvYsLy^?Ud!^jlj`Ps@6G7X2RGTO<4I_n2PWzWV)*>M1@yy8cJ(Ks}J_Kaz8H$$Gk* z^l}|5*CXn>g6e=3pDW>rqha;*xU8oq%aVH5FLE71^&k)DCA9BAJ&@~6k_&dpdb+zZ zNnP)#>lUgDc{m@T{b*P{Jud6%Nnet>K2p~?G_TYe*=OCryi#rZsv9&;xtIrpqxWN+ zwDt6$tfz;e7wfS-G)}pg4}_ynh1Jv3vYwvBcnSM`G#*8WPguMkI zO--ix&c)g0Rrf*u3FHSL|18`5ZFn9?_b%fwgA&vgHE+G5qcEr|=9sD9+pXw*210+F z&yWRdGvCayBjskUUp_>%uN)@YS6w99*IazF0nZL|f^m38N9#s<{(H^+2ke>RSuK5+ zrtki(cvj4=IdQZ=3kZ(OJJ|XC)<>vrsbrh zJ5uy4bI>4j=8z%g%=C0~=CEOyZ<}9RK19#_XkR%D{uvqOtcxx(XJuuXvo5~a+=}*9 z7h#tK?Q1TE|5OL^n%LZqXEE(~KKwI$W^Hb-m7WFD=ZEK|Id=YBSbo-c_FiXNmF$OH zUZn_JQzYx^EcslO_Alr;C$Ec`e*vrkSd898sEgI7@cWvzRoPDZL3$IRE`12}xIQcf zWhZ0<>=CmqZq=!ABF+D$J4=^5fgr?ZZ+-^*6tFn*uup>03m!%v4Cc=&| zIkuA9jUCf_KR86~uqRAT_oXbwy_a%LP0*(h4o;Fe=ZTzCGlry4&V#dtq)?7&372vl znj~{A5V@vhFE8=5h(fXUVDq)@*H8i-m;VMF1_U?!h7@>!e{iB zzY%WLTeR!2BBiK(rZTgAmNKh7eC}lE-OU+zhSgV}`s&AU#2Jn_ef6QQyz|)fJ%JWI zw)TVFfl8we^|RaNdXAC)*;b){7W#at53rNn`JUH!`*ZE0{kbr|=WiGK=QV!MYW%jZ z5bfK;{C3_lfqdFu&bD5&GR4zZW!quRI{yUkUSj^=Cr=s>bg{jo+T1 zi}pQXe%%iVeYeJMr^fHKEu#IkFu%Qz2>o7--!6^czDGs-zA(T2zY_ZW8o!q`ey=|+ z+FuXz`}1#v{?8h}mo<~|4W$PU;iNVf7STy*7zNI zMzkLa^E><}p+Btgdqv~-x2>Z6Z()9KfbBxp*BctYSEV0w-G9Ikz`d*$_#F09Yk=E< zcK~02z7e<+xE}Zt^gjl60XG3(>Am+cSM32jb}C_w94A?Ye3ezmlS_m=r9{Y67YcdW zLLpzxTDjkJww~iPRv}+&74i>Cggm1}$k#0t^2~)oo<;lSX-eugrc6IvS`yC`9CNt^ z@+2v@i1s;GX=Jp|nJi_$XfL0lkDqeM2uE;V zv*I32I3l){-=pcf?>nT&y_s-C`rXirHS{saIj|=j8T$~or|+yMNDq6$k?Frn`52fB zJHpYi&vQHa4*D7CVMjPReQ(Maz%j5Rw8g&7?X=$+&rmz;32o`0@-+hGnwQX;LTFEt zIpaGZ<(ijqcM73B>+TfFu^{0gj^mSLuK3PIITmF6E`@M>mdLdz;d!2Oagxjx-@z!y zqKv&MgvD7RS4YCzJm-l?GFN2yu|}}Sd-=ztdsA=uJK<5iu8|bjS6Ue-NJ3Th0)k)?2LSKF8D{op09C=pMup?o$?ak+f z{!LA7JEEzHZ|xB6Z-x22{i4vnt?_$P5e-`F<>I0!arSW@TP<99;i_vI073LQV z3Vl%H_mTADYlI_mjqsLSBfKNm2=B@@!ZEo!xFKXm9Fp}w;)tBHmq~uw zy-bqjUM6u#$d5Q8J$sqduGz~ZZV1_^{1MKDqWw;+akLx}>YuuoNnDYQ@QK6`(e7mu zS7Z~}{U35fw0oJv6`}Eq#1YZ%WfE6}#vl?$M7x(sToLMbB#wx7FO#?;)bDQ-M}*H$ zaM?vr1aYXpMGju2<6IX5c1CbW z)&q$na!x%ZJ8eBBS*@qwl8~PP9Fd-SO6{6@O56~#F-rc3$}J74r_bWN+IkAE$VT|2 zMc{~p)KhRpHj&-`Ax9*no`NevM6J) z)bGr15l1AXo`Nev*FSMY)OyOoxFOU&65Nq;AaO)O>M6J)q|X6Iq(t~_S2?Pr9|_I~ zUH`-p38|;xijbZ-B8u?asd7U|KOUSBssqFk;qypx?T9Nv`T}r70>ZCL<%W=cJUAo& z+Z>UQdJ3)xjn5cxL{x4F$u@9BsQ<(f38|;xijaN`I3g-Hgk&4IBGhl0pXAv@xR?17%RBKv@mI3i@H<%S%99dSku0l$tT!gB^UgmNaX2;~@wBf@h9H-vH} zt_bBAi6g>u1vi9pC9VkN8i^yqa|JhqawM(@E*o)a6|-GWq>0xea`?#MC_jo za75((&j3e6a8QUd@-5!~X*qo^M}*&R!3`na1#v})XF(hhzD5AQnq=aN5buIGBK*D! zZV1W56(PPwB#sEb?}8ihG1`eM@)6E z>3aq^B4Uj&z!8yagaMAozr`A1+2<-pgnm1uWgE5(`D@Qb0?kskNTe-BdM&{JBSkNh z0CuayQ(p!7U*xjLc5GepmqW@@?133Ib@*#e#a8@{C+mzEQ>^wp8~(HAUS3;YSM6zZ zHn^I+^p~2PzItbq7r(IB*s#RstaQ~meM=h}>KmJ2QMuSvT~k+$b`O5sa?#RiXH8vI zy|b#m(dnwKb=Lcw4UP3xHMJh!<-W$s%WG?IS~3=zA9*U9xM0n5yIuJ%cU9r|E5?ua z`#Yw|2>;?}Mu zHI-0%Y8&hEk-FXZK$)$wehK|TC!-&)Y;r9Edf_dO1hQJ!B2TSzTz=tXd6+k8%6RqA zUNA*HEST1#9X~lw)lQm{uO3dhVw!q5t>DUNe&;b7U;2$m=al&~CtY`q_y|Kkj`Nmi zUeY!{C;8wwOy?vDq5wX+OFoW;>BnuI)&P<)-4s)x+TN#KB&ip-@%*urTVMC zN2+28+O)3{t$^dp@g{$@uJ-)SX|^!z`Fnsu6vpZ?un=$Zr9RM0y*|~y-$%bINH4W6 ztK)~CWaF=i_?L95|1o)eACuR=MI9(%eop_dgP!uwlkF;~{=ZQb63LccD$c`k&)D%6 z@z2IufmUgV|COHKOVs?;`Ew)mYVN{A{QLA0folKNHub!!?8QMO|J*uZaac8!81DZ+ Dfs0Z} literal 0 HcmV?d00001 diff --git a/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_32x128.co b/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_32x128.co new file mode 100644 index 0000000000000000000000000000000000000000..1624d5aef6b2a995f4bf4076899951d5ce3d1011 GIT binary patch literal 28768 zcmeHQ4R}-4`9Dq4^dtQO3Z}KK5hDWzv7`lS7g+d4MJTjj@nM$Gq)nx?{WnE%fB*G1 z{SXik5d|H748^a_&3(AxHcC)XaG)UKV1tPY3J(03GR1-U|K4-XO>bgcwPAbQ^E5u; zci;Ct?|a{Ka_)Oi?)jPCI&tbGU0j^gz{E?>j<7g+OyHEqM{32v)OgYru|)iv#!{FO zYVvgo^OrCngQ=J(+mTv5r&NW3GgpWzvJ4@UA5vf}Psq#`j#I7{vWi+i%5_SOv@Z?E z1|hOd)QDd4d^#L{9Y57y?cdB}O>XMn*FB#)<}=`@ra)*>l{6BLobbEa++X7Exou}4!=kkp`n;L7HJg~UkjnGy28eEm0N2_b==eue*+Ui^jJwKlQn6195%I9ff z8*RRtA9@mhV!P2*yU0_qpr+3KlxSKpe^FJHr?FkQJlx=Nt4@hgoF1Qla|I8fMryjb z!SirUfju(CCwPkWb+tGD%M8oalVwCXS+^OQqd1-K3@57@TB4k++YARsaXQ}_&QdeX zk8-kZGc1bYbiOm3pl0ZdaP3d^qNqPyOb#B>o&<4W-wHl6)x2>AFwy@hD#BImw4hCw2|x(^1aXeU_V|cztD->V4E*voJI` z0TIBU-ZSa$+id{Njd<6#SOe^cHNdVO1bD=&9fUnSh@f4K?0PNM00&|XaHt0XHqLi7 zReD2H;YbfcgpI<{SOXl7HNeRp1bD>L(AX3jgb#ZVA#4yni8a7yu?9HXg8&=Ked-YG zj$=ItpdEwVafX;9B*q*er6&<4&(e%TMo%JWYUA#>%$OsXV~&v1g9sZYXokVkg8<>v z!5VXf!7)b|)`JKerfY^FzXt)rhoLa$2t_eRaP%O;Bc8C!!r6lm;R7)t<_J?_jxe<+ z5j@)KqNFDg!pcKg%n{0Cjxe_e5jJ8%%+OpK^Lh{?d`7rqj^K?sLTwKsJmQ(49f*b= zga{vqrkEovjyb|DJ&3T;gKw{zDWUWrM)*jy#T;Qt%n_FNAi^U}^$iQU7E`Nw5F&ga z*2ElPUCa^I_aMR(P4!K9EZkJBnHCTAAWrybJQ8z+$77E0(;h^4tie~|_SDsP&5Wme z5G8ypejRg!O)*FK!#9cWtgX7Seo;e(&r`k7Q`b~cRkPURuAs;NFVh~=Z}|QKHWoq+ zT(vdTb%tlYq$ii%ae+;?g|5XFRkg0Biks>i7s%k;#>A$n_+9(BJ= zn!gY1LpxmRew)-&d-a`-W4X6vCEeNh#~=ND<41{l9MfBUsasAYQ#%&Z0L12v;iC+` z#*1g#)viz1VdN2w)vfvz)`m{2`+SjY{@diZU|dHqh^%$S0M9+%7~orf3^)lBjDZruL}Oqcp}`p7d!I?hfI_m#7+6J^Yz#aEP#`}|a)vGt062krfC}&k zFaZ1z$ol94G-;f`44?wE00Y2$AnU6OILQtyAvy{PA8BtjvdTs*@9yo^LOf5Yo~#=b?W2Onr{s(-%=1P;A3%b3Ua|~cyKtEFAHAA zF)5hDaohyQ>TzwOS3B4}oqO1Qc(26!NxUDxdjsB&;=Q^v$kuk!Mr@G%Na7t5e=Q^a{4n1Rn>K!3~y;A~(%@E)KO_&TrzIA8$g0&q640Z6&}ncF%uI1}R#WHa%3 zSy@x=W;Pc5pV=&4$h@b=L;3<+^L3 zyHx7RgXLUT4&5NBn-iSFb#tKmj@0oSCp%V`NLkTsXfB03GbBISTn4!;BtO|a3-auc z{IllSkjq2zFPpE0d|gQXO>;Tqxgq&?&2u1MACfzp6M`5U6f?zPSAzBeMX@s0#@rUa zyUHp)Kd+_3>$MhES9x3Uc?R9yltdJIlKoW%s}BAO#@sQ{u{wUr=+&d~Svt&SMajzY zC#=F}3Vj`-SGJKUMg{@QsVd!uvWSm@`@Rg@q6z)$h0xW->8 zW2K}hN@}WzHHl)CSsi6%-T;mrWwX7G+2vkG`E}k&o4Ved=os^v(1-Z*$z?z;$e5{q{0_95XTFUeo_FW+yozj`a(POHem-Eyw}7_^T+_(aLCxMNuxk*w5P{F1NQiRvTq}BjOvQ#5c0_ zuNpl%#u2gOuSVRA3`ObF#~)mV?vNU z!}}d%zd?Hg#^JX>ogv6NI=}birtb&Y?#`5ft2f)(*-q5N<`Ls|Z3*hGZ^noPgSvTy zw+40dn}fIF!(_|11k=(6(X|uI$l&NtOSAejGOP(LjyuNr^(!5E$LbtMj-uq|`t_)L z`WF4lgqG1Aqg#wylom(39eqteUkZSU)aPoq*RD_F^Nsq`mathF9URT)o@4Y>#m|gZ zmQixVbU~s*n3R>wF*%!X#t{pP$NaoSAzzX&R%MI>$pcjxHJ{{5R2g%DWUDF@c?j~Q zs!U`X$b(clLCD{MY@dqr!Ir=h6s6!6<%-NqMd{z)ug?+vW*Kvj3}zh)%LE#LkcVUl zc}OPE41FRn2bdz{A!Z>D$>~sv<#nRy*sY3AX;ySPpdV-ewgHX64qzg12{0MB44493 z4om~C1ZDtN0s8`112chZfCGT{0L{R4z--`sz#QOuU>@)Rpau94a3F94&0XPQO*-Z6j zgF7)rT$@!X1xblss!L1oQ8*G4z5c{RYYT6aoYKOo1{Ii+y$OC?Bgx5Le{!;Qar$!H z(wE`=#X%jG1Wtk56oESe0(UV3x9@QZd`%H}ARzFNVyQiV>65Np%rQe5$gz)d3CF$( ze#4gWiEUdNaDCAER!-f7j4?eymR>)NL$H{eTsfY3r#S@+0iK(Z}-}Qn*$j zIr9l0w-0#Am-cy{JNbf=xS(BM)vP$5-nYu5%ku+Qdy?~(0M~g2<}Cx>=NX#!xuvUZ z8+2f7y4V--tJ)Nc21BUeWKo`$fOah)4a* zeg^$Uyde(J@0@2vzweQ^*dPN;PcBdBYBHKF|-@x{ii`*4<*RU zg3pg|VEgz~wO)~IT{=FMaL@|Ui@K{=`So#xP2}^vUm5x1b0r+Ou{2e!b0k}zC`~0C z^i*kTK=>VHlek}IIrRBBFQL7?lAo9ABFTkID^m$at$_Yge!k;uGWY90m;7)(LiK>dHUu~0?zhi=S~IA_1JT#0k8KI=hFBL#ylV#dN;;N)^n1F-7Ch=cK`TP zWiao@hcq68F&_wrK7;X+b)V#6&x-M~ZN_+s_Mgx=6d^ug@!c2?nrFB_knFfujMJF= zp&!EIe@6W;LVUvFXD~iA-*6ux+3~CxpD~*;KBE0uns;V7@46iq%saJhcjLn69qJg( zyDO9{+VzUQU8^frirXXamFm6jI=RkcaV5VknPJ#6D=wm*Zk6?PJJm&8pW1r*75D#D z=)Ym7tf#xl?_aM6)}TFmQ`XaCr~`EW*49%xkG7tYtk%;LvYwuTpIT4JuBoSy>xQs; zivF#``TK51U7_`YT2FV%db(TIQ{=av+8sk(L4KindP3IIQ>ZJ0x@u&%^>lDo+wSTL zjc4LsSx@(4JZavm_4J^ur-x-d{Yci+(-;qJJv}4q={Z?X5s&(vy&wIid9K#egR-6; zmh}|zsNdPA(Qj=%JtOPsIrMvISB>np-^04vcGvH(RZsEx(e*!YJL-X4|B-CnDeLKO z(#v(MT#u;h3aSGZe6EB8k44ne6SAJ3DoqWoU*tN3>OnrvOK9JYdLY-CBp2?K^>lY- zs=D4$*DX{R@^L;w`>}|6dP3IIQ@&JneWb2)XkMu`vfH|Wd8M}Pu5Qpc4aPhm9J(Ll zq^+k1Wj#F%y;zU!p>Z0F`9L`IbVNNpBkSoojF)Kt8jVL0;u99{$2e*0=|Ndf4`ZA} z`vWvCMTk#Wd^)0@o{{zR9L7hqKSc9Rt&!c13+A2Lw!3lB*3MY#X&h_Ya_u?)maXiO z&aG@?rxN#v&Y9zqw^$u(+Dkd!(_Y4LUHdGK_qEUFxW4^bjt{h#b9|_M4#y4cO|+iC zoEyqo6+4UHtE4hZ0k8#F2=oI-0$YLhvF_%Au~p55W4+BI$KKFPe&02aU))~%u{eAm zjN4lPECdz;M*>Fz?GxcQ5q=ZlH}QsM#luW3)jE^E#$ak)U^FQ$kXs?QL%z*uqPW8n zMLumZpLUr~hs@`;ugGV3N~JL{X?0)IvYJfO?F$B&mfQ*XF35L7UOPaH zm09GoT;{Vv=Ce}fbH`WYGd$;;=d;{nHm#`6Hm$75G2O8s&$Rqr$UlO7Kja7VOy7p* zUR0xv7Z{YFuBdtQiI-=YZQ&%3FQ^R_qg zzPVn!k@wBLTgaYQgzVif>XMocMTCPC|l7VH;jm^m?W^ zH@xak(6ij6BvY=*WXerVHRYzKn_6IRGwE3i>>cTPmeZ$?DW_jQQ%?W>rko2dzXLqX=7)XR1@Ot|!{wY6ee8;!TXR2&Q_5r(s2N!dIEx=hoKkx^@)~Y4`gC5`Ev?En()ZCP6YBc+gC5t1*`VxzY(RTLWBet&{dcXc$}ZBQJ)yCG zLHZtGBH9rq$B*ajh7E32UL!r)5hnL9O+NrkK|8{<_)6Yx_^_^ee2DC5Png!Wv+!H*UYSfbjo!=uE^D#G@j?2 zohoxJ61keQO4BLV>|Bv+PEsY$IWJY_>S$Fcx120rI$>U}Fa2{z*V;PZ{KY7LM=UnhJ@Z$3o$tls=r!p(ZKcD+%g6tz!NrnOI3rng7V zoh-e(ISbEfy6aPS{m4d~Y{coV58dUR&!+4NwCJ&g9o!qJH0n@4_u5>~G16~s75c5v z7f5}8o!VRAd6nCrYZvzCBK)4eP3WK3_-)nrZCfhr+amn7-!AmqHGa=&{9afg>@P(4 zy?BSvzo_whUgNjpPGR2>;kWZHq2H26_vDZ=lMYlZ%g8o%utzg_nV`>qJT z-9Hlg-5S3aG=4ANFYGTz_`UL=(7&Sbdr{-J=f}dnC&F*K8y z-(y0*Pvf^!-~OKn{eF$#OB%n|pAz=hBmDmKbD{r}#_x|BzXQJz_5%@q z2Y)5>2Q_}XG=6`6M%e!x;rEx{3jJR+e!De(hn^MoLlJ(5e=qchHGVH^{QkOG*#8>g z_r?~Xe?#NP9aa6C*(=f#ZJv;))BbQ~+!vipnR2+qnaHd-=5h<yg9BD6~qLE=Qzf#C~(th1l8X5NMt`@RI+UHKy$gt17M#uxD{rYJd z8TRX^3;BDAiHeexw%J0$i-R~&rac?FZ*#9o*#TxoJ zWGmVe4vK%6x2NxXCrOX?goFD3D*Z#?V6-C~8vi_RN8jN-AwAj=4(-1${WIV&v?H{| zzs=iezq6esJK7W4`hU#V2$XAnQfoS)Jyqt6?_iW`e%4*-g!bIK(kaKnq=z|C@y?XO|g!{!B!7T6PACm4(z4`Bi$MojU2tPs_ z!qeiuuf{z}x}AFSJA}LS=97d6_2%~p56iYP?kA)>p*R17@RZ(smhg<;`~~5;Z@xz8 zu20?dLtP_u*N5)%rnSJ4t)hkldYHHgNO-+1jyRg3%;rI57LjShL?@f*0 z(VfD6G{W!LABFyy#_uhS-#fd7{hbKEcV7|ucQt-*Yy6Jy753v1ekb+`{RxfVQH|ew z`-T0z2*1DmN$CHk@jIsRJ9$vpPe%Bi`isz?()hij@q7QUu)iPS_rV)N|AEHuU5(#| zZwmW|5q=-NE%YC0{Eln<{(el@{~qD@@w-C*vBvL&#_y98!v0Bw-|4>z{b`Ngdm6ug zoD%kbMEHIBfzW@d@%x*`@3W7D{j&(aGan268I9jbjo;^|h5hpgzkhxz^#9cOoznQ7 zJtORABmB<&Q|QlW{NC62eQ{3MzliV)28BMT@%upf@ioE`xkh+Pt`UyPHNrb`jc{D9 z5#Ez)gp+cO@P6+af#DwgZCxYKv*lC&^n?4L@V7J3|0OO2Jx>VRvm>s-S8yPRGqDQy zN8&<+_Ux#A6b=NnBd)|6AaNnSv}Z?t+C4jx<(?gJCdiMt5TQLgvTOG2hyy`wRGtLq z{LsE0);L-&1ocndvm;K#I`~B6LTLBwh!e4%+WiN)5ZXOE;zZE+MdLzf_w0xhL1Pe& z3!&Y!BTfYMI~o^4yJtt72iNdCQbzPpSTcweu4W&GI1gZi3_1}AV?-o1ofM^ z5ZXOE;zW?1xDYA_f@I=EP=ARF!RMRgKoBQ_^u&cwIS?chC*r@A3-P6TN;OGaPf6AZ zPMHpziB;$?aUnwWlin0>giUTS6ffPiC71p%qU!luzCtk#CmG?ALK%W)l+aHX#7&5a3R9# zDL4@{1}Rav5MlKcoCxZ7P82RgSUm+Ng8H5FE#g9i)l+aH==vuvgj!Eo1P6lbgTRp} z0}>Y^te%1sL3%5=5KiH@P32;eeh|13bo~<-BCMW*6G3|7LMXy-hsuE<{U~rFs16Vp zg3lw#DI-n<=?lSy2nfGjDhGn}qri>$cXJ`a>M1x8G(N+?g-|&VB-_A=p#BpVBCMW* z6G8f6;6kVz2$F5!L{Ptp3lUaN!HFQf16&A|13~f_a3ZL`#Dxf}r{F}8-T^Lz%7Gwx z3^)$ zoCwOXmkZIw@95=1h&4hl7h=kuUM_@KBlL11BPkx8Xm_;LB?3>#9AC6%DQ?Fa7163SWIilNZ03*VwSoS5fJztMDyqXsB;&LW{}; zuIie)YS=yavBLR_sw-;hs_HAM>KiLuwY3%XzKVv%`l^~*kMA;HW94PFwKpyt4$Tdo z$|f#Y^4)G%fy-Sra@6IcMtSmGuF8COe!+;#3o1SCg3Cu<;TeHH=u#`f{z7u?wWW@bYl|K&d}4He5ek>W@zj*B42BVMP4#Qh#)Qc>8Hm z|A{wTe}F_?BAc=_Tt5e5wBOrX3w^ZT+hex6`X&#vxo@t+&j&JFb=@MH*X8pv`7i04 z8kw!J-tB5~FHOwqNaw&6|Gn zuVTWSX=AUsQhbczA4!iSnxC|-jV*TZQV%4me@{y_tJke6lP}c*)lIDrDo&9NsXoxT z5~XaGm=63$Fj%kptMx%et8Ay*Rez~wSva8Bs$H!oDq3Vi)vo5R;zh6%eL%IV^-D!X z`l}$WXH{bCJ2)hr>aYIJq>9e2ru-;Uz6@IOSL5)*P{|F40Xbos&@`d9V; zc}OD;RDTs`Lme7BOUmY!i-TzX OpGp5W)rJxy{r?|IdG8kh literal 0 HcmV?d00001 diff --git a/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x128.co b/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x128.co new file mode 100644 index 0000000000000000000000000000000000000000..04953c1a46808d3eeb41cdcdbe7bc643c240e811 GIT binary patch literal 28808 zcmeHQ3wTu3wceSTOdgXL5O9J?95K>BVGNUiF&$`-ho}gF1Pp%EFk~i~NJw%sL9y2x zPbLom0TEHqVrx-qwb$0(R;%qLj0y@iC#V(JpLO<}{X4mR%Jiv*n3!S{6E7n>!eZnxL9slJUn~x$$C9p)#pB-; zmc-0ZlW(ywpMwDzOviZHj@0@&r78r9vxKN4OA|8rAqB>=h0N@sxb#{f>!|usu1jmA zeR(K02$3zLM)Z>BGvM&+`04(7|7IU;a#R1l?)mgFp9McX1xn-VzUW+hJSky%g<)5Q zQ0u6VeH_$J!)60cyl(PUC36__|I)SC?e)3ri#&~Wo?74CD>vI+jn%$Gti0owc2{N7 z5?5`7tFqGL^F7|^ak(pQX{>4Tz~Xi{LRaBya8-JKR$W`a&{ezHUguiu`RU9@?e$ev zK2Hgt&=da)`^~P}C7y~!HFfSMMAM3eORB0ojqSqa;RcsmcZ!eT^z((cR`3vd zq^4UNJP+68JHk_ZjHg&%S9|OKnqj$qva|>%>or4b1gG3pYnRd2=*Xc4-mL*K3kp%OZH4_bkgw^!eDe zCc^o8&vHWquk)Pcrqb)XlYA(`>3UD{=MlWlbCM62PU#uSCnKD%_bi`@;PsVR>i1E1 z&EoFC35WnD{hmp8-(CYKH{z~s(FWKNZGf&m1bD=&4#Ms}L{L{FUHhU9a3IPv*H<|yNk)|UuMZQK=;9(4q3)DbfK5Mk3~Wf*LI z2oO3Qa-xoq8+C+HeTcAWrZNn9eFzXb32h-b1o z5Dk3@5jqe}QAb!Bb%fjc5Mi?i-(Hm|q4gm~=t#6h9bsA25mxpg!Xr)f4U2jfQ)~JV zB6J|uMIB*7)DbrJA;M!#^-Xvz+*GYhi--CUCv-F(i8{j1qmJ;OeTeXAgRjEvsjKgq z8Bg{hO6XYpHtGn^L>=M3ze$8=?bVI-OByPCp6bP(x~7V%nx!6h1wHVo?Ldt1fH=kb}g-_s&zG0+*04TNCxLNCN>QYey`$&8Dm}8jc6>m zylQd1r($8%m~j;gYwIf)`6^wto~aFm71d*wjH$R8+Y~jmODe`RRafLYmX2{0PWgkq zp|R#>SCgkZpPq!-Z|#kqf5T2k$a(LuA4nojrABt8jWIoi{@reqpzmEZmqt5BzAeu{g>-HNj8-!?Ehmzx9gAWJqVdJ> z(b>P|i)Y%?c0kWz_|f;FxZRk<+R$lzpD(=4znUBqjOhplk+s1b;JL?|1AOb#Y!29< z(2~r7JTe%~fnvfqbD)GU-W*s!Xfg-*-e-b2ppk4b2i6cKngb63G{{eqoMs3F0L8%F zKn-{V7yy0*WCIKVnl#10G@u5w0RzB1ARA~16q6lTLU!N+vI85)KFAQzfGe26w!&bt z1x=>mL3~a@gFzr;96OlVwiU+)=I=DJ)=m>^>omruG~XUrxivqS&&T5aB;2)g#Az8|AC-6y$HBqD{CpX~435dcWR4?(BRFOSvp8NHyqM#q!Am&~2oB%~|2t?` z82%jL&k_C{;m;BN9O2Iq{v5wwd~mPEm{l`;c{`1d@ge70r^B>$#=y3%8r$7j%=UB| z+n}@CHSMy?{MW~baiex^PUqUYn0vV2?apz!-CoU5)y%Y#Lk>17E&y2`$5Y!lr&FW$ z&QYT@ZS-jWed!|3>%cU84nqcGE&%5O^MLmOi-E5LOMpX$U@idX0vmvotDm`ZW(Q|u z9D-~%J}(<9*yFgBvC?2E*OfwdvDB3X%ebx#x?xf`Cpd@e=0JCe)Xfdf<+{1hNUR#IPn^RC-tx|n)ZVq z_$eM0*VwCNtfVAOOHLNCrc$i39A{aXH-KYj*<7!4Zn@W4euH-^`QiExvR67W<=KoD zT%Y5g(e%0K%+1xb5hMJfPXluT1Jk|5cR25I;ySjlL3@}nhFO?-kLBMPbC4Vt!&=;K zTbtXRWbq8~PNcTuookaZb{5Q|WvCm57UTe3>@}3@cx}ARrfC;n?B{I}m)qN%Yt1sg z8S%{#;+xsV*UTORDmdVrE&D9q~!S1(sJTjoOfO6H?DRXooh3l znVOcBsb2zf-BkVm8gtnd3>;T3C zmjM%jD}YJBmB19>YG4|04R9cEEifIp4mbpOAJ7Wi02~T@0GJ8f2+Ril7-$1N1RMt3 z1k3?G0vrzf888?4bKpqeH=Eq?N#fe9O3F`&_flP2hL6G-AMf?Y$LF;0Hi=0stZI0^CD9w_$2F3e z==CQi<}6KJiCg*#yuUnXz>=U?;5JR*j)1@}X5#kU#RB(f0uKZP9@1>J2QYn7wTn5X zX~Q@U&@SOPFwSqRM~yry>Qye1j*n%bKlD@`FZ zKLI~|o|9df=lSY9r)vz?U$Qk`+W@~b^p9|0dnNDRh`jY&pT4{@h4e#KRi9z(VtxYm=Hy$ra)Gc0=r z@Bz=r>@RFRZQGz*=NXvY0ld#MG~~I2+ur&eEOGMr<(!qc)ufr) zP4fQJAg_lKM>wo~QnFsJNX}V4DVcEiD$y%&UvgfnQ-_MrO5%|_YRxN{nE>!&%=2M9qpC;yi^xSE?8cfOgL^8^q2DU9b;E< zzrpj#59cFvZ1(ZBY6DJssDwDPgwL6#)sw`?js~SpB3XXVGG7bv_C`h&MN0!uj7Jw zr?>5GT==|09iw@7g?2@|Q8Tuyx+1NpJ^Ws&-|KFW>pT`y^1G5*rmb^g!s_XBvYu|I zx`^vjt*2jc|6hatn|8{2x{Lh&_w~R!v}bS1dU_OffbQRFJ*D%g^^|12o*tL=^d$WB zdP;Vso`$a*Lh32{w*luLxE*zc)(d()-6`woE?G~J-$rV86mwSx--4Jk)x6TGrFEvYsLy^?T@k z^qc0nUQZ9odU{yaQ^ccw4?Tr`tM&A>tfyzu?~y$S^0szrR*J#pg%Y|FG?- z2Xg&Ka?Va!Pj``Cu4CnTL|<1>9kAhZB^-7%tezg1_4H(Ea`*a0u0yCEm7aFLUkbz=Oc6+4XdZeWj#IVOV-y%`Z|Z^m0lxzts9tEdfVRW z28~lL<^kcz{TL^;o*tC-^f2^dJ+_<1DHrpBaOA15dU{&c)3X>a(S9F|MjMdU_V)BibLLd8gONUdILVPH)@WxTv)=+Ikwp z+O}SQ){o~i>pJIMnaID3r90Etmn+BDE5~z{<2lN4nQ~mJ9M4vcn`oVZxj2%wY7Q2= zM@we5d|(T(0O$vf1-1em6Wz`E6RVmFCVHF4PQ0m^{Jv`l zjs=bdI;OyH3jC(PZ^}*0nul3hstp!@jmgrw$ZXMCAh$wphkS?GLUBjMi+tK-KJ7A} z4w=s#Uy;w~q*iT)eBSU)^J??NTiUA=EgdyUmOB=uSlX6Dz7z5)$ake!y2mO_e1S;| z8VZ}YT;4G@Xei>CY23R8=sYc+v&>^d$tXaXDA3@$fm0&iwBxx7>p_(N90<=ZrY{5d%Fal!UR-Z$4vH}k%^ zcL~|^s*t_=gN*ekSDG&I>a zHfK#KY7QFQcrMj2$#F$tqT_o-iJ37mmQ173k{KIo$&8D$Xl&DKn$gI#;!Uslb2|lwnx7S9` zLh19v^THele=aINYkc>2$Y-r=*3bhE4G%Qt0b77`fPUZ)fUQ0AfS!eP$Y-x=JT{Ae zPMRjr&*pHsO%vF`=5cwMCU6D&0hd>50#_Hxx;9fjSET3n^qh~^CCtA7?tv`Em`G@d zHKy{qZfn)Hk$!+NkV zF?JGfH!8PP+edn|BTO7zntA}3gm#1}v6Z~t=utiO_z>CAo-k#wFLep-J(O!&f-#kF zV6x0PPvo4IF({RC9+)*Km2yl^xP;@7WSMJ$$TdA9Kb3MFk|lDrCQRZv4^5W27K&W0 z8KtR|>(DHbYi2?v&pA6;=IU(KD7VZEUn*gCmM`@SXV02DQ2gt?juxZ!dBS$1bqC?y zM(fLj_ZqFd?JV((U^nT0YP7yaxY=kuK=^B;_0NP)q3wj8xZ6m#+-QA~aFx;8MR>o_ z`U>HLvaO7}k93b2t*;Y4VYD71eAa0FE8!NSRb7V^YK84Hv>EL)wVCbVb0@>-ZqC3n zoZkA>TR(;(&QQeZtq;BBoyVr_4zw7twH@3Os5Bc;Klj*O&ok0L*DCbSL7y-60d{gv zzUMV=f4*JVpAYkU;SQmHLGgP|@!Pgs*tdoGZNF3Kw<~_nD}FDo680Cv{9d|C=wDL& zUQqmY+%4=o!u)pLBlJ5Jzio=&%l8WV%VBt{(~gF2(Of z#qX5|h5eN#c!wL zx9>4w-xub${})2PU-5fc@q7IVVShc$?@zxH`adate^mSq{94!#g!vu(jnE%d{JIps zKR+exe-88e%kPB#FN)tT#qZFw!hR^s@9-al{;=ZrisJXzEyDiSFuymp3jG_3->cG( zxgI!R3gBMW3Va@Wi8a7&z&n92Lf;770bB=s8TuaryMP;kuk_sen5*^x9=jB?MvfD0 zLcY=_oqggj$`kZ01q zak`eam1)xsmlVe{4aZz=fjm*lEy6zgN=1f!_9Q9$g}rRDBEw!bMar$hK6k1j!#?*a zDYprG`7}j_z5Hq^w+s6X*C;aVH(V>_4q=}^U6Em*f1Q-?5cV5qC^GCf&J^Y%9M<(|5mjNRN9n;jqE?KrhzN#~|mRJ>l@!hk1MY z&Ub?JXiqqN@NZH-0_LI};mFt*csu$I_bKVoj&S7Qy{VrAN1+{|J@#$hPW{eyhU{oh zXdnCuUn5Yic?qqlgpOpHGrogSu6Y^vq!K!^?n$K_3lbjYI4)V{itk*MV?oAmQVGXp ziCha4Uf?+wCCgm#9g1=+%-EYsSd=Albtb&cbDoeabH#Ti%FUVaNh;xltWWqFL4SU7 zgzm%78Le*-ZZ}$w5$-Ws-zVHJ)(BR4FaL;iZyK$CCp>Djeopu?+7O-+_kBI?JEYrb zw7yHY%V<48c+hD5fbg(vE8~7jy5mOcKL}46t!D^N8?9dwp8e))gx>npTR-$QLT`QO zEpJ*2906mE)(A(!YTKJH2>qK%Z9AgW#J9E!`&(gtZ@(n;Z!3OpDt_-~J@@ ze^dO9Dt;#p3j2vLzmtCv`jd*^yNcfjhlTxvFuxDq5c&@lzxNcskKPpakHY*uep~22 zR{V}Bet$nI?0*mQ`{X^L|3vXSuK0a=T-ZMi^E>r7p+BYgy|4KF# z4EN}7>l%TcIiLKeADjq{zo*9g&*4bW^MsJSJmMgH1vi4Y6l-vQB#uP)ULLiNz>T1G z#GzORB#y*6dwJxi?&Xmz_wtBKL4L%M=-$gCyRw%@+z4u;^CvhLi1z!i#!)#E)IWVM zkGK*W;1h`>q3-1oS7IZz`+wv}sC#+Dm7wv9#F0?<@`x)zV-SfWq3-1oSAzN-i6f!z z{WiBCZ4>aU|$-CQe#7H-hZMohSnmM}p5gQ-?+}aU}?eBN5Jx zAU$y=%7Mg@;PXauBZw3DOfsLgz-1 zOk4@-FL5OJe3RS=;!2R7I1)NHf@I=K{Ht;#&Z(zVlhk@jvO#do4B%3%L4S!O(Opl; zuGUkM!8N0HW^gFh0f{4VPCcb|YCR=cuczQrke>-0iSBwzcBP&YH-g%jC4WNaR)*Bm z=Wt%No`Nf}0Y2#wI1(ZC6kLgo)b9V0BN0+h!Ihx#ONzje2&t#wO3)Z2Mc_z;)KhRJ zsNb0pI1(ZC6kG}FcjmWM6Jqq$iGqCj55j+z8T-180Kj0C6PvJd#{9;!2Rd033;c z@axjK5u_gn&cy#UM29b5_OKXD{N>M6Jqq#p&2gwBm1*$%D* z^_w^nA@vko3DP^kkt2h!oXK*7ZXW~jwj*&PL zJXdfdC|BZ2P>zu}5HwmI1)Tpa3d&3;!04C{TzuN-bX)2LU1+uITF)$ z_j4q~{y{%SLhgU`b0h=@gE$l4;{A{3PM^z>;P+c_BZzlETnXYC5J!Tq5x_4dnYa?f zJ0Ok(zwd$@K{9bAh;I;yBf;;x;6{7|J8>mG1xDgX@cXX9Xa5}Sh%<2p_*EPUo-?=+ zI#+^njKq=Pxq=%(xe`}`a*f21;JJbuLAerFf^v<-k>I(48$me|SAufv=ScMMKKeNl zVvW$xk(jo-pCcjG2>l!hxkl*cNc>Bz5uW)>=Sa|RN3?9kmKlHTzEGe=%2tWA#YV6B z_zUqz*>uEMvZp`pI92`wrYxvFdGs$uuw#|;-Q zsjjH0tE#W4s&A}t)z((j`zjh5>#J&NJ-*9)jg^I*Ip)en>U0~15_qotlr3)L4&{q(T-lcfI4!qE0Jq&}x6RKJfzLp)o$ zJXAjqVx-^ayBGRMzt6|)b@fdiW_RCOhaVhd_UgJNcCX9lW%6IrH#IVQW4+te3>800;R2=^! z7_8U*^}3*A4sYEPU-y@4mVpC`t^4bBL`R!!sN41Yb-W06q7&$Ly>98K*DV=__d!1< z{0`pKFWq1NJxU#mVN+kls(|Cm@Fst~u6F;ra>z_ECfFZF?5`t_;* z{WkhtIeO`JSsy?ABo}{8z`vx^{g27(`C^`b^l+5>2-e{ z=fd7Sc1qg~HwnQe8zw`fdx@UEK7Ve6PS0I Date: Sat, 13 Dec 2025 11:50:22 +0800 Subject: [PATCH 2/7] add ps co --- ...blockscaleFp8_g1u1_novs_gelu_1tg_32x128.co | Bin 29280 -> 29280 bytes ...ckscaleFp8_g1u1_novs_gelu_1tg_ps_32x128.co | Bin 0 -> 29728 bytes ...6_blockscaleFp8_g1u1_vs_gelu_1tg_32x128.co | Bin 29320 -> 29320 bytes ...lockscaleFp8_g1u1_vs_gelu_1tg_ps_32x128.co | Bin 0 -> 29776 bytes ...blockscaleFp8_g1u1_novs_silu_1tg_32x128.co | Bin 28768 -> 28768 bytes ...ckscaleFp8_g1u1_novs_silu_1tg_ps_32x128.co | Bin 0 -> 29216 bytes ...6_blockscaleFp8_g1u1_vs_silu_1tg_32x128.co | Bin 28808 -> 28808 bytes ...lockscaleFp8_g1u1_vs_silu_1tg_ps_32x128.co | Bin 0 -> 29264 bytes 8 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_ps_32x128.co create mode 100644 hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_ps_32x128.co create mode 100644 hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_ps_32x128.co create mode 100644 hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_ps_32x128.co diff --git a/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_32x128.co b/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_32x128.co index 4f0525b335e7274ee99cbc391e25ad68e3e4ccc0..cbfe528762ef94b27eba9611ec2dcb72e902681e 100644 GIT binary patch delta 137 zcmaFxgz>=>#tj@AlVvmlL>L4DAFwm9H#8X4G%yGRJm8h!XxN;i@rGq`jxLg{F_P>a p-3Xb<8eT}UNO~vdXe4aD;nl-3xxybw1(N>Bdvp^vYXn#*0RZX8E1LiS delta 127 zcmaFxgz>=>#tj@A!VCg|57-&l8ybvi8W;ou9`H(VOlH>z+?=EFhGnvXDi1qMFkClq v^B&y@84)Zx8YYW)1x(J-NZ5SCtA_=l!WeF5!{ikIfXRDw6E&hpi diff --git a/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_ps_32x128.co b/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_ps_32x128.co new file mode 100644 index 0000000000000000000000000000000000000000..9174945af01771b433a5de6eccbef62ed3a6ce64 GIT binary patch literal 29728 zcmeHQ4Rlo1wLY0UnItp$0}`0TB#szqpfH9>f-xQF@Q)TDkbuEO4MS#<34|ms6BPS= z@n-TPARtCmu-H=i(RFp%{#7knwS(I{2;82dY)Qf|e(WEP6vG_NU z#WMrc=gv_kLxMh}*Rn+=Xu3Ku9 z_BFw%bdN2PgkJU2LHs&?s=wO5xhET4)W5HLK6T9Jz>m&%u`fCoA5Th{ULkleO{i5= z$36yXKiVw7i8oA{UObO6-!GiYTpq8}y42lJ>#p(Mvu2ys*-+&@%F5b*VRcqCE_c?H zJ1Z*OUhm@#Zl|mK)`sdvH(Fe-2I$JY_09_SPpfL`7CURUS!v)o<2w7S;ygfJ~%yu7l~-Owgn9%M8oZlO=~aS)Uo2!#G{)45zCZTEd*H&kS?II9=)t=cyUy zhB;ZE85V|dy3`p?Q8RReIa!|>&Isdli8I__E_I4idQz+pbGANHToK0U5~uiZ$qhX@?vcZsug@g+tPJCI$+IjiR_EiM^xkv#_^4RAQp07v@~VB2D6 zV}++@Dje@eh@eq85ov&TBMor69|0b5*Eclw48qxdL~b5Z|8&)8}c%A-O*hG_`SWR7%7V%n?V(=tqRDQ#8Y1=|_Oz z>5v_9gq(;YjP6H-t+O@5klT*{!NZUrafHH%BiQ>9;SqPx$HLK%5Wxd6CE^G(B91V# zKM~y8&qZ;6A_SF((ugCJMI2#aKO$_ygqWlGXe{bSjNlpJia3HN;s`bUi13JeigqCC z`w=2|AQ~f%up;6JxA!B$HaEV#YNmwTj~Kxt(He1tl@Ui+(~k&`G}hHG?OjZ*??;H> zf!GjngpCnL*xZi@k2Tgc;<0dJm1bIO=|`O4(Rd`{2tSKB!hiQ8!lU)xa+kZdu6JfU z*^el}WAU4aBRm~(gg;(PglDW(4Ry=w%f0TZW$xO>^2+KJZdW-y{(pt`n10Lm7qGF= zW8kc*uBz2P^(8&I?2Yn2ZC&PEQC?Z&Y%IUEu3@PXoVPKtX>i0*e$%W1Cw3zm3T&0j z>fGgvE60v2UtCjHvD91PtZ`4PFD$PbyL@bUZQU*2@+x=D^765bRpq!^dA1c}ZG}_+ zV6AVczQx(-?n$aQYxWy!gZn?Q-4S&1d+`-2D(dG}Z&iImc}3mw+Q#}vt=`*~Ev~CM zGC9(8tV>;ZR?`!Cua5oFdaLsmcV$Bz29(Cdp>EdP;%@L(*VT5YJ4s@{C-h%x9V}tj zN8LA53?=~w(N34T@22RD05s;_VkcMbvQ@iSLDdr0?5gDYDAGZefCbSz znh2YbbAQfVp*eTOMW5Ti&n-Fa9Tj$;yx3*$knB>kI%Z5~@bkG6ZSeE0ZG*vYfkKWq z_;aa2GWZ>YF$RAzVXVQwh){3v^S$OcgI^}uWbm&iG#dO{05aq!Nlw=J{Qw7WAJ7jx z3uJ?Ie)>>4fXP4^XaV|xxj;5p=XX#$u$bC`i>Mt~Pwhi=ei^8p4|I+50(pEcY{Gf; zFxwQ5^Wa5oV*+D%+tq=qImQLzI8K^k@0ip&zQfM$@7%{8#CsjykK_F#ytm^0Q@lGm z1MGoLI%9zSP{F$u{JDZZR`6j3A64*Pj;VoEe!jFo8pnh{0>_boksLDvnH;YOT*2|G zz*QUv1qN}1|DCkE4u6jD=Lmm}@aG7Bj_~IQe~#aE962B}W|nnd+(F|ZZLxRQZThYk z2Df&}Y+t8?b$3dw&{?greD&478>7V7kzK3Z-f=f`jqtf#*>;!9BkL-gm|T3+#zx2Z zAuGqRWE*3*%VcjKJzAE>jPc!{BI3LTOvbesnu==;oDa+e-VbyDUjr5chYrQH2F?f8 z11VP@b7jvB%*8ka*j!vM3(N1e-Nsl+poHs6pu0lRl?F<=t`xfAif&$D9@ou-?n*^B zKQN!`=0lgQ=xz+$$aObDca@?m3zTtP8FV8Q-GaaZu3G@zw-g=Eak{;uSdnGj)}|83 zb9>~UHkCpy?U5gEng@A)kNk_K`H;(c> z_Nn%c=o#ZX#^YMr&1P9nOY_C7$2En%l@({(i;F#e9NUZMdd%!-XBs=!na=*&c^P}X zbLvFs7cP|L?|8Kl4GJ!n_ZSxmn+`n9_pD$w&U#`2^c#Q=Fv*JpEM%}=%TNuT*u4fEf!h6;tC(P zMO@zAYVR;8@ePP?2ovAHHt#pMb&MloM}H4-lapn6&>&xcJ<}=r_4`iZ_V~un?g+3y zbfT=VBKBw0nBR2jBtuMqJ;nPSV84aE9^>#kpiUoP?VaCw{ORuo*xt_g;T_v;?DJ0C zE1Skn+S3)#-Q0u`3j}nF2=55!7B>a%z{O;1x&n!bBj~deNKWSHOH9o6B`0UcG~4f< z#|+slx%jwgSgcGKmuBQ>OedUk+`^(UKW~@Gm*gu{8RJ0ma8;Is ze5ER5E>L^6Dia9{@>Qx#q$S8BR5?b---2wLiSwZ?hQ-Ko-tF?WDJim?n(C7>M88?` z!s9utU1ljjJrMH9WFe1C0h*zY1!e%_g*?(M5%T9r#D!1Yl$Lxs5TtnN^O+GZ{TG zK72-uMvu>E%wCbS21~uwcz^tlbrIHm$wgn0($jxL!ViEo)gTm@@H8tF&1XTqrRsSo!x8OSHAL8t?mzX)=b;#RS#V7Whqp~mQ0o=R*{dce5RO<&dQo@fZ2tK;&ZhGDKBScV zajgl5Z!1Yq>m141kCh}4j(DOZ!7uzyuxZ>cr40I9oR`qnR>99pb&=%!RTT+@gAo2*1;82KO6E^UAEuD+lI8v0Ilp z2{_kn%$xx%a}UqF9(a>`R3^6tTu3*ENNS-_j!MVT}{IhY58 zqwd2vDfOJ>(VN8hSs$95Am{LYoTc%|!F(Vb^%TZSsrw|4enyO!bvwpO*ndppP>A@1 zMfYJmXrAHzK(c+47^evjK|hkm|CIV)i1>s>Phos$zTrMXvi%t`J`=WMe1!eyH1EvH zyz6sZFz-~`zQ%>mJJd0nch}0-wn?(orqvb6MQx$?O7&iMld{fZQN_P2o}=%Y7Zp-Z zpH=GVPO6Led}{0ISKR;CqyPHdN-{}x3-?rd9?MEWVN22 zQtIg$_^I`j+BNl*)(snRj%4(OaPUsl5Lz3k^>nvVPxmVI6uE3ByOXFP$fc*Ao>J=R z8R*q|O0`?9rz3i6WS{jkr`NWxxyMK*GK9) z=QGSJwMO3OA| zUch(>`-3zdg@{jBbO__5t*1wndU_1wBB6qOF~g*3&4~+I8avU)QtjkV!T4XuOZETjRjK1hl+NQ#9D*UFxZ|agJ+09JNRXUTeT5oDu zYB0&okXsbFGqF4zzHi@7x%g|f(w%HtRW;bOx;n*l*V3V;mG?lt7xI0O9~dgGmAOS; zljb(Ay{yW(F0r10vp8BG(sSPH9z1Nz5YJ@kJ2QR9Z^1KRcHQa2 zHpUjo0vEIEx$Ke!RF!e(;0MiyAhZs2mgEU0=DZ0mkm!j6z*>waI1#4_XJOqnK=DKjC#l$n%dYDRmjNn*`tZ%>j~#-Kr_j3Glz z8L6qJjA6qt-?r{wIYeSUw67ip|Fkqy=4F?eGBYzxnU`N~YC-#&%dks^_H~!Tf2tjM zO=xPvGnY0z2mUdxS*y!qp=Z8y{qS5g+s2>&D%Tp%IBd!@Wi}`MuuaB;n+3pT;5?uY z_wMqKJCggW^Ft+hhHaI7U*O1xZElWY-bC&yiyjpnth+kYh;1z3YEG#TX}v; z`xNv%vvIa~X5+`bo<&JULS3|!#P4gC7I_Eh2T4XkUFs0%@%b?8{29^We-ONt9zs+?5=MCMdb)i(FIE@{%am zp_w9AbKGQ}b9#c3YoW;1oK})Vxu$1|Tr=V-c+Ob~O0M=6nR3fW^Cl5yWqOl7v-f_t z@H>8Wz}768pCfFO%)1Ehk<2d<-Y1#4?HuteWFP5%BANFSZj;Q134bM-|4jH4Y$x=_ z-9frllKBO~wUT)c;U>xa3gJVFtrGVj=^m5JuMs{WnU4}aBbon7xLq=9*IR{hVcRTu zR@-cOc3bG&Nt0YnX?RxESD*UoM>^u9BTip^=qvABHe;W^Spo$j(Cx1<=ukhqt7Q*8`e&ifQ}ljzraRBQpSM5PCfc71@q7MGp?_ZE_pHWm$12gjBgAj#T|&Q8t1493@#&4&_ zZ_g&tz9+y!dJM?p*Kcw+{N#pn06QcdK5WheDQt1Dr@q1b0 zclcML{cwoikzWh_5slv-jo+W2677Eu@%zi~g#IrYzr7m2qtA%;qal9B{vh3W&s4qI1JmXlgBbH_3n9I$OCklC}(mr>RMn?PG$wD?O z?WI#RGTKY03OQYApFd3_qkaB#A!jJw_M2x5`8%<(vK$xZQ=dZ`b*(*XgyXocS#Xag93I`m@6q(#_XO#2 zZzdd`dN1^14gD_UY}gZyh<=#c(|6X>q=!A>h}2&vodxE=j&M};^W2WUgMLhU*b$CO zJ&^P%a5U@)tL=ei!~J^cOUK?`!eG}o4Q7zXX|JF=>zve=5IOH_a!a}Jx>VQ zv!wmHui$_XXJkF@kHiJ(*|Q}3FdPuFBd*8>AaOyyv}Z|v+C58>l|4)1jF2C3L3;Kq zsa>;YNgNQeA)d%aAaOycFY2BpaY9HRjtipQvm{Q)X0rQ#=7MPVEQu3B;}?z#qTRD3 zP6&-bI4+2G&yqMHG#25wAlf}k;)Kvxe3Q5!e13wnR*bxe6GBK_5OvQolmkNT#1Sb4 z5*LKeJAJ!MGI2r(i3<|S0Uti^;eZe)g!IG(3FUy0p12_k$xk_-$^jux z2wvKXF0$`~vrnWa5Mn5*I|}fRId_ z5b8H^L9~08#0eoiaY0lL2+70=q5cvVgwHpH145h-(i0a%<$#b(oREK4F36Ya>A&EB z=)f6SkNy%Dq^F*ey|$i`3{D-{8Ne0U03@kf3@BP6&-(d>AfBP(1}FgvKB~ z3>PG*o`MrXW04Vt3ldaM!3m+U$oK|vL4xWjI3e`;CoYIuPgw{DgxW`dBT@<^E=W*4 z1t)~`+2DdWgx?O8i%R+t;D*rWpSU1F^%R^C(i0a%7Jj=_4hZSTfg3_~fVd!hZYi8P z;)IYsA6yW>@Y|zuKuA9h+>rlkE=W*41t)~YXEeAVDhGsQD>xz4f8v4!)l+anNIx1} z5S0T$vK5>V>Njygg6b(aA*8p13!-vBNS**r2=$k^AVKvMoDkC6!39w{AS6!!CuD#N zGQb6~4RAsJHE(Bt3(_>e1sULi3~)id*<27>ON{~t#E*MBaYA+gi3_4~Ku9J|2=Ob3 z3qtn90U?<ZJAo;%NfnG3@2x8Q&fzk)a+#HS!Gh{^#W znK&WDuOKc6zwd$rLNakeh*uGg3&QWa;DDS(J8?oj28QE;@cXXj{~P!ecEk<&9QajS z5S}wQASx$>aty}>;kn|w8s$oy5Xvgdp-v}4PXzosa%BKE+{>RSBO zt0D{j7M5k^%qbRIt`+}Ta;~natF3Z3l-D~OJ@gl?%Dr{vjUN0GWkdZkZ+V5Yw%ohC zzP_%Z5f&9oomJJfRcLqPM?e=ZuPU#ut*k4rtZOKD*3^{OdCTh?>ME;i+}^9b4HZ|{ z)ZDUc3^YsJ6^&f5eZMFDqPiC#EU2gR_y&k6gOZvtJW^Jf*IUAkK+UQ=<$gQ2rsw<#&*EG~& zYPejO7|dExw~T%fmC^5nHaZsrJ@6Js0$Gi7vAd=`H@9GN&tcxR%Hi0&iQ`o5B%4h= zv|T$*JuJvCR1dGo?TIjHtZIM=J#JGbPf~5C6;6xzw>BdVKtI-5K4rnIi8ovqGKqpO z;n$Mrb3k+goocNfD5(BDJhfS^52{SQRC81}wGOE`LouW}MCXcCWVJpeLHZ^&oA%|<3OJ_nL;h+V@A=)?EMeI5cN&E#jMZac0p8?GeV~{6IadF!BK=Mw zz0~`FI)3&4Gl&$4GaPS4|pXw8YWNhO4xixqlaa3jxLf4 pV(!30`klCQtB3l10+L`GNlmB>>9&ECB!j delta 127 zcmeBp%GmLgaRY~jFoQtg19k@Xh6baW1_ptE2fPv-li4)_Cr|K7*nCE#hh?&YDi1qM vB3w6c@&o^b%`$odG9p-XHcZy>3fMfs>kSJ+));PX!{i$OfXxs5S117h&4MTi diff --git a/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_ps_32x128.co b/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_ps_32x128.co new file mode 100644 index 0000000000000000000000000000000000000000..0a77a3e60011be1754e8d8933e14df19f0f7e3e0 GIT binary patch literal 29776 zcmeHQ4Rlo1wLY0UnItp$0}_~E5=V>_D2!o}U`z)Z{?Q@?5-_-^VaQA}fso{7f?}U9 z-b{W3YY zJTpK|zJ;Rs0`w`tRE$;ZNUfezQ3XL^h7eU`$wDSSq`+9FkeM|YyJiVlMXev@+EuNz zuL(w_dn{WLdeu({@tgRm{%Zf`o@j7V|Gw$@)G?m}KRVyV{?NJjcv8aj6M|i7Lam}Y z_AyZZ32heO#2cneFPg`g_g9W(PPfNlUFxc@aaDWnUbD^WsIT-KVWn-qvO3BemOHA; z9OdOMkLU4vm%~|hYkgIN3oTA(J#=NBI!C$d=atp9iyhV5tTm2huAj|*)LL6n;c+#v zZB|d!k6f|8wBF*VUhXPeT2!kn$w z6y0H*E_I4i0x8yoIa{wOt_b6FiBo*I_=Z4^d*v|a>ov)}E5mqQ@+?b=)cLq~LzwgR zp5^^vye@H;TZ?ZDB>7O7)AgR@FT!|T;v^p~p4u~%Plh>P?^!+*#_MadRG*{Hs%3$} z@reL>^_fXe-(CY~9>jZhMjBvuqyhH!A;2SU?I3jZA%b=_viG$}100Gpz>z)#*tXcw zQ0@*)g=2jP5i|w%MH=ACJ_OiW>QRSa zUli*@0PPs;i_%9NAvWR&@qLLfeV%3RT9C3v7K1A3$MKcVRJ_HD! z4p|XL7!h%V(S3-pb+%?0vilGqco=dcj*uU51bZJMJmLzvEeiV(B6uLCL>ys8#1Urp zC4x(PUljEvLQr`qi8w-O#1R(uA;LCHh&h^DV^JSs1kVU(#1Y&PN2u;YghyOcv;$Gs zhY-O7(GYQj6%j|cy$=z#x$x;#GbQ9c#0VaVmWU&)j5xxYK16t=p|);m&thsrA3_8V z#Kwpt+#hj-Eq#daSVL_CUJEx=YNo|QeTWl08jnOA;TI7{`0qYMc(l$_=5*E6_RNeY z`w%5~EPfktgl8g-@TZH3u-#f&U%R}n%;TzD=BjBZtEgJxa+cBS|5s>_>34j80UHYe z14ngLWsUyn3-si&FUt3fb(v#DSw*#@q3qV$`lU*6-p0hH!QlyIH_gg(U^k*ZFS}w{ zt*dNt#n^FWi>qtPmwL(_)vjrE`DK-3mya#G#Zy-4s$O0;wxO~N*DA-hVyrEH>L0Cj z^;Nex8eD;tdU9sJwbr}-1DhQ|r@jXtprWFFY4udr)t8mmF0W~*d(`T=ZQ0`5>cbNw zOUAm?iT5;txchYM*VbDdx40_mYcZHKCWY!o%`L8aPgQMAySj@c_IkAcQp-RIyFKcD znbKGY96&o<>VBJ|CwujGD2|h{VYETft$u1%@^{uxxtq4i=vUopy5Q`Q^`AxNj^RUY zz5j${J>cs-zK^CmqhwFE#;^Q$%(f*t8*^Cee!El0a#-7bTcwWKDxGc{ju&xV z>weqwj4i5IvZTrBb~~2b_g{`B71+s@J8e}?mRGriH90G|K8kdZC176k&PKu}le$nSP@N-KJdwaRvD=&81+adTW8GL+(L>qj3YujM(S)h>P4Zdt@kPN;; z!We_Eh%naRTSTZg`05Db3_h9AWbkbuG#Y#l0c6NelANsb`2dB$E}##12FM2Jd~{`&&R0n7z#?h~E~0i|9kmbA`DCDWKF~GJ@#pZluo>sk!)$Xr&VwIn z8{;3t+phLs%`wg&$8pjWd;6r8@$GhYUq=^v0Ke<-`#64og5Rz9{W*TyJN#@@2c6N+ zeyren1%IjFPZfMv!ABLmhhwTgm7g!opT;r4pTKdXe!3lATV88geeukN7nkRGzP+im*J{R3M%W!BYE z$hteE7U-;2S-$#e?~PGn?8vUgZf{@DoWs3NXO`XRbj!MmMkW^>v9ZxHKFG>(EZN4` z?K0WhM~{}}F=M>wuK2mpQZM z`sZRC{A@0+mxbka+iqj5*k8M`Mg1V}pl+Y`4jN zTaGMeG1kJI7O%4+OI$yI@Ru>>_Nn&v=o#bN z$KzVs&1P9nOY_ETz%_-wg%xGli;CPn9NUZLy3OoJM;bfYF_itS<1+R}$JB|?FI*_g zKm4JW;?YAY`Z^_6e7r0tB#2nkC{{_9y`;qL!?C?&zS};()NL=l$vutyaDV&RD;=2f zY*rKQ&+*U5>Rhys7$M6eM|wq{24?vNrnn35w6C}0J~pvI-AsyNCT8e1{U>8Kl4GJ! zo1K;xr!(H<8tk4(w&U&X2^c#Q=Fv)eo-`o`=%TNuT*u4fEf!h6;tDUfMO@zAVsAGn z@ePP?2ovAHw(K{!bc`cnNB;nElapn6zyPnGZSN5M`a=hCd%WXkxBJ;2J5W|w0s9MT z%x^n%k|D;=p631bv){p9k8$`tP^b5^wvO*T{>=CNY+pzGu=X7`_GJg2m5pO3?d|mI zZf?Yg`Te>@gm?ILiyQrS;9{~ho&Lnc;dJl#lao1m6BDz%$;nwUP4@MZywWz%{@`;5uM3a0754upO8J+z1>Dybovw-VYoKd;pjZ+ycx5{sd?NJ_H;F+zQMB zJ^~yL{5fz0@E5>Qz{i23f&UG(0)GX}20jV20sjY>3;Zo`9PoF*eBd*{0^lEjcHp0Y z6M!9!RByIAW8=lWSrMNT7we|Fv=W!X9vkcS#>QqfaT{ZN6RQ}WV=}s9ytqe@vGe%G+f)xZZ_ zqcT6Y^w_pQx6w5)vkiEkYiQ;g;1-uHa~<#}u7XS&$Jo2YbuiwG@ud51#3q^a@tegs zCq6itk3;NJVjPW6V?5}-9KA`V@rmCq#wl?}G4HDx@u<&3@5NXk-pG707U`SCSY$pZ z#=?wv)c2uJqtA#p(k{j#eY+To%pIP@&xfHebH8D~vwCcM>$kti&gYkXjs$58 z^QTU^ABvEd1=o~tSnK2jwO)~&wQ6z#;qbMj7j;+8;`hffHkHr!K_%pmYfU(8TXBL~ z=Sa?atT=&i_!GqmKH+zqP2+wkrO;>Nyo9#aa(-T_izMf+Do-FBw-)-V`1#&r)45;j zLh{4;2yNRu{CreLNzQ%DlR!A`2~R?y@H@q3aKFJcuguE4D#V;9a_KTA0q44m88d*T zu3;J118;JT%Ak3r+D?J4#5FKu8gRaAXvR$7LYFOL7Vu_QK?aS_2+RY*QTJk;lzL9` z=*?pMtPf63kVo)-oT2d;f%!l<>S>IZQuj$7yns;Vp-t{^zn0Kme zZ{xz}9qJg(yKCiZTP0a))#?i5g4WPyrTVP9Nm=KysG{E&&Cz$xiwdcy&noqF7u7}F zpW1r*HP8PI=)ZoCQcw4h-@jfDY=k{~Q>mvXPzUJwt*xhY9&J4(S*@ohm3n#_eri3X zc1=B{b;JEQM>6_CIB*wg2(1m&db&rcr~8z8id?pk-3inXF}N!*=s!=(PP_NU7_)e?N;jPL5wHOd$pb(R_f_drJjDI)YG#V4{bd?r_|H)N)p~kZsi#MkdWv{77DLaX&)Rx=PN}EoQBOzp)W}}@J-WxXw|;-4dWvgG z_y4e6s0YgWkL0X9NOnTn zOK97LdZ4T`NzUD))YE-BUAItO$j12yZ6`wN=}D!Yp7tcD>mzlY^Eu{~S|fX{ z8<SQm}c2+Rk~mnB3;R57AKh(B zpwElZ-8sNqU@kBZm-qgI*V3M04H$!fPe5b)gaYx5C%Pny(W2>ps z*p^Ubyff(=^au7iu+N2k9=E^v{)~xlmS-vNJ1(|gEv{HoYo*cDRuyl$b7`WfWfkPR zAg_hIKG77oR>{rs$^kCpszH^;)u~m+yM}y&zQH~R_PMan0r~!yCL5L`CiDI28(NDZkE@ixr}Qst2D05s4}j<{2TNK_BpW6g?%2k zzxe))Nx%61tZ|u5Yb%GE)>WmO)-TO8t=SCu$B-X{{L@U+x8Z#tJ-ZA;^s--<-?-zN zwmiSCfMdFJpj(zadO~lEN1p*~G2P0sE$LRSUpYv$uO1@W*IXvr*IjJ9Ws$*dkfrVs<^3owC3R_5&`vWr0iBOfFZ; z0&Ca}T&|M^*0Wh$Zjc4u!e(=Mg)Hz#tcc6E%K{te-7)oX(XM*lH^+;&@V+_s3EB0k zklhD`T=i!mS05GfvNwcWcS6Yj@ve{?P78VYheBR)R>-$~D&*VG3;DnNkf-K1W=t(; z^y_H9scy3E+I*w!dj-bys3=pqB$?8qqfO~CF(#R9-7ibnDJtB$-y0*bjJP;chRI~g zNJubcBqf=e(B5K_SQFaYk|dTsV1Oxo&>&NKYN{!H$Pmo8t@~FFl9(6mtB1fpEzOj1 z*=44Tj0{u8<(He9(Z1#~?4F^0-R1C~YDZoZ8e8!$rWNmpe~N3?;&fZ+T`*lgyf4kN z@%O^YwZ^*-oAR!i%^7;gCgX+90$>wx9?%Q?A+Wh;9?-j>Hs#&3HXfV9uahhb^s;$e zZjlAHu?1XSDGOZ9e#qrDvcPruN?o0;yjP|D3wqDV>muf#4{HDxB^e2I(NYqBu34Jp zounTi83}c%gP_O#Vb;sLA?smJXo$X&+kZE!S>8)}*b^F3bCSA%v9KdFMo;E;qenE$ zuaO>hgvQk3q(i`X*byd1mvg%@qkEnQN2ne4go&x1q~&<_Qm)ByQWD|71SRKek#lm| zpd`w9V8);%$}uJGN{)jQlw5N~t|@6bNtEm043VojZZgk#Xo8Y!zR1;_R-8n+4$Tm` zrpJ}@oHG-YT`~u-x$-I|vvt)jS@Il2^iTfJq z9+S+k6FwoCj}UH`%zq=?A(^%7uzWecb(TDg|(vng%H0N*9-lN8o%c?e!Icip#JR+ z@!NBc(C^Xs?bP_abgyWCDa7yPO+x>&#&4I#Z|`Q&zBj~g-;afUpT_S6jo&K|iuPAR z{9gU3(7&qjdr{-p^)u1l72?n_HPsI`$PN=JSy}DG=6(Dey=?y+FuLt zJNQeXKdAA0N#pnW6Qcd~5WheFTIm0*@q1b0cjzh6ekjE6@Nb0vu*Pq%#_umri}t^S z`2F?wLjPBd-#(4sk?o@WNQmFjKMMU(jo&L8zrXDe?SBjLd!tk6-_ZEIs`xR-1BdiJ zJj>w&w08-XuD|0CdD;1=L3J8DtjU2DD2>E)8kbh7py=00;MtjLrMQ#@D^QUQKw9lWe$StD1 zbcRMod+BwG+$!2{x?Urr{iYu%a+_#hI8!5|ec=s?e5Yu?d6q^-`^~e3{Jq#%S&obI zs_!R_x|YBi;TWE47CfT~hebE@XEc5GJx+Q&n+b=d-UGc@L%#<(3-*M=qaWt>^qKV( z>0wVeJoPt8XMiJMM>s0_d2UCaK|diq>1m(&h>LTl>B ze2qZ4X2&%r5!w=zobefua?MV=CyCIOaZeKEm>c&n$8iZtuK3JGIp(JQCW&xdhR8KP z?s=YbL4uMiK7&z?`Dq7|2n#YquJ*XMdCn6Olw9$diE^{2eVjx%A>(7dMo`}$9i!*) zvy%BO!d;U2J;H9u{6679u|_Z}&+;>*ds8z1gYbl8{*3S=*btr-&wVxSankLP%3`2#+eZO59IKcTzI{lkl`;{*v&VWd4fq{Kc;kdh1hf{ZQ8kz4f8DylE|P z>{(I6j)l~=H=h^!H#N2Gn5HJawM(?W72@~yi$edl#_vsy-|;=7{dkDqiI;``gvReJ zjo&-_MEg4-e(%02^zUl?-q!fN*Dc!L3-LR7K5bgg6@%#8)q5oLpcT(f`$w|@vNr>OszYG0Yjo%q47Jd@%!?eX#XH$5I?_P==~bM4;4SYMmVOd5#CbP2*;H*!aK?u;XP%I@V>G}IHjx+KImT~ zFg&Bbt!o5&=YIMNFE}AG|80x)UceEd_X$CJnY54hHQW&5l5D{9kvJlOy-czX!wn%j z;*e|v5=Z2My-f1c?q!m!>}3*{g#3sj64=Y6cFkTUaYM+4_#^iNi6cUNQTH;5D?<8k z91-naCUHf!klp_?M?|}qNn8;czi=E8?OrBvMQ9AdaYVFxnZy;Lu?WWz(e7muSA@pm z+r$y!^AlXQBIHF}5klgKsC${A+z@If?nnucI3j%B>Dy$Ii7P@#9Fb6N2M7Z2>nX`c46;)n#*Q*cE{pAv>65>!vY z71=^||IZwepn3|f2#sHS7>-C#Jq1^U#vncnM!vY6``?6{}ypX zg6b)_B6R-~M?|fsEQA|E?Zd$xDFG5kB&eQ(D?<7#a6}4)-%gdIO8Vj8jL`j09Fd@U z3a$w0i6bHlzuhV~g!JRU8KF8r91%XZ6s{d{MM$3uj)+hA?NzxUq#p;)$p1A*B&eQ( zD?;Nl8XOUo8$z-bToLL&aYTaZDYzn}9}SL($_*je3a$wCn>Zpt^%PtY(%ZoiQMn-` zPXJeh`b!*2vX}I*C)9F8$d0%n zUCY9EmGJIreizdU!AW91+1)>F0>d=<4T)i2aj(j)=1V)6WqR z92DY=e2e#gn$BIw5#i5Ua6^c9L0l2ySrA7=<%W<TgmNXW2;~}%Bf@h9H-vH|t_bDW&k^b2z4UWL#2TTWBQm3_pCcmH2>l!p zWsT6!5&5@RBdqvT<%q25bT&C<`fj;9PqlzALf<9Q)QK%b{?m1tK$9Yy6{Iak`dNto zXNrD|0+?69vtRjZ>CmjSV=I&Y*cDk312D6y27e{1z=FS>Wtlm1ip7>~#ebF&S6A28 zRJ!WR>KqMj`U_ZPp4zeoH@>;DzHXVPtlUvk=2>1>S6klzi}Iz8%Bq@5w7c-7p^KMS zmQ~eM)RtA$)|WY|tIKLVWp(wn6;;(P&()s#@~f+>Z&@}5nkBCC1}<2#olZxN! x z?wWDqT-gpsdA2h?%RX)!xa5PHwEj*DEiy;gY`m;VV23kdLhQJ8P^BvXDM+uhQwc>=${G6$EoP2 zxPs&Gi;VtOR|o6AAW;{~y4D2ie*`hy_h)X0KHT?bGHXq3gNs?6x7Fa=K$*3&X1Uev z@VJ@sFX1c2;YlCY=1Gjc8t15@uRb5|;so`{DVlZoY?K1iXRYu<|+Td6W zbi-R531rod#jfhI?CiYBfy10@mBX<)lWeMXk}Y37v|T$*J!*wAODERVyV~Oqqq8aE^5A{Gn_3!7Y&1!v6 zW%8w(qq?beNW~e7A=M!|SF9qd^(hJRKmGn%)nBbcDrPD6YP;&Ms972gD7Mi_G2MjWXAD$a*Ge9-0} literal 0 HcmV?d00001 diff --git a/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_32x128.co b/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_32x128.co index 1624d5aef6b2a995f4bf4076899951d5ce3d1011..8a462d0c0d9f0ce3257f94cdd13a3c6d75fab305 100644 GIT binary patch delta 137 zcmaFxfbqct#tj@AlVvmlL>L4DAFwm9H#8X4G%yGRJm8h!XxN;i@rGq`jxLg{F_P>a p-3aN)8lFh9NO~vdXe4aD;n~A7xxx=g1(N>Bdvp^vYxr9z0s!R0D~$jE delta 127 zcmaFxfbqct#tj@A!VCg|57-&l8ybvi8W;ou9`H(VOlH>z+?=EFhGnvXDi1qMFkClq v^B&y@X%Q?s8YYW)229S;NZ5SCvxfzt!WeF5!{ijdfXRDw6E-$?F-(9)I?&;biVz@R@S}zyGsy%(lA8p@UT-{^ zJOl*9hzb^4ic+hst$o;PTar;h!GiL%&_YE86$^@0s#tOEzxHEtW*8gIaKCH6FXK1) z&sux0b=F?9&pLa~{vGa^Jbj8GF3x3Q;$>vVS)4p3aLMD%mEvG}Jn8aTBL1ebWM+n% zY+a)HA`D1p3KL~NQmb=HRT#Lkgs70E3z_T?0%O@iX7+I0JYC2N^>!5N=EbsoRXA1) zkuBuTA;|L?aQJ2Hl)W0??2~mK8sC>apUU|x*wOjE>Wj|B^GN~ID*_K@2(?0$`vj;1 zXtM#w-!N%v!Cc1tKX)(n_-ftuCEl7!Z$<4rtGC+SHD$F&Sy9W+?e3DgW$ub%cS(u2 zw)XKFuiI06YfX8b7cCx74Rpn|)$S7SkIO2m7P%|7+AG~ly+4}ysJ*JRwANe4w%TjU zzvoT-sr?pr#WHX4lJZK=6QXPJqGhF}-kN4%@^H1=ql^-x82x0?ZN=P$@>F+QwfEuj zJV#`RkMR(zDl2aLw-FYplch&FS*H~4TZQc%Go*%(HF(& zQinJx6k>IhvvnHc@+d}^IK+qN+z^U!w-)7mod&skMHH(`9%W&Hnvc8JMLA#RQErH0 zb%~?gGUvunkej2NuJa&&62R@NA4t96J!6QC>683h%gMKx#`(UgN4#(=?Xcru8UF5DS@r90R;Q)n69{ zUGWfB9tvajP!zL=1zqs46%%5X?$TJ;1s~xv!V|LxU(6mVy5Qjv?<9R9s=MGKJP~y< zdsrT`hugd0VXGJ4UUgGK>w=H)Of<#pVMWXyR(HX}BXw2POF9-)>$>0~JQ3?-_OKyl z51YE+;jy}^Iy@GxE7MJj&0X*lo{dLh_VAOKJ^ZW-9v-c(E%taTt2$=JlU?uMl2gqTv>GsHY&<1mKBevD=WsQmFHMK!jV7u5BBPs@>|?> z-cV2-QM2FJYrOx3?T)aM-;1wMadAJj*Opb+6qi&jtE{Vj)Lwhr(nVDjM<&J^j(v#- z&uT)R_ZZkO?6kr9VP&K z(oUD!cawVRuigW=j&_b*X3{;~|G%&Q_|u2pHR71w^itbSLK3zt)-1&A0BKYKU4%3e z-K(eWIO6a-knb=ivnC8y?ej%+`HvGH zlCOMjGe5V{?QAV^`n5$KXRFa^Y*0C426KQ6e^0M7zhPeXt%XcsUY zr~z%j05BKGdKm&PY6lijJ8&Vj1FNaMw;`Yb_49!~pt9uvB4OvZWeqPF3| z;k@nY;ME*^1bc9tFv;0Ep=oTZlilC8mpzF0TD%{}`v-V$!TZN}x3&e@18sE1Ap5?= zyCwcq;twT0Eb&o^_j2qT?90!W5zOG25=`MZBshd)Rxpd>6~QYwUKPBGW6xkuj*hjtrPR#}&f~gy(B(+o zjlmna?ndaYlDeW`5!V$#H%RK{2j_F$eCWO*bv(wY&ej4cYlbcLb0E(S$v>_ygj^Vs zAFrPad0t5VdHp=dMIrfD^*2JkDJ1{4z6kPyko^1l`H*i8$!+xsLF5KE1%ET|H{?5Ce4a*HNrvwXrsvrd@G`pZA4d-rnSFHB0|y z_%}!KZ)ThJo4p3c5x(QEhu`#cP3zgyA7sz8iE;hDjkrDju`^qP>LXbVh#~oz9MSm0W@H?Qv6l5)J-+uh*?*!SNw&a1W+Z^n(Hry-gM@-n=9yHus zkBkL_hJ}Q81PzPogLmM=WUJeQsi}kL+6ktobM&XC=J?aoa}pYycTeye*E)^P)=X!n zre$UMji`IZ2IJaDc_4Y9l8r*XQpuPL)Sjbc z;z>ciO36f8f;>pc2}1q`WXE)z4{Zr7LDTYX*RJW)N7MTD^&2zAxLNvwDW&neLGq5`}gI2E@3_w571Z)DDfi1v9;0j<8 za1}5axEh!WTnkJGt^@W0wgUSA*8}?j?+03e8-N3V4+1lRn}FHC9{_E@&A@@cEx;V$ zBfvqx9|H#ie*zo|d>l9o_%om#_;X+`@JXNp_$%N@;BSDVfxiXj1D^(t0sbE71pW~? z4%k*t^=6AFFCMNp)iHSK4yiZbc11lYrXG!uU_;HOSCHeeG zNjb~YR%5BR3h&R47;wvX3EZv;+!+wKo0)j~UYEdwn!v*Wfk!o4#bHdJH0=tG>DoY! zJ+&)2_Db-Z+9xJ9wO8Z%p!2O`A0A*0#w0>R^Tbp>A8ZfsIm72G<_y!%SCSDR^*61Y zm`Z3~Lw0ihQoDZsl8pJgp3d_lV-lfZ>zq_R-*Uux_`Ifm`gu(<<~8*qt9d_W_) z&ttW`z26hitLuiYSzOO4)`gS(%V1bj*FXybJc9Ujm zHp%->wY(k*5SI;~DdE88i7BdHk({$~VhZ7)HKZ4HSIgno$8k29&-dPiWRK69aNyQC zDXPwqob%Y66v9DI%t;9dyAx~*x9d{`eJ;*R=x8qC=cT$x^2n7XDTJffKz|iK-#ct7 zx9hur>~KCp$JSbYKB}W6k9@2)g>du}wJ9!PcbZM(cKv8xS>?QPVNMiy4OtU_v%N`K z(||?Zfmzo9Z}JY!qIsqIPJ*t`+be4daGrNS)^y+kuOn*)@MiCrEXvPd%mc!q_aRTR zo|8OmqsX8Ap@}KlU_Oqsl#juf4}?RXLcV0(CwbU2B474x$d~B<5#=Et{t3t2hkVdH z!~KC|=SGpIaSuU1g!})5#-9)WgkzpUerUeoK0>nd8IhlH+mIj8|1+9*RyprF%?sw8 z>f2df_`E|Mqj`6Yc1^QUGdAmWW3(~Nk@re5Zd z5!a`_o_@joe;vke+9m7h9w)#?&t8}H^d#y4-M{tql+L5CrzER-dP>&QGq6+j zl-hOml-3O!aE^40g|OES)DT)5sCv3f*3&(*o+6e_)bAu}2x1A<(^ImZo`GJ~Q>xvn zo(}4$k)77l!5w`&t1Fbx#C@`!9zs58-m7|gMAp+|vYvh*>*+bHVS zTnspbT+lpM_4J6Wr^jSHg+I#0fO8nLzMh_!_4ESj>ClcE+3C24b@c6Q++V7m;xnb| zf8Y+(1G)YqIcJxwr+Y{**RgUvqSh5u2Wkz63xi~MO zV+ZPiTxXIza+j>9drDH&dPl8Ws4nE#@C*r@@#HghS6o)YJ2_o?bw{ME`@7k9_zi z9CHYH($~`?vYsA8o<#q{l$U(?CmeGwqMn|Y_4ESrBl;hud8cY*r+LA=Q++$ji@tWo zT2JFxQ~Qk<{O!-ON7|lcTidj_Keo-Dkkp>zT;Dv04e8S=TpSBA?AfDlwUp=1HUaMtI*#OJh@=VL!OR_Di zH$whCtFx&EVc^mgn@;?i?;+T+^z9!`AlS2Nlw}o7H zM##(F7xMCRLcZ-|A>V#M$o~_BJUPEUYx0=-pn>*PswXEeWLYehtdtZ>R$7{+0qso|BWpl=OPY~o_UviN z?A_av*|)DHvwwfgw=Mfu^fod-+E?|5eMW{Q>$1x%Sy@?@tjjOAG@^a=W!NP```XK4 zKiP@6#??3DnL{(4gZ>boS(C?Sqi3G<`Qf=>j)On{l%F-e<2&RtRW@tDVTXnX7xRG) zz_~y_@Vmgqj(I@OJX+*4R=qzqi+@gd4(o$75griS8D>-=F7S^ zQ$9bW=k)YEuWqJzW)Z->j>Q?12o3SZG+x(jjoNn7_cSIE8v6Ez9@mG}r0s-kLVrSY z{FS`_H**@b-K0lWxfmWb8bV^-n?{(O zRh#yyvtw-?aQ*Utqrqr>jen0jT7L5kZASAND`Hrn511HjV-CGXQ=%8$)Mt=dx*g0}Vz{0D}7h zC1wNa=RUjpIY#v~Q2F z+i{oB@6g#jr?Y!ujc9)%!tTYph5kjI-SaxTo%e|Loe_4s?iKo7I=k&UyO-`0?Jq^x z{pkUr|C7#cht6*IM$x`I!fwy^g?^9D?ggFQ%MXe6mm}<6`JvFiqO*HZXSer9qJ3|K z-M)u~exJ^6r_OHwR?)sc!tTJMLVrMKw@YVt@G;STFv9N8Plf)F&h90h-K$TC_E#h9 z{`?D}|Fh2SPddB9zZC6~`zy{_>P)|4W43UwRZxME{wF~`gI=fe-9dkc;*c8CMtP%Je_6*B`+ktlhUx2;_ zxD&V@_!9Kr1MUWH0>0dF?_=(Y!+5;nVl^Dc+k`y9Cgh1OAy0A%dGbObPgyABsjQLP zO=IghUTYKbbv7Yi?-KHKmymB*DC8Lng*=niK7HaoZ)4iDV+F26mV;w1H$WaQG}PDcAp z*9+Mu+ZRmN$!K42gOCTx_M2zuWVGKrQ^?;=Ow_a-~CRI9`|O#fqn0VUaXgoENA=KbkA-)YjLKjEOhzfL;~9E^U1L*t+4 z{pdT~N2EtT!l8W+q$`-9 z#2UdW@8xGn_qx&g55kj1>nDUCpbz0Wao<;dPmpex(fStQ9;5X%;SrnE2%U|gv$$z3aQs(w}t*g zo!u#&-AAWH`$rLW=l(AA=X7@O>g@h`MzsGk!tUeuh5lom-QRU~pL`(NKZ&qA|Dn*I z*V&!c*?oFWw0|05_pgtI{$Dz~GdjD^&WrZXBJ3{wOXx4??B3JaeSSf-e;#2M3<`Zv zXZOCe<74^&w$$=m}aU=e(xe)q2JK{u8euxX9I1nTgCxXULTnIkDzWBFL5FGe3Kjq;zW?1xDbj1K{9b7 z{!_US7uC~$!+|h>GqDciB`!p$o>G5(JtY~OGU{grS7JSoxDXfBQ|hO$rzER-3eE)C znZShz)l+KM)l=d?P#-gR5*vWTg$S#s;6#wVPZTahSUm+NViWcIf967j)l+aHD1XUO zxDa9W6r2dkL2?u>L|8oqCxUX38HEcGR!_l+pj>2rjkpkD^%R^4y8ej^q3S7%;6PCO zAaEoKfy9LftEb>ZkUj@o2$!(iuDF<_9|Ud$UH`;|2&<>yM3A1i5Sp;tsW=d%9}R8< z)dAu{@VO;9WyFaf{YY>j0>W;$;y{poG`JD}*IbCOdJ0YiQn+ws+h4^)IA8{b|LQkBCgTQE92kJNhz@>7Hy1*z5xTh$)An|AA;cP?n+qY=2;E$W|A;k0+sBFvvAW&U;L+%}!g`-+ z0xyF8KZ}NTY?blX;mZVCq->Q)TWj?4;QtMxSCRnMBJtek!74hm$#!hn@z*3NOECn~ z%PaA(9*wc#-$JrYpFYXv$hG4?+u*A!sw&I8HO1BLIv@RuM#Z&N#dSXX5@AjC(%Rw@ zcV%(yvg+!pnmY6-S>i4$uPj5m7e4~IXjxftd1Yx;acNaevAd$8xT?0ex~8hMyuw?1 zb!|<_)fE-DEFBKbVsA+u7i_s6k2}xpDIGQXTcbyNbKUNeTu*M^h;QYUcs+UF8g-3# z1pc9vU$Ql2TF<$55kcuA!>8~M?ZC(|wC%4gl z2k{TkNBbQ_X0NQO^D?{Vwo3fABeR!PF0=dGwLT{QlD@8n*=wpi?m9QK*Lj!M@!syG zT1vl&$>?_?>)eZgK3Iz*fvm#4$XijIn>%V^=rHdZc{n1^ zFTvYLkcaV;R0ntnnN7-{p!!b9of7kJEynDCeyp*0()=0Y zZ@4yM5`}++UrM6ufVR^Os<%3jsQx`P)vW4+lF628jxtkqNZ~Zuk?IhgD^bd-KBYna zXE0c$>{T67m?QhEc4aTsECUDRTeYjYrO+ljs&*B>!pqT4v<20!>ZC$dC#4%c26asM z4ZLYw%3l3Gs6rRo^e>lQz%i8{vR8FH^t+?kqGRaa=@X*ptd4=B@FrUt1HIIByjfo7 z^gDI*QuhIsKm5cOe~rUm(kc7&yM#V{iNLceQ9`_E{5L>P@#o5R1sMh!ENT E0<3?5*Z=?k literal 0 HcmV?d00001 diff --git a/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x128.co b/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x128.co index 04953c1a46808d3eeb41cdcdbe7bc643c240e811..b7f973d6c9e9174f3392354aca10ed813fa20965 100644 GIT binary patch delta 137 zcmeBp$k_3aaRZ0OWEqVB5e9+42kZ>&4Gl&$4GaPS4|pXw8YWNhOxS!zqlaa3jxLf4 pV(!37&6QCQtA~l10+L`GMaGMF7QUEAapT delta 127 zcmeBp$k_3aaRY~jFoQtg19k@Xh6baW1_ptE2fPv-li4)_Cr|K9*nCE#hh?&YDi1qM vB3w6c@&mtw%`$od(jr)NHcZy>4A?xu^9>6^));PX!{i#jfXxs5Rwx1h$Z;q6 diff --git a/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_ps_32x128.co b/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_ps_32x128.co new file mode 100644 index 0000000000000000000000000000000000000000..4019df3d6b9c6ce5a698be4a953b3c637ed2a401 GIT binary patch literal 29264 zcmeHQ4R}-4`9Dq4^dtQO3Z~fBh?N0@Skl637g+cxphGCMVDVv=(4AyKBY4%vjPDuz38N!jhN~ zYO-~R=FgyCI#V%T_9L}=PN@n3N0ty(WNAVsJA}Ykwvd@M6uV~%Sw*cK#oE0@wyzGw zdLgo0?i_+VpALs##7?zW$2a#xql?D(MbD?^d=BjBd|&oO=i>RKfaw*6-5ElyqMG|S zsQ-jE3vkjkQ>PctW6bvp=Q5Ya>$EO)H`KaoymzeGYIQbLd5^HN_Fq_?6^+ZCHRaBV z3b)t$NQ2wyD!;Lzy3vgmm#YD~a&Nt}!u^w~n!3f#nyuDa=Q8(?XFp`EtE}|88`)N? zxB7eT_@7yCaMmn$moKfZbv-J&mM>mjS?O+Q6DAMTJ6)<#d<3JPF21RpyHGtf-c;{? zpgPYM9^%71#Jbv=oBnNtW$MY&BAl$(2+a|UE_8&`)d(#SPS$ILBO(}G=m_Vj5#~lX zS+5ZmL@>J05l&SjbVN8=uMy6OV03{a+)_424Y4@F*?JAJEP~O64sm)g#6=O#)@z8K z2u2q=#Hqm$>m!`4*AQ1kFuK4YK2UN^FvdMfg!AUKf>vH5AvrGtS)en50n)3r1G%{=j%PnCn8vVZj|bM)K$GKm>j=w zU{LRwbocGmf#ycMXGgRSc17!8PahmS=+P#jyAK|;tC2k~M(f~Ev<{B+!NJzW&c+H) za4H<@gNu+X9FNw)o6$Nr)dvR;y6YPngGo5k2M-}hct2VPA4coolRh}uQsz}tus4SF z!GSghdt(eydx($PLsDNnOrNL8Lt0-vXlmo$nDnSUn4|WP*#{3>rfSk)>4SsN>5vn( zhY?YG7~KaCTV`w0klP0bp=rpE+CxFq9_)Sa@Sr>7vT*dlMQ9?XM(trn)E;K`#e-XW zT@?4lLr8fjjoL$5)E*Z0!NXQeh&h@|V^JS`gw6<8)E+!hd#LGyhX>tLwTY3*! zJyCnu+y@U2H`X=cv2bIRW?J0e2S1_Ncra=YKaJYM&->uvp?Yt*%UxU7Gcz9RgO|`; z{5EP2PekqEPhZBvHfvQw-SYZ!ue)lQySA~svU-KvRZfrppQk;h-|_tgY%ByjIBTk_ zY7LKnMo%VtWBgB8mpNCISJpTi%Wte}SSp?KHYPR=4o@tzaZO#tQg4N`#yzdRpuB4A^0DPNV4I@4W_kJ8#;S6BR(ZA+V{HXRf3(&&RNvri zbO%G~iJAS@+Ti{!HakL2eJ8#^#l-x~>aD78D6gnnUfWpzkkxzBvc+{ZhbKiBjCH9C z&uM~w_v+X$tv5Pva91|eA(@mDhq_U7gS)|7U02(o?jni39__!>GEk4*9(BJ=HadU< zXopMPZh??1e4oF005t6NPt48C|I(T$%)XO8p*qxp9S zJkp*H19}d_kG=)PE&3$Zib1RUeBpim*W}eqXIZT?SONya@BqdY(CGv+#<86lEnPU) zF<+;iwR9R-Yp0IA)~Sz8X^IKNv=nF^cQUiie?L;{06qr1#lNO&kz(wc z#M-)?NvwCc$`b_{=RcbWexeTRlC@@swJ%1Rmt@+q=T#nj*H#VMA(d&`*Q9I z&ABVS^tp}v+=SWNA}Mbr+gr}jZQzXH_G2l|Zj0(pEcY{GdA=(Z^d=fR8G z#stRjwkrZxa7+j!aGX5V-Z8m#LWiB*-Pz6V#d{szkKp}7ytm-}6TCY*18id_oiV_E zAn|sIKa=<)i4RD8NaCFw2L}f8^JN4wI3@>@IgSjB8OJ*mHzEN7!?OJxADcggr;tbNrU$@BxJ}v!eU-7RrbIetU=AX6Twbu(eBJ-JK4$ zuT$R&ozF-vOJEbzHxTD zLjCQdM=Q#hF}}Ogg`byzY4|LL48~^-oDa+e-VJmBUjh~bhYZ1I4V(|G2U4s)=E|8H zn2S6F*j#*G7M8!yb`xVIffBAOf$mbND-D!#T`6?Kq;6hd9@ou-?lP&HADGW|^P$U; zx@!a1a^1DiT`qNHfikWugKoIgEeI^&x&_dEP3m}z)9oF_QdV?Zno1zg4az@hDurAc zlpkrD2YG%_{zcP#$YnwK*G<<#zAh;LuBi<2!l3+zrUj6%56YcQaRKB8HwDFRQv$X; zMaf~TmANcFS7na){5+O+k0&R;s?yVf&okiiB*o*#NwTlfkfVdWf}Gon>>aT)CUi`| zXK6Q^6(u9X7q=dtDfF$ZILBUG?D6B+UOd-hW=A?R*wN0R?C+fyvzI%ICPBY&p`v`} zJ3jJ9H>udGq_3nTMM+K;zNV3{(j0qfsmG6Fd+B_SeSVq8UUr>l8rk9c4zTAtG3D8; zW?Y{Wo>0`eXdf{`QAUpRi7^e#@efS*IBvDyZpU?OVuSWEeGD@(<37{>F=it-?i{*7$&exqB*IKp@Ax8XM}O;H96@CDemPBE@ObP~76H(_>1fc>!(WrY>8zoN$c zwo|7!#s%2peB1%{JM=do55EWM3<1{O`OQb3_*Q`J?MxchvE9Z#>BPOVY3$@ZT>;(o zO~_avpj$+EOF*}{DR2uuOtz*gkdiW-uAM+y8b@DBN{%lrEhnzoe*0vfew|%!@5r=g zDoR$CPmj8%Z`QAiYo5?Pq1o7_G~3&37;7BHk_U{ZG1GlfpTg%Gji)tkyD}j#fzLhr zgqezu8FN@#@iEgy@d{x=Mk2?=p@egeSy(LQ=gndv2l-M}Mjohrm@4aqe3>d^E>L@p zDico%^5v>bq$SA1RXI+`UxRF$iSwZ?j>RcT-p$ID>FJ6xc(6~ODaOsx79JbH+7*@# zGyoxwOcV0Rbf6jfcwi3it?cH1Ov@EASVUBfcEUxEdbG-6Vwkz?X;+TLKf{gI`oZ!2^g+yk@Qfp@!yX0HKmcH6So0e|Q&%%(iX-yuGS#JiABy6#47 zR7jt+N#r@@z9~Ep@xKyzOne;qpzCt~j*k z42q%6An!l*@_HynTo!z$gu~jVB&+p`-v^bFJw9u~ zVOvX*)jCIV&ch|igu@>#N%jl7<7^tYOD}^y7w09kwN>!*Qe7lDe^o^?;rO-CU(V0> zCY#Rf1}`K#oR849)yvOEb(G}%hrP*!;~(`VJA~aSHiO#@p?PJN^U8raQS8=bO$N?& zCuYq6mbr&zT@Ad>Jt~XlmD+bIbfxZrS<`^?-9xix0vEb%S+juGy9=`@KO-;?2uIz8 zJjr@a^5{(>f7bh^Br7BMIL=T$MqoY=j(QyVl69Zt(c476tlN<<(f@tQLjn8~7T$$? z&^*Kafn@t8k*A6GK|hlF|B%LC0RM!Ak0U=c-*6ux*}hHWXX19`NA&-M=ABv2yI%8x zd8hX6EiZiDp^nkKyHdHbO|R(Nw7Np2ur2&vsov|Zlj}SdQ~dklIfkxzF=6%eNm);K zQeDLLsja7*;CO zsr8iFHT9I%4fo(2X&4LPz@4Zev^G%d>26t1_sV*TST<9?6R07GC0I{S%6fVldbOTX z?N;mQ@SYmkYdsy&)3>*}Livo}C+q1!*+aJPtVJG z3V)P~p$Cx*n&)agJuK_#QCUynk8&~eEXJ&@r{`onJ&$@is;5TwI_}XueR~`C7pkZD zOzHX`wiES0uK!5R*)8koUee2TtXz+%>k6s^7JSx(!%l?N)048Eo-Roau3zLjgz7;q z&P!<9iFzQ{nIz}$mi2URMY6iyQP(Y07jkhvLfeV3dU{gU)6?E$b$z6+b3VenQfp+d zbp!KC?b}=3pgfJhJRlr(5P8zp)5Ef!9)(`4$GRy`BQPHbN1Y9;r{`onJ&$~e{x4EK z3gDlx@F4P}t*3`&Jw1v%iT;NuF9q;VSa>$9o}QES^gQw-`X8Zrr`E_`^MZM&_U$b% z+S(ayJ&j?lUDuxX;rYyl&IOYb`FF8&d-}#Q&GEIGxUlw@Yf12zNmfj;0kU<=ST3C~I=RW{{M@-&T`w4{mbzHT79n0>YrG5DSs zvo8;r56lOS1C9gQieOg+yCT>XEooBR%+y?^Gx@3wrk15flhO>i1#%nYTa702J33y( z(<lkv36cy9fic*Z2PD6{1AhA$get2^G*R+VUKuTC=Ex-`Yqx(f1bkk>-KJ;fBv zRho!xrHpNrjBT}y?Y7T}?d$1ZK36N@0Qo-1Kgu?J6`qGt z?KWOyPy)JwrtRNo9~aOSa?I2p*r({d20~w)*N_ElHQmUuJ@rPeUpYv$uf9mMuen&X zue;<%1D>7b1mf_Fiq=i^{PU`N57{!svr76dN#A{2@T`yA*+y z?Au)SC<2$TnOv?>1lF=^xLmIYY+$pv+^7h=fz9Uf3Ps@eSTUDxRs=TDvr`)5qMZ$V zY|dwI;A3;`6|(yUA$txAx%w|ct~n~?WiJc4{)CXf`-YGkPYZeZJ3?M@R>(JfAmp3R z3;Fv2$VCNBSw)3S0UhnP)K9TpS&(S^W?^DxOpGa0uQz4J#+owY;!FzLvR~0-r^K;k zzb{VDvJw(ZStgSyD>>Pem6~d5MtiGC&zjNRo~mb=0|uBf2Msc14jydEyyzm#w=Mfu z4$?Cp+E-r$`-}`z*2Nc_va+&FS(jX5YC-#&i?Mrz_H~!QzQ~TaCN{Oi*C+zl709|aTRvB$=lArSkJly4KR@n) zEJmM5sEgI7^15zmQFf4ifIg8>H+T^AxIWAVWfx=v`V$&sFXR2cp3|c2AwBvN8VBd4 zb_3(lk1#QI3hy_1M2qqw>Culcad1iMAz%{v5vIge@P1=P_tfJf)Qu8&k^3GH}k%8#500!(*0O(-cPtyZ$3o$E4}%zgpZ@|#2&vpNViIFeui+Z z-n@r!livJ1;eE2N^!p;|9@d**B79VDK0>%nZ~i;scD-4<4k=Iy+GZ)U+GZ=W+rsBg zhThebfoC|qjj6YB427Se@YCBEdW$=k&FJp>9+?DJO`b*P{Ftj?zx>7Q&7`X`~! zlX^cpy)VzbpSM5NCfc70vwQkhp?_Lq_oT*d$12gjBg}5+Z9>0OWA~KC?wPfs{h2Vk zXKxq!XEk1p?^VR_pHXQ`^TcaJIrq1146$~W4B9Vw|}c>-yddo z;31(upt0Mnv3v1h(f(qX-NBy;{Xvc0a~ivs9u@5`h1vb(mqPy+joqI$c87i?+7E@< z9sZ5bAJ*9I(b)aS$JF-o*9|^NN`bVKZsVa^xp&S0d599-*fL{ z&YDAb?BZYz94A?XJlP`TDGnh|bqKj=k&ve?67qD`!tG|T4IHns2>EJ@kiYE^@=S-2 zuURDIS&M``o7O(*N?I3FW*jYc#4`oQTyBOuNy^QleePt9jP|)xr0f&zrBgLB+DnV1 z+#=fNPt(X~pFds7t)jhbhDJtv*;P_*6YbYst&!1w-M6LOF4`B))W~RGc#V{A746s0 z(#UAPezuUm86U4G2?;*+`COu|HMmANhWnZY_h`alu`T=_P2c^FlOFeG!eN8&gkG$n z--Mil{)EG0AK?AzJKrhNqd(#B!M{m80~~>Vgrj1g=Kbh9-20?QKf+Oi52Stw9F2a2 z*4Wp0Kkaw6PpBRJ39Wu#!qYtF!ekjMzC%%r1sMlY2@A7CtoDT0c+3-%WvuwlM6ua3 z-b*E%nDriCBdE_$j?sPiNxk`1!kv2an}qxH=C=qBiZy~+-pkLB?iIcHpM)p$<_`(q zMIXYm;=Zr?Jx;padh_dqd-dj1gopL!w+WBRzS8geq&ulM|BLXn-uwySIlcK)!t-B# zjnLbedK-tjM(Aw}y~Ryyfn!gK8g?wKw!QMS(7&RoZO1e<@ztH8{naqL*Pa#n*EDvo zXzY&f7VXEw>`wey=uc?uUe(yWzE`xr9%lE(3qt>f#_lzZ-JAPF`ab`(6=rw(Z$f`sWB0no?(L(Z{p~QjcU~6ycQkfy zXzb3sBHGV{*}eOk(7&s(dsAcg&l95kpJ8_Iy&?4PY3xpF?A|{q+TRbeJNplzKdZ5O zOJn!1)1v)fVRj$9BlI6=?Eay#`|w@S{$ZHix%Y(roW|~y#_prDqWz;VyN^E*`j0hs zr!{t;oD=PzgxQ_{Sm@7d?B3SceR^KBe;Q^N2nc;ZWA~1<<7;or4;qPhjzRz$Z=y^iOULNgZeGWH* zxD@MgeGPlawCWM?&RBkW5?&8ZU7q_KBtBD5{|z@n2QI~WjF&hP z!Fo#lwe^%_aLuTn5gdvQK;lSzrk+wiZ9OGft*78pkevY>iC{gYc1=AcZUpr)f#nW zyB#V=lk~&EnV{>RI1(ZC6kG|?6GuW3cDqz=1nI|vGeLELI1+qrNv;`jB}ktSj)Y&> z?NPZAq#qB?#DAJ25mHaVm7x5L21i2WMv!a;SAxb*9Ep&63a$j{M}s4wawAB#f-6De zCXPf%Jq1^S^mcG0RBi;x6Ty|B@e)TOq@IE+L3%ql5-K->K1+BSCA7QQ$`SanB>J#10^FBvft$$;6c)-T`qWs6TNdNG7ht z9@3*fp_U^-{fHaU4LxxsUIa$sNKij5H{uZb5oh8E@bfqlJZ5ksC}!eHP>hi{5GfEI1)Toa3d&I;!04gkvI}OR&XOIM&e3PjQt#m9^OYkM?!El`Z*Fay8Ag2 zV*j9@BO&)c`Z*GUgF&2$ukilIljlCmk>K}Ra3hF!KwJsp84yQ88=0Z%H(FVGE7Fx-S-J zlCoJMZMo5FA^x8sdL;^AK8a^P4b;)0S+-*+9LtSNcjoW*Lx1r*SnwlGyje%x~yP}Z` zmRy(1ndfv>jvN1t@#EdOPG?1~D>rZKH}WdnuDoxIyV5-te^{wvamC1B*q=Ed3JJpo zWBnDOO-WLJ+#0ICNa}ZeFH}EF>hE6=svjfuHy4EJg&4;yQ$qDZjAJvd3f0e&el~>p z7h)WHA}k)4>_1f;1fB5@8RM;45>)YnkLh1sJnLQ^swcORem`*=^pSo)ky&f&8r{t5 zx~Uew2Fa{dwacv@r`N;ezoc($VAh5@m$T8ytc~s!jl8#WS#<@}?wW==ObwR{6N6bR z>Xy+jXfpa`$wud5pa<6CNFb|mE_TfyNj zpmzLJP2X`r54Iw;@6;>Ds)y6YPLA?d8lyC)A9*aFx?t9%Ypx2LK%pPs7nbNcplx=Y zYM~xTRR4aO+N{hFrG=;-On zkHTdtKV+}g@!;>3X7lbW_;>w;C_1ahz;Sq!EscR*>UFIC9X|S9J9?@20X2X4NihDJ zh<{0^+B-SLe_5(5bO&%vsqoA*th6+tl-_vIhr|>~~3fLzqFB$o?OhJZ7c< literal 0 HcmV?d00001 From 73264153ef2fedeb286cf24900de0cf60a86ac74 Mon Sep 17 00:00:00 2001 From: feifei14119 Date: Sat, 13 Dec 2025 13:16:26 +0800 Subject: [PATCH 3/7] fix pertoken co bug --- ...f16_pertokenFp8_g1u1_vs_gelu_1tg_32x128.co | Bin 30232 -> 28768 bytes ...f16_pertokenFp8_g1u1_vs_silu_1tg_32x128.co | Bin 29720 -> 28256 bytes 2 files changed, 0 insertions(+), 0 deletions(-) mode change 100755 => 100644 hsa/gfx942/fmoe/gelu/fmoe_bf16_pertokenFp8_g1u1_vs_gelu_1tg_32x128.co mode change 100755 => 100644 hsa/gfx942/fmoe/silu/fmoe_bf16_pertokenFp8_g1u1_vs_silu_1tg_32x128.co diff --git a/hsa/gfx942/fmoe/gelu/fmoe_bf16_pertokenFp8_g1u1_vs_gelu_1tg_32x128.co b/hsa/gfx942/fmoe/gelu/fmoe_bf16_pertokenFp8_g1u1_vs_gelu_1tg_32x128.co old mode 100755 new mode 100644 index 46d316a2cb05f72cd93f91fa232ada9695999c7b..91b3430d78a403c0baef399f5be727aec232a499 GIT binary patch delta 3548 zcmeHKT})eL82(Nv&^FjHw8DmNM@A6mghHi7x9BOf3y(Qwf- zx+ZE^2Kxvhgpj#Rz31?TEhT{b7oc^F@GdOqnqeNFHQZ^M)DdikM5$x_U z%H8kn#4x33l{Ar!&1qoyZ|2xMc5!>F{`nl~Y3}9$jlC@W8sAr-`xAYUL2#n<;-N(i z=S?@%JYU>(3th|HLLX*iZave}V#rn>n@^oqa3R&J;9}|x1(#BN3NEM4D7cd9SMW;8 zqu|w4dewDIC&=h-2BDd3+%=O$=>6<0dUOwQ1qOne!d4?H9oVemlkz;#_ZAokYD}k% zi-20_!)9AW^D>YkQ`C3`PzNGGLt#=GuC7hW*N6_o1Pvx^{1~u=3~$(Y!zVzDrnE8U zO8;u>DM}#E(u0Coc!#F|FC!#Nfb)P!zy;990T%&h0GB|I04@X01Fj&H*r0cUJi!NA zgakCMpq;0@g{Qn#pxhx)?i`{#(KbYd<7i45Y)3N+cJP#+;3@AEDDM&|KRHBs_YmbS zav@J~e`Ju{{;gZkp=www{5)_wZ>m|)Mh+w*BTm!6z27)z(-zg1; zv`KlL=wK*B&8J0FMxHRNxI?0jV=t_JGT-3aZOinai#- zm)&44`;574jk)Y5bJ;p`8GK`QT}DjuzjYahE@UtMHVu&`EBlacIva+VBM~Qq71A%a zSNsQYYS@lboasMMnxC4u#6#0k3+ByFO%k;vE!FhQPffh+;mb%`mcVdMYX0C+{k)r+ zk85h)|K;zgsr)pfq)gHi%aT?}O&+|FpPrf_ zVorK446AP2P0z>GZ7V;k=}8jvJnZ!UF2OsM2;pETmS{BB!Hci04~8a3{1GuM#YQKk z3BMQ(iLp^BCPu;&QPC?+iqUvD9E!xC!+TB|8Jiq|x?lA9hvOsS*km9i20{^03I@ec zR18N#fw7=JS|5#g>x03!CSC$%-0zJkgg5(ql11_bnyd$HR;%AENnW$hY-zAryndg> z*7SCM)idKJ?L@TqWl;-PbA zwV74d@{ZQZ^zR3E_IES=?U9}OT|zZFq&|VDRSzu63W^b`QxyDTrJlq1xi>Gl8XXNN zC=L6AqSK@+@Ho94o;+{C g_5|z(oQl7wpFO7tD9p~I$M@j~C`i?>Rh2`30niR;umAu6 delta 4932 zcmeHLTWl0n7(TOYm)%X-w$e=zNu89YHU*lcEz3%4+3jUZOE23N+od5eASy!ba;Zj} z>P#)DNQ@6AKJcL81Ab@Dna%8}3C3uB zkWIe*{@cvX?)m52|IGZnGWC<<)tlR6E=7rCo%BBLnmTl{;;gQ96YFI&$>+WTasyhD z``YnLKEPcvBiKI91^=%cCyu9GNGV~|tVwl(-nW328-VO`<-35VV9u=^bL(++#9!|rw1Jq|l{Zf%`M*JqEKguG=M zE>$;N8c+i4212ExEAz(!yM zr~^iIywwA*>!kJct*jIHvG;@DA;jngp8_8SzlZn$cnUlUK8^Tp@EPzVcp6ORt`Tz7 zdnfALFFZlr&vj#*FOc$ z#wj8^-xT5bW)^qnJY$_=Rq2FrhSe2h`7~Y^(|BDAi@O7d{WD+>yOjObC2S_67Pqpv zFK2@Gh6;iz&$X#Mx8*~Y3sC<#a4x$Bo^Oxvdk}0F11bM-UTQkj z`(3u-+c!|-~?i4wmQ@6I+?yC%~ea{<4e%9iPv=$L8VF?7CuxvuhH zgUW-A7g1aixaq$PE@kV(^Gy++Z)WkboIAWnvL}qItd8{vuZwBCu7$uVHO z2CDPWBjcMM_Q)d(dxXxk6fXVv_i@OGVPPb7z2*NT%Dw-BDASStVraa$vxF8ip(VH} zY{8X<*cM=>)l6t@;gtp17GPU&Wg!*pS|zJ56WUxH zWDBk=#JIUC5)W^#@`X2-EWn@s#srw|(29ey?lOs0@gv((D@!mvsx8-L-L-Yr)>T<= z(>jZmD)melLSOS_-DTQZjt{prR@Ppjx%7_qXqBwHOk&sISX*jk2}XDLcRlP5{lVQa z`OBoql(^RLn)VQ5PppyqZj)fDA10IL-x?lu$Mk5QR8}Zik!0nOl}1(#b(x1gYz7)DNQZM#W=++3HIWjNiQs{gS zkTEhZT#B?{AHzQD0`_0$!JlC?dKY{VdFHYM zYKj9^u^l={pqse_s7m>QFcNLmla#(91OQ(CXK&fxYva% zgOx%0fwF?ehuqamNJ7U-I$pV!y@Zq}96^{xB3?tU?<>+1Pi1D;itLS8W;`t7E2XnF zig^A=W__oKx9`l1ZxivWdo5#QI=e?C$Y?rb89%UhmxtI7f9!0Me*ajJ@)3P?|MBN* z)`!U8uHDZL?hHo5QQNEvHrZx%Q@S8jU1dk3;SIuER~@XeqmApUZL>KVOBaNLgwCya zlD@sNp3V)nrPLQ^6qnXbqeI1Xekk8(U!Rtw_l80RbK+UmoOpH>MB(F9OD^}?ks>;l l+z|Lt4FNwJUs(-b>bd&mFYlj!9GG7LYBEqf-k6w$ z)0{lt^StkSe%hx`dlC=1%?F$@*4)ve<2a{&-grp2IX-{*<5)*KQa24zKII%h4(#3$ z%DwOHM43{wVqgn4rvv(L=GZ*;a0S+XK1XtzyLC{vQII^&_hcFVL|>*5oG6(nP3XAN zWOezAxg8tmdTIlGlv230RF}(?E<3gmKcnJeyj#UY{0$YC;yo%Z$9q-067N&-YP?^? zYw_et=doO;g6^genoUo7X443Lke)-2?jbJAL{OLAWJdV|n^t~CSs;2rmWiOwa>kqh z)I%S(Ig0C-f#g^s=Bt1P5DA*HXVl@ciW%iP(P5aN$%4(F0Opb5bq8X@D$)>gcK2Q0dNLz5%h7u1mG;-66j&TWxxf%D+tBb>7AhdzdvW9O~yhlpi%1 z6vb$qMx6Hx?`eJdb=`$6v#r2K&w~#LTC=}W{eqW|=hfdNg5{oh6|f8%2$pC6pf;51 zXOuOfgP{nPTOOI$0V{~1_=e*A+dvLmp5Au{@CXq{8+g;#Kq@Wvec-Zd4VueVn9FW3 zm)&G8`<%IKmAUK|bJ-em8GK{*Tt-atzjYahE~GF1HVu)cs5@_%PRkH;B;pjXLh{wt z;{PB{P1|vbQ#}Xscc&)K>!oR_2lMVuO%k;vEw%LAotk*LmoFq~nFqresrgf{=I6cC zd{R@h=<(lEQ~ha1GYUyhtO$BFHF@yH?(`&$BsH}Z-JPEJcY&s61q^HHiQPWFkfi1j z5Hr$q(Wkj>FFl`Bw|)PsmYyUr&%;jt?-IOIf)EZ&L}N9!N_g=VRe?}&L<$SC7#$6Y z6Os@K3DHq8Dum^Uh~O84LS#~wL*XcN_}>yo#)2bImxLi{aB@T#3l4{b;ZRr*0|6lv z5#(@acq|}Asv=>3RUq*8#4DhTOa7=zc-zpBXcvcuYaNH`91h7QihkRW&0bw+_e(?e zy4u&I>fvF@Kj?o6f5;uU95l4M-yv7Mz2&LPTy(oQ<%?;M2jP*pKd= z)p}N4-F37kraydhXMZQt-<5XicL~)QkoE+kRr_I4n3aoArK;ecON<=GFN7|6YHF)d zKpd0;g00cMwXCr?mAkPfF|T$;oNJQ{m@vQDnbSlRi^tJ0n($JW6keC(m23 fJq5c7$0IKr=f+e4$?QBv{0$rdjZ^hoP36#EgG*<- delta 4929 zcmeHLTWAzl7(O#eHoGBi63r58B@Qv9A!=AR7gH0L&23|1E^A_946<&mRja#IYFez= z&g9lYiB)@S8$?11MJz2SU8!QliQt0|Ed+$thYBe|@u_|Ap{;$e-S$AHkaM=&$s{C`R|pf2Z~2;?F_4m63RI3S*1=LK3#EMkM$7iWi!d=z5;Rqu>|*H zC(`*acgc(@3~?ND|H^UVc-o^Blsx$w$x-E6u^L{M3~B=znFjglM)KV1DOK5&tSNgg zdyhfR+}P2SV;Un36NL?h9}PO}?GAguVfQ=iZ4SH7VfQ-hCe7CX z_JK~3hj)mQkweb(HYW%t1Iogtp@qQf=XiWm=e$9S;ka};}hN$+r8a=ND1 zrAHonOo9e}Twg!IxX{QqiLv=!B;+OrY|9^k-zfYMp6_jP_{jG*3xAa7x3xHYD^FtjDANipz!hf6RcXm2_ zb~8VWPsjo&i1zS;dbis!Jf2aaM6N`(W}0i%@l?lL!ndcfWjY}`W}0^lo!_)v*Lbi= zX&vi0HlmJrXkvUpk6ecmIPW5#t>$9jaA2%T6PaS-x=chE;<-QkMLl3j0bzZMscOD zDi1v}8hqL#OBePCoe38%`Rw;m$cSNKWAbMB|0K$N|AHvfWBtX@c=2QjEoMSX@KD%- zD+{qLz)Y)|(AvT)3$iW1w&2P_ECiT-*I(e5MVJY#8~0!ft}Mj#*M2P^t1lDUTpVNz zt}MiOxGDyoK3wGsA1+ydfBKCHFpWlw{j%;diPi8W+fpk_FrADp*Ja(cb=KBZS#MK1 ziMQN{3YVk;H%(S@HNEGgRg^cgRwJ9Jz=ToTIb`n zEbFqO%L*@!^J>ni<~ ze)_Spf=1%5s%$d&CRWn%%C+olQoh|0gjpow)%526B7JaEdUl=2-fm5g*NON_>1@p+ zo_{pGzFWjQqv`P-B7S|ZWo%4k_lN{JmI_(MkL+D_6Z_8}J9~)!@NALtDShMRll!V# zYDxTs-7m#=`5UVnZ8P9+u}w!oO*mjj8>=@6bA6z$+Kx7_ud>b7K-ey*^7{#$Tk$-7 zZ)K3q9q34E2WJ#D) Date: Mon, 15 Dec 2025 10:38:40 +0800 Subject: [PATCH 4/7] add co to csv --- csrc/py_itfs_cu/asm_fmoe.cu | 18 +++++++++--------- .../gelu/fmoe_bf16_blockscaleFp8_g1u1_gelu.csv | 4 ++++ .../silu/fmoe_bf16_blockscaleFp8_g1u1_silu.csv | 4 ++++ 3 files changed, 17 insertions(+), 9 deletions(-) diff --git a/csrc/py_itfs_cu/asm_fmoe.cu b/csrc/py_itfs_cu/asm_fmoe.cu index d22daf3058..d3c2eca940 100755 --- a/csrc/py_itfs_cu/asm_fmoe.cu +++ b/csrc/py_itfs_cu/asm_fmoe.cu @@ -260,8 +260,8 @@ FMoeKernel* get_heuristic_kernel( uint32_t tg_num = 0; uint32_t num_persistent_tgs = 0; uint32_t round = 0xffffffff; - std::string arch_id = get_gpu_arch(); - std::string selectedKl = kernel_name.empty() ? "" : arch_id + kernel_name; + std::string arch_id = get_gpu_arch(); + std::string selectedKl = kernel_name.empty() ? "" : arch_id + kernel_name; int vskip = 1; static std::unordered_map> impl_ptr_map; @@ -272,8 +272,8 @@ FMoeKernel* get_heuristic_kernel( { for(const auto& el : *cfgs) { - if (el.first.find(arch_id) != 0) - continue; + if(el.first.find(arch_id) != 0) + continue; const auto& cfg = el.second; if(cfg.vskip == vskip && cfg.smf == smf) { @@ -675,8 +675,8 @@ void fmoe_g1u1_tkw1(torch::Tensor& out, // [token_cnt, dim] const int token_cnt = input.size(0); const int block_m = 32; // fmoe sorting kernel and fmoe kernel only support 32 for now const int estimated_sub_X_cnt = (token_cnt * topk + block_m - 1) / block_m; - int model_dim = down.size(1); - int inter_dim = down.size(2); + int model_dim = down.size(1); + int inter_dim = down.size(2); inter_dim *= model_dim / gate.size(2); if(fc2_smooth_scale.has_value()) @@ -839,7 +839,7 @@ void fmoe_fp8_blockscale_g1u1(torch::Tensor& out, // [token_cnt, d int sub_X_cnt = sorted_expert_ids.size(0); const char* enable_vskip = std::getenv("AITER_ENABLE_VSKIP"); - if(out.dtype() == at::ScalarType::BFloat16 && inter_dim % 256 == 0 && fc_scale_blkn == 128 && + if(out.dtype() == at::ScalarType::BFloat16 && inter_dim % 128 == 0 && fc_scale_blkn == 128 && fc_scale_blkk == 128) { if(activation == ActivationType::Silu) @@ -850,8 +850,8 @@ void fmoe_fp8_blockscale_g1u1(torch::Tensor& out, // [token_cnt, d TORCH_CHECK( false, __func__, "Unsupported activation type for fmoe_fp8_blockscale_g1u1"); - impl_ptr = get_heuristic_kernel(inter_dim, sorted_expert_ids.size(0), config_map, 0, kernel_name); - + impl_ptr = + get_heuristic_kernel(inter_dim, sorted_expert_ids.size(0), config_map, 0, kernel_name); impl_ptr->launch_kernel(out, input, gate, diff --git a/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_gelu.csv b/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_gelu.csv index f25b1fa86f..1e06c66b40 100644 --- a/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_gelu.csv +++ b/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_gelu.csv @@ -1,5 +1,9 @@ knl_name,co_name,atm,vskip,smf,tg_num_perCU,ps,subGU_m,subGU_n +_ZN5aiter47fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_32x128E,fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_32x128.co,0,1,0,1,0,32,128 _ZN5aiter47fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_32x256E,fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_32x256.co,0,1,0,1,0,32,256 +_ZN5aiter50fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_ps_32x128E,fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_ps_32x128.co,0,1,0,1,1,32,128 _ZN5aiter50fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_ps_32x256E,fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_ps_32x256.co,0,1,0,1,1,32,256 +_ZN5aiter49fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_32x128E,fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_32x128.co,0,0,0,1,0,32,128 _ZN5aiter49fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_32x256E,fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_32x256.co,0,0,0,1,0,32,256 +_ZN5aiter52fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_ps_32x128E,fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_ps_32x128.co,0,0,0,1,1,32,128 _ZN5aiter52fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_ps_32x256E,fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_ps_32x256.co,0,0,0,1,1,32,256 diff --git a/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_silu.csv b/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_silu.csv index 401f30905c..bbc4374b75 100644 --- a/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_silu.csv +++ b/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_silu.csv @@ -1,5 +1,9 @@ knl_name,co_name,atm,vskip,smf,tg_num_perCU,ps,subGU_m,subGU_n +_ZN5aiter47fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x128E,fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x128.co,0,1,0,1,0,32,128 _ZN5aiter47fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x256E,fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x256.co,0,1,0,1,0,32,256 +_ZN5aiter52fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_ps_32x128E,fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_ps_32x128.co,0,0,0,1,1,32,128 _ZN5aiter52fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_ps_32x256E,fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_ps_32x256.co,0,0,0,1,1,32,256 +_ZN5aiter50fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_ps_32x128E,fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_ps_32x128.co,0,1,0,1,1,32,128 _ZN5aiter50fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_ps_32x256E,fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_ps_32x256.co,0,1,0,1,1,32,256 +_ZN5aiter49fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_32x128E,fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_32x128.co,0,0,0,1,0,32,128 _ZN5aiter49fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_32x256E,fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_32x256.co,0,0,0,1,0,32,256 From 1af69280c492ed50866e3497175d5322e4127afe Mon Sep 17 00:00:00 2001 From: zufayu Date: Mon, 15 Dec 2025 13:42:09 +0800 Subject: [PATCH 5/7] add 128ntile logic for one stage asm --- aiter/fused_moe.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/aiter/fused_moe.py b/aiter/fused_moe.py index 6319c48334..9705de4bb9 100644 --- a/aiter/fused_moe.py +++ b/aiter/fused_moe.py @@ -642,7 +642,7 @@ def FinalFunc(): doweight_stage1, ) in fused_moe_1stage_dict[get_gfx()]: if q_type == QuantType.per_1x128: - run_1stage = True and (inter_dim % 256 == 0) + run_1stage = True and (inter_dim % 128 == 0) elif q_type == QuantType.per_Token and q_dtype_w == dtypes.i8: run_1stage = token > 32 elif q_type == QuantType.per_Token and q_dtype_w == dtypes.fp8: From 18424b6876d3fd2bc1f83148b41d4f40fb268670 Mon Sep 17 00:00:00 2001 From: feifei14119 Date: Wed, 17 Dec 2025 11:13:51 +0800 Subject: [PATCH 6/7] fix mem fault during perf turn --- ...blockscaleFp8_g1u1_novs_gelu_1tg_32x128.co | Bin 29280 -> 29288 bytes ...ckscaleFp8_g1u1_novs_gelu_1tg_ps_32x128.co | Bin 29728 -> 29736 bytes ...6_blockscaleFp8_g1u1_vs_gelu_1tg_32x128.co | Bin 29320 -> 29328 bytes ...lockscaleFp8_g1u1_vs_gelu_1tg_ps_32x128.co | Bin 29776 -> 29784 bytes ...f16_pertokenFp8_g1u1_vs_gelu_1tg_32x128.co | Bin 28768 -> 28832 bytes ...blockscaleFp8_g1u1_novs_silu_1tg_32x128.co | Bin 28768 -> 28776 bytes ...ckscaleFp8_g1u1_novs_silu_1tg_ps_32x128.co | Bin 29216 -> 29224 bytes ...6_blockscaleFp8_g1u1_vs_silu_1tg_32x128.co | Bin 28808 -> 28816 bytes ...lockscaleFp8_g1u1_vs_silu_1tg_ps_32x128.co | Bin 29264 -> 29272 bytes ...f16_pertokenFp8_g1u1_vs_silu_1tg_32x128.co | Bin 28256 -> 28320 bytes 10 files changed, 0 insertions(+), 0 deletions(-) diff --git a/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_32x128.co b/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_32x128.co index cbfe528762ef94b27eba9611ec2dcb72e902681e..f6ad84f26ba8569ec6e9fbc094bbec2c7694535a 100644 GIT binary patch delta 454 zcmaFxgz?1_#t9mXFD7a(cMR}l00S7!AOPVr0x32i4#55aX}1K zlPww56){AiX2A7sZe-jn$9QD3slt6GmIfxVy2&ES1wckh-R2f$7Eei!YN#zxiU&-y zGk8p1m?JJ}V~#4T1eJX-`65vEg9oat1yt5%GGnf|)n;%YgERY8Pv-nz! delta 460 zcmaFygz>=>#t9mX4<>3Zcl7XO00S7!AOPVr0x32i_Q;0tJ-Sip0tlC3$7Df9aX}1K zlPww56){AiX2A7sZe-jn$9QP7slt7x$r8#1n|qX5JQ*D(Ps|aQbclr70HwsiG&_UI z5x=`5{lRpAwP0;i@LuH>#R?HQb{E~&L_Xdc|#KGV(IWTvpzX3x|Vp3{O zyn&+&oM~j@0uwcHGlMbBoE%|HM`ySK7jsLPsGG41!{iH6wv%VcIBmX>_koeoWAe!Y zeNK2-PG&4L7uo?0l^w8nM6 IfgB?P0050%kpKVy diff --git a/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_ps_32x128.co b/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_novs_gelu_1tg_ps_32x128.co index 9174945af01771b433a5de6eccbef62ed3a6ce64..eb8a1339dded3948b915897df7c85b77644847e0 100644 GIT binary patch delta 470 zcmZ4Rf^o$Q#t9mXD<*0#cU)&M1X653ydnp}U(tg~7eKfS5t9WO#RV}` zO}1oISHuv7ngQ3lxsma=JmZPYwTkzdSQ?nb>L$mi6aX11b(_zqoN#BHxOr+$H6!DK z$%1*}k`tn!RzfLVFwM@;GT9L*tKp0)3*oXeSWGSi%C?~CjfCo*FnJ+R)+7~GZ$4Dk zW%9*5amE#sU*?^Uv|z|dOiIm(H*hjCfiaC74Pi`E6Gs@++`tvabTWb|Ff?|AYjbgS zfhlmaG~6s$kk16Pq_Ie!6CTKucNUooO7udUC6NoEUrc5!HfNNWY*}p2cw%xTkTjUQ M6G$$Y{Iggc06>mp$N&HU delta 493 zcmZ4Sf^oqM#t9mX3npqVcU<7d00uCcK>)&M1X653ydVd{U(ka}7eKfS36ljG#RV}` zO}1oISHuv7ngQ3lxsma=Jmay=wTkzdCMT#AY`&s$!kuyQ=D9i5jFK~IpjJRBB{0p- z&@tICPh7Ib0#z2mWoJ;BoCuUP@IsY^c#@r=X7WU!YzLajDNvJjCLaXK+CY5_GvPmo z!NkF^VDiVj^HF9DIf+TBIq?QgMn*8Ek)<1qX=-8tW11T{!q0>V`5t1H~C_&=;ZlDlAQ2hpM0>$Tu7i7;wXV!2>pR+a(9vJWc6Z6 aMuEwW#rBK`CN}~}jmZarL6!A`%IZu`%oS&Bm|U5=)6bP5 zCow5CC*Hu(#RA4OGI4`3P2Aw3X3j=1QAZb37}LeV5XN*faoXIFm%+r;;5&I?j_Bn7 z1(KZb5S^@8XfE^t8a5xG;Vr;2`E`NphpzyL-w2tfFZK#C2BE3zT{if&Z80K#SHm@LRBE{LIO zvL&OsB8Di`47lFSjf|V+7|(4sRk+VISwgvBbC0r#Cu70ngE``o5s^?Epp-b6W@j*& z{1GU7!x2?h7b?pzSut1K(+5qjGgS6ZGO8{=sJIH$lQ7vUAO;f$L&fCA+@1c;3^|EO zsX6fmjxJ^}rjdyYjA`Nq7d3M>fQdS~7{iz@=58>in~5dEPlV`{{Y-Y)S!^l`M z`D1}TCp>f~YZjUdzJP|)3#hXsCKnc(GrpL-ve2G!#^jeka>r!NB707aJcxtuPA)8v F2LSI|TYmrm diff --git a/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_ps_32x128.co b/hsa/gfx942/fmoe/gelu/fmoe_bf16_blockscaleFp8_g1u1_vs_gelu_1tg_ps_32x128.co index 0a77a3e60011be1754e8d8933e14df19f0f7e3e0..2c51b59c2b8366bc5eea92996fa0c2b257757714 100644 GIT binary patch delta 470 zcmcccg7L-+#t9mXHzsN>cf8@p00uCcK>)&M1X653d?N?KztMwA7eKfS8j}SX#RV}` zO}1oISHuv7ngQ3lxsma=JmZzkwTkzdSQ?nb>L$mi6aX11b(_zq{BUP%-F!5snvwCs zWXC*l$rVvhE1{Gwm}X~anOq2z)o@0Yg>cy!0wym6%FaR48wu5WVDd$vtWPSc-h8NR z$z;ZSamE{yCG*cmnlR)fCZ*=Y8#oy`!k9*mhA^h730&0N5J{~$RDq$flL1Vvi;FRg z>1OG&*|8v>324d2B7IJHAWwc-WG?8@3vrf5E`;7O*|6B0(PMIEu|4CD$t!_m!sM4g K^1@`z5_JGzw`r;X delta 497 zcmccdg7Lx&#t9mX7ba>hcf8=o00uCcK>)&M1X653d?5$IztDq97eKfS29pID#RV}` zO}1oISHuv7ngQ3lxsma=JmaO!wTkzdCMT#AY`&uM!=15x^VytgM#&R3P%EI65}0OZ zSTH#;Ph7Ib0#z2mWoJ;BJP|1C;e{#-@gzILl*tEyvOCaBPJx>2GWjD=HV5ismeS3zHS|&qoB)Z7qBttnK2p|PVYOs$KHA&lu} z>BKPkf|T9l*)qJNlRpB<6O#=~)B$=*Zx;Xn diff --git a/hsa/gfx942/fmoe/gelu/fmoe_bf16_pertokenFp8_g1u1_vs_gelu_1tg_32x128.co b/hsa/gfx942/fmoe/gelu/fmoe_bf16_pertokenFp8_g1u1_vs_gelu_1tg_32x128.co index 91b3430d78a403c0baef399f5be727aec232a499..d7b2e5c9a54e08259c4468ab049b85cfa1750bee 100644 GIT binary patch delta 517 zcmaFxfN{Y?#t9ls3b_-tmO1u#Gk^h%W)Ohz8G#fV5KqX2@F#Sl(ghGML&0Q0MsYz5 zRg*0l)fF*Bp=QAKZf<1UEyozR*;av_S)_#_LZK-lN~ND6LaiwxMxik#Ms2c)vd(4~ zWgmCOmdO*d#3flGpw>buaWKu!P%-&nmU#UfTU1$HsH~0~s<;JI{6{#dxHD8-BOX=U z4=TPR9aa1Sh|9#m01cbT;n|z~Tn!m=5|dJM;td=v;7lV!SD2`=t2vBmX5a*4I-0{3 zIGZ@aL|x4$Pt38N+$UqV`9jVIM#i4W2lMnf;W07!XP&trLl@LdSx|biW4<{PL)PS~ dd^^SslXvFZGhUedGvA)mAqS%1=48h_c>viCsVZ&rWMsYz5 zRg*0l)fF*Bp=QAKZf<1UEyw7%*;av_d2@@hk2|Bw69)q{2qsU@-sIvO^be6nS}x!?h4&>etAhr{H?d~>D)nUi{CpYHF F0|3@bSQ`KU diff --git a/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_32x128.co b/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_32x128.co index 8a462d0c0d9f0ce3257f94cdd13a3c6d75fab305..8a01439ea80f35631b0f253ff6c82d77bc35ac13 100644 GIT binary patch delta 454 zcmaFxfbqox#t9mXFD7a(cMR}m00S7!AOPVr0x32i4#55aX}1K zlPww56){AiX2A7sZe-jn$LO-zRN+1oO9PWw-DDBv0w5!$ZgYz=i@T&pCDaxu#RI0< z89XL0%o3NhF-MhEg33Oad=V)7!5vlB0xD}WnK4^j@=X$|tRGbN&t$`FamIkjp4mJ7 zj2LnflTvfy4IC}uOd}IFn5c=H1&nEC?hIo(TEG>!7(2m4U7g%E2jmnm0Zl%br_Tuw z%*j9V%moiXgXI7;s2wId=9@Dfm|U4}&sZ>dCy<;l`Dea8CqoXz%?~F#=E(y9&p27J delta 460 zcmaFyfbqct#t9mX4<>3Zcl7XP00S7!AOPVr0x32i_Q-_rJvve80tlC3$7Df9aX}1K zlPww56){AiX2A7sZe-jn$LO@#RN+3;WC`Vh%{|I2?u-tTCuWIDIz&KifKuXMnw`O9 z@n^#*E1afn>*I#sYiJAK4JMKb(9j IPmYlR02KjUwg3PC diff --git a/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_ps_32x128.co b/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_ps_32x128.co index 8ca34dbe457e8606464dcbd1f4af55302edc9b7a..e6446caa0b38da6de4d8eb6cdca9e4fb3b8a25eb 100644 GIT binary patch delta 470 zcmZ4RgmJ|a#t9mXD<*0#cU)&M1X653ydn$2U(tn17eKfS5t9WO#RV}` zO}1oISHuv7ngQ3lxsma=Jfp|vTE+WJEDcOzb(3RM3V@81y3J=)PPj4(Zl0P|&B(Z5 zvS5z5KO<+tTM?)CX)Wi|SG;?-^F&$lD3Ji^n;M!a) zTwn^^3=KC6=H)X1Eom&!=Y$9HjXGvs3=ogb23(XlNCR-NTGoF}S2_y|B N?*x(yCjTr{2LROJW0wE` delta 493 zcmZ4SgmJ+W#t9mX3npqVcU<7Z00uCcK>)&M1X653ydVq0U(kh07eKfS36ljG#RV}` zO}1oISHuv7ngQ3lxsma=Jfqv@TE+WJlM_@5HeXRW;mRnyd2UuUqvVV#s1;C32~4vy zbWAqP5tpp7K$V4X*%=fjCjw;+JW*x+pn7X2PXx+#pqZQkmDQPi5GZQ{^)bwZ{~!hv z2g8ENA9Kz}nK9%fCZ*=Y8#uZe!I(ytZZM{)i3N;l=Ijh(I=a9V7#bVFwYgY0!4$X| zxG_w=Amy+*Do>AzX@Sq=i`k-+=NCwF!h?PC!2)w3fo_PS1hOIY2bRg*1+tUX3ndu^ ZCOa0|Gai`S2qZNo9|V#!CNmbP0{{rJWo7^X diff --git a/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x128.co b/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_32x128.co index b7f973d6c9e9174f3392354aca10ed813fa20965..910e3553fdda70ca7a8d07d606868d7be00d57e2 100644 GIT binary patch delta 456 zcmeBp$T$IrG#CXYYA$zd@MZu57|kF6;WGj$HXv@ugzy_WQRxB*m!V;@Afvb-hN{Vy zjOvOQqEIv7dN(&RZkA&V*lenBpNXY`Nvv+Nh;jjtky5w0McKq%vZ4}d3zXsk)9efh zlP_k8OWK&D$|^x+A53P<7MJAkK$W$C%7#of1j@1`qssb0WpySeW{Wd6Os>q{>F3Ij zlbDp66K~*XX#ryznYh82CT?(1GYcb_sH3GRjOk)x2xGcBJ8f>r$zWn?@SeOdOLX%8 zJV{P?h)!0_Hy8Q<4Vw?p@D^a3{5nr|a(lic5PcicVCz0K#SHm@LRBE{LIO zvL&OsB8Di`47lFSjf|V+82vVzD%@w9ETLSmxkuT=ov~o@!7OpfhzO_+P)Zz3von}X z{s@%4;fN}$3zcP4T=%87liH300RLR9pq>Nto;v5QB+>p<;4l_D+9ihMdHt z)SP$&M@usp)5ydH#x!w*i<(&&z(gG_jbTg|V>cMn)!C9^@&zgD$undeHnZfuVPveB z{4r0T6CS#gHS^5{UqHj@1=Lv*lMD0B8DC6ZnQzZHWAaNNxnr_sfjy^24#dHCCl}_) F0|5S!TBQI0 diff --git a/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_ps_32x128.co b/hsa/gfx942/fmoe/silu/fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_ps_32x128.co index 4019df3d6b9c6ce5a698be4a953b3c637ed2a401..31cf559926dd7f2d42ddeb3998df3535ecfc7907 100644 GIT binary patch delta 470 zcmcccgz?4`#t9mXHzsN>cf8@l00uCcK>)&M1X653d?O3OztM$C7eKfS8j}SX#RV}` zO}1oISHuv7ngQ3lxsma=JY&e_TE+WJEDcOzb(3RM3V@81y3J=)ez-ESZa$h-&B%CR zvSW_8cy!4otoXl=VqL)te7B zxnwe9t~le3$&$I}BTX1`5|dJM;td>K9brr(M?)CX)C4YS=7OZw9IC+3*w_H3*2U5o z#&k1u+3c8?&jhq&V}U*=Jdh{9EHD@J=!Q7UBO5~Rm~2>R&gd~Yv(TRL$K;hjGGX#d LAbDZ3W|2Ao^$KWt delta 497 zcmccdgz>@?#t9mX7ba>hcf8=k00uCcK>)&M1X653d?5?MztDwB7eKfS29pID#RV}` zO}1oISHuv7ngQ3lxsma=JY&%2TE+WJlM_@5HeXTs;mXLq`D|7-qvVMys1;C32~4vy zESQ{_BQ9BEfhr5(vNI@5o(Pon@I;mMgX*0!`5;hs2b#$#P+6DBAAzztP#?og_zhw( zaWGt%teAT~%7`H+F)1}C-oVk-0>(75bb~QXP2i$tE=X!kp$ZI*ja*@BT`Ub@OgBR( zhRGMC>?Y5aao&77?=2(K1)s@`Iii!#7f5o#gMRYI0&^jUZivGivLW;VmdU#dWG9Cg cN-{c3E-bWX{4jYVkc^o85lEhxY*?fY0Oy=--~a#s diff --git a/hsa/gfx942/fmoe/silu/fmoe_bf16_pertokenFp8_g1u1_vs_silu_1tg_32x128.co b/hsa/gfx942/fmoe/silu/fmoe_bf16_pertokenFp8_g1u1_vs_silu_1tg_32x128.co index 001c7c9cf6dc50769d11aa4654b501365e23a520..4738f2aa08c951d32354d9985cd6313d9ef77ee0 100644 GIT binary patch delta 518 zcmaEGhjGDO#t9ls3fU92mO1u#GJpY$W)Ohz8G#fV5Kl;l@F%pR(ghGML&0Q0MsYz5 zRg*0l)fF*Bp=QAKZf<1UEys9qv#kO5U;$WJcpVADac8KwMjWcR zA5?rtDysMe5SNLA0U9=w!!tMenHe(VBqpWi#2Yx8z?nvdt}sz!S92KC)Y%EfbTo!5 za5iv+iMqH>o|t7lxlhJ!^M$MrjEp^#59a7|!ee6c&m40>h7PEkGNANi$6WKt@_DS2 eYjdp`H%#7{YtMLL^3Pm*PKPXrikp)ibL0WOC}Fh# delta 492 zcmZ2*m+`?J#t9ls53(j|EpznoWB>yg%^(2bGXg0#APz`}@B`XW=>iCsVZ&rWMsYz5 zRg*0l)fF*Bp=QAKZf<1UEys9pv#kO<^X3+1A6G_~$rm%kC69zbO@~tAV49u5Vlrc< zczuj5s;n+l_KgdwxCK;v2byWlQ1KVBsOtTo;uWc=;uk<%CJqK@5KNw)xyjGekRc~A zDK#hFz|k1aG%|37i5k0@!&bmGcAEvVKQJ=-OlHj0 z=Y$9NWXoK0!2{5sI{=Lihslk(=1d3DC-2I&V=S2bGuNJR!eq-ldrpQdh~sZgZp@Jf E05 Date: Wed, 17 Dec 2025 12:27:13 +0800 Subject: [PATCH 7/7] en vs for pertoken kernel --- ...f16_pertokenFp8_g1u1_vs_gelu_1tg_32x128.co | Bin 28832 -> 28880 bytes ...f16_pertokenFp8_g1u1_vs_silu_1tg_32x128.co | Bin 28320 -> 28368 bytes 2 files changed, 0 insertions(+), 0 deletions(-) diff --git a/hsa/gfx942/fmoe/gelu/fmoe_bf16_pertokenFp8_g1u1_vs_gelu_1tg_32x128.co b/hsa/gfx942/fmoe/gelu/fmoe_bf16_pertokenFp8_g1u1_vs_gelu_1tg_32x128.co index d7b2e5c9a54e08259c4468ab049b85cfa1750bee..8ecdf7ea30f3d0f09536eca17aacc9014fab83c4 100644 GIT binary patch delta 1748 zcmZ4RknzGp#t9mX0TVTsJMQsj00S7!AOPVr0x32iK9C9FALvA-3m{wuhslDB;({2e zCR;M9D`JR3&4BCO+{n0Fjxlz#t%5llW9H^kHFfrs28Jp7BAvMQH#FL`ir9!uk3eFF zBeBDf*r7=55F~am5<3Wqz4?W%D+eS0WIeO>jQpGLnaQ))Hwf%!Xz;ij;K0Gq0A!1R z_>DlmBb46=kq>f0;sfP_oT2i138+TOsmMUPyePe3Uno-v*J7@j>DP5Jpa(vAH%gn2~YIbA$rH18^0+YMBqpWi#2YwTm`@JNkykM^M2MKcn9imK zFs7@yBg5niQkIkZWb8Hz>&h1)(pnPHxIKV>Fn& fGvA)^!Q`L$_Dl>pldTHuI16$hDsE10%##NI!a69Z delta 1696 zcmcccka597#t9mX3KKP#JN9@pfB}qV5P9H1rRPn!DK;3aX}1K zlPww56){AiX2A7sZe-jo#~8TTR>7Q&F?n;TnmW7BL53y!4l*p=*TA@EUyu{meifT3 zt-)Y=T5AZHp3xc#rf0Q=f$2G|;b3}RYXq2H&>9J*7qxEEbLC+4pX_F~p3#5vH#2$m zdd7eI7#ck81~_mqr~ug_Abul|?+D?GLFI#-koZ9PAZG|)0xBQkg2V^PhqyxcQc(FY zHzYn#KFl4$ua|);i12_2fcZcL5uOmf98^Ba3yBYukMf4_6`=AlK1h6^e2g!IuLPBk z^F!hT<>UOpeDQtO92^Y|oSaPzi~-vN0$KK3F}3V3^`ER|4+^4N_M<_>xNq_gcLm0T z&5|CpNsX_nK4_Ok!7-E zwmYNFpw sTA2l<*(PtwH)CX&{4(F3al>TI0(+(l*^{#h>^L2AAS!N7-k2v300%`ow*UYD diff --git a/hsa/gfx942/fmoe/silu/fmoe_bf16_pertokenFp8_g1u1_vs_silu_1tg_32x128.co b/hsa/gfx942/fmoe/silu/fmoe_bf16_pertokenFp8_g1u1_vs_silu_1tg_32x128.co index 4738f2aa08c951d32354d9985cd6313d9ef77ee0..c49e0a61a8d1c987a81bd856078acac963257268 100644 GIT binary patch delta 1748 zcmZ2*m+`_~#t9mX0TVTsJMQsh00S7!AOPVr0x32iK9COKA81FV3m{wuhslDB;({2e zCR;M9D`JR3&4BCO+{n0Fj`8khTLp7AM$OHoYU=DM4GdHEMLKcqZ)mh>6|oVS9)ZLT zM`DK|u|tvAAxP|CBz6!Id-DrjS9V7J$$F;i8TmKgGnHqrZxGne(BN@5z=4CI0mv2s z@f(4BM<~A$A|K?0#0Sa;IYaqP5cv=nBtB3+#1+bKhRBDxA@PCoVeSxqy+{j0L4*fX z0f-M&5a9{sw?gEjypZ@n`6zEFzYQWEbA$rCeo^0+YMBqpWi#2Yx8m`@JNl2>stM2MKcn9ha< zFs6%}Bg5niQkIkZWb8HzWPe~}+%uUmSDzDJu}rqiH5W4IfLLmf0iiFjPj1RJV>Fn& fGuNK+!Q`L0_Dl>}ldbaXI192MDsE10%#jBGE5s-1 delta 1698 zcmca`mvO;e#t9mX3KKP#JN9@ofB}qV5PI9> z^^E`aF*JDG4RGLKPyw<S7^-x0zWgUSavA@PCoLCz4q1XMo61&I%o4{?R?rJ(X* zZb*Efe3(0gUoQhy5a9t40P}$gB0M2{IjDS;7ZM*RALR|OWb_4irSU>_&r#ao^-0ZVHSF znnK4tGk!7-E zraPm~pu uTA2Z*IVNw)HJjX@$2$3Kt~KL^$(nigOcydIXXV*(I%GlA+?>2IM;-tO3OtGc