-
Notifications
You must be signed in to change notification settings - Fork 26
MPI Ops and lowering passes #1829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This reverts commit f668b66.
strattr to allreduce also)
| StrAttr:$datatype, | ||
| StrAttr:$op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on this solution to handle Datatype (and Op) for Isend/Irecv/Send/Recv (Allreduce)? We still register the symbol and get it's name on the Reactant side like before, then we just pass the symbol name in to the Op via an StrAttr. This gives us all the support for the various Datatypes x Ops combos that we had before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented on datatype above, for op we can create an enum and put add/max/min/etc in it [and do the corresponding lowering]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on what you mean here? What do you want the op arg to be instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like this
| def EnzymeXLA_LapackUplo : I32EnumAttr<"LapackUplo", |
| TensorOf<[I32]> : $count, | ||
| TensorOf<[I32]> : $dest, | ||
| TensorOf<[I32]> : $tag, | ||
| StrAttr:$datatype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ftynse We can have this be a TypeAttr [and use the elementtype]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do it that way too. Although that would mean duplicating a lot of logic that we currently rely on MPI.jl for. Would the ementtype approach be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on what you mean here as well?
| ins AnyTensor : $inbuf, | ||
| TensorOf<[I32]> : $count, | ||
| TensorOf<[I32]> : $source, | ||
| TensorOf<[I32]> : $tag, | ||
| TensorOf<[I64]> : $inrequest, | ||
| StrAttr:$datatype | ||
| ); | ||
|
|
||
| let results = ( | ||
| outs AnyTensor : $outbuf, | ||
| TensorOf<[I64]> : $outrequest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure about this inbuf/outbuf inrequest/outrequest design here and in other Ops. It allows us to reproduce exactly the same IR that we had before (ie, via manual lowering from Reactant). But the more I've thought about this the more it seems like a weird design for certain Ops/arguments. Eg, here it maybe makes sense for the buffers, but not the requests? And for Comm_rank, it seems like it would make more sense for it to be a pure function, where we don't take in a rank, only output one. However, doing that would require some changes to the way we call this on the Reactant side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should only return a request, not take as an input [since its write only]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EnzymeJAX Benchmarks
Details
| Benchmark suite | Current: 2607329 | Previous: d71d865 | Ratio |
|---|---|---|---|
actmtch / JaXPipe / cpu / Primal |
0.000007801339961588383 s |
0.000006897379980728147 s |
1.13 |
actmtch / Jax / cpu / Primal |
0.000006982140002946835 s |
0.0000069136199908825804 s |
1.01 |
actmtch / HLOOpt / cpu / Primal |
0.000009364600009575952 s |
0.00000841961998048646 s |
1.11 |
actmtch / PartOpt / cpu / Primal |
0.000008105779961624649 s |
0.000006819640047979192 s |
1.19 |
actmtch / IPartOpt / cpu / Primal |
0.000008338160005223472 s |
0.000007617820037921774 s |
1.09 |
actmtch / DefOpt / cpu / Primal |
0.00000885656003447366 s |
0.000007364539997070097 s |
1.20 |
actmtch / IDefOpt / cpu / Primal |
0.000008580240046285325 s |
0.000008136800051943282 s |
1.05 |
actmtch / JaXPipe / cpu / Forward |
0.00001248672004294349 s |
0.0000118703200314485 s |
1.05 |
actmtch / Jax / cpu / Forward |
0.000011506779974297388 s |
0.00001041201993757568 s |
1.11 |
actmtch / HLOOpt / cpu / Forward |
0.000013279659942782018 s |
0.000012178740007584564 s |
1.09 |
actmtch / PartOpt / cpu / Forward |
0.000012227019979036414 s |
0.000011013440016540698 s |
1.11 |
actmtch / IPartOpt / cpu / Forward |
0.0000126099599947338 s |
0.000011710939988915924 s |
1.08 |
actmtch / DefOpt / cpu / Forward |
0.000012218400024721632 s |
0.00001178708000225015 s |
1.04 |
actmtch / IDefOpt / cpu / Forward |
0.00001247872005478712 s |
0.000011781220046032103 s |
1.06 |
actmtch / JaXPipe / cpu / PreRev |
0.000011843380016216542 s |
0.00001191142000607215 s |
0.99 |
actmtch / JaXPipe / cpu / PostRev |
0.000012083959991286974 s |
0.000011489859980429174 s |
1.05 |
actmtch / JaXPipe / cpu / BothRev |
0.000013053219918219838 s |
0.000012797200070053804 s |
1.02 |
actmtch / Jax / cpu / BothRev |
0.000010819220024131938 s |
0.000011320319972583092 s |
0.96 |
actmtch / HLOOpt / cpu / PreRev |
0.000012794480044249212 s |
0.000012538219962152651 s |
1.02 |
actmtch / HLOOpt / cpu / PostRev |
0.000014550819960277297 s |
0.000013989540020702408 s |
1.04 |
actmtch / HLOOpt / cpu / BothRev |
0.000012774460028595058 s |
0.000012801459997717756 s |
1.00 |
actmtch / PartOpt / cpu / PreRev |
0.000011288400019111576 s |
0.000011282919986115304 s |
1.00 |
actmtch / PartOpt / cpu / PostRev |
0.000011578120047488482 s |
0.000010915479970208251 s |
1.06 |
actmtch / PartOpt / cpu / BothRev |
0.00001327068002865417 s |
0.000012874100011686096 s |
1.03 |
actmtch / IPartOpt / cpu / PreRev |
0.000012146540084359004 s |
0.000012087639988749289 s |
1.00 |
actmtch / IPartOpt / cpu / PostRev |
0.000011811599961220054 s |
0.00001114202000280784 s |
1.06 |
actmtch / IPartOpt / cpu / BothRev |
0.000012769280056090791 s |
0.00001213371994708723 s |
1.05 |
actmtch / DefOpt / cpu / PreRev |
0.000012098220013285754 s |
0.000011616179999691667 s |
1.04 |
actmtch / DefOpt / cpu / PostRev |
0.000012647039984585715 s |
0.000012520880018200842 s |
1.01 |
actmtch / DefOpt / cpu / BothRev |
0.000012569259997690096 s |
0.000012740499987557997 s |
0.99 |
actmtch / IDefOpt / cpu / PreRev |
0.000011571579962037504 s |
0.000011899540022568544 s |
0.97 |
actmtch / IDefOpt / cpu / PostRev |
0.0000132195000333013 s |
0.000012510440046753502 s |
1.06 |
actmtch / IDefOpt / cpu / BothRev |
0.000012532540022220929 s |
0.000012450659996829926 s |
1.01 |
actmtch / JaXPipe / cuda / Primal |
0.0000024 s |
||
actmtch / Jax / cuda / Primal |
0.0000024 s |
||
actmtch / HLOOpt / cuda / Primal |
0.0000024 s |
||
actmtch / PartOpt / cuda / Primal |
0.0000024 s |
||
actmtch / IPartOpt / cuda / Primal |
0.0000024 s |
||
actmtch / DefOpt / cuda / Primal |
0.0000024 s |
||
actmtch / IDefOpt / cuda / Primal |
0.0000024 s |
||
actmtch / JaXPipe / cuda / Forward |
0.000010688 s |
||
actmtch / Jax / cuda / Forward |
0.000010208 s |
||
actmtch / HLOOpt / cuda / Forward |
0.000010688 s |
||
actmtch / PartOpt / cuda / Forward |
0.000010816 s |
||
actmtch / IPartOpt / cuda / Forward |
0.000010944 s |
||
actmtch / DefOpt / cuda / Forward |
0.000010784 s |
||
actmtch / IDefOpt / cuda / Forward |
0.000010943 s |
||
actmtch / JaXPipe / cuda / PreRev |
0.000013408 s |
||
actmtch / JaXPipe / cuda / PostRev |
0.000011103 s |
||
actmtch / JaXPipe / cuda / BothRev |
0.000010784 s |
||
actmtch / Jax / cuda / BothRev |
0.000013312 s |
||
actmtch / HLOOpt / cuda / PreRev |
0.000010944 s |
||
actmtch / HLOOpt / cuda / PostRev |
0.000013568 s |
||
actmtch / HLOOpt / cuda / BothRev |
0.000011071 s |
||
actmtch / PartOpt / cuda / PreRev |
0.000010976 s |
||
actmtch / PartOpt / cuda / PostRev |
0.000010753 s |
||
actmtch / PartOpt / cuda / BothRev |
0.000010976 s |
||
actmtch / IPartOpt / cuda / PreRev |
0.000010528 s |
||
actmtch / IPartOpt / cuda / PostRev |
0.000010816 s |
||
actmtch / IPartOpt / cuda / BothRev |
0.000010816 s |
||
actmtch / DefOpt / cuda / PreRev |
0.000010945 s |
||
actmtch / DefOpt / cuda / PostRev |
0.000010944 s |
||
actmtch / DefOpt / cuda / BothRev |
0.000010527 s |
||
actmtch / IDefOpt / cuda / PreRev |
0.0000112 s |
||
actmtch / IDefOpt / cuda / PostRev |
0.00001104 s |
||
actmtch / IDefOpt / cuda / BothRev |
0.00001072 s |
||
actmtch / JaXPipe / tpu / Primal |
5.63425e-7 s |
5.63175e-7 s |
1.00 |
actmtch / Jax / tpu / Primal |
5.967999999999999e-7 s |
5.968500000000001e-7 s |
1.00 |
actmtch / HLOOpt / tpu / Primal |
0.000002093375 s |
0.0000020960500000000005 s |
1.00 |
actmtch / PartOpt / tpu / Primal |
5.9695e-7 s |
5.96575e-7 s |
1.00 |
actmtch / IPartOpt / tpu / Primal |
5.527e-7 s |
5.53025e-7 s |
1.00 |
actmtch / DefOpt / tpu / Primal |
0.0000021629 s |
0.0000021589500000000003 s |
1.00 |
actmtch / IDefOpt / tpu / Primal |
0.0000020946250000000003 s |
0.000002115 s |
0.99 |
actmtch / JaXPipe / tpu / Forward |
0.00000382735 s |
0.00000382905 s |
1.00 |
actmtch / Jax / tpu / Forward |
0.000001215425 s |
0.000001215 s |
1.00 |
actmtch / HLOOpt / tpu / Forward |
0.00000392895 s |
0.0000039365 s |
1.00 |
actmtch / PartOpt / tpu / Forward |
0.0000039112 s |
0.000003928025 s |
1.00 |
actmtch / IPartOpt / tpu / Forward |
0.0000039404 s |
0.0000039343 s |
1.00 |
actmtch / DefOpt / tpu / Forward |
0.000003917225 s |
0.000003908975 s |
1.00 |
actmtch / IDefOpt / tpu / Forward |
0.000003931325 s |
0.000003937574999999999 s |
1.00 |
actmtch / JaXPipe / tpu / PreRev |
0.0000034757 s |
0.000003483325 s |
1.00 |
actmtch / JaXPipe / tpu / PostRev |
0.0000016487 s |
0.000001633125 s |
1.01 |
actmtch / JaXPipe / tpu / BothRev |
0.000003484175 s |
0.0000034899249999999995 s |
1.00 |
actmtch / Jax / tpu / BothRev |
0.00000164205 s |
0.000001637175 s |
1.00 |
actmtch / HLOOpt / tpu / PreRev |
0.00000349265 s |
0.0000034652250000000004 s |
1.01 |
actmtch / HLOOpt / tpu / PostRev |
0.000003425475 s |
0.00000341055 s |
1.00 |
actmtch / HLOOpt / tpu / BothRev |
0.000003471825 s |
0.000003485675 s |
1.00 |
actmtch / PartOpt / tpu / PreRev |
0.000003418675 s |
0.00000340875 s |
1.00 |
actmtch / PartOpt / tpu / PostRev |
0.000001597675 s |
0.00000160075 s |
1.00 |
actmtch / PartOpt / tpu / BothRev |
0.00000341515 s |
0.0000034111000000000003 s |
1.00 |
actmtch / IPartOpt / tpu / PreRev |
0.000003481825 s |
0.00000347 s |
1.00 |
actmtch / IPartOpt / tpu / PostRev |
0.000001635875 s |
0.000001652325 s |
0.99 |
actmtch / IPartOpt / tpu / BothRev |
0.000003471125 s |
0.000003472225 s |
1.00 |
actmtch / DefOpt / tpu / PreRev |
0.0000034099 s |
0.000003425575 s |
1.00 |
actmtch / DefOpt / tpu / PostRev |
0.00000342105 s |
0.0000034118 s |
1.00 |
actmtch / DefOpt / tpu / BothRev |
0.0000034019500000000005 s |
0.000003401575 s |
1.00 |
actmtch / IDefOpt / tpu / PreRev |
0.00000347575 s |
0.0000034789750000000004 s |
1.00 |
actmtch / IDefOpt / tpu / PostRev |
0.0000034126 s |
0.000003421675 s |
1.00 |
actmtch / IDefOpt / tpu / BothRev |
0.00000346505 s |
0.0000034823 s |
1.00 |
actmtch / JaXPipe / cpu / Primal |
0.00001341 s |
0.000006897379980728147 s |
1.94 |
actmtch / Jax / cpu / Primal |
0.000013679 s |
0.0000069136199908825804 s |
1.98 |
actmtch / HLOOpt / cpu / Primal |
0.000014212 s |
0.00000841961998048646 s |
1.69 |
actmtch / PartOpt / cpu / Primal |
0.000013195 s |
0.000006819640047979192 s |
1.93 |
actmtch / IPartOpt / cpu / Primal |
0.000013553 s |
0.000007617820037921774 s |
1.78 |
actmtch / DefOpt / cpu / Primal |
0.000014326 s |
0.000007364539997070097 s |
1.95 |
actmtch / IDefOpt / cpu / Primal |
0.000014025 s |
0.000008136800051943282 s |
1.72 |
actmtch / JaXPipe / cpu / Forward |
0.000019546 s |
0.0000118703200314485 s |
1.65 |
actmtch / Jax / cpu / Forward |
0.000018245 s |
0.00001041201993757568 s |
1.75 |
actmtch / HLOOpt / cpu / Forward |
0.000019313 s |
0.000012178740007584564 s |
1.59 |
actmtch / PartOpt / cpu / Forward |
0.000019215 s |
0.000011013440016540698 s |
1.74 |
actmtch / IPartOpt / cpu / Forward |
0.000019384 s |
0.000011710939988915924 s |
1.66 |
actmtch / DefOpt / cpu / Forward |
0.000019143 s |
0.00001178708000225015 s |
1.62 |
actmtch / IDefOpt / cpu / Forward |
0.000019595 s |
0.000011781220046032103 s |
1.66 |
actmtch / JaXPipe / cpu / PreRev |
0.000019679 s |
0.00001191142000607215 s |
1.65 |
actmtch / JaXPipe / cpu / PostRev |
0.000017996000000000002 s |
0.000011489859980429174 s |
1.57 |
actmtch / JaXPipe / cpu / BothRev |
0.000019959 s |
0.000012797200070053804 s |
1.56 |
actmtch / Jax / cpu / BothRev |
0.000018301 s |
0.000011320319972583092 s |
1.62 |
actmtch / HLOOpt / cpu / PreRev |
0.000019392 s |
0.000012538219962152651 s |
1.55 |
actmtch / HLOOpt / cpu / PostRev |
0.000020011 s |
0.000013989540020702408 s |
1.43 |
actmtch / HLOOpt / cpu / BothRev |
0.000019514 s |
0.000012801459997717756 s |
1.52 |
actmtch / PartOpt / cpu / PreRev |
0.000019284 s |
0.000011282919986115304 s |
1.71 |
actmtch / PartOpt / cpu / PostRev |
0.000018081 s |
0.000010915479970208251 s |
1.66 |
actmtch / PartOpt / cpu / BothRev |
0.000019988 s |
0.000012874100011686096 s |
1.55 |
actmtch / IPartOpt / cpu / PreRev |
0.000019595 s |
0.000012087639988749289 s |
1.62 |
actmtch / IPartOpt / cpu / PostRev |
0.000017902000000000002 s |
0.00001114202000280784 s |
1.61 |
actmtch / IPartOpt / cpu / BothRev |
0.000019791 s |
0.00001213371994708723 s |
1.63 |
actmtch / DefOpt / cpu / PreRev |
0.000019403 s |
0.000011616179999691667 s |
1.67 |
actmtch / DefOpt / cpu / PostRev |
0.000019267 s |
0.000012520880018200842 s |
1.54 |
actmtch / DefOpt / cpu / BothRev |
0.000019727000000000003 s |
0.000012740499987557997 s |
1.55 |
actmtch / IDefOpt / cpu / PreRev |
0.000019631 s |
0.000011899540022568544 s |
1.65 |
actmtch / IDefOpt / cpu / PostRev |
0.000019881 s |
0.000012510440046753502 s |
1.59 |
actmtch / IDefOpt / cpu / BothRev |
0.000019304 s |
0.000012450659996829926 s |
1.55 |
add_one / JaXPipe / cpu / Primal |
0.000009446800013392932 s |
0.00000706440000612929 s |
1.34 |
add_one / Jax / cpu / Primal |
0.000009122679985011928 s |
0.000007601719935337314 s |
1.20 |
add_one / HLOOpt / cpu / Primal |
0.000007943339933262904 s |
0.000007250780045069405 s |
1.10 |
add_one / PartOpt / cpu / Primal |
0.000008322859939653426 s |
0.000007323659974645125 s |
1.14 |
add_one / IPartOpt / cpu / Primal |
0.000008701859951543155 s |
0.000007363139993685763 s |
1.18 |
add_one / DefOpt / cpu / Primal |
0.000007943080054246821 s |
0.000007430599980580155 s |
1.07 |
add_one / IDefOpt / cpu / Primal |
0.00000771577999330475 s |
0.000007264319974638056 s |
1.06 |
add_one / JaXPipe / cpu / Forward |
0.000011471960042399588 s |
0.000011343019978085068 s |
1.01 |
add_one / Jax / cpu / Forward |
0.000012413799995556472 s |
0.00001087398001800466 s |
1.14 |
add_one / HLOOpt / cpu / Forward |
0.000011387340036890236 s |
0.0000115157799791632 s |
0.99 |
add_one / PartOpt / cpu / Forward |
0.000011812899992946769 s |
0.000010771899987958022 s |
1.10 |
add_one / IPartOpt / cpu / Forward |
0.000011171239984832937 s |
0.000011192560004928963 s |
1.00 |
add_one / DefOpt / cpu / Forward |
0.000011936559985770144 s |
0.000011124239945274894 s |
1.07 |
add_one / IDefOpt / cpu / Forward |
0.000011112559968751155 s |
0.000011266279998380925 s |
0.99 |
add_one / JaXPipe / cpu / PreRev |
0.00001307148000705638 s |
0.000013437640000120154 s |
0.97 |
add_one / JaXPipe / cpu / PostRev |
0.000013430580020212802 s |
0.00001323504003266862 s |
1.01 |
add_one / JaXPipe / cpu / BothRev |
0.000013528300005418714 s |
0.000013404940018517664 s |
1.01 |
add_one / Jax / cpu / BothRev |
0.000012836819932999787 s |
0.000012968259970875809 s |
0.99 |
add_one / HLOOpt / cpu / PreRev |
0.000013550359999499052 s |
0.000013323479979590047 s |
1.02 |
add_one / HLOOpt / cpu / PostRev |
0.00001762362000590656 s |
0.000014851399982944714 s |
1.19 |
add_one / HLOOpt / cpu / BothRev |
0.000012584479954966809 s |
0.000012748400004056747 s |
0.99 |
add_one / PartOpt / cpu / PreRev |
0.000013045459982095051 s |
0.000012426560033418356 s |
1.05 |
add_one / PartOpt / cpu / PostRev |
0.000012626800089492462 s |
0.00001276427994525875 s |
0.99 |
add_one / PartOpt / cpu / BothRev |
0.000012988099988433532 s |
0.00001322353997238679 s |
0.98 |
add_one / IPartOpt / cpu / PreRev |
0.000013442439976643071 s |
0.00001235531998645456 s |
1.09 |
add_one / IPartOpt / cpu / PostRev |
0.000012876960081484869 s |
0.000012512420062193996 s |
1.03 |
add_one / IPartOpt / cpu / BothRev |
0.000013156900058675091 s |
0.000013071919993308256 s |
1.01 |
add_one / DefOpt / cpu / PreRev |
0.00001256018002095516 s |
0.00001339427999482723 s |
0.94 |
add_one / DefOpt / cpu / PostRev |
0.000013207299907662671 s |
0.000013905179985158611 s |
0.95 |
add_one / DefOpt / cpu / BothRev |
0.000013874879969080213 s |
0.000012657760025831522 s |
1.10 |
add_one / IDefOpt / cpu / PreRev |
0.000013266940022731431 s |
0.000012838639995607084 s |
1.03 |
add_one / IDefOpt / cpu / PostRev |
0.00001307259999521193 s |
0.000013451439972413937 s |
0.97 |
add_one / IDefOpt / cpu / BothRev |
0.000012516119950305438 s |
0.00001288153997847985 s |
0.97 |
add_one / JaXPipe / cuda / Primal |
0.000002304 s |
||
add_one / Jax / cuda / Primal |
0.000002304 s |
||
add_one / HLOOpt / cuda / Primal |
0.000002335 s |
||
add_one / PartOpt / cuda / Primal |
0.000002335 s |
||
add_one / IPartOpt / cuda / Primal |
0.000002335 s |
||
add_one / DefOpt / cuda / Primal |
0.000002335 s |
||
add_one / IDefOpt / cuda / Primal |
0.000002335 s |
||
add_one / JaXPipe / cuda / Forward |
0.00001056 s |
||
add_one / Jax / cuda / Forward |
0.00001088 s |
||
add_one / HLOOpt / cuda / Forward |
0.000010752 s |
||
add_one / PartOpt / cuda / Forward |
0.000010752 s |
||
add_one / IPartOpt / cuda / Forward |
0.000010783 s |
||
add_one / DefOpt / cuda / Forward |
0.000010848 s |
||
add_one / IDefOpt / cuda / Forward |
0.00001088 s |
||
add_one / JaXPipe / cuda / PreRev |
0.000026944 s |
||
add_one / JaXPipe / cuda / PostRev |
0.000026208 s |
||
add_one / JaXPipe / cuda / BothRev |
0.000026431 s |
||
add_one / Jax / cuda / BothRev |
0.000026368 s |
||
add_one / HLOOpt / cuda / PreRev |
0.000027552 s |
||
add_one / HLOOpt / cuda / PostRev |
0.000026016 s |
||
add_one / HLOOpt / cuda / BothRev |
0.000026368 s |
||
add_one / PartOpt / cuda / PreRev |
0.00002624 s |
||
add_one / PartOpt / cuda / PostRev |
0.00002608 s |
||
add_one / PartOpt / cuda / BothRev |
0.000026048 s |
||
add_one / IPartOpt / cuda / PreRev |
0.000026304 s |
||
add_one / IPartOpt / cuda / PostRev |
0.000027232 s |
||
add_one / IPartOpt / cuda / BothRev |
0.00002704 s |
||
add_one / DefOpt / cuda / PreRev |
0.000026624 s |
||
add_one / DefOpt / cuda / PostRev |
0.000026752 s |
||
add_one / DefOpt / cuda / BothRev |
0.000027872 s |
||
add_one / IDefOpt / cuda / PreRev |
0.000026433000000000003 s |
||
add_one / IDefOpt / cuda / PostRev |
0.000026304 s |
||
add_one / IDefOpt / cuda / BothRev |
0.000027616 s |
||
add_one / JaXPipe / tpu / Primal |
0.0000014321999999999998 s |
0.0000014288 s |
1.00 |
add_one / Jax / tpu / Primal |
0.00000141655 s |
0.000001405625 s |
1.01 |
add_one / HLOOpt / tpu / Primal |
0.00000142245 s |
0.0000014295749999999998 s |
1.00 |
add_one / PartOpt / tpu / Primal |
0.00000140345 s |
0.000001402225 s |
1.00 |
add_one / IPartOpt / tpu / Primal |
0.0000014394499999999998 s |
0.000001428775 s |
1.01 |
add_one / DefOpt / tpu / Primal |
0.0000014021 s |
0.0000014075499999999998 s |
1.00 |
add_one / IDefOpt / tpu / Primal |
0.000001436775 s |
0.000001429925 s |
1.00 |
add_one / JaXPipe / tpu / Forward |
0.000001788725 s |
0.0000018106 s |
0.99 |
add_one / Jax / tpu / Forward |
0.000001840975 s |
0.00000185235 s |
0.99 |
add_one / HLOOpt / tpu / Forward |
0.000001812725 s |
0.00000179575 s |
1.01 |
add_one / PartOpt / tpu / Forward |
0.000001855 s |
0.000001853475 s |
1.00 |
add_one / IPartOpt / tpu / Forward |
0.0000018029 s |
0.000001793625 s |
1.01 |
add_one / DefOpt / tpu / Forward |
0.00000183755 s |
0.000001843375 s |
1.00 |
add_one / IDefOpt / tpu / Forward |
0.0000017965749999999998 s |
0.000001797975 s |
1.00 |
add_one / JaXPipe / tpu / PreRev |
0.0000022323 s |
0.000002233925 s |
1.00 |
add_one / JaXPipe / tpu / PostRev |
0.000002181675 s |
0.000002186125 s |
1.00 |
add_one / JaXPipe / tpu / BothRev |
0.00000223695 s |
0.000002238875 s |
1.00 |
add_one / Jax / tpu / BothRev |
0.000002190475 s |
0.0000021898500000000003 s |
1.00 |
add_one / HLOOpt / tpu / PreRev |
0.000002232975 s |
0.00000224045 s |
1.00 |
add_one / HLOOpt / tpu / PostRev |
0.000002189075 s |
0.0000021848500000000004 s |
1.00 |
add_one / HLOOpt / tpu / BothRev |
0.000002239175 s |
0.0000022363750000000003 s |
1.00 |
add_one / PartOpt / tpu / PreRev |
0.000002181425 s |
0.000002183875 s |
1.00 |
add_one / PartOpt / tpu / PostRev |
0.0000022419 s |
0.00000223835 s |
1.00 |
add_one / PartOpt / tpu / BothRev |
0.0000021854000000000003 s |
0.000002182175 s |
1.00 |
add_one / IPartOpt / tpu / PreRev |
0.00000224525 s |
0.00000224795 s |
1.00 |
add_one / IPartOpt / tpu / PostRev |
0.0000021774 s |
0.000002180375 s |
1.00 |
add_one / IPartOpt / tpu / BothRev |
0.0000022437250000000003 s |
0.000002235325 s |
1.00 |
add_one / DefOpt / tpu / PreRev |
0.000002196525 s |
0.0000021844250000000003 s |
1.01 |
add_one / DefOpt / tpu / PostRev |
0.0000022391 s |
0.00000223515 s |
1.00 |
add_one / DefOpt / tpu / BothRev |
0.000002184025 s |
0.000002192675 s |
1.00 |
add_one / IDefOpt / tpu / PreRev |
0.00000224075 s |
0.0000022315 s |
1.00 |
add_one / IDefOpt / tpu / PostRev |
0.00000219735 s |
0.0000021845000000000004 s |
1.01 |
add_one / IDefOpt / tpu / BothRev |
0.00000224315 s |
0.00000223215 s |
1.00 |
add_one / JaXPipe / cpu / Primal |
0.00001333 s |
0.00000706440000612929 s |
1.89 |
add_one / Jax / cpu / Primal |
0.000013246000000000002 s |
0.000007601719935337314 s |
1.74 |
add_one / HLOOpt / cpu / Primal |
0.000013108 s |
0.000007250780045069405 s |
1.81 |
add_one / PartOpt / cpu / Primal |
0.000013103 s |
0.000007323659974645125 s |
1.79 |
add_one / IPartOpt / cpu / Primal |
0.000013169 s |
0.000007363139993685763 s |
1.79 |
add_one / DefOpt / cpu / Primal |
0.000013203 s |
0.000007430599980580155 s |
1.78 |
add_one / IDefOpt / cpu / Primal |
0.000013018 s |
0.000007264319974638056 s |
1.79 |
add_one / JaXPipe / cpu / Forward |
0.000018053 s |
0.000011343019978085068 s |
1.59 |
add_one / Jax / cpu / Forward |
0.000017839 s |
0.00001087398001800466 s |
1.64 |
add_one / HLOOpt / cpu / Forward |
0.000018105 s |
0.0000115157799791632 s |
1.57 |
add_one / PartOpt / cpu / Forward |
0.000017794 s |
0.000010771899987958022 s |
1.65 |
add_one / IPartOpt / cpu / Forward |
0.000017837 s |
0.000011192560004928963 s |
1.59 |
add_one / DefOpt / cpu / Forward |
0.000017947000000000003 s |
0.000011124239945274894 s |
1.61 |
add_one / IDefOpt / cpu / Forward |
0.000018022 s |
0.000011266279998380925 s |
1.60 |
add_one / JaXPipe / cpu / PreRev |
0.000020671 s |
0.000013437640000120154 s |
1.54 |
add_one / JaXPipe / cpu / PostRev |
0.00001965 s |
0.00001323504003266862 s |
1.48 |
add_one / JaXPipe / cpu / BothRev |
0.000019692 s |
0.000013404940018517664 s |
1.47 |
add_one / Jax / cpu / BothRev |
0.000019872 s |
0.000012968259970875809 s |
1.53 |
add_one / HLOOpt / cpu / PreRev |
0.000019866000000000003 s |
0.000013323479979590047 s |
1.49 |
add_one / HLOOpt / cpu / PostRev |
0.000020458 s |
0.000014851399982944714 s |
1.38 |
add_one / HLOOpt / cpu / BothRev |
0.000020226000000000003 s |
0.000012748400004056747 s |
1.59 |
add_one / PartOpt / cpu / PreRev |
0.000020005 s |
0.000012426560033418356 s |
1.61 |
add_one / PartOpt / cpu / PostRev |
0.000020179 s |
0.00001276427994525875 s |
1.58 |
add_one / PartOpt / cpu / BothRev |
0.000019981 s |
0.00001322353997238679 s |
1.51 |
add_one / IPartOpt / cpu / PreRev |
0.000020751 s |
0.00001235531998645456 s |
1.68 |
add_one / IPartOpt / cpu / PostRev |
0.00001964 s |
0.000012512420062193996 s |
1.57 |
add_one / IPartOpt / cpu / BothRev |
0.000020067000000000003 s |
0.000013071919993308256 s |
1.54 |
add_one / DefOpt / cpu / PreRev |
0.000019539 s |
0.00001339427999482723 s |
1.46 |
add_one / DefOpt / cpu / PostRev |
0.000020039 s |
0.000013905179985158611 s |
1.44 |
add_one / DefOpt / cpu / BothRev |
0.00002019 s |
0.000012657760025831522 s |
1.60 |
add_one / IDefOpt / cpu / PreRev |
0.00002022 s |
0.000012838639995607084 s |
1.57 |
add_one / IDefOpt / cpu / PostRev |
0.000019706 s |
0.000013451439972413937 s |
1.46 |
add_one / IDefOpt / cpu / BothRev |
0.000019612 s |
0.00001288153997847985 s |
1.52 |
add_two / JaXPipe / cpu / Primal |
0.000008228919978137127 s |
0.000007585460025438806 s |
1.08 |
add_two / Jax / cpu / Primal |
0.000007852960061427439 s |
0.000007901919934738543 s |
0.99 |
add_two / HLOOpt / cpu / Primal |
0.00000757889998567407 s |
0.000007734699966022163 s |
0.98 |
add_two / PartOpt / cpu / Primal |
0.000007924899855424882 s |
0.0000077645399778703 s |
1.02 |
add_two / IPartOpt / cpu / Primal |
0.000007571559890493517 s |
0.00000801171999228245 s |
0.95 |
add_two / DefOpt / cpu / Primal |
0.000008030860008148012 s |
0.0000074785999822779556 s |
1.07 |
add_two / IDefOpt / cpu / Primal |
0.00000777446000938653 s |
0.000007569339968540589 s |
1.03 |
add_two / JaXPipe / cpu / Forward |
0.000011113740001746918 s |
0.000011130499970022357 s |
1.00 |
add_two / Jax / cpu / Forward |
0.000011730279911716934 s |
0.000011560640014067755 s |
1.01 |
add_two / HLOOpt / cpu / Forward |
0.000011978540060226806 s |
0.000011478699998406229 s |
1.04 |
add_two / PartOpt / cpu / Forward |
0.000011723240040737436 s |
0.000011006600025211813 s |
1.07 |
add_two / IPartOpt / cpu / Forward |
0.000011729160014510852 s |
0.00001155640001343272 s |
1.01 |
add_two / DefOpt / cpu / Forward |
0.000011317320004309294 s |
0.000011182519965586837 s |
1.01 |
add_two / IDefOpt / cpu / Forward |
0.000012071959918102949 s |
0.000011505340007715858 s |
1.05 |
add_two / JaXPipe / cpu / PreRev |
0.000015901680053502788 s |
0.000015082180016179335 s |
1.05 |
add_two / JaXPipe / cpu / PostRev |
0.00001510661992142559 s |
0.000015239040012602345 s |
0.99 |
add_two / JaXPipe / cpu / BothRev |
0.000016034640084399142 s |
0.000015677139981562504 s |
1.02 |
add_two / Jax / cpu / BothRev |
0.000015171480063145282 s |
0.000015563599990855437 s |
0.97 |
add_two / HLOOpt / cpu / PreRev |
0.000015356760050053707 s |
0.000015294399981939933 s |
1.00 |
add_two / HLOOpt / cpu / PostRev |
0.000017783100029191702 s |
0.00001706452000689751 s |
1.04 |
add_two / HLOOpt / cpu / BothRev |
0.0000157942800251476 s |
0.000015444639993802413 s |
1.02 |
add_two / PartOpt / cpu / PreRev |
0.000015446899997186847 s |
0.000015105920001587946 s |
1.02 |
add_two / PartOpt / cpu / PostRev |
0.00001575201999003184 s |
0.000015631119968020357 s |
1.01 |
add_two / PartOpt / cpu / BothRev |
0.000015933240065351127 s |
0.000015236720055327168 s |
1.05 |
add_two / IPartOpt / cpu / PreRev |
0.000015364900027634574 s |
0.000015926079968267003 s |
0.96 |
add_two / IPartOpt / cpu / PostRev |
0.000016114860081870574 s |
0.00001560540000355104 s |
1.03 |
add_two / IPartOpt / cpu / BothRev |
0.000015513699981966056 s |
0.000014749879992450588 s |
1.05 |
add_two / DefOpt / cpu / PreRev |
0.000015790779962117085 s |
0.00001555791996906919 s |
1.01 |
add_two / DefOpt / cpu / PostRev |
0.00001619841994397575 s |
0.00001548297998851922 s |
1.05 |
add_two / DefOpt / cpu / BothRev |
0.000015951639998093015 s |
0.000015098959956958423 s |
1.06 |
add_two / IDefOpt / cpu / PreRev |
0.000016616000029898713 s |
0.000015209419980237724 s |
1.09 |
add_two / IDefOpt / cpu / PostRev |
0.000015404000114358494 s |
0.000015477779952561833 s |
1.00 |
add_two / IDefOpt / cpu / BothRev |
0.000015376459941762732 s |
0.00001620710004317516 s |
0.95 |
add_two / JaXPipe / cuda / Primal |
0.000002431 s |
||
add_two / Jax / cuda / Primal |
0.000002432 s |
||
add_two / HLOOpt / cuda / Primal |
0.000002431 s |
||
add_two / PartOpt / cuda / Primal |
0.000002431 s |
||
add_two / IPartOpt / cuda / Primal |
0.000002431 s |
||
add_two / DefOpt / cuda / Primal |
0.000002432 s |
||
add_two / IDefOpt / cuda / Primal |
0.000002431 s |
||
add_two / JaXPipe / cuda / Forward |
0.00001088 s |
||
add_two / Jax / cuda / Forward |
0.000010752 s |
||
add_two / HLOOpt / cuda / Forward |
0.00001088 s |
||
add_two / PartOpt / cuda / Forward |
0.00001072 s |
||
add_two / IPartOpt / cuda / Forward |
0.0000104 s |
||
add_two / DefOpt / cuda / Forward |
0.00001088 s |
||
add_two / IDefOpt / cuda / Forward |
0.00001088 s |
||
add_two / JaXPipe / cuda / PreRev |
0.000034208 s |
||
add_two / JaXPipe / cuda / PostRev |
0.000034016 s |
||
add_two / JaXPipe / cuda / BothRev |
0.000034623000000000004 s |
||
add_two / Jax / cuda / BothRev |
0.000033888 s |
||
add_two / HLOOpt / cuda / PreRev |
0.00003488 s |
||
add_two / HLOOpt / cuda / PostRev |
0.000034144000000000004 s |
||
add_two / HLOOpt / cuda / BothRev |
0.000034687 s |
||
add_two / PartOpt / cuda / PreRev |
0.00003504 s |
||
add_two / PartOpt / cuda / PostRev |
0.000033184 s |
||
add_two / PartOpt / cuda / BothRev |
0.000033569 s |
||
add_two / IPartOpt / cuda / PreRev |
0.000034144000000000004 s |
||
add_two / IPartOpt / cuda / PostRev |
0.000033439 s |
||
add_two / IPartOpt / cuda / BothRev |
0.000034784000000000004 s |
||
add_two / DefOpt / cuda / PreRev |
0.000035232 s |
||
add_two / DefOpt / cuda / PostRev |
0.000033665000000000004 s |
||
add_two / DefOpt / cuda / BothRev |
0.00003408 s |
||
add_two / IDefOpt / cuda / PreRev |
0.000034976 s |
||
add_two / IDefOpt / cuda / PostRev |
0.000034399 s |
||
add_two / IDefOpt / cuda / BothRev |
0.000034592 s |
||
add_two / JaXPipe / tpu / Primal |
0.0000014380999999999998 s |
0.0000014355 s |
1.00 |
add_two / Jax / tpu / Primal |
0.00000143245 s |
0.0000014217 s |
1.01 |
add_two / HLOOpt / tpu / Primal |
0.000001430075 s |
0.000001438575 s |
0.99 |
add_two / PartOpt / tpu / Primal |
0.00000142945 s |
0.0000014199 s |
1.01 |
add_two / IPartOpt / tpu / Primal |
0.000001433875 s |
0.0000014303750000000005 s |
1.00 |
add_two / DefOpt / tpu / Primal |
0.0000014306499999999998 s |
0.0000014258 s |
1.00 |
add_two / IDefOpt / tpu / Primal |
0.0000014299500000000002 s |
0.00000144245 s |
0.99 |
add_two / JaXPipe / tpu / Forward |
0.000001819825 s |
0.00000182285 s |
1.00 |
add_two / Jax / tpu / Forward |
0.000001832275 s |
0.000001825575 s |
1.00 |
add_two / HLOOpt / tpu / Forward |
0.00000183045 s |
0.00000182335 s |
1.00 |
add_two / PartOpt / tpu / Forward |
0.000001824275 s |
0.0000018401 s |
0.99 |
add_two / IPartOpt / tpu / Forward |
0.000001830575 s |
0.000001820525 s |
1.01 |
add_two / DefOpt / tpu / Forward |
0.000001826475 s |
0.0000018333 s |
1.00 |
add_two / IDefOpt / tpu / Forward |
0.000001841075 s |
0.00000182775 s |
1.01 |
add_two / JaXPipe / tpu / PreRev |
0.0000028336250000000003 s |
0.0000028411500000000003 s |
1.00 |
add_two / JaXPipe / tpu / PostRev |
0.000002766825 s |
0.0000027745500000000004 s |
1.00 |
add_two / JaXPipe / tpu / BothRev |
0.000002842175 s |
0.0000028418 s |
1.00 |
add_two / Jax / tpu / BothRev |
0.000002752 s |
0.00000276245 s |
1.00 |
add_two / HLOOpt / tpu / PreRev |
0.000002842225 s |
0.000002845725 s |
1.00 |
add_two / HLOOpt / tpu / PostRev |
0.0000027545 s |
0.0000027546250000000003 s |
1.00 |
add_two / HLOOpt / tpu / BothRev |
0.000002840625 s |
0.00000285825 s |
0.99 |
add_two / PartOpt / tpu / PreRev |
0.000002768925 s |
0.0000027701500000000005 s |
1.00 |
add_two / PartOpt / tpu / PostRev |
0.000002835 s |
0.0000028395 s |
1.00 |
add_two / PartOpt / tpu / BothRev |
0.00000274145 s |
0.00000275855 s |
0.99 |
add_two / IPartOpt / tpu / PreRev |
0.0000028288000000000003 s |
0.00000283355 s |
1.00 |
add_two / IPartOpt / tpu / PostRev |
0.0000027467000000000003 s |
0.000002756025 s |
1.00 |
add_two / IPartOpt / tpu / BothRev |
0.0000028326 s |
0.0000028466000000000004 s |
1.00 |
add_two / DefOpt / tpu / PreRev |
0.0000027463500000000004 s |
0.0000027504 s |
1.00 |
add_two / DefOpt / tpu / PostRev |
0.00000284445 s |
0.0000028446500000000004 s |
1.00 |
add_two / DefOpt / tpu / BothRev |
0.0000027459499999999995 s |
0.000002745525 s |
1.00 |
add_two / IDefOpt / tpu / PreRev |
0.00000284615 s |
0.0000028409 s |
1.00 |
add_two / IDefOpt / tpu / PostRev |
0.000002758375 s |
0.000002753225 s |
1.00 |
add_two / IDefOpt / tpu / BothRev |
0.00000283725 s |
0.000002842725 s |
1.00 |
add_two / JaXPipe / cpu / Primal |
0.00001351 s |
0.000007585460025438806 s |
1.78 |
add_two / Jax / cpu / Primal |
0.000013249 s |
0.000007901919934738543 s |
1.68 |
add_two / HLOOpt / cpu / Primal |
0.000013496 s |
0.000007734699966022163 s |
1.74 |
add_two / PartOpt / cpu / Primal |
0.00001362 s |
0.0000077645399778703 s |
1.75 |
add_two / IPartOpt / cpu / Primal |
0.000013305 s |
0.00000801171999228245 s |
1.66 |
add_two / DefOpt / cpu / Primal |
0.000013436 s |
0.0000074785999822779556 s |
1.80 |
add_two / IDefOpt / cpu / Primal |
0.000013282 s |
0.000007569339968540589 s |
1.75 |
add_two / JaXPipe / cpu / Forward |
0.000018373 s |
0.000011130499970022357 s |
1.65 |
add_two / Jax / cpu / Forward |
0.000018196 s |
0.000011560640014067755 s |
1.57 |
add_two / HLOOpt / cpu / Forward |
0.000018221 s |
0.000011478699998406229 s |
1.59 |
add_two / PartOpt / cpu / Forward |
0.000018771 s |
0.000011006600025211813 s |
1.71 |
add_two / IPartOpt / cpu / Forward |
0.000018178 s |
0.00001155640001343272 s |
1.57 |
add_two / DefOpt / cpu / Forward |
0.000018114000000000003 s |
0.000011182519965586837 s |
1.62 |
add_two / IDefOpt / cpu / Forward |
0.000018006 s |
0.000011505340007715858 s |
1.57 |
add_two / JaXPipe / cpu / PreRev |
0.000023699 s |
0.000015082180016179335 s |
1.57 |
add_two / JaXPipe / cpu / PostRev |
0.000023686 s |
0.000015239040012602345 s |
1.55 |
add_two / JaXPipe / cpu / BothRev |
0.000023366000000000003 s |
0.000015677139981562504 s |
1.49 |
add_two / Jax / cpu / BothRev |
0.000022854 s |
0.000015563599990855437 s |
1.47 |
add_two / HLOOpt / cpu / PreRev |
0.000022941 s |
0.000015294399981939933 s |
1.50 |
add_two / HLOOpt / cpu / PostRev |
0.00002342 s |
0.00001706452000689751 s |
1.37 |
add_two / HLOOpt / cpu / BothRev |
0.000023987000000000003 s |
0.000015444639993802413 s |
1.55 |
add_two / PartOpt / cpu / PreRev |
0.000023106 s |
0.000015105920001587946 s |
1.53 |
add_two / PartOpt / cpu / PostRev |
0.000023603 s |
0.000015631119968020357 s |
1.51 |
add_two / PartOpt / cpu / BothRev |
0.000024219 s |
0.000015236720055327168 s |
1.59 |
add_two / IPartOpt / cpu / PreRev |
0.000024302 s |
0.000015926079968267003 s |
1.53 |
add_two / IPartOpt / cpu / PostRev |
0.000024582 s |
0.00001560540000355104 s |
1.58 |
add_two / IPartOpt / cpu / BothRev |
0.000023517 s |
0.000014749879992450588 s |
1.59 |
add_two / DefOpt / cpu / PreRev |
0.000024343 s |
0.00001555791996906919 s |
1.56 |
add_two / DefOpt / cpu / PostRev |
0.000023704 s |
0.00001548297998851922 s |
1.53 |
add_two / DefOpt / cpu / BothRev |
0.000023488 s |
0.000015098959956958423 s |
1.56 |
add_two / IDefOpt / cpu / PreRev |
0.000024819 s |
0.000015209419980237724 s |
1.63 |
add_two / IDefOpt / cpu / PostRev |
0.0000255 s |
0.000015477779952561833 s |
1.65 |
add_two / IDefOpt / cpu / BothRev |
0.000023716 s |
0.00001620710004317516 s |
1.46 |
cache / JaXPipe / cpu / Primal |
0.000006778320021112449 s |
0.000007056179956634878 s |
0.96 |
cache / Jax / cpu / Primal |
0.000008608680091128917 s |
0.000007446979971064138 s |
1.16 |
cache / HLOOpt / cpu / Primal |
0.000008067280050454429 s |
0.000006615619986405363 s |
1.22 |
cache / PartOpt / cpu / Primal |
0.000007780860069033223 s |
0.000006719259981764481 s |
1.16 |
cache / IPartOpt / cpu / Primal |
0.0000078753999696346 s |
0.000006971860020712484 s |
1.13 |
cache / DefOpt / cpu / Primal |
0.000007817879941285355 s |
0.000007025120012258412 s |
1.11 |
cache / IDefOpt / cpu / Primal |
0.000007880059947638073 s |
0.000007174279980972642 s |
1.10 |
cache / JaXPipe / cpu / Forward |
0.000014821899894741363 s |
0.000014488840042758966 s |
1.02 |
cache / Jax / cpu / Forward |
0.000015384719990834128 s |
0.000015192699947874644 s |
1.01 |
cache / HLOOpt / cpu / Forward |
0.000016243100035353563 s |
0.000015517559959334903 s |
1.05 |
cache / PartOpt / cpu / Forward |
0.000014748579960723872 s |
0.000014401319958778911 s |
1.02 |
cache / IPartOpt / cpu / Forward |
0.00001569882002513623 s |
0.000015117139928406687 s |
1.04 |
cache / DefOpt / cpu / Forward |
0.000015263340046658413 s |
0.000014252099936129523 s |
1.07 |
cache / IDefOpt / cpu / Forward |
0.0000152645599519019 s |
0.000015947860019878136 s |
0.96 |
cache / JaXPipe / cpu / PreRev |
0.000017022980009642198 s |
0.000017153660010080785 s |
0.99 |
cache / JaXPipe / cpu / PostRev |
0.000020436460054042983 s |
0.00002086701995722251 s |
0.98 |
cache / JaXPipe / cpu / BothRev |
0.000017449760052841155 s |
0.000016654460014251528 s |
1.05 |
cache / Jax / cpu / BothRev |
0.00002214977999756229 s |
0.000020643239995479235 s |
1.07 |
cache / HLOOpt / cpu / PreRev |
0.000016472999977850122 s |
0.000017784900019250928 s |
0.93 |
cache / HLOOpt / cpu / PostRev |
0.0000197194800057332 s |
0.000021892259992455367 s |
0.90 |
cache / HLOOpt / cpu / BothRev |
0.000017747519996191842 s |
0.000018382259986537976 s |
0.97 |
cache / PartOpt / cpu / PreRev |
0.000015947099946060917 s |
0.000016693739980837562 s |
0.96 |
cache / PartOpt / cpu / PostRev |
0.000022589079999306702 s |
0.000021782079984404844 s |
1.04 |
cache / PartOpt / cpu / BothRev |
0.000016057699904195035 s |
0.000017551979999552715 s |
0.91 |
cache / IPartOpt / cpu / PreRev |
0.00001659378007389023 s |
0.00001678380000157631 s |
0.99 |
cache / IPartOpt / cpu / PostRev |
0.000021509379948838613 s |
0.000021706779980377176 s |
0.99 |
cache / IPartOpt / cpu / BothRev |
0.000016132279924931935 s |
0.000017438879986002576 s |
0.93 |
cache / DefOpt / cpu / PreRev |
0.000016353860082745087 s |
0.00001758438000251772 s |
0.93 |
cache / DefOpt / cpu / PostRev |
0.00001690259996394161 s |
0.000016727100019124918 s |
1.01 |
cache / DefOpt / cpu / BothRev |
0.00001693030000751605 s |
0.000016970419956123806 s |
1.00 |
cache / IDefOpt / cpu / PreRev |
0.00001601755995579879 s |
0.000017113839985540834 s |
0.94 |
cache / IDefOpt / cpu / PostRev |
0.000015976459926605458 s |
0.000016802999980427557 s |
0.95 |
cache / IDefOpt / cpu / BothRev |
0.000015905940072116208 s |
0.000017829540001912392 s |
0.89 |
cache / JaXPipe / cuda / Primal |
0.000002336 s |
||
cache / Jax / cuda / Primal |
0.000002336 s |
||
cache / HLOOpt / cuda / Primal |
0.000002335 s |
||
cache / PartOpt / cuda / Primal |
0.000002335 s |
||
cache / IPartOpt / cuda / Primal |
0.000002335 s |
||
cache / DefOpt / cuda / Primal |
0.000002335 s |
||
cache / IDefOpt / cuda / Primal |
0.000002335 s |
||
cache / JaXPipe / cuda / Forward |
0.0000023670000000000004 s |
||
cache / Jax / cuda / Forward |
0.0000023670000000000004 s |
||
cache / HLOOpt / cuda / Forward |
0.0000023670000000000004 s |
||
cache / PartOpt / cuda / Forward |
0.000002336 s |
||
cache / IPartOpt / cuda / Forward |
0.000002336 s |
||
cache / DefOpt / cuda / Forward |
0.0000023670000000000004 s |
||
cache / IDefOpt / cuda / Forward |
0.0000023670000000000004 s |
||
cache / JaXPipe / cuda / PreRev |
0.000011616 s |
||
cache / JaXPipe / cuda / PostRev |
0.000011425 s |
||
cache / JaXPipe / cuda / BothRev |
0.000011424 s |
||
cache / Jax / cuda / BothRev |
0.000011423 s |
||
cache / HLOOpt / cuda / PreRev |
0.000013727 s |
||
cache / HLOOpt / cuda / PostRev |
0.000013696 s |
||
cache / HLOOpt / cuda / BothRev |
0.000013728 s |
||
cache / PartOpt / cuda / PreRev |
0.000011072 s |
||
cache / PartOpt / cuda / PostRev |
0.000011296 s |
||
cache / PartOpt / cuda / BothRev |
0.000010911 s |
||
cache / IPartOpt / cuda / PreRev |
0.000011168 s |
||
cache / IPartOpt / cuda / PostRev |
0.00001104 s |
||
cache / IPartOpt / cuda / BothRev |
0.000010656 s |
||
cache / DefOpt / cuda / PreRev |
0.000010944 s |
||
cache / DefOpt / cuda / PostRev |
0.000011263 s |
||
cache / DefOpt / cuda / BothRev |
0.000011104 s |
||
cache / IDefOpt / cuda / PreRev |
0.000010976 s |
||
cache / IDefOpt / cuda / PostRev |
0.0000112 s |
||
cache / IDefOpt / cuda / BothRev |
0.00001056 s |
||
cache / JaXPipe / tpu / Primal |
0.000002471575 s |
0.000002457375 s |
1.01 |
cache / Jax / tpu / Primal |
0.000002457325 s |
0.0000024826 s |
0.99 |
cache / HLOOpt / tpu / Primal |
0.000002477925 s |
0.0000024602 s |
1.01 |
cache / PartOpt / tpu / Primal |
0.0000024645500000000004 s |
0.00000245655 s |
1.00 |
cache / IPartOpt / tpu / Primal |
0.0000024774 s |
0.000002473975 s |
1.00 |
cache / DefOpt / tpu / Primal |
0.000002461075 s |
0.000002445925 s |
1.01 |
cache / IDefOpt / tpu / Primal |
0.00000247445 s |
0.000002467875 s |
1.00 |
cache / JaXPipe / tpu / Forward |
0.0000035455750000000004 s |
0.0000035509 s |
1.00 |
cache / Jax / tpu / Forward |
0.00000354205 s |
0.00000355365 s |
1.00 |
cache / HLOOpt / tpu / Forward |
0.00000353565 s |
0.000003554675 s |
0.99 |
cache / PartOpt / tpu / Forward |
0.0000035289749999999995 s |
0.000003536275 s |
1.00 |
cache / IPartOpt / tpu / Forward |
0.000003556375 s |
0.0000035529500000000004 s |
1.00 |
cache / DefOpt / tpu / Forward |
0.00000352405 s |
0.00000352805 s |
1.00 |
cache / IDefOpt / tpu / Forward |
0.00000355235 s |
0.00000355375 s |
1.00 |
cache / JaXPipe / tpu / PreRev |
0.00000495065 s |
0.0000049691500000000005 s |
1.00 |
cache / JaXPipe / tpu / PostRev |
0.00000497545 s |
0.000004967775 s |
1.00 |
cache / JaXPipe / tpu / BothRev |
0.000004979374999999999 s |
0.000004972925 s |
1.00 |
cache / Jax / tpu / BothRev |
0.00000498505 s |
0.000004984625 s |
1.00 |
cache / HLOOpt / tpu / PreRev |
0.000003948575 s |
0.000003951 s |
1.00 |
cache / HLOOpt / tpu / PostRev |
0.00000414235 s |
0.000004137575 s |
1.00 |
cache / HLOOpt / tpu / BothRev |
0.000003937375 s |
0.000003938075 s |
1.00 |
cache / PartOpt / tpu / PreRev |
0.000004981525 s |
0.000005003675 s |
1.00 |
cache / PartOpt / tpu / PostRev |
0.000004992675 s |
0.00000496145 s |
1.01 |
cache / PartOpt / tpu / BothRev |
0.00000498985 s |
0.000004965275 s |
1.00 |
cache / IPartOpt / tpu / PreRev |
0.000004974799999999999 s |
0.000004991749999999999 s |
1.00 |
cache / IPartOpt / tpu / PostRev |
0.000004970899999999999 s |
0.00000496835 s |
1.00 |
cache / IPartOpt / tpu / BothRev |
0.000004947625 s |
0.0000049548 s |
1.00 |
cache / DefOpt / tpu / PreRev |
0.000004987875 s |
0.0000049717 s |
1.00 |
cache / DefOpt / tpu / PostRev |
0.0000049723 s |
0.000004986725 s |
1.00 |
cache / DefOpt / tpu / BothRev |
0.0000049649 s |
0.00000496795 s |
1.00 |
cache / IDefOpt / tpu / PreRev |
0.00000497315 s |
0.00000498765 s |
1.00 |
cache / IDefOpt / tpu / PostRev |
0.000004976274999999999 s |
0.000004972625 s |
1.00 |
cache / IDefOpt / tpu / BothRev |
0.000004969375 s |
0.00000497715 s |
1.00 |
cache / JaXPipe / cpu / Primal |
0.00001281 s |
0.000007056179956634878 s |
1.82 |
cache / Jax / cpu / Primal |
0.000012678 s |
0.000007446979971064138 s |
1.70 |
cache / HLOOpt / cpu / Primal |
0.000012639 s |
0.000006615619986405363 s |
1.91 |
cache / PartOpt / cpu / Primal |
0.000012657 s |
0.000006719259981764481 s |
1.88 |
cache / IPartOpt / cpu / Primal |
0.000012754 s |
0.000006971860020712484 s |
1.83 |
cache / DefOpt / cpu / Primal |
0.000012962 s |
0.000007025120012258412 s |
1.85 |
cache / IDefOpt / cpu / Primal |
0.000013084 s |
0.000007174279980972642 s |
1.82 |
cache / JaXPipe / cpu / Forward |
0.000017829 s |
0.000014488840042758966 s |
1.23 |
cache / Jax / cpu / Forward |
0.000018526 s |
0.000015192699947874644 s |
1.22 |
cache / HLOOpt / cpu / Forward |
0.000017856 s |
0.000015517559959334903 s |
1.15 |
cache / PartOpt / cpu / Forward |
0.000018085 s |
0.000014401319958778911 s |
1.26 |
cache / IPartOpt / cpu / Forward |
0.000017978 s |
0.000015117139928406687 s |
1.19 |
cache / DefOpt / cpu / Forward |
0.000018006 s |
0.000014252099936129523 s |
1.26 |
cache / IDefOpt / cpu / Forward |
0.000017606 s |
0.000015947860019878136 s |
1.10 |
cache / JaXPipe / cpu / PreRev |
0.000018113 s |
0.000017153660010080785 s |
1.06 |
cache / JaXPipe / cpu / PostRev |
0.000020813 s |
0.00002086701995722251 s |
1.00 |
cache / JaXPipe / cpu / BothRev |
0.000019192 s |
0.000016654460014251528 s |
1.15 |
cache / Jax / cpu / BothRev |
0.000030859000000000004 s |
0.000020643239995479235 s |
1.49 |
cache / HLOOpt / cpu / PreRev |
0.000027816 s |
0.000017784900019250928 s |
1.56 |
cache / HLOOpt / cpu / PostRev |
0.000026897 s |
0.000021892259992455367 s |
1.23 |
cache / HLOOpt / cpu / BothRev |
0.000026521 s |
0.000018382259986537976 s |
1.44 |
cache / PartOpt / cpu / PreRev |
0.000032443 s |
0.000016693739980837562 s |
1.94 |
cache / PartOpt / cpu / PostRev |
0.000032167 s |
0.000021782079984404844 s |
1.48 |
cache / PartOpt / cpu / BothRev |
0.000018749 s |
0.000017551979999552715 s |
1.07 |
cache / IPartOpt / cpu / PreRev |
0.00002821 s |
0.00001678380000157631 s |
1.68 |
cache / IPartOpt / cpu / PostRev |
0.000024467 s |
0.000021706779980377176 s |
1.13 |
cache / IPartOpt / cpu / BothRev |
0.000027721 s |
0.000017438879986002576 s |
1.59 |
cache / DefOpt / cpu / PreRev |
0.000027143 s |
0.00001758438000251772 s |
1.54 |
cache / DefOpt / cpu / PostRev |
0.000027852 s |
0.000016727100019124918 s |
1.67 |
cache / DefOpt / cpu / BothRev |
0.000024003 s |
0.000016970419956123806 s |
1.41 |
cache / IDefOpt / cpu / PreRev |
0.000033742 s |
0.000017113839985540834 s |
1.97 |
cache / IDefOpt / cpu / PostRev |
0.00002855 s |
0.000016802999980427557 s |
1.70 |
cache / IDefOpt / cpu / BothRev |
0.000030528 s |
0.000017829540001912392 s |
1.71 |
Concat / JaXPipe / cpu / Primal |
0.00000853854004162713 s |
0.000007423600009133225 s |
1.15 |
Concat / Jax / cpu / Primal |
0.000008292900038213702 s |
0.0000074543200298649024 s |
1.11 |
Concat / HLOOpt / cpu / Primal |
0.000008743599992158124 s |
0.000007162399979279143 s |
1.22 |
Concat / PartOpt / cpu / Primal |
0.000008038120049604914 s |
0.000007042840034046094 s |
1.14 |
Concat / IPartOpt / cpu / Primal |
0.000008651279968034942 s |
0.000007088260008458746 s |
1.22 |
Concat / DefOpt / cpu / Primal |
0.000007556819946330507 s |
0.000006931860007171053 s |
1.09 |
Concat / IDefOpt / cpu / Primal |
0.00000806989995908225 s |
0.000006992560029175365 s |
1.15 |
Concat / JaXPipe / cpu / Forward |
0.000011782979981944663 s |
0.000010997740000675549 s |
1.07 |
Concat / Jax / cpu / Forward |
0.00001173520002339501 s |
0.00001100831997973728 s |
1.07 |
Concat / HLOOpt / cpu / Forward |
0.000012193139973533108 s |
0.00001094812000701495 s |
1.11 |
Concat / PartOpt / cpu / Forward |
0.000011208300056750886 s |
0.000010536919990045136 s |
1.06 |
Concat / IPartOpt / cpu / Forward |
0.00001135069998781546 s |
0.000011385640036678523 s |
1.00 |
Concat / DefOpt / cpu / Forward |
0.000011503539990371792 s |
0.00001138815999183862 s |
1.01 |
Concat / IDefOpt / cpu / Forward |
0.000011685179979394888 s |
0.000011273079962847987 s |
1.04 |
Concat / JaXPipe / cpu / PreRev |
0.0000139744999796676 s |
0.000012589880006999013 s |
1.11 |
Concat / JaXPipe / cpu / PostRev |
0.000013585540018539178 s |
0.00001297502001762041 s |
1.05 |
Concat / JaXPipe / cpu / BothRev |
0.000013054519949946551 s |
0.000012389000021357788 s |
1.05 |
Concat / Jax / cpu / BothRev |
0.000013700719991902587 s |
0.000012437199993655667 s |
1.10 |
Concat / HLOOpt / cpu / PreRev |
0.000013570220035035164 s |
0.00001298888004384935 s |
1.04 |
Concat / HLOOpt / cpu / PostRev |
0.0000151305399958801 s |
0.000014651680039605708 s |
1.03 |
Concat / HLOOpt / cpu / BothRev |
0.000013433439908112631 s |
0.000013292440025907126 s |
1.01 |
Concat / PartOpt / cpu / PreRev |
0.000013654339963977691 s |
0.000012485759962146405 s |
1.09 |
Concat / PartOpt / cpu / PostRev |
0.000012869299989688443 s |
0.00001300403999266564 s |
0.99 |
Concat / PartOpt / cpu / BothRev |
0.00001365785992675228 s |
0.000013027780014454036 s |
1.05 |
Concat / IPartOpt / cpu / PreRev |
0.000013559300004999386 s |
0.000011711039987858384 s |
1.16 |
Concat / IPartOpt / cpu / PostRev |
0.000012565239921968896 s |
0.000013076560016997973 s |
0.96 |
Concat / IPartOpt / cpu / BothRev |
0.000013256500042189143 s |
0.000013247099941509076 s |
1.00 |
Concat / DefOpt / cpu / PreRev |
0.000013137820042175008 s |
0.000012478100006774183 s |
1.05 |
Concat / DefOpt / cpu / PostRev |
0.000013935439965280238 s |
0.000013085759965179024 s |
1.06 |
Concat / DefOpt / cpu / BothRev |
0.0000132373799533525 s |
0.000013082300038149697 s |
1.01 |
Concat / IDefOpt / cpu / PreRev |
0.000013549500035878736 s |
0.000012021520005873751 s |
1.13 |
Concat / IDefOpt / cpu / PostRev |
0.000012924419988848968 s |
0.000012947859986525145 s |
1.00 |
Concat / IDefOpt / cpu / BothRev |
0.000013414599980023924 s |
0.000013117840017002892 s |
1.02 |
Concat / JaXPipe / cuda / Primal |
0.000002464 s |
||
Concat / Jax / cuda / Primal |
0.000002464 s |
||
Concat / HLOOpt / cuda / Primal |
0.000002463 s |
||
Concat / PartOpt / cuda / Primal |
0.000002463 s |
||
Concat / IPartOpt / cuda / Primal |
0.000002463 s |
||
Concat / DefOpt / cuda / Primal |
0.000002464 s |
||
Concat / IDefOpt / cuda / Primal |
0.000002463 s |
||
Concat / JaXPipe / cuda / Forward |
0.000012032 s |
||
Concat / Jax / cuda / Forward |
0.000011039 s |
||
Concat / HLOOpt / cuda / Forward |
0.000011712 s |
||
Concat / PartOpt / cuda / Forward |
0.000011104 s |
||
Concat / IPartOpt / cuda / Forward |
0.000011392 s |
||
Concat / DefOpt / cuda / Forward |
0.000011136 s |
||
Concat / IDefOpt / cuda / Forward |
0.000010688 s |
||
Concat / JaXPipe / cuda / PreRev |
0.00001728 s |
||
Concat / JaXPipe / cuda / PostRev |
0.000017664 s |
||
Concat / JaXPipe / cuda / BothRev |
0.000017152 s |
||
Concat / Jax / cuda / BothRev |
0.000017024 s |
||
Concat / HLOOpt / cuda / PreRev |
0.000019392 s |
||
Concat / HLOOpt / cuda / PostRev |
0.000017344 s |
||
Concat / HLOOpt / cuda / BothRev |
0.000017344 s |
||
Concat / PartOpt / cuda / PreRev |
0.000017056 s |
||
Concat / PartOpt / cuda / PostRev |
0.000017919999999999998 s |
||
Concat / PartOpt / cuda / BothRev |
0.000017184 s |
||
Concat / IPartOpt / cuda / PreRev |
0.000017536 s |
||
Concat / IPartOpt / cuda / PostRev |
0.000017632 s |
||
Concat / IPartOpt / cuda / BothRev |
0.000017503999999999997 s |
||
Concat / DefOpt / cuda / PreRev |
0.000017472 s |
||
Concat / DefOpt / cuda / PostRev |
0.000017119 s |
||
Concat / DefOpt / cuda / BothRev |
0.000016993 s |
||
Concat / IDefOpt / cuda / PreRev |
0.000017503999999999997 s |
||
Concat / IDefOpt / cuda / PostRev |
0.000017344 s |
||
Concat / IDefOpt / cuda / BothRev |
0.000017824 s |
||
Concat / JaXPipe / tpu / Primal |
0.000001482075 s |
0.0000014889749999999998 s |
1.00 |
Concat / Jax / tpu / Primal |
0.0000014892 s |
0.000001478325 s |
1.01 |
Concat / HLOOpt / tpu / Primal |
0.000001480825 s |
0.0000014868749999999998 s |
1.00 |
Concat / PartOpt / tpu / Primal |
0.000001482 s |
0.0000014743999999999998 s |
1.01 |
Concat / IPartOpt / tpu / Primal |
0.00000148455 s |
0.0000014854999999999998 s |
1.00 |
Concat / DefOpt / tpu / Primal |
0.000001482225 s |
0.0000014769 s |
1.00 |
Concat / IDefOpt / tpu / Primal |
0.000001487575 s |
0.0000014824 s |
1.00 |
Concat / JaXPipe / tpu / Forward |
0.000001541525 s |
0.0000015397500000000002 s |
1.00 |
Concat / Jax / tpu / Forward |
0.0000015307 s |
0.0000015126500000000002 s |
1.01 |
Concat / HLOOpt / tpu / Forward |
0.0000015294499999999998 s |
0.00000154435 s |
0.99 |
Concat / PartOpt / tpu / Forward |
0.00000152005 s |
0.000001529425 s |
0.99 |
Concat / IPartOpt / tpu / Forward |
0.0000015528249999999998 s |
0.000001542375 s |
1.01 |
Concat / DefOpt / tpu / Forward |
0.000001526125 s |
0.0000015228500000000002 s |
1.00 |
Concat / IDefOpt / tpu / Forward |
0.0000015443000000000002 s |
0.0000015567 s |
0.99 |
Concat / JaXPipe / tpu / PreRev |
0.000001965475 s |
0.000001959925 s |
1.00 |
Concat / JaXPipe / tpu / PostRev |
0.000002038725 s |
0.0000020423 s |
1.00 |
Concat / JaXPipe / tpu / BothRev |
0.000001965125 s |
0.0000019545 s |
1.01 |
Concat / Jax / tpu / BothRev |
0.0000020292 s |
0.00000202485 s |
1.00 |
Concat / HLOOpt / tpu / PreRev |
0.00000197145 s |
0.00000195365 s |
1.01 |
Concat / HLOOpt / tpu / PostRev |
0.000002020575 s |
0.0000020227 s |
1.00 |
Concat / HLOOpt / tpu / BothRev |
0.000001960375 s |
0.000001956225 s |
1.00 |
Concat / PartOpt / tpu / PreRev |
0.0000020289 s |
0.000002033975 s |
1.00 |
Concat / PartOpt / tpu / PostRev |
0.0000019727500000000003 s |
0.00000196165 s |
1.01 |
Concat / PartOpt / tpu / BothRev |
0.000002029 s |
0.000002028325 s |
1.00 |
Concat / IPartOpt / tpu / PreRev |
0.000001954825 s |
0.00000196235 s |
1.00 |
Concat / IPartOpt / tpu / PostRev |
0.000002022225 s |
0.0000020308750000000003 s |
1.00 |
Concat / IPartOpt / tpu / BothRev |
0.00000195565 s |
0.000001966275 s |
0.99 |
Concat / DefOpt / tpu / PreRev |
0.000002025225 s |
0.0000020217 s |
1.00 |
Concat / DefOpt / tpu / PostRev |
0.00000196195 s |
0.0000019582 s |
1.00 |
Concat / DefOpt / tpu / BothRev |
0.000002034975 s |
0.000002024975 s |
1.00 |
Concat / IDefOpt / tpu / PreRev |
0.000001968425 s |
0.0000019659 s |
1.00 |
Concat / IDefOpt / tpu / PostRev |
0.0000020218 s |
0.000002021575 s |
1.00 |
Concat / IDefOpt / tpu / BothRev |
0.000001962975 s |
0.000001958225 s |
1.00 |
Concat / JaXPipe / cpu / Primal |
0.000012891 s |
0.000007423600009133225 s |
1.74 |
Concat / Jax / cpu / Primal |
0.000013159 s |
0.0000074543200298649024 s |
1.77 |
Concat / HLOOpt / cpu / Primal |
0.000012781 s |
0.000007162399979279143 s |
1.78 |
Concat / PartOpt / cpu / Primal |
0.000013001 s |
0.000007042840034046094 s |
1.85 |
Concat / IPartOpt / cpu / Primal |
0.000012987 s |
0.000007088260008458746 s |
1.83 |
Concat / DefOpt / cpu / Primal |
0.000013385 s |
0.000006931860007171053 s |
1.93 |
Concat / IDefOpt / cpu / Primal |
0.000012923 s |
0.000006992560029175365 s |
1.85 |
Concat / JaXPipe / cpu / Forward |
0.000018111 s |
0.000010997740000675549 s |
1.65 |
Concat / Jax / cpu / Forward |
0.000018073 s |
0.00001100831997973728 s |
1.64 |
Concat / HLOOpt / cpu / Forward |
0.000017422 s |
0.00001094812000701495 s |
1.59 |
Concat / PartOpt / cpu / Forward |
0.000017912 s |
0.000010536919990045136 s |
1.70 |
Concat / IPartOpt / cpu / Forward |
0.000017475 s |
0.000011385640036678523 s |
1.53 |
Concat / DefOpt / cpu / Forward |
0.000018222 s |
0.00001138815999183862 s |
1.60 |
Concat / IDefOpt / cpu / Forward |
0.000018261 s |
0.000011273079962847987 s |
1.62 |
Concat / JaXPipe / cpu / PreRev |
0.000020716 s |
0.000012589880006999013 s |
1.65 |
Concat / JaXPipe / cpu / PostRev |
0.000020142 s |
0.00001297502001762041 s |
1.55 |
Concat / JaXPipe / cpu / BothRev |
0.000020054 s |
0.000012389000021357788 s |
1.62 |
Concat / Jax / cpu / BothRev |
0.000020273 s |
0.000012437199993655667 s |
1.63 |
Concat / HLOOpt / cpu / PreRev |
0.000020265 s |
0.00001298888004384935 s |
1.56 |
Concat / HLOOpt / cpu / PostRev |
0.000020156 s |
0.000014651680039605708 s |
1.38 |
Concat / HLOOpt / cpu / BothRev |
0.000019525 s |
0.000013292440025907126 s |
1.47 |
Concat / PartOpt / cpu / PreRev |
0.000020162 s |
0.000012485759962146405 s |
1.61 |
Concat / PartOpt / cpu / PostRev |
0.000020088 s |
0.00001300403999266564 s |
1.54 |
Concat / PartOpt / cpu / BothRev |
0.000019946 s |
0.000013027780014454036 s |
1.53 |
Concat / IPartOpt / cpu / PreRev |
0.000020458 s |
0.000011711039987858384 s |
1.75 |
Concat / IPartOpt / cpu / PostRev |
0.000019953 s |
0.000013076560016997973 s |
1.53 |
Concat / IPartOpt / cpu / BothRev |
0.000019306 s |
0.000013247099941509076 s |
1.46 |
Concat / DefOpt / cpu / PreRev |
0.000020191 s |
0.000012478100006774183 s |
1.62 |
Concat / DefOpt / cpu / PostRev |
0.000020127 s |
0.000013085759965179024 s |
1.54 |
Concat / DefOpt / cpu / BothRev |
0.000019954 s |
0.000013082300038149697 s |
1.53 |
Concat / IDefOpt / cpu / PreRev |
0.000020334 s |
0.000012021520005873751 s |
1.69 |
Concat / IDefOpt / cpu / PostRev |
0.000019702 s |
0.000012947859986525145 s |
1.52 |
Concat / IDefOpt / cpu / BothRev |
0.00001972 s |
0.000013117840017002892 s |
1.50 |
const_scatter / JaXPipe / cpu / Primal |
0.000008534140015399316 s |
0.000006994200020926655 s |
1.22 |
const_scatter / Jax / cpu / Primal |
0.000008727180047571891 s |
0.000006940659932297421 s |
1.26 |
const_scatter / HLOOpt / cpu / Primal |
0.000009245859891962028 s |
0.000007303180018425337 s |
1.27 |
const_scatter / PartOpt / cpu / Primal |
0.000007525980072387028 s |
0.000006964520007386455 s |
1.08 |
const_scatter / IPartOpt / cpu / Primal |
0.000007913200061011594 s |
0.000007651199975953204 s |
1.03 |
const_scatter / DefOpt / cpu / Primal |
0.000008409620058955624 s |
0.000007847279985071509 s |
1.07 |
const_scatter / IDefOpt / cpu / Primal |
0.000008935640071285889 s |
0.000007382480007436243 s |
1.21 |
const_scatter / JaXPipe / cpu / Forward |
0.000012267279889783824 s |
0.000011513020008351304 s |
1.07 |
const_scatter / Jax / cpu / Forward |
0.000011286559874861269 s |
0.000010891859983530592 s |
1.04 |
const_scatter / HLOOpt / cpu / Forward |
0.000012388600007398054 s |
0.000011693440001181443 s |
1.06 |
const_scatter / PartOpt / cpu / Forward |
0.000011982700052612926 s |
0.000011965339990638312 s |
1.00 |
const_scatter / IPartOpt / cpu / Forward |
0.000012739320081891492 s |
0.000012006939978164154 s |
1.06 |
const_scatter / DefOpt / cpu / Forward |
0.00001239565997821046 s |
0.000011777100016843178 s |
1.05 |
const_scatter / IDefOpt / cpu / Forward |
0.00001212765993841458 s |
0.000011975979950875628 s |
1.01 |
const_scatter / JaXPipe / cpu / PreRev |
0.0002929876999587 s |
0.0002884649599764 s |
1.02 |
const_scatter / JaXPipe / cpu / PostRev |
0.0002850323799975 s |
0.0002808212600211 s |
1.01 |
const_scatter / JaXPipe / cpu / BothRev |
0.0002856764399803 s |
0.0002820423199955 s |
1.01 |
const_scatter / Jax / cpu / BothRev |
0.0002854121200471 s |
0.0002806260200213 s |
1.02 |
const_scatter / HLOOpt / cpu / PreRev |
0.00028734857995 s |
0.0002817600599973 s |
1.02 |
const_scatter / HLOOpt / cpu / PostRev |
0.0002885129200694 s |
0.0002844262800044 s |
1.01 |
const_scatter / HLOOpt / cpu / BothRev |
0.0003000291000353 s |
0.0002816947199698 s |
1.07 |
const_scatter / PartOpt / cpu / PreRev |
0.0002900109400252 s |
0.0002816291600174 s |
1.03 |
const_scatter / PartOpt / cpu / PostRev |
0.0002845576200525 s |
0.0002827880800123 s |
1.01 |
const_scatter / PartOpt / cpu / BothRev |
0.0002947422400029 s |
0.0002830636999988 s |
1.04 |
const_scatter / IPartOpt / cpu / PreRev |
0.0002870086599614 s |
0.0002845580800021 s |
1.01 |
const_scatter / IPartOpt / cpu / PostRev |
0.0003003155798978 s |
0.0002829547199962 s |
1.06 |
const_scatter / IPartOpt / cpu / BothRev |
0.0002866614999948 s |
0.0002849848400182 s |
1.01 |
const_scatter / DefOpt / cpu / PreRev |
0.0002873075599563 s |
0.0002842204000171 s |
1.01 |
const_scatter / DefOpt / cpu / PostRev |
0.0002864068999951 s |
0.0002823248199911 s |
1.01 |
const_scatter / DefOpt / cpu / BothRev |
0.0002866443600032 s |
0.0002831766800045 s |
1.01 |
const_scatter / IDefOpt / cpu / PreRev |
0.0002865423200091 s |
0.0002833324000312 s |
1.01 |
const_scatter / IDefOpt / cpu / PostRev |
0.0002881191400228 s |
0.000286672539969 s |
1.01 |
const_scatter / IDefOpt / cpu / BothRev |
0.00028581397999 s |
0.0002846314399539 s |
1.00 |
const_scatter / JaXPipe / cuda / Primal |
0.000002463 s |
||
const_scatter / Jax / cuda / Primal |
0.000002463 s |
||
const_scatter / HLOOpt / cuda / Primal |
0.000002463 s |
||
const_scatter / PartOpt / cuda / Primal |
0.000002463 s |
||
const_scatter / IPartOpt / cuda / Primal |
0.000002463 s |
||
const_scatter / DefOpt / cuda / Primal |
0.000002463 s |
||
const_scatter / IDefOpt / cuda / Primal |
0.000002464 s |
||
const_scatter / JaXPipe / cuda / Forward |
0.000010944 s |
||
const_scatter / Jax / cuda / Forward |
0.000010816 s |
||
const_scatter / HLOOpt / cuda / Forward |
0.00001088 s |
||
const_scatter / PartOpt / cuda / Forward |
0.000011072 s |
||
const_scatter / IPartOpt / cuda / Forward |
0.000011104 s |
||
const_scatter / DefOpt / cuda / Forward |
0.0000112 s |
||
const_scatter / IDefOpt / cuda / Forward |
0.000013472 s |
||
const_scatter / JaXPipe / cuda / PreRev |
0.000017984 s |
||
const_scatter / JaXPipe / cuda / PostRev |
0.000017663 s |
||
const_scatter / JaXPipe / cuda / BothRev |
0.000017792 s |
||
const_scatter / Jax / cuda / BothRev |
0.000017888000000000002 s |
||
const_scatter / HLOOpt / cuda / PreRev |
0.000017663 s |
||
const_scatter / HLOOpt / cuda / PostRev |
0.0000176 s |
||
const_scatter / HLOOpt / cuda / BothRev |
0.000017503999999999997 s |
||
const_scatter / PartOpt / cuda / PreRev |
0.000018176 s |
||
const_scatter / PartOpt / cuda / PostRev |
0.000020191 s |
||
const_scatter / PartOpt / cuda / BothRev |
0.000017632 s |
||
const_scatter / IPartOpt / cuda / PreRev |
0.000017696 s |
||
const_scatter / IPartOpt / cuda / PostRev |
0.000017632 s |
||
const_scatter / IPartOpt / cuda / BothRev |
0.000017312 s |
||
const_scatter / DefOpt / cuda / PreRev |
0.000017984 s |
||
const_scatter / DefOpt / cuda / PostRev |
0.000017216 s |
||
const_scatter / DefOpt / cuda / BothRev |
0.000017088 s |
||
const_scatter / IDefOpt / cuda / PreRev |
0.000017824 s |
||
const_scatter / IDefOpt / cuda / PostRev |
0.000017664 s |
||
const_scatter / IDefOpt / cuda / BothRev |
0.000017536 s |
||
const_scatter / JaXPipe / tpu / Primal |
0.0000037999 s |
0.00000379185 s |
1.00 |
const_scatter / Jax / tpu / Primal |
0.000003812825 s |
0.00000380515 s |
1.00 |
const_scatter / HLOOpt / tpu / Primal |
0.0000037971 s |
0.0000037833 s |
1.00 |
const_scatter / PartOpt / tpu / Primal |
0.000003828325 s |
0.000003808975 s |
1.01 |
const_scatter / IPartOpt / tpu / Primal |
0.000003802575 s |
0.00000381 s |
1.00 |
const_scatter / DefOpt / tpu / Primal |
0.000003825025 s |
0.000003826425 s |
1.00 |
const_scatter / IDefOpt / tpu / Primal |
0.00000379495 s |
0.00000378795 s |
1.00 |
const_scatter / JaXPipe / tpu / Forward |
0.000006450725000000001 s |
0.000006485174999999999 s |
0.99 |
const_scatter / Jax / tpu / Forward |
0.000006507800000000001 s |
0.000006492449999999999 s |
1.00 |
const_scatter / HLOOpt / tpu / Forward |
0.000006491 s |
0.000006475 s |
1.00 |
const_scatter / PartOpt / tpu / Forward |
0.000006496775 s |
0.000006491749999999999 s |
1.00 |
const_scatter / IPartOpt / tpu / Forward |
0.000006465375 s |
0.000006475925 s |
1.00 |
const_scatter / DefOpt / tpu / Forward |
0.00000650065 s |
0.000006485675 s |
1.00 |
const_scatter / IDefOpt / tpu / Forward |
0.00000648715 s |
0.000006472849999999999 s |
1.00 |
const_scatter / JaXPipe / tpu / PreRev |
0.000006646025 s |
0.000006628175 s |
1.00 |
const_scatter / JaXPipe / tpu / PostRev |
0.0000066257 s |
0.0000066119 s |
1.00 |
const_scatter / JaXPipe / tpu / BothRev |
0.000006633050000000001 s |
0.000006620675 s |
1.00 |
const_scatter / Jax / tpu / BothRev |
0.000006648625 s |
0.000006623175 s |
1.00 |
const_scatter / HLOOpt / tpu / PreRev |
0.000006609075 s |
0.000006610975 s |
1.00 |
const_scatter / HLOOpt / tpu / PostRev |
0.000006633525 s |
0.00000663025 s |
1.00 |
const_scatter / HLOOpt / tpu / BothRev |
0.0000066106 s |
0.000006617225 s |
1.00 |
const_scatter / PartOpt / tpu / PreRev |
0.00000663915 s |
0.000006633499999999999 s |
1.00 |
const_scatter / PartOpt / tpu / PostRev |
0.000006620575 s |
0.0000066066 s |
1.00 |
const_scatter / PartOpt / tpu / BothRev |
0.000006626075000000001 s |
0.0000066218 s |
1.00 |
const_scatter / IPartOpt / tpu / PreRev |
0.0000066152 s |
0.000006618175 s |
1.00 |
const_scatter / IPartOpt / tpu / PostRev |
0.00000663185 s |
0.000006642824999999999 s |
1.00 |
const_scatter / IPartOpt / tpu / BothRev |
0.000006631550000000001 s |
0.000006603299999999999 s |
1.00 |
const_scatter / DefOpt / tpu / PreRev |
0.000006621325 s |
0.000006625575 s |
1.00 |
const_scatter / DefOpt / tpu / PostRev |
0.0000066306250000000006 s |
0.000006600549999999999 s |
1.00 |
const_scatter / DefOpt / tpu / BothRev |
0.000006641025 s |
0.000006618275 s |
1.00 |
const_scatter / IDefOpt / tpu / PreRev |
0.000006634975 s |
0.00000661355 s |
1.00 |
const_scatter / IDefOpt / tpu / PostRev |
0.000006633125 s |
0.000006639000000000001 s |
1.00 |
const_scatter / IDefOpt / tpu / BothRev |
0.000006616975 s |
0.000006601775 s |
1.00 |
const_scatter / JaXPipe / cpu / Primal |
0.000013002 s |
0.000006994200020926655 s |
1.86 |
const_scatter / Jax / cpu / Primal |
0.000012712 s |
0.000006940659932297421 s |
1.83 |
const_scatter / HLOOpt / cpu / Primal |
0.000013778 s |
0.000007303180018425337 s |
1.89 |
const_scatter / PartOpt / cpu / Primal |
0.000012548 s |
0.000006964520007386455 s |
1.80 |
const_scatter / IPartOpt / cpu / Primal |
0.000012967 s |
0.000007651199975953204 s |
1.69 |
const_scatter / DefOpt / cpu / Primal |
0.00001321 s |
0.000007847279985071509 s |
1.68 |
const_scatter / IDefOpt / cpu / Primal |
0.000013255 s |
0.000007382480007436243 s |
1.80 |
const_scatter / JaXPipe / cpu / Forward |
0.000018548 s |
0.000011513020008351304 s |
1.61 |
const_scatter / Jax / cpu / Forward |
0.000016902000000000002 s |
0.000010891859983530592 s |
1.55 |
const_scatter / HLOOpt / cpu / Forward |
0.0000183 s |
0.000011693440001181443 s |
1.56 |
const_scatter / PartOpt / cpu / Forward |
0.000017987 s |
0.000011965339990638312 s |
1.50 |
const_scatter / IPartOpt / cpu / Forward |
0.000018016 s |
0.000012006939978164154 s |
1.50 |
const_scatter / DefOpt / cpu / Forward |
0.000017943 s |
0.000011777100016843178 s |
1.52 |
const_scatter / IDefOpt / cpu / Forward |
0.000018372 s |
0.000011975979950875628 s |
1.53 |
const_scatter / JaXPipe / cpu / PreRev |
0.000520496 s |
0.0002884649599764 s |
1.80 |
const_scatter / JaXPipe / cpu / PostRev |
0.000504577 s |
0.0002808212600211 s |
1.80 |
const_scatter / JaXPipe / cpu / BothRev |
0.000522641 s |
0.0002820423199955 s |
1.85 |
const_scatter / Jax / cpu / BothRev |
0.000499801 s |
0.0002806260200213 s |
1.78 |
const_scatter / HLOOpt / cpu / PreRev |
0.000505684 s |
0.0002817600599973 s |
1.79 |
const_scatter / HLOOpt / cpu / PostRev |
0.0004927369999999 s |
0.0002844262800044 s |
1.73 |
const_scatter / HLOOpt / cpu / BothRev |
0.000514085 s |
0.0002816947199698 s |
1.82 |
const_scatter / PartOpt / cpu / PreRev |
0.0005327159999999 s |
0.0002816291600174 s |
1.89 |
const_scatter / PartOpt / cpu / PostRev |
0.0005184569999999 s |
0.0002827880800123 s |
1.83 |
const_scatter / PartOpt / cpu / BothRev |
0.000538382 s |
0.0002830636999988 s |
1.90 |
const_scatter / IPartOpt / cpu / PreRev |
0.000520047 s |
0.0002845580800021 s |
1.83 |
const_scatter / IPartOpt / cpu / PostRev |
0.000520562 s |
0.0002829547199962 s |
1.84 |
const_scatter / IPartOpt / cpu / BothRev |
0.000524372 s |
0.0002849848400182 s |
1.84 |
const_scatter / DefOpt / cpu / PreRev |
0.000524334 s |
0.0002842204000171 s |
1.84 |
const_scatter / DefOpt / cpu / PostRev |
0.000521496 s |
0.0002823248199911 s |
1.85 |
const_scatter / DefOpt / cpu / BothRev |
0.000517248 s |
0.0002831766800045 s |
1.83 |
const_scatter / IDefOpt / cpu / PreRev |
0.000543275 s |
0.0002833324000312 s |
1.92 |
const_scatter / IDefOpt / cpu / PostRev |
0.0005287699999999 s |
0.000286672539969 s |
1.84 |
const_scatter / IDefOpt / cpu / BothRev |
0.0005252099999999 s |
0.0002846314399539 s |
1.85 |
GenDot / JaXPipe / cpu / Primal |
0.000008958719990914688 s |
0.000008582359987485688 s |
1.04 |
GenDot / Jax / cpu / Primal |
0.00000837783993119956 s |
0.000008547680045012384 s |
0.98 |
GenDot / HLOOpt / cpu / Primal |
0.000009342199937236727 s |
0.000009023879938467871 s |
1.04 |
GenDot / PartOpt / cpu / Primal |
0.000008772699984547217 s |
0.000007385940016320091 s |
1.19 |
GenDot / IPartOpt / cpu / Primal |
0.00000909473998035537 s |
0.000007326580034714425 s |
1.24 |
GenDot / DefOpt / cpu / Primal |
0.000009483880076004423 s |
0.000008422580012847902 s |
1.13 |
GenDot / IDefOpt / cpu / Primal |
0.00000911121989702224 s |
0.000008287680002467823 s |
1.10 |
GenDot / JaXPipe / cpu / Forward |
0.000012151399969297927 s |
0.00001278026004001731 s |
0.95 |
GenDot / Jax / cpu / Forward |
0.000011724580035661347 s |
0.000011590759977480048 s |
1.01 |
GenDot / HLOOpt / cpu / Forward |
0.000012378199917293388 s |
0.000012356199958958311 s |
1.00 |
GenDot / PartOpt / cpu / Forward |
0.000012167280019639292 s |
0.000012064479988111996 s |
1.01 |
GenDot / IPartOpt / cpu / Forward |
0.000012694980068772563 s |
0.0000124142200093047 s |
1.02 |
GenDot / DefOpt / cpu / Forward |
0.000012157339951954784 s |
0.000011938679990635138 s |
1.02 |
GenDot / IDefOpt / cpu / Forward |
0.000012216939958307193 s |
0.000012493860003814916 s |
0.98 |
GenDot / JaXPipe / cpu / PreRev |
0.000012620739962585505 s |
0.000012100460007786751 s |
1.04 |
GenDot / JaXPipe / cpu / PostRev |
0.000011619399992923718 s |
0.000011199239979760025 s |
1.04 |
GenDot / JaXPipe / cpu / BothRev |
0.000013036560067121171 s |
0.000013475320001816726 s |
0.97 |
GenDot / Jax / cpu / BothRev |
0.000012043740025546868 s |
0.00001128350001636136 s |
1.07 |
GenDot / HLOOpt / cpu / PreRev |
0.000012961479951627552 s |
0.00001207449995490606 s |
1.07 |
GenDot / HLOOpt / cpu / PostRev |
0.000014781340014451417 s |
0.000013877620003768243 s |
1.07 |
GenDot / HLOOpt / cpu / BothRev |
0.000012422619984135962 s |
0.000012398079988997778 s |
1.00 |
GenDot / PartOpt / cpu / PreRev |
0.000012376739960018313 s |
0.000011878660006914288 s |
1.04 |
GenDot / PartOpt / cpu / PostRev |
0.00001177330002974486 s |
0.000011188879989276755 s |
1.05 |
GenDot / PartOpt / cpu / BothRev |
0.000013029579986323367 s |
0.00001272072001484048 s |
1.02 |
GenDot / IPartOpt / cpu / PreRev |
0.000012492900023062248 s |
0.000012142459981987483 s |
1.03 |
GenDot / IPartOpt / cpu / PostRev |
0.000011422539992054223 s |
0.00001083865999135014 s |
1.05 |
GenDot / IPartOpt / cpu / BothRev |
0.000012697400015895256 s |
0.00001177546005237673 s |
1.08 |
GenDot / DefOpt / cpu / PreRev |
0.000012311920090724016 s |
0.000012637859963433585 s |
0.97 |
GenDot / DefOpt / cpu / PostRev |
0.0000126659199486312 s |
0.000012783999991370366 s |
0.99 |
GenDot / DefOpt / cpu / BothRev |
0.000012756619998981476 s |
0.000013002139994569006 s |
0.98 |
GenDot / IDefOpt / cpu / PreRev |
0.000012268939972273074 s |
0.00001264439996703004 s |
0.97 |
GenDot / IDefOpt / cpu / PostRev |
0.0000127633200645505 s |
0.000011830120001832256 s |
1.08 |
GenDot / IDefOpt / cpu / BothRev |
0.000011982239921053403 s |
0.00001169350002783176 s |
1.02 |
GenDot / JaXPipe / cuda / Primal |
0.000002527 s |
||
GenDot / Jax / cuda / Primal |
0.000002528 s |
||
GenDot / HLOOpt / cuda / Primal |
0.000002527 s |
||
GenDot / PartOpt / cuda / Primal |
0.00000256 s |
||
GenDot / IPartOpt / cuda / Primal |
0.000002559 s |
||
GenDot / DefOpt / cuda / Primal |
0.000002528 s |
||
GenDot / IDefOpt / cuda / Primal |
0.000002527 s |
||
GenDot / JaXPipe / cuda / Forward |
0.0000128 s |
||
GenDot / Jax / cuda / Forward |
0.000012128 s |
||
GenDot / HLOOpt / cuda / Forward |
0.000010944 s |
||
GenDot / PartOpt / cuda / Forward |
0.000010848 s |
||
GenDot / IPartOpt / cuda / Forward |
0.000011935 s |
||
GenDot / DefOpt / cuda / Forward |
0.000012352 s |
||
GenDot / IDefOpt / cuda / Forward |
0.00001056 s |
||
GenDot / JaXPipe / cuda / PreRev |
0.00001088 s |
||
GenDot / JaXPipe / cuda / PostRev |
0.000010753 s |
||
GenDot / JaXPipe / cuda / BothRev |
0.000011936 s |
||
GenDot / Jax / cuda / BothRev |
0.000010784 s |
||
GenDot / HLOOpt / cuda / PreRev |
0.000010816 s |
||
GenDot / HLOOpt / cuda / PostRev |
0.00001104 s |
||
GenDot / HLOOpt / cuda / BothRev |
0.000010848 s |
||
GenDot / PartOpt / cuda / PreRev |
0.000010656 s |
||
GenDot / PartOpt / cuda / PostRev |
0.00001072 s |
||
GenDot / PartOpt / cuda / BothRev |
0.000011008 s |
||
GenDot / IPartOpt / cuda / PreRev |
0.000010752 s |
||
GenDot / IPartOpt / cuda / PostRev |
0.000011104 s |
||
GenDot / IPartOpt / cuda / BothRev |
0.00001184 s |
||
GenDot / DefOpt / cuda / PreRev |
0.000011008 s |
||
GenDot / DefOpt / cuda / PostRev |
0.000012031 s |
||
GenDot / DefOpt / cuda / BothRev |
0.000010752 s |
||
GenDot / IDefOpt / cuda / PreRev |
0.000010976 s |
||
GenDot / IDefOpt / cuda / PostRev |
0.000011008 s |
||
GenDot / IDefOpt / cuda / BothRev |
0.000010784 s |
||
GenDot / JaXPipe / tpu / Primal |
9.302e-7 s |
9.2965e-7 s |
1.00 |
GenDot / Jax / tpu / Primal |
9.258e-7 s |
9.25425e-7 s |
1.00 |
GenDot / HLOOpt / tpu / Primal |
0.00000158495 s |
0.000001571425 s |
1.01 |
GenDot / PartOpt / tpu / Primal |
9.255e-7 s |
9.26075e-7 s |
1.00 |
GenDot / IPartOpt / tpu / Primal |
9.3035e-7 s |
9.3045e-7 s |
1.00 |
GenDot / DefOpt / tpu / Primal |
0.000001496475 s |
0.0000014878 s |
1.01 |
GenDot / IDefOpt / tpu / Primal |
0.000001575875 s |
0.0000015664 s |
1.01 |
GenDot / JaXPipe / tpu / Forward |
0.0000031652750000000004 s |
0.0000031493500000000006 s |
1.01 |
GenDot / Jax / tpu / Forward |
0.000002319425 s |
0.00000232515 s |
1.00 |
GenDot / HLOOpt / tpu / Forward |
0.0000031127 s |
0.00000311025 s |
1.00 |
GenDot / PartOpt / tpu / Forward |
0.0000032258750000000003 s |
0.000003215475 s |
1.00 |
GenDot / IPartOpt / tpu / Forward |
0.00000311305 s |
0.0000031061 s |
1.00 |
GenDot / DefOpt / tpu / Forward |
0.000003216475 s |
0.000003208275 s |
1.00 |
GenDot / IDefOpt / tpu / Forward |
0.0000031163 s |
0.00000311325 s |
1.00 |
GenDot / JaXPipe / tpu / PreRev |
0.000002963075 s |
0.000002957625 s |
1.00 |
GenDot / JaXPipe / tpu / PostRev |
0.00000241275 s |
0.000002414325 s |
1.00 |
GenDot / JaXPipe / tpu / BothRev |
0.0000029551250000000004 s |
0.0000029555 s |
1.00 |
GenDot / Jax / tpu / BothRev |
0.0000024067 s |
0.000002399625 s |
1.00 |
GenDot / HLOOpt / tpu / PreRev |
0.000002965875 s |
0.0000029610750000000003 s |
1.00 |
GenDot / HLOOpt / tpu / PostRev |
0.000002945975 s |
0.000002922925 s |
1.01 |
GenDot / HLOOpt / tpu / BothRev |
0.0000029622 s |
0.000002958475 s |
1.00 |
GenDot / PartOpt / tpu / PreRev |
0.00000293435 s |
0.00000294585 s |
1.00 |
GenDot / PartOpt / tpu / PostRev |
0.0000023909500000000004 s |
0.00000239025 s |
1.00 |
GenDot / PartOpt / tpu / BothRev |
0.000002936775 s |
0.000002932625 s |
1.00 |
GenDot / IPartOpt / tpu / PreRev |
0.000002952925 s |
0.0000029512749999999995 s |
1.00 |
GenDot / IPartOpt / tpu / PostRev |
0.0000024127250000000003 s |
0.00000239925 s |
1.01 |
GenDot / IPartOpt / tpu / BothRev |
0.000002964375 s |
0.0000029439500000000004 s |
1.01 |
GenDot / DefOpt / tpu / PreRev |
0.00000293365 s |
0.000002926975 s |
1.00 |
GenDot / DefOpt / tpu / PostRev |
0.000002962375 s |
0.000002960325 s |
1.00 |
GenDot / DefOpt / tpu / BothRev |
0.00000294765 s |
0.000002949375 s |
1.00 |
GenDot / IDefOpt / tpu / PreRev |
0.000002959975 s |
0.00000296815 s |
1.00 |
GenDot / IDefOpt / tpu / PostRev |
0.0000029458250000000003 s |
0.0000029295 s |
1.01 |
GenDot / IDefOpt / tpu / BothRev |
0.000002960425 s |
0.0000029574000000000003 s |
1.00 |
GenDot / JaXPipe / cpu / Primal |
0.000015907000000000002 s |
0.000008582359987485688 s |
1.85 |
GenDot / Jax / cpu / Primal |
0.000015581 s |
0.000008547680045012384 s |
1.82 |
GenDot / HLOOpt / cpu / Primal |
0.000014864 s |
0.000009023879938467871 s |
1.65 |
GenDot / PartOpt / cpu / Primal |
0.000015055 s |
0.000007385940016320091 s |
2.04 |
GenDot / IPartOpt / cpu / Primal |
0.000015185 s |
0.000007326580034714425 s |
2.07 |
GenDot / DefOpt / cpu / Primal |
0.000014223 s |
0.000008422580012847902 s |
1.69 |
GenDot / IDefOpt / cpu / Primal |
0.000014034 s |
0.000008287680002467823 s |
1.69 |
GenDot / JaXPipe / cpu / Forward |
0.00001935 s |
0.00001278026004001731 s |
1.51 |
GenDot / Jax / cpu / Forward |
0.000020922 s |
0.000011590759977480048 s |
1.81 |
GenDot / HLOOpt / cpu / Forward |
0.000019004 s |
0.000012356199958958311 s |
1.54 |
GenDot / PartOpt / cpu / Forward |
0.000019408 s |
0.000012064479988111996 s |
1.61 |
GenDot / IPartOpt / cpu / Forward |
0.000019388 s |
0.0000124142200093047 s |
1.56 |
GenDot / DefOpt / cpu / Forward |
0.000020075 s |
0.000011938679990635138 s |
1.68 |
GenDot / IDefOpt / cpu / Forward |
0.000019623 s |
0.000012493860003814916 s |
1.57 |
GenDot / JaXPipe / cpu / PreRev |
0.000020583 s |
0.000012100460007786751 s |
1.70 |
GenDot / JaXPipe / cpu / PostRev |
0.000021388 s |
0.000011199239979760025 s |
1.91 |
GenDot / JaXPipe / cpu / BothRev |
0.00002031 s |
0.000013475320001816726 s |
1.51 |
GenDot / Jax / cpu / BothRev |
0.000021784 s |
0.00001128350001636136 s |
1.93 |
GenDot / HLOOpt / cpu / PreRev |
0.000019542 s |
0.00001207449995490606 s |
1.62 |
GenDot / HLOOpt / cpu / PostRev |
0.000019826 s |
0.000013877620003768243 s |
1.43 |
GenDot / HLOOpt / cpu / BothRev |
0.000020097 s |
0.000012398079988997778 s |
1.62 |
GenDot / PartOpt / cpu / PreRev |
0.000020538 s |
0.000011878660006914288 s |
1.73 |
GenDot / PartOpt / cpu / PostRev |
0.00002045 s |
0.000011188879989276755 s |
1.83 |
GenDot / PartOpt / cpu / BothRev |
0.000020327 s |
0.00001272072001484048 s |
1.60 |
GenDot / IPartOpt / cpu / PreRev |
0.000020169 s |
0.000012142459981987483 s |
1.66 |
GenDot / IPartOpt / cpu / PostRev |
0.000021024 s |
0.00001083865999135014 s |
1.94 |
GenDot / IPartOpt / cpu / BothRev |
0.000019929 s |
0.00001177546005237673 s |
1.69 |
GenDot / DefOpt / cpu / PreRev |
0.000019617 s |
0.000012637859963433585 s |
1.55 |
GenDot / DefOpt / cpu / PostRev |
0.000020167 s |
0.000012783999991370366 s |
1.58 |
GenDot / DefOpt / cpu / BothRev |
0.000020332 s |
0.000013002139994569006 s |
1.56 |
GenDot / IDefOpt / cpu / PreRev |
0.000019705 s |
0.00001264439996703004 s |
1.56 |
GenDot / IDefOpt / cpu / PostRev |
0.000020243 s |
0.000011830120001832256 s |
1.71 |
GenDot / IDefOpt / cpu / BothRev |
0.000019639 s |
0.00001169350002783176 s |
1.68 |
hlo_ffi / JaXPipe / cpu / Primal |
0.000010924319958576234 s |
0.000010208559961029096 s |
1.07 |
hlo_ffi / Jax / cpu / Primal |
0.00001144100000601611 s |
0.000009593339964339977 s |
1.19 |
hlo_ffi / HLOOpt / cpu / Primal |
0.000010675880075723398 s |
0.00001168801994936075 s |
0.91 |
hlo_ffi / PartOpt / cpu / Primal |
0.000010641139997460414 s |
0.000009252899981220252 s |
1.15 |
hlo_ffi / IPartOpt / cpu / Primal |
0.000011026320007658795 s |
0.000009737240015965653 s |
1.13 |
hlo_ffi / DefOpt / cpu / Primal |
0.000010748799995781155 s |
0.000009807599963096435 s |
1.10 |
hlo_ffi / IDefOpt / cpu / Primal |
0.000010324019949621289 s |
0.000009679939939815083 s |
1.07 |
hlo_ffi / JaXPipe / cpu / Forward |
0.000015504019993386466 s |
0.000013769279967164038 s |
1.13 |
hlo_ffi / Jax / cpu / Forward |
0.000015015400076663357 s |
0.000013657240015163551 s |
1.10 |
hlo_ffi / HLOOpt / cpu / Forward |
0.000015290279970940902 s |
0.00001393840000673663 s |
1.10 |
hlo_ffi / PartOpt / cpu / Forward |
0.000014986740025051404 s |
0.000013640720017065176 s |
1.10 |
hlo_ffi / IPartOpt / cpu / Forward |
0.00001530562003608793 s |
0.00001365507996524684 s |
1.12 |
hlo_ffi / DefOpt / cpu / Forward |
0.000015445160115632462 s |
0.000013563320017055958 s |
1.14 |
hlo_ffi / IDefOpt / cpu / Forward |
0.000014876340046612312 s |
0.000013545380024879703 s |
1.10 |
hlo_ffi / JaXPipe / cpu / PreRev |
0.000015881700073805403 s |
0.000014215600040188291 s |
1.12 |
hlo_ffi / JaXPipe / cpu / PostRev |
0.00001554423995912657 s |
0.000014195959975040753 s |
1.09 |
hlo_ffi / JaXPipe / cpu / BothRev |
0.000014963959984015672 s |
0.000013981700012664078 s |
1.07 |
hlo_ffi / Jax / cpu / BothRev |
0.00001568682004290167 s |
0.000013890259988329487 s |
1.13 |
hlo_ffi / HLOOpt / cpu / PreRev |
0.000016117940085678128 s |
0.000014438219986914191 s |
1.12 |
hlo_ffi / HLOOpt / cpu / PostRev |
0.00001703290003206348 s |
0.000016117239983941544 s |
1.06 |
hlo_ffi / HLOOpt / cpu / BothRev |
0.000014804940055910264 s |
0.00001423425995199068 s |
1.04 |
hlo_ffi / PartOpt / cpu / PreRev |
0.000015164779979386369 s |
0.000013936400055172271 s |
1.09 |
hlo_ffi / PartOpt / cpu / PostRev |
0.000014396179940376897 s |
0.000014319900019472698 s |
1.01 |
hlo_ffi / PartOpt / cpu / BothRev |
0.000015006060002633604 s |
0.000013974980001876249 s |
1.07 |
hlo_ffi / IPartOpt / cpu / PreRev |
0.000015746199933346363 s |
0.000014112100025158723 s |
1.12 |
hlo_ffi / IPartOpt / cpu / PostRev |
0.000014658080053777668 s |
0.000014140760013106048 s |
1.04 |
hlo_ffi / IPartOpt / cpu / BothRev |
0.000014772880076634465 s |
0.000014128419979897445 s |
1.05 |
hlo_ffi / DefOpt / cpu / PreRev |
0.000014747459972568324 s |
0.000014014620001034928 s |
1.05 |
hlo_ffi / DefOpt / cpu / PostRev |
0.00001416151997545967 s |
0.000014004400009071104 s |
1.01 |
hlo_ffi / DefOpt / cpu / BothRev |
0.00001534862007247284 s |
0.000014273060005507431 s |
1.08 |
hlo_ffi / IDefOpt / cpu / PreRev |
0.00001533333990664687 s |
0.000014238399980968095 s |
1.08 |
hlo_ffi / IDefOpt / cpu / PostRev |
0.000014773519978916738 s |
0.000014124179979262409 s |
1.05 |
hlo_ffi / IDefOpt / cpu / BothRev |
0.000014627340024162547 s |
0.000014526620025208104 s |
1.01 |
hlo_ffi / JaXPipe / cuda / Primal |
0.0000023670000000000004 s |
||
hlo_ffi / Jax / cuda / Primal |
0.0000023670000000000004 s |
||
hlo_ffi / HLOOpt / cuda / Primal |
0.0000023670000000000004 s |
||
hlo_ffi / PartOpt / cuda / Primal |
0.000002368 s |
||
hlo_ffi / IPartOpt / cuda / Primal |
0.0000023670000000000004 s |
||
hlo_ffi / DefOpt / cuda / Primal |
0.0000023670000000000004 s |
||
hlo_ffi / IDefOpt / cuda / Primal |
0.0000023670000000000004 s |
||
hlo_ffi / JaXPipe / cuda / Forward |
0.000002463 s |
||
hlo_ffi / Jax / cuda / Forward |
0.000002463 s |
||
hlo_ffi / HLOOpt / cuda / Forward |
0.000002463 s |
||
hlo_ffi / PartOpt / cuda / Forward |
0.000002463 s |
||
hlo_ffi / IPartOpt / cuda / Forward |
0.000002463 s |
||
hlo_ffi / DefOpt / cuda / Forward |
0.000002463 s |
||
hlo_ffi / IDefOpt / cuda / Forward |
0.000002463 s |
||
hlo_ffi / JaXPipe / cuda / PreRev |
0.000002463 s |
||
hlo_ffi / JaXPipe / cuda / PostRev |
0.000002431 s |
||
hlo_ffi / JaXPipe / cuda / BothRev |
0.000002463 s |
||
hlo_ffi / Jax / cuda / BothRev |
0.000002463 s |
||
hlo_ffi / HLOOpt / cuda / PreRev |
0.000002432 s |
||
hlo_ffi / HLOOpt / cuda / PostRev |
0.000002431 s |
||
hlo_ffi / HLOOpt / cuda / BothRev |
0.000002432 s |
||
hlo_ffi / PartOpt / cuda / PreRev |
0.000002463 s |
||
hlo_ffi / PartOpt / cuda / PostRev |
0.000002463 s |
||
hlo_ffi / PartOpt / cuda / BothRev |
0.000002463 s |
||
hlo_ffi / IPartOpt / cuda / PreRev |
0.000002432 s |
||
hlo_ffi / IPartOpt / cuda / PostRev |
0.000002431 s |
||
hlo_ffi / IPartOpt / cuda / BothRev |
0.000002432 s |
||
hlo_ffi / DefOpt / cuda / PreRev |
0.000002433 s |
||
hlo_ffi / DefOpt / cuda / PostRev |
0.000002463 s |
||
hlo_ffi / DefOpt / cuda / BothRev |
0.000002463 s |
||
hlo_ffi / IDefOpt / cuda / PreRev |
0.000002463 s |
||
hlo_ffi / IDefOpt / cuda / PostRev |
0.000002432 s |
||
hlo_ffi / IDefOpt / cuda / BothRev |
0.000002463 s |
||
hlo_ffi / JaXPipe / tpu / Primal |
9.342e-7 s |
9.284e-7 s |
1.01 |
hlo_ffi / Jax / tpu / Primal |
9.50775e-7 s |
9.51775e-7 s |
1.00 |
hlo_ffi / HLOOpt / tpu / Primal |
9.1165e-7 s |
9.051e-7 s |
1.01 |
hlo_ffi / PartOpt / tpu / Primal |
9.59075e-7 s |
9.53875e-7 s |
1.01 |
hlo_ffi / IPartOpt / tpu / Primal |
9.09725e-7 s |
9.071e-7 s |
1.00 |
hlo_ffi / DefOpt / tpu / Primal |
9.50875e-7 s |
9.5405e-7 s |
1.00 |
hlo_ffi / IDefOpt / tpu / Primal |
9.04775e-7 s |
9.11875e-7 s |
0.99 |
hlo_ffi / JaXPipe / tpu / Forward |
9.49475e-7 s |
9.48725e-7 s |
1.00 |
hlo_ffi / Jax / tpu / Forward |
9.819e-7 s |
9.8115e-7 s |
1.00 |
hlo_ffi / HLOOpt / tpu / Forward |
9.73875e-7 s |
9.74025e-7 s |
1.00 |
hlo_ffi / PartOpt / tpu / Forward |
9.34475e-7 s |
9.341e-7 s |
1.00 |
hlo_ffi / IPartOpt / tpu / Forward |
9.74525e-7 s |
9.73775e-7 s |
1.00 |
hlo_ffi / DefOpt / tpu / Forward |
9.348e-7 s |
9.3325e-7 s |
1.00 |
hlo_ffi / IDefOpt / tpu / Forward |
9.74175e-7 s |
9.7405e-7 s |
1.00 |
hlo_ffi / JaXPipe / tpu / PreRev |
9.379e-7 s |
9.37975e-7 s |
1.00 |
hlo_ffi / JaXPipe / tpu / PostRev |
9.6555e-7 s |
9.65375e-7 s |
1.00 |
hlo_ffi / JaXPipe / tpu / BothRev |
9.62075e-7 s |
9.62075e-7 s |
1 |
hlo_ffi / Jax / tpu / BothRev |
9.6515e-7 s |
9.65025e-7 s |
1.00 |
hlo_ffi / HLOOpt / tpu / PreRev |
9.63175e-7 s |
9.62025e-7 s |
1.00 |
hlo_ffi / HLOOpt / tpu / PostRev |
9.6495e-7 s |
9.64725e-7 s |
1.00 |
hlo_ffi / HLOOpt / tpu / BothRev |
9.627e-7 s |
9.619e-7 s |
1.00 |
hlo_ffi / PartOpt / tpu / PreRev |
9.65e-7 s |
9.6455e-7 s |
1.00 |
hlo_ffi / PartOpt / tpu / PostRev |
9.625e-7 s |
9.621e-7 s |
1.00 |
hlo_ffi / PartOpt / tpu / BothRev |
9.6525e-7 s |
9.646e-7 s |
1.00 |
hlo_ffi / IPartOpt / tpu / PreRev |
9.628499999999998e-7 s |
9.615e-7 s |
1.00 |
hlo_ffi / IPartOpt / tpu / PostRev |
9.6515e-7 s |
9.649e-7 s |
1.00 |
hlo_ffi / IPartOpt / tpu / BothRev |
9.62675e-7 s |
9.61625e-7 s |
1.00 |
hlo_ffi / DefOpt / tpu / PreRev |
9.6535e-7 s |
9.6435e-7 s |
1.00 |
hlo_ffi / DefOpt / tpu / PostRev |
9.62425e-7 s |
9.61925e-7 s |
1.00 |
hlo_ffi / DefOpt / tpu / BothRev |
9.6475e-7 s |
9.644e-7 s |
1.00 |
hlo_ffi / IDefOpt / tpu / PreRev |
9.6255e-7 s |
9.619e-7 s |
1.00 |
hlo_ffi / IDefOpt / tpu / PostRev |
9.65025e-7 s |
9.64225e-7 s |
1.00 |
hlo_ffi / IDefOpt / tpu / BothRev |
9.62975e-7 s |
9.62225e-7 s |
1.00 |
hlo_ffi / JaXPipe / cpu / Primal |
0.000018172 s |
0.000010208559961029096 s |
1.78 |
hlo_ffi / Jax / cpu / Primal |
0.00001771 s |
0.000009593339964339977 s |
1.85 |
hlo_ffi / HLOOpt / cpu / Primal |
0.000017718999999999998 s |
0.00001168801994936075 s |
1.52 |
hlo_ffi / PartOpt / cpu / Primal |
0.000018361 s |
0.000009252899981220252 s |
1.98 |
hlo_ffi / IPartOpt / cpu / Primal |
0.000018365 s |
0.000009737240015965653 s |
1.89 |
hlo_ffi / DefOpt / cpu / Primal |
0.000018169 s |
0.000009807599963096435 s |
1.85 |
hlo_ffi / IDefOpt / cpu / Primal |
0.000018386 s |
0.000009679939939815083 s |
1.90 |
hlo_ffi / JaXPipe / cpu / Forward |
0.000025315 s |
0.000013769279967164038 s |
1.84 |
hlo_ffi / Jax / cpu / Forward |
0.000024976000000000003 s |
0.000013657240015163551 s |
1.83 |
hlo_ffi / HLOOpt / cpu / Forward |
0.000025217 s |
0.00001393840000673663 s |
1.81 |
hlo_ffi / PartOpt / cpu / Forward |
0.000025763 s |
0.000013640720017065176 s |
1.89 |
hlo_ffi / IPartOpt / cpu / Forward |
0.000025313 s |
0.00001365507996524684 s |
1.85 |
hlo_ffi / DefOpt / cpu / Forward |
0.00002555 s |
0.000013563320017055958 s |
1.88 |
hlo_ffi / IDefOpt / cpu / Forward |
0.000024594 s |
0.000013545380024879703 s |
1.82 |
hlo_ffi / JaXPipe / cpu / PreRev |
0.0000247 s |
0.000014215600040188291 s |
1.74 |
hlo_ffi / JaXPipe / cpu / PostRev |
0.000023574 s |
0.000014195959975040753 s |
1.66 |
hlo_ffi / JaXPipe / cpu / BothRev |
0.000023816 s |
0.000013981700012664078 s |
1.70 |
hlo_ffi / Jax / cpu / BothRev |
0.000024213 s |
0.000013890259988329487 s |
1.74 |
hlo_ffi / HLOOpt / cpu / PreRev |
0.000024655 s |
0.000014438219986914191 s |
1.71 |
hlo_ffi / HLOOpt / cpu / PostRev |
0.000024373 s |
0.000016117239983941544 s |
1.51 |
hlo_ffi / HLOOpt / cpu / BothRev |
0.000024773 s |
0.00001423425995199068 s |
1.74 |
hlo_ffi / PartOpt / cpu / PreRev |
0.000024723 s |
0.000013936400055172271 s |
1.77 |
hlo_ffi / PartOpt / cpu / PostRev |
0.000024967 s |
0.000014319900019472698 s |
1.74 |
hlo_ffi / PartOpt / cpu / BothRev |
0.00002438 s |
0.000013974980001876249 s |
1.74 |
hlo_ffi / IPartOpt / cpu / PreRev |
0.000024695000000000003 s |
0.000014112100025158723 s |
1.75 |
hlo_ffi / IPartOpt / cpu / PostRev |
0.000024251 s |
0.000014140760013106048 s |
1.71 |
hlo_ffi / IPartOpt / cpu / BothRev |
0.000025008 s |
0.000014128419979897445 s |
1.77 |
hlo_ffi / DefOpt / cpu / PreRev |
0.000024347 s |
0.000014014620001034928 s |
1.74 |
hlo_ffi / DefOpt / cpu / PostRev |
0.000025511 s |
0.000014004400009071104 s |
1.82 |
hlo_ffi / DefOpt / cpu / BothRev |
0.000024834 s |
0.000014273060005507431 s |
1.74 |
hlo_ffi / IDefOpt / cpu / PreRev |
0.000024958 s |
0.000014238399980968095 s |
1.75 |
hlo_ffi / IDefOpt / cpu / PostRev |
0.00002497 s |
0.000014124179979262409 s |
1.77 |
hlo_ffi / IDefOpt / cpu / BothRev |
0.000025864 s |
0.000014526620025208104 s |
1.78 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / Primal |
0.0009349355999802 s |
0.0008971354000095 s |
1.04 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cpu / Primal |
0.0009274594001908 s |
0.0008899171998564 s |
1.04 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / Primal |
0.0010079368003061 s |
0.0009708640001008 s |
1.04 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / Primal |
0.0009351313998195 s |
0.0009020605999467 s |
1.04 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / Primal |
0.0009221920001436 s |
0.0008823370000754 s |
1.05 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / Primal |
0.00108485260007 s |
0.0009529861999908 s |
1.14 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / Primal |
0.0010123527998075 s |
0.0009414723999725 s |
1.08 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / Forward |
0.0023791528001311 s |
0.0021872678000363 s |
1.09 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cpu / Forward |
0.0025144418001218 s |
0.0022974508001425 s |
1.09 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / Forward |
0.0023345342000538 s |
0.0021531899999899 s |
1.08 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / Forward |
0.0024414441999397 s |
0.0022659975999886 s |
1.08 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / Forward |
0.0022626698000749 s |
0.0022205513998414 s |
1.02 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / Forward |
0.002243314000043 s |
0.0021808726000017 s |
1.03 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / Forward |
0.0025117023998973 s |
0.0022005146000083 s |
1.14 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / PreRev |
0.0060882047999257 s |
0.0053413053999065 s |
1.14 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / PostRev |
0.0058263736000299 s |
0.0055230883999684 s |
1.05 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / BothRev |
0.0055735834001097 s |
0.0064637317998858 s |
0.86 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cpu / BothRev |
0.0061264169999049 s |
0.003398336400005 s |
1.80 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / PreRev |
0.0065767152000262 s |
0.0054967704000773 s |
1.20 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / PostRev |
0.0038909652001166 s |
0.0053526630000305 s |
0.73 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / BothRev |
0.0062376900001254 s |
0.0051467443998262 s |
1.21 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / PreRev |
0.0040834241997799 s |
0.0055261032000998 s |
0.74 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / PostRev |
0.0063477446001343 s |
0.0050601578001078 s |
1.25 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / BothRev |
0.0039601960001164 s |
0.0053215985999486 s |
0.74 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / PreRev |
0.0064511211998251 s |
0.0049230481999984 s |
1.31 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / PostRev |
0.0040804615999149 s |
0.0054744204000598 s |
0.75 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / BothRev |
0.0063872664002701 s |
0.0050823348001358 s |
1.26 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / PreRev |
0.0042009329998109 s |
0.004144794000058 s |
1.01 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / PostRev |
0.0061575753999932 s |
0.0054135484001562 s |
1.14 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / BothRev |
0.0040519577998566 s |
0.0056405348000225 s |
0.72 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / PreRev |
0.0058569333999912 s |
0.0054345575999832 s |
1.08 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / PostRev |
0.0058035177999045 s |
0.0056602189999466 s |
1.03 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / BothRev |
0.0072535874000095 s |
0.004959446400062 s |
1.46 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cuda / Primal |
0.000295583 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cuda / Primal |
0.000296254 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cuda / Primal |
0.000302366 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cuda / Primal |
0.000296286 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cuda / Primal |
0.000295583 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cuda / Primal |
0.000303326 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cuda / Primal |
0.000302462 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cuda / Forward |
0.0005823009999999 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cuda / Forward |
0.000567517 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cuda / Forward |
0.000582397 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cuda / Forward |
0.000582877 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cuda / Forward |
0.0005835489999999 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cuda / Forward |
0.000582909 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cuda / Forward |
0.000583358 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cuda / PreRev |
0.001056795 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cuda / PostRev |
0.001012763 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cuda / BothRev |
0.001052122 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cuda / BothRev |
0.001005499 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cuda / PreRev |
0.00103753 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cuda / PostRev |
0.001059515 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cuda / BothRev |
0.001037339 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cuda / PreRev |
0.001052057 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cuda / PostRev |
0.00100089 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cuda / BothRev |
0.001052858 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cuda / PreRev |
0.0010513859999999 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cuda / PostRev |
0.001000602 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cuda / BothRev |
0.00105321 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cuda / PreRev |
0.001051547 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cuda / PostRev |
0.000985946 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cuda / BothRev |
0.00105321 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cuda / PreRev |
0.00105417 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cuda / PostRev |
0.00105545 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cuda / BothRev |
0.001054043 s |
||
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / tpu / Primal |
0.0001243749999999 s |
0.000130709 s |
0.95 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / tpu / Primal |
0.00012636525 s |
0.0001240825 s |
1.02 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / tpu / Primal |
0.0001526847499999 s |
0.000160036 s |
0.95 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / tpu / Primal |
0.00013420475 s |
0.0001310375 s |
1.02 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / tpu / Primal |
0.0001313835 s |
0.00013850275 s |
0.95 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / tpu / Primal |
0.0001476569999999 s |
0.0001452479999999 s |
1.02 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / tpu / Primal |
0.0001507925 s |
0.000158184 s |
0.95 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / tpu / Forward |
0.00021233175 s |
0.0002136285 s |
0.99 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / tpu / Forward |
0.0002606585 s |
0.00026264175 s |
0.99 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / tpu / Forward |
0.0002125989999999 s |
0.0002197907499999 s |
0.97 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / tpu / Forward |
0.000218329 s |
0.00021473625 s |
1.02 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / tpu / Forward |
0.00021235775 s |
0.0002155807499999 s |
0.99 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / tpu / Forward |
0.000218641 s |
0.0002177224999999 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / tpu / Forward |
0.00021244225 s |
0.0002154529999999 s |
0.99 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / tpu / PreRev |
0.00035597575 s |
0.00035606725 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / tpu / PostRev |
0.0002567705 s |
0.00025604325 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / tpu / BothRev |
0.00035548375 s |
0.00035553875 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / tpu / BothRev |
0.00025771575 s |
0.00025739475 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / tpu / PreRev |
0.0003557859999999 s |
0.0003556455 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / tpu / PostRev |
0.0002914995 s |
0.0002916115 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / tpu / BothRev |
0.0003558902499999 s |
0.00035595175 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / tpu / PreRev |
0.0003577695 s |
0.00035638875 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / tpu / PostRev |
0.00027321475 s |
0.0002722449999999 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / tpu / BothRev |
0.000358045 s |
0.00035633325 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / tpu / PreRev |
0.0003558115 s |
0.0003561175 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / tpu / PostRev |
0.0002736744999999 s |
0.0002719079999999 s |
1.01 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / tpu / BothRev |
0.00035566225 s |
0.0003558575 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / tpu / PreRev |
0.0003600397499999 s |
0.0003581802499999 s |
1.01 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / tpu / PostRev |
0.000284063 s |
0.000283912 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / tpu / BothRev |
0.00035972725 s |
0.0003585855 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / tpu / PreRev |
0.000358039 s |
0.00035803675 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / tpu / PostRev |
0.00030212025 s |
0.00030181725 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / tpu / BothRev |
0.00035780175 s |
0.00035818825 s |
1.00 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / Primal |
0.002271689 s |
0.0008971354000095 s |
2.53 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cpu / Primal |
0.0025739139999999 s |
0.0008899171998564 s |
2.89 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / Primal |
0.002548352 s |
0.0009708640001008 s |
2.62 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / Primal |
0.002345908 s |
0.0009020605999467 s |
2.60 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / Primal |
0.0024799419999999 s |
0.0008823370000754 s |
2.81 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / Primal |
0.00268944 s |
0.0009529861999908 s |
2.82 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / Primal |
0.00210795 s |
0.0009414723999725 s |
2.24 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / Forward |
0.005897662 s |
0.0021872678000363 s |
2.70 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cpu / Forward |
0.00620977 s |
0.0022974508001425 s |
2.70 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / Forward |
0.006260465 s |
0.0021531899999899 s |
2.91 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / Forward |
0.005660108 s |
0.0022659975999886 s |
2.50 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / Forward |
0.006034746 s |
0.0022205513998414 s |
2.72 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / Forward |
0.005741135 s |
0.0021808726000017 s |
2.63 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / Forward |
0.006221135 s |
0.0022005146000083 s |
2.83 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / PreRev |
0.009134504 s |
0.0053413053999065 s |
1.71 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / PostRev |
0.010198764 s |
0.0055230883999684 s |
1.85 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / BothRev |
0.009020979 s |
0.0064637317998858 s |
1.40 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cpu / BothRev |
0.010675008 s |
0.003398336400005 s |
3.14 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / PreRev |
0.009596225 s |
0.0054967704000773 s |
1.75 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / PostRev |
0.008347126 s |
0.0053526630000305 s |
1.56 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / BothRev |
0.009270003 s |
0.0051467443998262 s |
1.80 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / PreRev |
0.008113154 s |
0.0055261032000998 s |
1.47 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / PostRev |
0.009179662 s |
0.0050601578001078 s |
1.81 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / BothRev |
0.010006134 s |
0.0053215985999486 s |
1.88 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / PreRev |
0.009365746 s |
0.0049230481999984 s |
1.90 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / PostRev |
0.009646417 s |
0.0054744204000598 s |
1.76 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / BothRev |
0.008469473 s |
0.0050823348001358 s |
1.67 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / PreRev |
0.010134985 s |
0.004144794000058 s |
2.45 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / PostRev |
0.008348219 s |
0.0054135484001562 s |
1.54 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / BothRev |
0.0098972 s |
0.0056405348000225 s |
1.75 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / PreRev |
0.00951859 s |
0.0054345575999832 s |
1.75 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / PostRev |
0.009772599 s |
0.0056602189999466 s |
1.73 |
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / BothRev |
0.010182318 s |
0.004959446400062 s |
2.05 |
scatter_sum / JaXPipe / cpu / Primal |
0.000010183399990637551 s |
0.000009197599974868353 s |
1.11 |
scatter_sum / Jax / cpu / Primal |
0.000009889619886962465 s |
0.00000863658000525902 s |
1.15 |
scatter_sum / HLOOpt / cpu / Primal |
0.000009029760003613771 s |
0.000008636440006739577 s |
1.05 |
scatter_sum / PartOpt / cpu / Primal |
0.00000970557988694054 s |
0.000008683640025992645 s |
1.12 |
scatter_sum / IPartOpt / cpu / Primal |
0.000009167279949906515 s |
0.000008986519978861907 s |
1.02 |
scatter_sum / DefOpt / cpu / Primal |
0.000008687619938427816 s |
0.00000832479997370683 s |
1.04 |
scatter_sum / IDefOpt / cpu / Primal |
0.000009356920108984924 s |
0.000008516199995938223 s |
1.10 |
scatter_sum / JaXPipe / cpu / Forward |
0.000013841940053680443 s |
0.000013116119998812791 s |
1.06 |
scatter_sum / Jax / cpu / Forward |
0.000013263760029076366 s |
0.00001274550000744057 s |
1.04 |
scatter_sum / HLOOpt / cpu / Forward |
0.000014280699942901264 s |
0.000012919219998366315 s |
1.11 |
scatter_sum / PartOpt / cpu / Forward |
0.00001395991997924284 s |
0.000012863720012319393 s |
1.09 |
scatter_sum / IPartOpt / cpu / Forward |
0.000014041120011825117 s |
0.000012910679997730768 s |
1.09 |
scatter_sum / DefOpt / cpu / Forward |
0.000014347019932756666 s |
0.000012142800014771638 s |
1.18 |
scatter_sum / IDefOpt / cpu / Forward |
0.000013847979935235345 s |
0.000012415180026437157 s |
1.12 |
scatter_sum / JaXPipe / cpu / PreRev |
0.00001375821990222903 s |
0.000012810599982913118 s |
1.07 |
scatter_sum / JaXPipe / cpu / PostRev |
0.00001326634001088678 s |
0.000013005880018681636 s |
1.02 |
scatter_sum / JaXPipe / cpu / BothRev |
0.000013837100141245172 s |
0.000013457019986162776 s |
1.03 |
scatter_sum / Jax / cpu / BothRev |
0.00001333199992586742 s |
0.000012451120019250084 s |
1.07 |
scatter_sum / HLOOpt / cpu / PreRev |
0.00001337258005150943 s |
0.0000133911399916542 s |
1.00 |
scatter_sum / HLOOpt / cpu / PostRev |
0.00001552734005599632 s |
0.000015021919998616796 s |
1.03 |
scatter_sum / HLOOpt / cpu / BothRev |
0.00001398595994032803 s |
0.000012903119986731329 s |
1.08 |
scatter_sum / PartOpt / cpu / PreRev |
0.000014097860039328224 s |
0.000013085659957141617 s |
1.08 |
scatter_sum / PartOpt / cpu / PostRev |
0.00001340291999440524 s |
0.00001335955999820726 s |
1.00 |
scatter_sum / PartOpt / cpu / BothRev |
0.0000141941800393397 s |
0.000013487699989127578 s |
1.05 |
scatter_sum / IPartOpt / cpu / PreRev |
0.000012951199969393202 s |
0.00001300502000049164 s |
1.00 |
scatter_sum / IPartOpt / cpu / PostRev |
0.000013985779951326549 s |
0.000013540099971578456 s |
1.03 |
scatter_sum / IPartOpt / cpu / BothRev |
0.000013978120095998747 s |
0.000013024959953327198 s |
1.07 |
scatter_sum / DefOpt / cpu / PreRev |
0.000013238460051070431 s |
0.000012345180020929547 s |
1.07 |
scatter_sum / DefOpt / cpu / PostRev |
0.00001396254001519992 s |
0.000013244880028651096 s |
1.05 |
scatter_sum / DefOpt / cpu / BothRev |
0.00001370358011627104 s |
0.00001294892000260006 s |
1.06 |
scatter_sum / IDefOpt / cpu / PreRev |
0.000013782300029561155 s |
0.000012458519977371909 s |
1.11 |
scatter_sum / IDefOpt / cpu / PostRev |
0.00001319270000749384 s |
0.000013432480009214488 s |
0.98 |
scatter_sum / IDefOpt / cpu / BothRev |
0.00001325865992839681 s |
0.000013218240028436412 s |
1.00 |
scatter_sum / JaXPipe / cuda / Primal |
0.000011072 s |
||
scatter_sum / Jax / cuda / Primal |
0.000011552 s |
||
scatter_sum / HLOOpt / cuda / Primal |
0.000011968 s |
||
scatter_sum / PartOpt / cuda / Primal |
0.000010592 s |
||
scatter_sum / IPartOpt / cuda / Primal |
0.000010849 s |
||
scatter_sum / DefOpt / cuda / Primal |
0.000011455999999999998 s |
||
scatter_sum / IDefOpt / cuda / Primal |
0.000011072 s |
||
scatter_sum / JaXPipe / cuda / Forward |
0.00001808 s |
||
scatter_sum / Jax / cuda / Forward |
0.000017856 s |
||
scatter_sum / HLOOpt / cuda / Forward |
0.000017632 s |
||
scatter_sum / PartOpt / cuda / Forward |
0.000017696 s |
||
scatter_sum / IPartOpt / cuda / Forward |
0.000018368 s |
||
scatter_sum / DefOpt / cuda / Forward |
0.000017664 s |
||
scatter_sum / IDefOpt / cuda / Forward |
0.00001808 s |
||
scatter_sum / JaXPipe / cuda / PreRev |
0.000019744000000000003 s |
||
scatter_sum / JaXPipe / cuda / PostRev |
0.000017823 s |
||
scatter_sum / JaXPipe / cuda / BothRev |
0.000017825 s |
||
scatter_sum / Jax / cuda / BothRev |
0.000016992 s |
||
scatter_sum / HLOOpt / cuda / PreRev |
0.000017664 s |
||
scatter_sum / HLOOpt / cuda / PostRev |
0.000017503999999999997 s |
||
scatter_sum / HLOOpt / cuda / BothRev |
0.00001824 s |
||
scatter_sum / PartOpt / cuda / PreRev |
0.000018304 s |
||
scatter_sum / PartOpt / cuda / PostRev |
0.000017856 s |
||
scatter_sum / PartOpt / cuda / BothRev |
0.000017406999999999998 s |
||
scatter_sum / IPartOpt / cuda / PreRev |
0.000018176 s |
||
scatter_sum / IPartOpt / cuda / PostRev |
0.000017760000000000003 s |
||
scatter_sum / IPartOpt / cuda / BothRev |
0.000017888000000000002 s |
||
scatter_sum / DefOpt / cuda / PreRev |
0.000019552 s |
||
scatter_sum / DefOpt / cuda / PostRev |
0.000017632 s |
||
scatter_sum / DefOpt / cuda / BothRev |
0.000018047 s |
||
scatter_sum / IDefOpt / cuda / PreRev |
0.000019648 s |
||
scatter_sum / IDefOpt / cuda / PostRev |
0.000017984 s |
||
scatter_sum / IDefOpt / cuda / BothRev |
0.000017952 s |
||
scatter_sum / JaXPipe / tpu / Primal |
0.0000013508999999999998 s |
0.000001350125 s |
1.00 |
scatter_sum / Jax / tpu / Primal |
0.0000014046500000000002 s |
0.0000014033000000000002 s |
1.00 |
scatter_sum / HLOOpt / tpu / Primal |
0.0000013516 s |
0.0000013502249999999998 s |
1.00 |
scatter_sum / PartOpt / tpu / Primal |
0.0000014050250000000002 s |
0.0000014037250000000005 s |
1.00 |
scatter_sum / IPartOpt / tpu / Primal |
0.00000135135 s |
0.0000013500249999999998 s |
1.00 |
scatter_sum / DefOpt / tpu / Primal |
0.0000014046500000000002 s |
0.0000014043 s |
1.00 |
scatter_sum / IDefOpt / tpu / Primal |
0.00000135045 s |
0.0000013501750000000002 s |
1.00 |
scatter_sum / JaXPipe / tpu / Forward |
0.000002710075 s |
0.0000027027 s |
1.00 |
scatter_sum / Jax / tpu / Forward |
0.000002737625 s |
0.000002726925 s |
1.00 |
scatter_sum / HLOOpt / tpu / Forward |
0.000002701125 s |
0.0000027038000000000003 s |
1.00 |
scatter_sum / PartOpt / tpu / Forward |
0.00000269255 s |
0.00000269375 s |
1.00 |
scatter_sum / IPartOpt / tpu / Forward |
0.000002701375 s |
0.00000270735 s |
1.00 |
scatter_sum / DefOpt / tpu / Forward |
0.0000026921 s |
0.000002696475 s |
1.00 |
scatter_sum / IDefOpt / tpu / Forward |
0.0000027053500000000003 s |
0.0000027051750000000003 s |
1.00 |
scatter_sum / JaXPipe / tpu / PreRev |
0.00000269 s |
0.0000026953 s |
1.00 |
scatter_sum / JaXPipe / tpu / PostRev |
0.000002685 s |
0.00000268745 s |
1.00 |
scatter_sum / JaXPipe / tpu / BothRev |
0.00000271375 s |
0.0000027069 s |
1.00 |
scatter_sum / Jax / tpu / BothRev |
0.000002739075 s |
0.0000027392 s |
1.00 |
scatter_sum / HLOOpt / tpu / PreRev |
0.000002709875 s |
0.000002704325 s |
1.00 |
scatter_sum / HLOOpt / tpu / PostRev |
0.000002749075 s |
0.000002741175 s |
1.00 |
scatter_sum / HLOOpt / tpu / BothRev |
0.0000027067249999999995 s |
0.0000027063 s |
1.00 |
scatter_sum / PartOpt / tpu / PreRev |
0.00000275155 s |
0.000002745575 s |
1.00 |
scatter_sum / PartOpt / tpu / PostRev |
0.000002707175 s |
0.000002701075 s |
1.00 |
scatter_sum / PartOpt / tpu / BothRev |
0.0000027399 s |
0.00000274695 s |
1.00 |
scatter_sum / IPartOpt / tpu / PreRev |
0.000002713825 s |
0.00000270495 s |
1.00 |
scatter_sum / IPartOpt / tpu / PostRev |
0.0000027442250000000004 s |
0.000002741025 s |
1.00 |
scatter_sum / IPartOpt / tpu / BothRev |
0.000002705475 s |
0.000002712075 s |
1.00 |
scatter_sum / DefOpt / tpu / PreRev |
0.000002743525 s |
0.000002741375 s |
1.00 |
scatter_sum / DefOpt / tpu / PostRev |
0.0000027045750000000003 s |
0.000002704725 s |
1.00 |
scatter_sum / DefOpt / tpu / BothRev |
0.000002741375 s |
0.000002746225 s |
1.00 |
scatter_sum / IDefOpt / tpu / PreRev |
0.000002705375 s |
0.000002709225 s |
1.00 |
scatter_sum / IDefOpt / tpu / PostRev |
0.000002738425 s |
0.0000027389999999999995 s |
1.00 |
scatter_sum / IDefOpt / tpu / BothRev |
0.00000270845 s |
0.000002711825 s |
1.00 |
scatter_sum / JaXPipe / cpu / Primal |
0.000016272999999999998 s |
0.000009197599974868353 s |
1.77 |
scatter_sum / Jax / cpu / Primal |
0.000015834 s |
0.00000863658000525902 s |
1.83 |
scatter_sum / HLOOpt / cpu / Primal |
0.000016011 s |
0.000008636440006739577 s |
1.85 |
scatter_sum / PartOpt / cpu / Primal |
0.000015360000000000002 s |
0.000008683640025992645 s |
1.77 |
scatter_sum / IPartOpt / cpu / Primal |
0.000015768000000000002 s |
0.000008986519978861907 s |
1.75 |
scatter_sum / DefOpt / cpu / Primal |
0.000015929999999999998 s |
0.00000832479997370683 s |
1.91 |
scatter_sum / IDefOpt / cpu / Primal |
0.000015848999999999997 s |
0.000008516199995938223 s |
1.86 |
scatter_sum / JaXPipe / cpu / Forward |
0.000024807 s |
0.000013116119998812791 s |
1.89 |
scatter_sum / Jax / cpu / Forward |
0.000024616 s |
0.00001274550000744057 s |
1.93 |
scatter_sum / HLOOpt / cpu / Forward |
0.000024219 s |
0.000012919219998366315 s |
1.87 |
scatter_sum / PartOpt / cpu / Forward |
0.000024722 s |
0.000012863720012319393 s |
1.92 |
scatter_sum / IPartOpt / cpu / Forward |
0.000026084 s |
0.000012910679997730768 s |
2.02 |
scatter_sum / DefOpt / cpu / Forward |
0.000023484 s |
0.000012142800014771638 s |
1.93 |
scatter_sum / IDefOpt / cpu / Forward |
0.000023217 s |
0.000012415180026437157 s |
1.87 |
scatter_sum / JaXPipe / cpu / PreRev |
0.000024878 s |
0.000012810599982913118 s |
1.94 |
scatter_sum / JaXPipe / cpu / PostRev |
0.000022709 s |
0.000013005880018681636 s |
1.75 |
scatter_sum / JaXPipe / cpu / BothRev |
0.00002331 s |
0.000013457019986162776 s |
1.73 |
scatter_sum / Jax / cpu / BothRev |
0.000023879 s |
0.000012451120019250084 s |
1.92 |
scatter_sum / HLOOpt / cpu / PreRev |
0.000024609 s |
0.0000133911399916542 s |
1.84 |
scatter_sum / HLOOpt / cpu / PostRev |
0.000023197 s |
0.000015021919998616796 s |
1.54 |
scatter_sum / HLOOpt / cpu / BothRev |
0.000023532 s |
0.000012903119986731329 s |
1.82 |
scatter_sum / PartOpt / cpu / PreRev |
0.000024254 s |
0.000013085659957141617 s |
1.85 |
scatter_sum / PartOpt / cpu / PostRev |
0.000024451 s |
0.00001335955999820726 s |
1.83 |
scatter_sum / PartOpt / cpu / BothRev |
0.000024654 s |
0.000013487699989127578 s |
1.83 |
scatter_sum / IPartOpt / cpu / PreRev |
0.000024829 s |
0.00001300502000049164 s |
1.91 |
scatter_sum / IPartOpt / cpu / PostRev |
0.000023344 s |
0.000013540099971578456 s |
1.72 |
scatter_sum / IPartOpt / cpu / BothRev |
0.000023167 s |
0.000013024959953327198 s |
1.78 |
scatter_sum / DefOpt / cpu / PreRev |
0.00002406 s |
0.000012345180020929547 s |
1.95 |
scatter_sum / DefOpt / cpu / PostRev |
0.000024225 s |
0.000013244880028651096 s |
1.83 |
scatter_sum / DefOpt / cpu / BothRev |
0.000024111 s |
0.00001294892000260006 s |
1.86 |
scatter_sum / IDefOpt / cpu / PreRev |
0.00002274 s |
0.000012458519977371909 s |
1.83 |
scatter_sum / IDefOpt / cpu / PostRev |
0.000023018 s |
0.000013432480009214488 s |
1.71 |
scatter_sum / IDefOpt / cpu / BothRev |
0.000023566 s |
0.000013218240028436412 s |
1.78 |
slicing / JaXPipe / cpu / Primal |
0.000008707579945621547 s |
0.000006967759964027209 s |
1.25 |
slicing / Jax / cpu / Primal |
0.000007284799994522473 s |
0.000007308279982680688 s |
1.00 |
slicing / HLOOpt / cpu / Primal |
0.000007880920111347223 s |
0.000007030099968687864 s |
1.12 |
slicing / PartOpt / cpu / Primal |
0.000007727299962425605 s |
0.000006606380047742277 s |
1.17 |
slicing / IPartOpt / cpu / Primal |
0.000008197519928216935 s |
0.0000070093600061227335 s |
1.17 |
slicing / DefOpt / cpu / Primal |
0.000007657079986529425 s |
0.000007347559958361672 s |
1.04 |
slicing / IDefOpt / cpu / Primal |
0.000007732699959888123 s |
0.000007295080013136612 s |
1.06 |
slicing / JaXPipe / cpu / Forward |
0.000011202780096937204 s |
0.00001058188000570226 s |
1.06 |
slicing / Jax / cpu / Forward |
0.000010673479991964995 s |
0.000010111599976880823 s |
1.06 |
slicing / HLOOpt / cpu / Forward |
0.000011764920036512197 s |
0.000010369259971412248 s |
1.13 |
slicing / PartOpt / cpu / Forward |
0.000010967979978886433 s |
0.000010642100014592873 s |
1.03 |
slicing / IPartOpt / cpu / Forward |
0.000010802180077007506 s |
0.000011122879968752386 s |
0.97 |
slicing / DefOpt / cpu / Forward |
0.000010347040060878498 s |
0.000010585239997453756 s |
0.98 |
slicing / IDefOpt / cpu / Forward |
0.000011701300009008264 s |
0.000010901799969360582 s |
1.07 |
slicing / JaXPipe / cpu / PreRev |
0.000011341839981469092 s |
0.000010843380023288772 s |
1.05 |
slicing / JaXPipe / cpu / PostRev |
0.000012217339954077031 s |
0.000010892160007642817 s |
1.12 |
slicing / JaXPipe / cpu / BothRev |
0.000011399119939596858 s |
0.00001148246002230735 s |
0.99 |
slicing / Jax / cpu / BothRev |
0.000011811219937953864 s |
0.00001101181999729306 s |
1.07 |
slicing / HLOOpt / cpu / PreRev |
0.000011596399999689311 s |
0.000011168119945068613 s |
1.04 |
slicing / HLOOpt / cpu / PostRev |
0.00001327369996943162 s |
0.00001565627999298158 s |
0.85 |
slicing / HLOOpt / cpu / BothRev |
0.000011081120046583235 s |
0.000011009500012733042 s |
1.01 |
slicing / PartOpt / cpu / PreRev |
0.000011381300064385868 s |
0.000011034819972337572 s |
1.03 |
slicing / PartOpt / cpu / PostRev |
0.0000119806199472805 s |
0.000011052660020141048 s |
1.08 |
slicing / PartOpt / cpu / BothRev |
0.000011821760072052712 s |
0.00001162962001217238 s |
1.02 |
slicing / IPartOpt / cpu / PreRev |
0.00001134347994593554 s |
0.000010544719989411531 s |
1.08 |
slicing / IPartOpt / cpu / PostRev |
0.000011537000009411712 s |
0.000011078819979957187 s |
1.04 |
slicing / IPartOpt / cpu / BothRev |
0.00001193932001115172 s |
0.000011282959994787236 s |
1.06 |
slicing / DefOpt / cpu / PreRev |
0.000011067499945056624 s |
0.000010709039997891525 s |
1.03 |
slicing / DefOpt / cpu / PostRev |
0.000011694160002662102 s |
0.000011488679992908146 s |
1.02 |
slicing / DefOpt / cpu / BothRev |
0.000011757460069929949 s |
0.00001066072003595764 s |
1.10 |
slicing / IDefOpt / cpu / PreRev |
0.000011742820006475086 s |
0.000010654759971657768 s |
1.10 |
slicing / IDefOpt / cpu / PostRev |
0.00001131686003645882 s |
0.000011159800033055945 s |
1.01 |
slicing / IDefOpt / cpu / BothRev |
0.000011514720008563017 s |
0.000010670940046111356 s |
1.08 |
slicing / JaXPipe / cuda / Primal |
0.000002304 s |
||
slicing / Jax / cuda / Primal |
0.000002303 s |
||
slicing / HLOOpt / cuda / Primal |
0.000002303 s |
||
slicing / PartOpt / cuda / Primal |
0.000002303 s |
||
slicing / IPartOpt / cuda / Primal |
0.000002304 s |
||
slicing / DefOpt / cuda / Primal |
0.000002303 s |
||
slicing / IDefOpt / cuda / Primal |
0.000002303 s |
||
slicing / JaXPipe / cuda / Forward |
0.000010433 s |
||
slicing / Jax / cuda / Forward |
0.00001072 s |
||
slicing / HLOOpt / cuda / Forward |
0.000010464 s |
||
slicing / PartOpt / cuda / Forward |
0.000011392 s |
||
slicing / IPartOpt / cuda / Forward |
0.000010784 s |
||
slicing / DefOpt / cuda / Forward |
0.000010528 s |
||
slicing / IDefOpt / cuda / Forward |
0.000011072 s |
||
slicing / JaXPipe / cuda / PreRev |
0.000011104 s |
||
slicing / JaXPipe / cuda / PostRev |
0.000011104 s |
||
slicing / JaXPipe / cuda / BothRev |
0.000010944 s |
||
slicing / Jax / cuda / BothRev |
0.00001104 s |
||
slicing / HLOOpt / cuda / PreRev |
0.000011808 s |
||
slicing / HLOOpt / cuda / PostRev |
0.000010848 s |
||
slicing / HLOOpt / cuda / BothRev |
0.000011328 s |
||
slicing / PartOpt / cuda / PreRev |
0.000010656 s |
||
slicing / PartOpt / cuda / PostRev |
0.000010592 s |
||
slicing / PartOpt / cuda / BothRev |
0.000010432 s |
||
slicing / IPartOpt / cuda / PreRev |
0.000010912 s |
||
slicing / IPartOpt / cuda / PostRev |
0.000010752 s |
||
slicing / IPartOpt / cuda / BothRev |
0.000010592 s |
||
slicing / DefOpt / cuda / PreRev |
0.00001024 s |
||
slicing / DefOpt / cuda / PostRev |
0.000010752 s |
||
slicing / DefOpt / cuda / BothRev |
0.000010847 s |
||
slicing / IDefOpt / cuda / PreRev |
0.00001056 s |
||
slicing / IDefOpt / cuda / PostRev |
0.000010752 s |
||
slicing / IDefOpt / cuda / BothRev |
0.000010496 s |
||
slicing / JaXPipe / tpu / Primal |
0.000001024775 s |
0.00000102665 s |
1.00 |
slicing / Jax / tpu / Primal |
9.68625e-7 s |
9.691e-7 s |
1.00 |
slicing / HLOOpt / tpu / Primal |
0.00000102725 s |
0.000001022725 s |
1.00 |
slicing / PartOpt / tpu / Primal |
9.741e-7 s |
9.7145e-7 s |
1.00 |
slicing / IPartOpt / tpu / Primal |
0.000001022025 s |
0.000001027425 s |
0.99 |
slicing / DefOpt / tpu / Primal |
9.6835e-7 s |
9.7015e-7 s |
1.00 |
slicing / IDefOpt / tpu / Primal |
0.00000102545 s |
0.0000010241500000000002 s |
1.00 |
slicing / JaXPipe / tpu / Forward |
0.000001411 s |
0.000001420325 s |
0.99 |
slicing / Jax / tpu / Forward |
0.000001477525 s |
0.000001482275 s |
1.00 |
slicing / HLOOpt / tpu / Forward |
0.00000151975 s |
0.000001521325 s |
1.00 |
slicing / PartOpt / tpu / Forward |
0.00000150675 s |
0.000001498725 s |
1.01 |
slicing / IPartOpt / tpu / Forward |
0.000001522025 s |
0.0000015166750000000002 s |
1.00 |
slicing / DefOpt / tpu / Forward |
0.000001503025 s |
0.000001497025 s |
1.00 |
slicing / IDefOpt / tpu / Forward |
0.0000015334249999999998 s |
0.0000015183749999999997 s |
1.01 |
slicing / JaXPipe / tpu / PreRev |
0.00000256575 s |
0.0000025757750000000003 s |
1.00 |
slicing / JaXPipe / tpu / PostRev |
0.000002519725 s |
0.000002527475 s |
1.00 |
slicing / JaXPipe / tpu / BothRev |
0.00000259535 s |
0.000002581175 s |
1.01 |
slicing / Jax / tpu / BothRev |
0.0000025354500000000004 s |
0.00000254895 s |
0.99 |
slicing / HLOOpt / tpu / PreRev |
0.0000025794499999999995 s |
0.00000258125 s |
1.00 |
slicing / HLOOpt / tpu / PostRev |
0.0000025419 s |
0.000002547175 s |
1.00 |
slicing / HLOOpt / tpu / BothRev |
0.000002587475 s |
0.0000025804 s |
1.00 |
slicing / PartOpt / tpu / PreRev |
0.000002533275 s |
0.000002536925 s |
1.00 |
slicing / PartOpt / tpu / PostRev |
0.000002586525 s |
0.0000025853 s |
1.00 |
slicing / PartOpt / tpu / BothRev |
0.0000025449 s |
0.0000025357750000000003 s |
1.00 |
slicing / IPartOpt / tpu / PreRev |
0.000002576675 s |
0.000002592675 s |
0.99 |
slicing / IPartOpt / tpu / PostRev |
0.0000025358750000000005 s |
0.0000025356000000000003 s |
1.00 |
slicing / IPartOpt / tpu / BothRev |
0.0000025857 s |
0.0000025901 s |
1.00 |
slicing / DefOpt / tpu / PreRev |
0.0000025307 s |
0.0000025452500000000003 s |
0.99 |
slicing / DefOpt / tpu / PostRev |
0.0000025852000000000003 s |
0.000002585725 s |
1.00 |
slicing / DefOpt / tpu / BothRev |
0.000002543225 s |
0.000002534775 s |
1.00 |
slicing / IDefOpt / tpu / PreRev |
0.000002577675 s |
0.00000259005 s |
1.00 |
slicing / IDefOpt / tpu / PostRev |
0.0000025315250000000003 s |
0.000002541125 s |
1.00 |
slicing / IDefOpt / tpu / BothRev |
0.000002578375 s |
0.0000025866250000000004 s |
1.00 |
slicing / JaXPipe / cpu / Primal |
0.000012958 s |
0.000006967759964027209 s |
1.86 |
slicing / Jax / cpu / Primal |
0.000012691 s |
0.000007308279982680688 s |
1.74 |
slicing / HLOOpt / cpu / Primal |
0.000012748 s |
0.000007030099968687864 s |
1.81 |
slicing / PartOpt / cpu / Primal |
0.000012629 s |
0.000006606380047742277 s |
1.91 |
slicing / IPartOpt / cpu / Primal |
0.000012712 s |
0.0000070093600061227335 s |
1.81 |
slicing / DefOpt / cpu / Primal |
0.000012529 s |
0.000007347559958361672 s |
1.71 |
slicing / IDefOpt / cpu / Primal |
0.000012598 s |
0.000007295080013136612 s |
1.73 |
slicing / JaXPipe / cpu / Forward |
0.000017235 s |
0.00001058188000570226 s |
1.63 |
slicing / Jax / cpu / Forward |
0.000017001 s |
0.000010111599976880823 s |
1.68 |
slicing / HLOOpt / cpu / Forward |
0.000016847 s |
0.000010369259971412248 s |
1.62 |
slicing / PartOpt / cpu / Forward |
0.000016839 s |
0.000010642100014592873 s |
1.58 |
slicing / IPartOpt / cpu / Forward |
0.000017009 s |
0.000011122879968752386 s |
1.53 |
slicing / DefOpt / cpu / Forward |
0.00001675 s |
0.000010585239997453756 s |
1.58 |
slicing / IDefOpt / cpu / Forward |
0.000016903 s |
0.000010901799969360582 s |
1.55 |
slicing / JaXPipe / cpu / PreRev |
0.000017603 s |
0.000010843380023288772 s |
1.62 |
slicing / JaXPipe / cpu / PostRev |
0.000017506 s |
0.000010892160007642817 s |
1.61 |
slicing / JaXPipe / cpu / BothRev |
0.000017277 s |
0.00001148246002230735 s |
1.50 |
slicing / Jax / cpu / BothRev |
0.000017389999999999998 s |
0.00001101181999729306 s |
1.58 |
slicing / HLOOpt / cpu / PreRev |
0.00001825 s |
0.000011168119945068613 s |
1.63 |
slicing / HLOOpt / cpu / PostRev |
0.000018025 s |
0.00001565627999298158 s |
1.15 |
slicing / HLOOpt / cpu / BothRev |
0.000017539 s |
0.000011009500012733042 s |
1.59 |
slicing / PartOpt / cpu / PreRev |
0.000018402 s |
0.000011034819972337572 s |
1.67 |
slicing / PartOpt / cpu / PostRev |
0.000017787 s |
0.000011052660020141048 s |
1.61 |
slicing / PartOpt / cpu / BothRev |
0.000017964999999999998 s |
0.00001162962001217238 s |
1.54 |
slicing / IPartOpt / cpu / PreRev |
0.000017701000000000002 s |
0.000010544719989411531 s |
1.68 |
slicing / IPartOpt / cpu / PostRev |
0.000017593999999999998 s |
0.000011078819979957187 s |
1.59 |
slicing / IPartOpt / cpu / BothRev |
0.000017819 s |
0.000011282959994787236 s |
1.58 |
slicing / DefOpt / cpu / PreRev |
0.000018437 s |
0.000010709039997891525 s |
1.72 |
slicing / DefOpt / cpu / PostRev |
0.000017769 s |
0.000011488679992908146 s |
1.55 |
slicing / DefOpt / cpu / BothRev |
0.000018353 s |
0.00001066072003595764 s |
1.72 |
slicing / IDefOpt / cpu / PreRev |
0.000017675 s |
0.000010654759971657768 s |
1.66 |
slicing / IDefOpt / cpu / PostRev |
0.000017442 s |
0.000011159800033055945 s |
1.56 |
slicing / IDefOpt / cpu / BothRev |
0.000017743 s |
0.000010670940046111356 s |
1.66 |
sum / JaXPipe / cpu / Primal |
0.000009703159958007745 s |
0.000008534619983038283 s |
1.14 |
sum / Jax / cpu / Primal |
0.000009331719993497246 s |
0.00000848227999995288 s |
1.10 |
sum / HLOOpt / cpu / Primal |
0.000010051419976662146 s |
0.00000842622001982818 s |
1.19 |
sum / PartOpt / cpu / Primal |
0.000008696079967194236 s |
0.000008917679979276727 s |
0.98 |
sum / IPartOpt / cpu / Primal |
0.0000098646600417851 s |
0.000008620339958724798 s |
1.14 |
sum / DefOpt / cpu / Primal |
0.000008684100012033013 s |
0.00000882572004229587 s |
0.98 |
sum / IDefOpt / cpu / Primal |
0.000008947900059865788 s |
0.000008002759977898677 s |
1.12 |
sum / JaXPipe / cpu / Forward |
0.000013199400000303285 s |
0.000012622960002772744 s |
1.05 |
sum / Jax / cpu / Forward |
0.000013182119982957374 s |
0.000012528960032796022 s |
1.05 |
sum / HLOOpt / cpu / Forward |
0.000013430360031634336 s |
0.00001259522004147584 s |
1.07 |
sum / PartOpt / cpu / Forward |
0.00001275664002605481 s |
0.000012516680026237735 s |
1.02 |
sum / IPartOpt / cpu / Forward |
0.000013174080013413914 s |
0.000012727580005957862 s |
1.04 |
sum / DefOpt / cpu / Forward |
0.000012715920092887243 s |
0.0000123404800069693 s |
1.03 |
sum / IDefOpt / cpu / Forward |
0.000012533579938462936 s |
0.000012064680004186811 s |
1.04 |
sum / JaXPipe / cpu / PreRev |
0.000012346200001047692 s |
0.000012319859997660388 s |
1.00 |
sum / JaXPipe / cpu / PostRev |
0.000012574120046338066 s |
0.000012000920023638172 s |
1.05 |
sum / JaXPipe / cpu / BothRev |
0.000012460000052669784 s |
0.00001183335996756796 s |
1.05 |
sum / Jax / cpu / BothRev |
0.000012215100032335613 s |
0.000011508679990583917 s |
1.06 |
sum / HLOOpt / cpu / PreRev |
0.000012884199950349284 s |
0.000012048360040353146 s |
1.07 |
sum / HLOOpt / cpu / PostRev |
0.000014180940015648955 s |
0.000014090959984969233 s |
1.01 |
sum / HLOOpt / cpu / BothRev |
0.000012211080011184095 s |
0.0000120684399826132 s |
1.01 |
sum / PartOpt / cpu / PreRev |
0.000011928680069104302 s |
0.00001170439998531947 s |
1.02 |
sum / PartOpt / cpu / PostRev |
0.000012329819892329397 s |
0.000012685339997915436 s |
0.97 |
sum / PartOpt / cpu / BothRev |
0.000012584139913087713 s |
0.000011704099997587036 s |
1.08 |
sum / IPartOpt / cpu / PreRev |
0.000012323919982009102 s |
0.000011762620006265934 s |
1.05 |
sum / IPartOpt / cpu / PostRev |
0.000012879819914815016 s |
0.000011630520029939362 s |
1.11 |
sum / IPartOpt / cpu / BothRev |
0.000012425980021362193 s |
0.000011163259978275164 s |
1.11 |
sum / DefOpt / cpu / PreRev |
0.00001232571996297338 s |
0.000011822399965240038 s |
1.04 |
sum / DefOpt / cpu / PostRev |
0.000012890540037915344 s |
0.000011481779956739049 s |
1.12 |
sum / DefOpt / cpu / BothRev |
0.000012246999958733796 s |
0.00001181014000394498 s |
1.04 |
sum / IDefOpt / cpu / PreRev |
0.00001259490003576502 s |
0.00001237589999618649 s |
1.02 |
sum / IDefOpt / cpu / PostRev |
0.000012125120010750832 s |
0.00001125832002799143 s |
1.08 |
sum / IDefOpt / cpu / BothRev |
0.000012409059945639457 s |
0.000011904700031664106 s |
1.04 |
sum / JaXPipe / cuda / Primal |
0.000002463 s |
||
sum / Jax / cuda / Primal |
0.000002463 s |
||
sum / HLOOpt / cuda / Primal |
0.000002463 s |
||
sum / PartOpt / cuda / Primal |
0.000002463 s |
||
sum / IPartOpt / cuda / Primal |
0.000002463 s |
||
sum / DefOpt / cuda / Primal |
0.000002464 s |
||
sum / IDefOpt / cuda / Primal |
0.000002463 s |
||
sum / JaXPipe / cuda / Forward |
0.000011392 s |
||
sum / Jax / cuda / Forward |
0.000011104 s |
||
sum / HLOOpt / cuda / Forward |
0.000011136 s |
||
sum / PartOpt / cuda / Forward |
0.000010848 s |
||
sum / IPartOpt / cuda / Forward |
0.000011296 s |
||
sum / DefOpt / cuda / Forward |
0.000011136 s |
||
sum / IDefOpt / cuda / Forward |
0.000010912 s |
||
sum / JaXPipe / cuda / PreRev |
0.000010656 s |
||
sum / JaXPipe / cuda / PostRev |
0.000010464 s |
||
sum / JaXPipe / cuda / BothRev |
0.000010304 s |
||
sum / Jax / cuda / BothRev |
0.00001056 s |
||
sum / HLOOpt / cuda / PreRev |
0.00001024 s |
||
sum / HLOOpt / cuda / PostRev |
0.000010336 s |
||
sum / HLOOpt / cuda / BothRev |
0.000010144 s |
||
sum / PartOpt / cuda / PreRev |
0.000011008 s |
||
sum / PartOpt / cuda / PostRev |
0.000010369 s |
||
sum / PartOpt / cuda / BothRev |
0.000010752 s |
||
sum / IPartOpt / cuda / PreRev |
0.000010593 s |
||
sum / IPartOpt / cuda / PostRev |
0.000010272 s |
||
sum / IPartOpt / cuda / BothRev |
0.000010433 s |
||
sum / DefOpt / cuda / PreRev |
0.000010688 s |
||
sum / DefOpt / cuda / PostRev |
0.000011552 s |
||
sum / DefOpt / cuda / BothRev |
0.000010784 s |
||
sum / IDefOpt / cuda / PreRev |
0.000012096 s |
||
sum / IDefOpt / cuda / PostRev |
0.000010592 s |
||
sum / IDefOpt / cuda / BothRev |
0.000010495 s |
||
sum / JaXPipe / tpu / Primal |
5.106499999999999e-7 s |
5.103250000000001e-7 s |
1.00 |
sum / Jax / tpu / Primal |
5.47425e-7 s |
5.467e-7 s |
1.00 |
sum / HLOOpt / tpu / Primal |
5.104e-7 s |
5.1015e-7 s |
1.00 |
sum / PartOpt / tpu / Primal |
5.47525e-7 s |
5.47125e-7 s |
1.00 |
sum / IPartOpt / tpu / Primal |
5.104499999999999e-7 s |
5.10225e-7 s |
1.00 |
sum / DefOpt / tpu / Primal |
5.4745e-7 s |
5.4695e-7 s |
1.00 |
sum / IDefOpt / tpu / Primal |
5.108499999999999e-7 s |
5.106499999999999e-7 s |
1.00 |
sum / JaXPipe / tpu / Forward |
0.000001569275 s |
0.0000015479999999999998 s |
1.01 |
sum / Jax / tpu / Forward |
0.00000151085 s |
0.000001497925 s |
1.01 |
sum / HLOOpt / tpu / Forward |
0.000001532775 s |
0.0000015321 s |
1.00 |
sum / PartOpt / tpu / Forward |
0.0000014927000000000003 s |
0.0000014986250000000002 s |
1.00 |
sum / IPartOpt / tpu / Forward |
0.000001535425 s |
0.0000015334750000000002 s |
1.00 |
sum / DefOpt / tpu / Forward |
0.0000015002750000000002 s |
0.000001498375 s |
1.00 |
sum / IDefOpt / tpu / Forward |
0.0000015358 s |
0.0000015289249999999995 s |
1.00 |
sum / JaXPipe / tpu / PreRev |
0.000001045525 s |
0.000001050825 s |
0.99 |
sum / JaXPipe / tpu / PostRev |
0.00000108545 s |
0.00000109645 s |
0.99 |
sum / JaXPipe / tpu / BothRev |
0.000001051075 s |
0.000001054325 s |
1.00 |
sum / Jax / tpu / BothRev |
0.000001092725 s |
0.000001092325 s |
1.00 |
sum / HLOOpt / tpu / PreRev |
0.000001048825 s |
0.00000105305 s |
1.00 |
sum / HLOOpt / tpu / PostRev |
0.00000108435 s |
0.000001093525 s |
0.99 |
sum / HLOOpt / tpu / BothRev |
0.000001057975 s |
0.00000105495 s |
1.00 |
sum / PartOpt / tpu / PreRev |
0.0000010865 s |
0.0000010913 s |
1.00 |
sum / PartOpt / tpu / PostRev |
0.000001051125 s |
0.00000104745 s |
1.00 |
sum / PartOpt / tpu / BothRev |
0.0000010857500000000002 s |
0.0000010863 s |
1.00 |
sum / IPartOpt / tpu / PreRev |
0.0000010480000000000002 s |
0.0000010546 s |
0.99 |
sum / IPartOpt / tpu / PostRev |
0.0000010873499999999998 s |
0.00000109015 s |
1.00 |
sum / IPartOpt / tpu / BothRev |
0.0000010501249999999998 s |
0.0000010541250000000005 s |
1.00 |
sum / DefOpt / tpu / PreRev |
0.000001086075 s |
0.00000108945 s |
1.00 |
sum / DefOpt / tpu / PostRev |
0.0000010468 s |
0.0000010604500000000002 s |
0.99 |
sum / DefOpt / tpu / BothRev |
0.0000010877 s |
0.00000108885 s |
1.00 |
sum / IDefOpt / tpu / PreRev |
0.000001046125 s |
0.0000010493 s |
1.00 |
sum / IDefOpt / tpu / PostRev |
0.000001085225 s |
0.000001086325 s |
1.00 |
sum / IDefOpt / tpu / BothRev |
0.000001046225 s |
0.00000104625 s |
1.00 |
sum / JaXPipe / cpu / Primal |
0.000014969 s |
0.000008534619983038283 s |
1.75 |
sum / Jax / cpu / Primal |
0.000014789 s |
0.00000848227999995288 s |
1.74 |
sum / HLOOpt / cpu / Primal |
0.000014407 s |
0.00000842622001982818 s |
1.71 |
sum / PartOpt / cpu / Primal |
0.000014842 s |
0.000008917679979276727 s |
1.66 |
sum / IPartOpt / cpu / Primal |
0.00001458 s |
0.000008620339958724798 s |
1.69 |
sum / DefOpt / cpu / Primal |
0.000015223 s |
0.00000882572004229587 s |
1.72 |
sum / IDefOpt / cpu / Primal |
0.000015297 s |
0.000008002759977898677 s |
1.91 |
sum / JaXPipe / cpu / Forward |
0.000021089 s |
0.000012622960002772744 s |
1.67 |
sum / Jax / cpu / Forward |
0.000020419 s |
0.000012528960032796022 s |
1.63 |
sum / HLOOpt / cpu / Forward |
0.000020679 s |
0.00001259522004147584 s |
1.64 |
sum / PartOpt / cpu / Forward |
0.000020356 s |
0.000012516680026237735 s |
1.63 |
sum / IPartOpt / cpu / Forward |
0.000019847 s |
0.000012727580005957862 s |
1.56 |
sum / DefOpt / cpu / Forward |
0.000020596 s |
0.0000123404800069693 s |
1.67 |
sum / IDefOpt / cpu / Forward |
0.000020513 s |
0.000012064680004186811 s |
1.70 |
sum / JaXPipe / cpu / PreRev |
0.000018943 s |
0.000012319859997660388 s |
1.54 |
sum / JaXPipe / cpu / PostRev |
0.00001969 s |
0.000012000920023638172 s |
1.64 |
sum / JaXPipe / cpu / BothRev |
0.000019041 s |
0.00001183335996756796 s |
1.61 |
sum / Jax / cpu / BothRev |
0.00001953 s |
0.000011508679990583917 s |
1.70 |
sum / HLOOpt / cpu / PreRev |
0.000019179 s |
0.000012048360040353146 s |
1.59 |
sum / HLOOpt / cpu / PostRev |
0.000019672 s |
0.000014090959984969233 s |
1.40 |
sum / HLOOpt / cpu / BothRev |
0.000019776 s |
0.0000120684399826132 s |
1.64 |
sum / PartOpt / cpu / PreRev |
0.000020115 s |
0.00001170439998531947 s |
1.72 |
sum / PartOpt / cpu / PostRev |
0.000019526 s |
0.000012685339997915436 s |
1.54 |
sum / PartOpt / cpu / BothRev |
0.000019612 s |
0.000011704099997587036 s |
1.68 |
sum / IPartOpt / cpu / PreRev |
0.000019538 s |
0.000011762620006265934 s |
1.66 |
sum / IPartOpt / cpu / PostRev |
0.000019527 s |
0.000011630520029939362 s |
1.68 |
sum / IPartOpt / cpu / BothRev |
0.000019783 s |
0.000011163259978275164 s |
1.77 |
sum / DefOpt / cpu / PreRev |
0.000019404000000000003 s |
0.000011822399965240038 s |
1.64 |
sum / DefOpt / cpu / PostRev |
0.000019425 s |
0.000011481779956739049 s |
1.69 |
sum / DefOpt / cpu / BothRev |
0.000019716 s |
0.00001181014000394498 s |
1.67 |
sum / IDefOpt / cpu / PreRev |
0.000019846 s |
0.00001237589999618649 s |
1.60 |
sum / IDefOpt / cpu / PostRev |
0.000019331 s |
0.00001125832002799143 s |
1.72 |
sum / IDefOpt / cpu / BothRev |
0.000018684 s |
0.000011904700031664106 s |
1.57 |
value_and_grad / JaXPipe / cpu / Primal |
0.000016176939989236417 s |
0.00001525831999060756 s |
1.06 |
value_and_grad / Jax / cpu / Primal |
0.00001554813999973703 s |
0.000015128719960557646 s |
1.03 |
value_and_grad / HLOOpt / cpu / Primal |
0.000015296199999284 s |
0.000014485420024357154 s |
1.06 |
value_and_grad / PartOpt / cpu / Primal |
0.000014763099970878102 s |
0.000015235659993777518 s |
0.97 |
value_and_grad / IPartOpt / cpu / Primal |
0.00001577749993884936 s |
0.000014750719947187462 s |
1.07 |
value_and_grad / DefOpt / cpu / Primal |
0.000015779440000187607 s |
0.000014715099978275247 s |
1.07 |
value_and_grad / IDefOpt / cpu / Primal |
0.000014645979990746128 s |
0.000014892859990141006 s |
0.98 |
value_and_grad / JaXPipe / cuda / Primal |
0.000034623000000000004 s |
||
value_and_grad / Jax / cuda / Primal |
0.00005264 s |
||
value_and_grad / HLOOpt / cuda / Primal |
0.000033696 s |
||
value_and_grad / PartOpt / cuda / Primal |
0.00003424 s |
||
value_and_grad / IPartOpt / cuda / Primal |
0.00003472 s |
||
value_and_grad / DefOpt / cuda / Primal |
0.000038688 s |
||
value_and_grad / IDefOpt / cuda / Primal |
0.000038719000000000007 s |
||
value_and_grad / JaXPipe / tpu / Primal |
0 s |
0 s |
1 |
value_and_grad / Jax / tpu / Primal |
0 s |
0 s |
1 |
value_and_grad / HLOOpt / tpu / Primal |
0 s |
0 s |
1 |
value_and_grad / PartOpt / tpu / Primal |
0 s |
0 s |
1 |
value_and_grad / IPartOpt / tpu / Primal |
0 s |
0 s |
1 |
value_and_grad / DefOpt / tpu / Primal |
0 s |
0 s |
1 |
value_and_grad / IDefOpt / tpu / Primal |
0 s |
0 s |
1 |
value_and_grad / JaXPipe / cpu / Primal |
0.000023745 s |
0.00001525831999060756 s |
1.56 |
value_and_grad / Jax / cpu / Primal |
0.000023209 s |
0.000015128719960557646 s |
1.53 |
value_and_grad / HLOOpt / cpu / Primal |
0.000023165000000000003 s |
0.000014485420024357154 s |
1.60 |
value_and_grad / PartOpt / cpu / Primal |
0.000023578 s |
0.000015235659993777518 s |
1.55 |
value_and_grad / IPartOpt / cpu / Primal |
0.000023159 s |
0.000014750719947187462 s |
1.57 |
value_and_grad / DefOpt / cpu / Primal |
0.000023249 s |
0.000014715099978275247 s |
1.58 |
value_and_grad / IDefOpt / cpu / Primal |
0.000023265 s |
0.000014892859990141006 s |
1.56 |
jaxmd20 / JaXPipe / cuda / Primal |
0.001455192 s |
||
jaxmd20 / Jax / cuda / Primal |
0.001440409 s |
||
jaxmd20 / HLOOpt / cuda / Primal |
0.001354937 s |
||
jaxmd20 / PartOpt / cuda / Primal |
0.001331064 s |
||
jaxmd20 / IPartOpt / cuda / Primal |
0.0013646959999999 s |
||
jaxmd20 / DefOpt / cuda / Primal |
0.000944188 s |
||
jaxmd20 / IDefOpt / cuda / Primal |
0.000974971 s |
||
jaxmd20 / JaXPipe / cuda / Forward |
0.001631991 s |
||
jaxmd20 / Jax / cuda / Forward |
0.00187743 s |
||
jaxmd20 / HLOOpt / cuda / Forward |
0.001712792 s |
||
jaxmd20 / PartOpt / cuda / Forward |
0.001714423 s |
||
jaxmd20 / IPartOpt / cuda / Forward |
0.001740919 s |
||
jaxmd20 / DefOpt / cuda / Forward |
0.001707318 s |
||
jaxmd20 / IDefOpt / cuda / Forward |
0.001723735 s |
||
jaxmd20 / JaXPipe / cuda / PreRev |
0.002773201 s |
||
jaxmd20 / JaXPipe / cuda / PostRev |
0.005449763 s |
||
jaxmd20 / JaXPipe / cuda / BothRev |
0.002788913 s |
||
jaxmd20 / Jax / cuda / BothRev |
0.0054343709999999 s |
||
jaxmd20 / HLOOpt / cuda / PreRev |
0.0028404 s |
||
jaxmd20 / HLOOpt / cuda / PostRev |
0.005525922 s |
||
jaxmd20 / HLOOpt / cuda / BothRev |
0.00280357 s |
||
jaxmd20 / PartOpt / cuda / PreRev |
0.0028976479999999 s |
||
jaxmd20 / PartOpt / cuda / PostRev |
0.005594082 s |
||
jaxmd20 / PartOpt / cuda / BothRev |
0.0028265129999999 s |
||
jaxmd20 / IPartOpt / cuda / PreRev |
0.002937712 s |
||
jaxmd20 / IPartOpt / cuda / PostRev |
0.005611298 s |
||
jaxmd20 / IPartOpt / cuda / BothRev |
0.0028372 s |
||
jaxmd20 / DefOpt / cuda / PreRev |
0.002918064 s |
||
jaxmd20 / DefOpt / cuda / PostRev |
0.00281928 s |
||
jaxmd20 / DefOpt / cuda / BothRev |
0.002826097 s |
||
jaxmd20 / IDefOpt / cuda / PreRev |
0.002897969 s |
||
jaxmd20 / IDefOpt / cuda / PostRev |
0.00233426 s |
||
jaxmd20 / IDefOpt / cuda / BothRev |
0.002831697 s |
||
jaxmd20 / JaXPipe / tpu / Primal |
0.009277274375 s |
0.009285324375 s |
1.00 |
jaxmd20 / Jax / tpu / Primal |
0.0092756325 s |
0.0092787793749999 s |
1.00 |
jaxmd20 / HLOOpt / tpu / Primal |
0.009160073125 s |
0.009167460625 s |
1.00 |
jaxmd20 / PartOpt / tpu / Primal |
0.00919567125 s |
0.009200495625 s |
1.00 |
jaxmd20 / IPartOpt / tpu / Primal |
0.00919831625 s |
0.0091986468749999 s |
1.00 |
jaxmd20 / DefOpt / tpu / Primal |
0.008805305625 s |
0.0087980087499999 s |
1.00 |
jaxmd20 / IDefOpt / tpu / Primal |
0.00869804 s |
0.00869818375 s |
1.00 |
jaxmd20 / JaXPipe / tpu / Forward |
0.017413935625 s |
0.0174107325 s |
1.00 |
jaxmd20 / Jax / tpu / Forward |
0.0187338025 s |
0.0187580718749999 s |
1.00 |
jaxmd20 / HLOOpt / tpu / Forward |
0.017401165625 s |
0.017402763125 s |
1.00 |
jaxmd20 / PartOpt / tpu / Forward |
0.0174205006249999 s |
0.017414855625 s |
1.00 |
jaxmd20 / IPartOpt / tpu / Forward |
0.017415825 s |
0.017412161875 s |
1.00 |
jaxmd20 / DefOpt / tpu / Forward |
0.017415030625 s |
0.0174230925 s |
1.00 |
jaxmd20 / IDefOpt / tpu / Forward |
0.01741135375 s |
0.017409656875 s |
1.00 |
jaxmd20 / JaXPipe / tpu / PreRev |
0.025449028125 s |
0.025466438125 s |
1.00 |
jaxmd20 / JaXPipe / tpu / PostRev |
0.02186584125 s |
0.02187530625 s |
1.00 |
jaxmd20 / JaXPipe / tpu / BothRev |
0.02543556625 s |
0.02545988875 s |
1.00 |
jaxmd20 / Jax / tpu / BothRev |
0.0218562487499999 s |
0.021869294375 s |
1.00 |
jaxmd20 / HLOOpt / tpu / PreRev |
0.02555658 s |
0.025579618125 s |
1.00 |
jaxmd20 / HLOOpt / tpu / PostRev |
0.020704740625 s |
0.02071224125 s |
1.00 |
jaxmd20 / HLOOpt / tpu / BothRev |
0.02565634625 s |
0.025692965 s |
1.00 |
jaxmd20 / PartOpt / tpu / PreRev |
0.025476410625 s |
0.02546148625 s |
1.00 |
jaxmd20 / PartOpt / tpu / PostRev |
0.02151089875 s |
0.021266319375 s |
1.01 |
jaxmd20 / PartOpt / tpu / BothRev |
0.025579195 s |
0.02556031 s |
1.00 |
jaxmd20 / IPartOpt / tpu / PreRev |
0.025449066875 s |
0.0254667925 s |
1.00 |
jaxmd20 / IPartOpt / tpu / PostRev |
0.021525515625 s |
0.0215107793749999 s |
1.00 |
jaxmd20 / IPartOpt / tpu / BothRev |
0.025544875 s |
0.0255709374999999 s |
1.00 |
jaxmd20 / DefOpt / tpu / PreRev |
0.0254830025 s |
0.02546553125 s |
1.00 |
jaxmd20 / DefOpt / tpu / PostRev |
0.018809855625 s |
0.0188311475 s |
1.00 |
jaxmd20 / DefOpt / tpu / BothRev |
0.02557464875 s |
0.02555516125 s |
1.00 |
jaxmd20 / IDefOpt / tpu / PreRev |
0.0254569825 s |
0.0254661425 s |
1.00 |
jaxmd20 / IDefOpt / tpu / PostRev |
0.0183353737499999 s |
0.018310786875 s |
1.00 |
jaxmd20 / IDefOpt / tpu / BothRev |
0.02554438 s |
0.0255700231249999 s |
1.00 |
jaxmd40 / JaXPipe / cpu / Primal |
0.0754830029999999 s |
0.073993308 s |
1.02 |
jaxmd40 / Jax / cpu / Primal |
0.071149994 s |
0.073613883 s |
0.97 |
jaxmd40 / HLOOpt / cpu / Primal |
0.091414001 s |
0.107396891 s |
0.85 |
jaxmd40 / PartOpt / cpu / Primal |
0.0721791589999999 s |
0.081463886 s |
0.89 |
jaxmd40 / IPartOpt / cpu / Primal |
0.07307664 s |
0.080921025 s |
0.90 |
jaxmd40 / DefOpt / cpu / Primal |
0.084365976 s |
0.1020056 s |
0.83 |
jaxmd40 / IDefOpt / cpu / Primal |
0.089137949 s |
0.109501773 s |
0.81 |
jaxmd40 / JaXPipe / cpu / Forward |
0.162158269 s |
0.189545239 s |
0.86 |
jaxmd40 / Jax / cpu / Forward |
0.090833853 s |
0.10736556 s |
0.85 |
jaxmd40 / HLOOpt / cpu / Forward |
0.168094777 s |
0.189890195 s |
0.89 |
jaxmd40 / PartOpt / cpu / Forward |
0.161890177 s |
0.190169587 s |
0.85 |
jaxmd40 / IPartOpt / cpu / Forward |
0.176897202 s |
0.2004099889999999 s |
0.88 |
jaxmd40 / DefOpt / cpu / Forward |
0.1607591109999999 s |
0.192691593 s |
0.83 |
jaxmd40 / IDefOpt / cpu / Forward |
0.175798451 s |
0.1917536279999999 s |
0.92 |
jaxmd40 / JaXPipe / cpu / PreRev |
0.228621829 s |
0.2491823819999999 s |
0.92 |
jaxmd40 / JaXPipe / cpu / PostRev |
0.143784428 s |
0.159837425 s |
0.90 |
jaxmd40 / JaXPipe / cpu / BothRev |
0.227943803 s |
0.251328121 s |
0.91 |
jaxmd40 / Jax / cpu / BothRev |
0.142277806 s |
0.152441235 s |
0.93 |
jaxmd40 / HLOOpt / cpu / PreRev |
0.227200626 s |
0.247478724 s |
0.92 |
jaxmd40 / HLOOpt / cpu / PostRev |
0.1807468639999999 s |
0.204412115 s |
0.88 |
jaxmd40 / HLOOpt / cpu / BothRev |
0.2598833619999999 s |
0.296773168 s |
0.88 |
jaxmd40 / PartOpt / cpu / PreRev |
0.241460985 s |
0.264468797 s |
0.91 |
jaxmd40 / PartOpt / cpu / PostRev |
0.1400372299999999 s |
0.1569377059999999 s |
0.89 |
jaxmd40 / PartOpt / cpu / BothRev |
0.273114126 s |
0.285594582 s |
0.96 |
jaxmd40 / IPartOpt / cpu / PreRev |
0.220240829 s |
0.256192396 s |
0.86 |
jaxmd40 / IPartOpt / cpu / PostRev |
0.137621319 s |
0.149952415 s |
0.92 |
jaxmd40 / IPartOpt / cpu / BothRev |
0.2499928669999999 s |
0.276281015 s |
0.90 |
jaxmd40 / DefOpt / cpu / PreRev |
0.224390852 s |
0.256144481 s |
0.88 |
jaxmd40 / DefOpt / cpu / PostRev |
0.17791083 s |
0.19889593 s |
0.89 |
jaxmd40 / DefOpt / cpu / BothRev |
0.254190759 s |
0.285229248 s |
0.89 |
jaxmd40 / IDefOpt / cpu / PreRev |
0.232619737 s |
0.255377117 s |
0.91 |
jaxmd40 / IDefOpt / cpu / PostRev |
0.1792368899999999 s |
0.214632765 s |
0.84 |
jaxmd40 / IDefOpt / cpu / BothRev |
0.253944873 s |
0.2347946269999999 s |
1.08 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / JaXPipe / cuda / Primal |
1.701141962 s |
||
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / Jax / cuda / Primal |
1.704263293 s |
||
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / HLOOpt / cuda / Primal |
1.716250644 s |
||
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / PartOpt / cuda / Primal |
1.695894459 s |
||
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / IPartOpt / cuda / Primal |
1.694047356 s |
||
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / DefOpt / cuda / Primal |
1.665130931 s |
||
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / IDefOpt / cuda / Primal |
1.911048516 s |
||
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / JaXPipe / tpu / Primal |
3.038812840625 s |
3.038831480625 s |
1.00 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / Jax / tpu / Primal |
3.0394444575 s |
3.0393183325 s |
1.00 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / HLOOpt / tpu / Primal |
3.121668325625 s |
3.121700555625 s |
1.00 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / PartOpt / tpu / Primal |
3.060118801875 s |
3.06013362875 s |
1.00 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / IPartOpt / tpu / Primal |
3.060385166875 s |
3.0603782275 s |
1.00 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / DefOpt / tpu / Primal |
2.1025125950000003 s |
2.1024763275 s |
1.00 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / IDefOpt / tpu / Primal |
4.3564873175 s |
4.35644791125 s |
1.00 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / JaXPipe / cpu / Primal |
6.332010231 s |
6.952634694 s |
0.91 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / Jax / cpu / Primal |
6.308204574 s |
6.998000881 s |
0.90 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / HLOOpt / cpu / Primal |
6.230678027 s |
6.8480217 s |
0.91 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / PartOpt / cpu / Primal |
6.471540448 s |
7.121108887 s |
0.91 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / IPartOpt / cpu / Primal |
6.308579509 s |
6.987212517 s |
0.90 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / DefOpt / cpu / Primal |
2.532982659 s |
2.837167547 s |
0.89 |
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / IDefOpt / cpu / Primal |
6.879658571 s |
7.750080211999999 s |
0.89 |
This comment was automatically generated by workflow using github-action-benchmark.
| let summary = "Equivalent to " "`MPI_Comm_rank(MPI_COMM_WORLD, &rank)`"; | ||
|
|
||
| let arguments = ( | ||
| ins AnyTensor : $inrank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this shouldn't require an inrank, it can just return the rank, same with size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I removed the inrank arg from comm rank and updated the lowering pass. In the new lowering pass, I now create a constant tensor to hold the result which I then pass into the wrapper function, which allows me to still use output operand aliases to get the result back.
Enzyme-JAX/src/enzyme_ad/jax/Passes/LowerEnzymeXLAMPI.cpp
Lines 129 to 134 in 9358d05
| // Create a constant tensor to hold the result | |
| auto tensorType = llvm::cast<RankedTensorType>(op->getResultTypes()[0]); | |
| auto constantAttr = DenseIntElementsAttr::get(tensorType, | |
| ArrayRef<int32_t>{-1}); | |
| Value constantTensor = rewriter.create<stablehlo::ConstantOp>( | |
| op.getLoc(), tensorType, constantAttr); |
Is this an ok approach? If so, I'll go ahead and change all the other ops similarly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I removed inrank, insize, inrequest from comm_rank, comm_size, isend, irecv.
Now, I still have recv, irecv and allreduce taking in an inbuf and outputting an outbuf. Is this design ok, or should I similarly remove the inbufs from those too?
Adds EnzymeXLA Ops and lowering to LLVM passes for the following MPI commands:
MPI_Comm_rank
MPI_Comm_size
MPI_Barrier
MPI_Send
MPI_Recv
MPI_Isend
MPI_Irecv
MPI_Wait
MPI_Allreduce
Note: Currently MPI_COMM_WORLD is the only communicator supported. However, I did come up with a solution to handle the Datatype (and Op) for Isend/Irecv/Send/Recv (Allreduce), which we had previously had some discussions about. See comment below.