Skip to content

Conversation

@romanlee
Copy link
Collaborator

Adds EnzymeXLA Ops and lowering to LLVM passes for the following MPI commands:

  • MPI_Comm_rank

  • MPI_Comm_size

  • MPI_Barrier

  • MPI_Send

  • MPI_Recv

  • MPI_Isend

  • MPI_Irecv

  • MPI_Wait

  • MPI_Allreduce

Note: Currently MPI_COMM_WORLD is the only communicator supported. However, I did come up with a solution to handle the Datatype (and Op) for Isend/Irecv/Send/Recv (Allreduce), which we had previously had some discussions about. See comment below.

Comment on lines +1187 to +1188
StrAttr:$datatype,
StrAttr:$op
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on this solution to handle Datatype (and Op) for Isend/Irecv/Send/Recv (Allreduce)? We still register the symbol and get it's name on the Reactant side like before, then we just pass the symbol name in to the Op via an StrAttr. This gives us all the support for the various Datatypes x Ops combos that we had before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented on datatype above, for op we can create an enum and put add/max/min/etc in it [and do the corresponding lowering]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on what you mean here? What do you want the op arg to be instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like this

def EnzymeXLA_LapackUplo : I32EnumAttr<"LapackUplo",

TensorOf<[I32]> : $count,
TensorOf<[I32]> : $dest,
TensorOf<[I32]> : $tag,
StrAttr:$datatype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ftynse We can have this be a TypeAttr [and use the elementtype]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do it that way too. Although that would mean duplicating a lot of logic that we currently rely on MPI.jl for. Would the ementtype approach be better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on what you mean here as well?

Comment on lines 1156 to 1166
ins AnyTensor : $inbuf,
TensorOf<[I32]> : $count,
TensorOf<[I32]> : $source,
TensorOf<[I32]> : $tag,
TensorOf<[I64]> : $inrequest,
StrAttr:$datatype
);

let results = (
outs AnyTensor : $outbuf,
TensorOf<[I64]> : $outrequest
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about this inbuf/outbuf inrequest/outrequest design here and in other Ops. It allows us to reproduce exactly the same IR that we had before (ie, via manual lowering from Reactant). But the more I've thought about this the more it seems like a weird design for certain Ops/arguments. Eg, here it maybe makes sense for the buffers, but not the requests? And for Comm_rank, it seems like it would make more sense for it to be a pure function, where we don't take in a rank, only output one. However, doing that would require some changes to the way we call this on the Reactant side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should only return a request, not take as an input [since its write only]

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnzymeJAX Benchmarks

Details
Benchmark suite Current: 2607329 Previous: d71d865 Ratio
actmtch / JaXPipe / cpu / Primal 0.000007801339961588383 s 0.000006897379980728147 s 1.13
actmtch / Jax / cpu / Primal 0.000006982140002946835 s 0.0000069136199908825804 s 1.01
actmtch / HLOOpt / cpu / Primal 0.000009364600009575952 s 0.00000841961998048646 s 1.11
actmtch / PartOpt / cpu / Primal 0.000008105779961624649 s 0.000006819640047979192 s 1.19
actmtch / IPartOpt / cpu / Primal 0.000008338160005223472 s 0.000007617820037921774 s 1.09
actmtch / DefOpt / cpu / Primal 0.00000885656003447366 s 0.000007364539997070097 s 1.20
actmtch / IDefOpt / cpu / Primal 0.000008580240046285325 s 0.000008136800051943282 s 1.05
actmtch / JaXPipe / cpu / Forward 0.00001248672004294349 s 0.0000118703200314485 s 1.05
actmtch / Jax / cpu / Forward 0.000011506779974297388 s 0.00001041201993757568 s 1.11
actmtch / HLOOpt / cpu / Forward 0.000013279659942782018 s 0.000012178740007584564 s 1.09
actmtch / PartOpt / cpu / Forward 0.000012227019979036414 s 0.000011013440016540698 s 1.11
actmtch / IPartOpt / cpu / Forward 0.0000126099599947338 s 0.000011710939988915924 s 1.08
actmtch / DefOpt / cpu / Forward 0.000012218400024721632 s 0.00001178708000225015 s 1.04
actmtch / IDefOpt / cpu / Forward 0.00001247872005478712 s 0.000011781220046032103 s 1.06
actmtch / JaXPipe / cpu / PreRev 0.000011843380016216542 s 0.00001191142000607215 s 0.99
actmtch / JaXPipe / cpu / PostRev 0.000012083959991286974 s 0.000011489859980429174 s 1.05
actmtch / JaXPipe / cpu / BothRev 0.000013053219918219838 s 0.000012797200070053804 s 1.02
actmtch / Jax / cpu / BothRev 0.000010819220024131938 s 0.000011320319972583092 s 0.96
actmtch / HLOOpt / cpu / PreRev 0.000012794480044249212 s 0.000012538219962152651 s 1.02
actmtch / HLOOpt / cpu / PostRev 0.000014550819960277297 s 0.000013989540020702408 s 1.04
actmtch / HLOOpt / cpu / BothRev 0.000012774460028595058 s 0.000012801459997717756 s 1.00
actmtch / PartOpt / cpu / PreRev 0.000011288400019111576 s 0.000011282919986115304 s 1.00
actmtch / PartOpt / cpu / PostRev 0.000011578120047488482 s 0.000010915479970208251 s 1.06
actmtch / PartOpt / cpu / BothRev 0.00001327068002865417 s 0.000012874100011686096 s 1.03
actmtch / IPartOpt / cpu / PreRev 0.000012146540084359004 s 0.000012087639988749289 s 1.00
actmtch / IPartOpt / cpu / PostRev 0.000011811599961220054 s 0.00001114202000280784 s 1.06
actmtch / IPartOpt / cpu / BothRev 0.000012769280056090791 s 0.00001213371994708723 s 1.05
actmtch / DefOpt / cpu / PreRev 0.000012098220013285754 s 0.000011616179999691667 s 1.04
actmtch / DefOpt / cpu / PostRev 0.000012647039984585715 s 0.000012520880018200842 s 1.01
actmtch / DefOpt / cpu / BothRev 0.000012569259997690096 s 0.000012740499987557997 s 0.99
actmtch / IDefOpt / cpu / PreRev 0.000011571579962037504 s 0.000011899540022568544 s 0.97
actmtch / IDefOpt / cpu / PostRev 0.0000132195000333013 s 0.000012510440046753502 s 1.06
actmtch / IDefOpt / cpu / BothRev 0.000012532540022220929 s 0.000012450659996829926 s 1.01
actmtch / JaXPipe / cuda / Primal 0.0000024 s
actmtch / Jax / cuda / Primal 0.0000024 s
actmtch / HLOOpt / cuda / Primal 0.0000024 s
actmtch / PartOpt / cuda / Primal 0.0000024 s
actmtch / IPartOpt / cuda / Primal 0.0000024 s
actmtch / DefOpt / cuda / Primal 0.0000024 s
actmtch / IDefOpt / cuda / Primal 0.0000024 s
actmtch / JaXPipe / cuda / Forward 0.000010688 s
actmtch / Jax / cuda / Forward 0.000010208 s
actmtch / HLOOpt / cuda / Forward 0.000010688 s
actmtch / PartOpt / cuda / Forward 0.000010816 s
actmtch / IPartOpt / cuda / Forward 0.000010944 s
actmtch / DefOpt / cuda / Forward 0.000010784 s
actmtch / IDefOpt / cuda / Forward 0.000010943 s
actmtch / JaXPipe / cuda / PreRev 0.000013408 s
actmtch / JaXPipe / cuda / PostRev 0.000011103 s
actmtch / JaXPipe / cuda / BothRev 0.000010784 s
actmtch / Jax / cuda / BothRev 0.000013312 s
actmtch / HLOOpt / cuda / PreRev 0.000010944 s
actmtch / HLOOpt / cuda / PostRev 0.000013568 s
actmtch / HLOOpt / cuda / BothRev 0.000011071 s
actmtch / PartOpt / cuda / PreRev 0.000010976 s
actmtch / PartOpt / cuda / PostRev 0.000010753 s
actmtch / PartOpt / cuda / BothRev 0.000010976 s
actmtch / IPartOpt / cuda / PreRev 0.000010528 s
actmtch / IPartOpt / cuda / PostRev 0.000010816 s
actmtch / IPartOpt / cuda / BothRev 0.000010816 s
actmtch / DefOpt / cuda / PreRev 0.000010945 s
actmtch / DefOpt / cuda / PostRev 0.000010944 s
actmtch / DefOpt / cuda / BothRev 0.000010527 s
actmtch / IDefOpt / cuda / PreRev 0.0000112 s
actmtch / IDefOpt / cuda / PostRev 0.00001104 s
actmtch / IDefOpt / cuda / BothRev 0.00001072 s
actmtch / JaXPipe / tpu / Primal 5.63425e-7 s 5.63175e-7 s 1.00
actmtch / Jax / tpu / Primal 5.967999999999999e-7 s 5.968500000000001e-7 s 1.00
actmtch / HLOOpt / tpu / Primal 0.000002093375 s 0.0000020960500000000005 s 1.00
actmtch / PartOpt / tpu / Primal 5.9695e-7 s 5.96575e-7 s 1.00
actmtch / IPartOpt / tpu / Primal 5.527e-7 s 5.53025e-7 s 1.00
actmtch / DefOpt / tpu / Primal 0.0000021629 s 0.0000021589500000000003 s 1.00
actmtch / IDefOpt / tpu / Primal 0.0000020946250000000003 s 0.000002115 s 0.99
actmtch / JaXPipe / tpu / Forward 0.00000382735 s 0.00000382905 s 1.00
actmtch / Jax / tpu / Forward 0.000001215425 s 0.000001215 s 1.00
actmtch / HLOOpt / tpu / Forward 0.00000392895 s 0.0000039365 s 1.00
actmtch / PartOpt / tpu / Forward 0.0000039112 s 0.000003928025 s 1.00
actmtch / IPartOpt / tpu / Forward 0.0000039404 s 0.0000039343 s 1.00
actmtch / DefOpt / tpu / Forward 0.000003917225 s 0.000003908975 s 1.00
actmtch / IDefOpt / tpu / Forward 0.000003931325 s 0.000003937574999999999 s 1.00
actmtch / JaXPipe / tpu / PreRev 0.0000034757 s 0.000003483325 s 1.00
actmtch / JaXPipe / tpu / PostRev 0.0000016487 s 0.000001633125 s 1.01
actmtch / JaXPipe / tpu / BothRev 0.000003484175 s 0.0000034899249999999995 s 1.00
actmtch / Jax / tpu / BothRev 0.00000164205 s 0.000001637175 s 1.00
actmtch / HLOOpt / tpu / PreRev 0.00000349265 s 0.0000034652250000000004 s 1.01
actmtch / HLOOpt / tpu / PostRev 0.000003425475 s 0.00000341055 s 1.00
actmtch / HLOOpt / tpu / BothRev 0.000003471825 s 0.000003485675 s 1.00
actmtch / PartOpt / tpu / PreRev 0.000003418675 s 0.00000340875 s 1.00
actmtch / PartOpt / tpu / PostRev 0.000001597675 s 0.00000160075 s 1.00
actmtch / PartOpt / tpu / BothRev 0.00000341515 s 0.0000034111000000000003 s 1.00
actmtch / IPartOpt / tpu / PreRev 0.000003481825 s 0.00000347 s 1.00
actmtch / IPartOpt / tpu / PostRev 0.000001635875 s 0.000001652325 s 0.99
actmtch / IPartOpt / tpu / BothRev 0.000003471125 s 0.000003472225 s 1.00
actmtch / DefOpt / tpu / PreRev 0.0000034099 s 0.000003425575 s 1.00
actmtch / DefOpt / tpu / PostRev 0.00000342105 s 0.0000034118 s 1.00
actmtch / DefOpt / tpu / BothRev 0.0000034019500000000005 s 0.000003401575 s 1.00
actmtch / IDefOpt / tpu / PreRev 0.00000347575 s 0.0000034789750000000004 s 1.00
actmtch / IDefOpt / tpu / PostRev 0.0000034126 s 0.000003421675 s 1.00
actmtch / IDefOpt / tpu / BothRev 0.00000346505 s 0.0000034823 s 1.00
actmtch / JaXPipe / cpu / Primal 0.00001341 s 0.000006897379980728147 s 1.94
actmtch / Jax / cpu / Primal 0.000013679 s 0.0000069136199908825804 s 1.98
actmtch / HLOOpt / cpu / Primal 0.000014212 s 0.00000841961998048646 s 1.69
actmtch / PartOpt / cpu / Primal 0.000013195 s 0.000006819640047979192 s 1.93
actmtch / IPartOpt / cpu / Primal 0.000013553 s 0.000007617820037921774 s 1.78
actmtch / DefOpt / cpu / Primal 0.000014326 s 0.000007364539997070097 s 1.95
actmtch / IDefOpt / cpu / Primal 0.000014025 s 0.000008136800051943282 s 1.72
actmtch / JaXPipe / cpu / Forward 0.000019546 s 0.0000118703200314485 s 1.65
actmtch / Jax / cpu / Forward 0.000018245 s 0.00001041201993757568 s 1.75
actmtch / HLOOpt / cpu / Forward 0.000019313 s 0.000012178740007584564 s 1.59
actmtch / PartOpt / cpu / Forward 0.000019215 s 0.000011013440016540698 s 1.74
actmtch / IPartOpt / cpu / Forward 0.000019384 s 0.000011710939988915924 s 1.66
actmtch / DefOpt / cpu / Forward 0.000019143 s 0.00001178708000225015 s 1.62
actmtch / IDefOpt / cpu / Forward 0.000019595 s 0.000011781220046032103 s 1.66
actmtch / JaXPipe / cpu / PreRev 0.000019679 s 0.00001191142000607215 s 1.65
actmtch / JaXPipe / cpu / PostRev 0.000017996000000000002 s 0.000011489859980429174 s 1.57
actmtch / JaXPipe / cpu / BothRev 0.000019959 s 0.000012797200070053804 s 1.56
actmtch / Jax / cpu / BothRev 0.000018301 s 0.000011320319972583092 s 1.62
actmtch / HLOOpt / cpu / PreRev 0.000019392 s 0.000012538219962152651 s 1.55
actmtch / HLOOpt / cpu / PostRev 0.000020011 s 0.000013989540020702408 s 1.43
actmtch / HLOOpt / cpu / BothRev 0.000019514 s 0.000012801459997717756 s 1.52
actmtch / PartOpt / cpu / PreRev 0.000019284 s 0.000011282919986115304 s 1.71
actmtch / PartOpt / cpu / PostRev 0.000018081 s 0.000010915479970208251 s 1.66
actmtch / PartOpt / cpu / BothRev 0.000019988 s 0.000012874100011686096 s 1.55
actmtch / IPartOpt / cpu / PreRev 0.000019595 s 0.000012087639988749289 s 1.62
actmtch / IPartOpt / cpu / PostRev 0.000017902000000000002 s 0.00001114202000280784 s 1.61
actmtch / IPartOpt / cpu / BothRev 0.000019791 s 0.00001213371994708723 s 1.63
actmtch / DefOpt / cpu / PreRev 0.000019403 s 0.000011616179999691667 s 1.67
actmtch / DefOpt / cpu / PostRev 0.000019267 s 0.000012520880018200842 s 1.54
actmtch / DefOpt / cpu / BothRev 0.000019727000000000003 s 0.000012740499987557997 s 1.55
actmtch / IDefOpt / cpu / PreRev 0.000019631 s 0.000011899540022568544 s 1.65
actmtch / IDefOpt / cpu / PostRev 0.000019881 s 0.000012510440046753502 s 1.59
actmtch / IDefOpt / cpu / BothRev 0.000019304 s 0.000012450659996829926 s 1.55
add_one / JaXPipe / cpu / Primal 0.000009446800013392932 s 0.00000706440000612929 s 1.34
add_one / Jax / cpu / Primal 0.000009122679985011928 s 0.000007601719935337314 s 1.20
add_one / HLOOpt / cpu / Primal 0.000007943339933262904 s 0.000007250780045069405 s 1.10
add_one / PartOpt / cpu / Primal 0.000008322859939653426 s 0.000007323659974645125 s 1.14
add_one / IPartOpt / cpu / Primal 0.000008701859951543155 s 0.000007363139993685763 s 1.18
add_one / DefOpt / cpu / Primal 0.000007943080054246821 s 0.000007430599980580155 s 1.07
add_one / IDefOpt / cpu / Primal 0.00000771577999330475 s 0.000007264319974638056 s 1.06
add_one / JaXPipe / cpu / Forward 0.000011471960042399588 s 0.000011343019978085068 s 1.01
add_one / Jax / cpu / Forward 0.000012413799995556472 s 0.00001087398001800466 s 1.14
add_one / HLOOpt / cpu / Forward 0.000011387340036890236 s 0.0000115157799791632 s 0.99
add_one / PartOpt / cpu / Forward 0.000011812899992946769 s 0.000010771899987958022 s 1.10
add_one / IPartOpt / cpu / Forward 0.000011171239984832937 s 0.000011192560004928963 s 1.00
add_one / DefOpt / cpu / Forward 0.000011936559985770144 s 0.000011124239945274894 s 1.07
add_one / IDefOpt / cpu / Forward 0.000011112559968751155 s 0.000011266279998380925 s 0.99
add_one / JaXPipe / cpu / PreRev 0.00001307148000705638 s 0.000013437640000120154 s 0.97
add_one / JaXPipe / cpu / PostRev 0.000013430580020212802 s 0.00001323504003266862 s 1.01
add_one / JaXPipe / cpu / BothRev 0.000013528300005418714 s 0.000013404940018517664 s 1.01
add_one / Jax / cpu / BothRev 0.000012836819932999787 s 0.000012968259970875809 s 0.99
add_one / HLOOpt / cpu / PreRev 0.000013550359999499052 s 0.000013323479979590047 s 1.02
add_one / HLOOpt / cpu / PostRev 0.00001762362000590656 s 0.000014851399982944714 s 1.19
add_one / HLOOpt / cpu / BothRev 0.000012584479954966809 s 0.000012748400004056747 s 0.99
add_one / PartOpt / cpu / PreRev 0.000013045459982095051 s 0.000012426560033418356 s 1.05
add_one / PartOpt / cpu / PostRev 0.000012626800089492462 s 0.00001276427994525875 s 0.99
add_one / PartOpt / cpu / BothRev 0.000012988099988433532 s 0.00001322353997238679 s 0.98
add_one / IPartOpt / cpu / PreRev 0.000013442439976643071 s 0.00001235531998645456 s 1.09
add_one / IPartOpt / cpu / PostRev 0.000012876960081484869 s 0.000012512420062193996 s 1.03
add_one / IPartOpt / cpu / BothRev 0.000013156900058675091 s 0.000013071919993308256 s 1.01
add_one / DefOpt / cpu / PreRev 0.00001256018002095516 s 0.00001339427999482723 s 0.94
add_one / DefOpt / cpu / PostRev 0.000013207299907662671 s 0.000013905179985158611 s 0.95
add_one / DefOpt / cpu / BothRev 0.000013874879969080213 s 0.000012657760025831522 s 1.10
add_one / IDefOpt / cpu / PreRev 0.000013266940022731431 s 0.000012838639995607084 s 1.03
add_one / IDefOpt / cpu / PostRev 0.00001307259999521193 s 0.000013451439972413937 s 0.97
add_one / IDefOpt / cpu / BothRev 0.000012516119950305438 s 0.00001288153997847985 s 0.97
add_one / JaXPipe / cuda / Primal 0.000002304 s
add_one / Jax / cuda / Primal 0.000002304 s
add_one / HLOOpt / cuda / Primal 0.000002335 s
add_one / PartOpt / cuda / Primal 0.000002335 s
add_one / IPartOpt / cuda / Primal 0.000002335 s
add_one / DefOpt / cuda / Primal 0.000002335 s
add_one / IDefOpt / cuda / Primal 0.000002335 s
add_one / JaXPipe / cuda / Forward 0.00001056 s
add_one / Jax / cuda / Forward 0.00001088 s
add_one / HLOOpt / cuda / Forward 0.000010752 s
add_one / PartOpt / cuda / Forward 0.000010752 s
add_one / IPartOpt / cuda / Forward 0.000010783 s
add_one / DefOpt / cuda / Forward 0.000010848 s
add_one / IDefOpt / cuda / Forward 0.00001088 s
add_one / JaXPipe / cuda / PreRev 0.000026944 s
add_one / JaXPipe / cuda / PostRev 0.000026208 s
add_one / JaXPipe / cuda / BothRev 0.000026431 s
add_one / Jax / cuda / BothRev 0.000026368 s
add_one / HLOOpt / cuda / PreRev 0.000027552 s
add_one / HLOOpt / cuda / PostRev 0.000026016 s
add_one / HLOOpt / cuda / BothRev 0.000026368 s
add_one / PartOpt / cuda / PreRev 0.00002624 s
add_one / PartOpt / cuda / PostRev 0.00002608 s
add_one / PartOpt / cuda / BothRev 0.000026048 s
add_one / IPartOpt / cuda / PreRev 0.000026304 s
add_one / IPartOpt / cuda / PostRev 0.000027232 s
add_one / IPartOpt / cuda / BothRev 0.00002704 s
add_one / DefOpt / cuda / PreRev 0.000026624 s
add_one / DefOpt / cuda / PostRev 0.000026752 s
add_one / DefOpt / cuda / BothRev 0.000027872 s
add_one / IDefOpt / cuda / PreRev 0.000026433000000000003 s
add_one / IDefOpt / cuda / PostRev 0.000026304 s
add_one / IDefOpt / cuda / BothRev 0.000027616 s
add_one / JaXPipe / tpu / Primal 0.0000014321999999999998 s 0.0000014288 s 1.00
add_one / Jax / tpu / Primal 0.00000141655 s 0.000001405625 s 1.01
add_one / HLOOpt / tpu / Primal 0.00000142245 s 0.0000014295749999999998 s 1.00
add_one / PartOpt / tpu / Primal 0.00000140345 s 0.000001402225 s 1.00
add_one / IPartOpt / tpu / Primal 0.0000014394499999999998 s 0.000001428775 s 1.01
add_one / DefOpt / tpu / Primal 0.0000014021 s 0.0000014075499999999998 s 1.00
add_one / IDefOpt / tpu / Primal 0.000001436775 s 0.000001429925 s 1.00
add_one / JaXPipe / tpu / Forward 0.000001788725 s 0.0000018106 s 0.99
add_one / Jax / tpu / Forward 0.000001840975 s 0.00000185235 s 0.99
add_one / HLOOpt / tpu / Forward 0.000001812725 s 0.00000179575 s 1.01
add_one / PartOpt / tpu / Forward 0.000001855 s 0.000001853475 s 1.00
add_one / IPartOpt / tpu / Forward 0.0000018029 s 0.000001793625 s 1.01
add_one / DefOpt / tpu / Forward 0.00000183755 s 0.000001843375 s 1.00
add_one / IDefOpt / tpu / Forward 0.0000017965749999999998 s 0.000001797975 s 1.00
add_one / JaXPipe / tpu / PreRev 0.0000022323 s 0.000002233925 s 1.00
add_one / JaXPipe / tpu / PostRev 0.000002181675 s 0.000002186125 s 1.00
add_one / JaXPipe / tpu / BothRev 0.00000223695 s 0.000002238875 s 1.00
add_one / Jax / tpu / BothRev 0.000002190475 s 0.0000021898500000000003 s 1.00
add_one / HLOOpt / tpu / PreRev 0.000002232975 s 0.00000224045 s 1.00
add_one / HLOOpt / tpu / PostRev 0.000002189075 s 0.0000021848500000000004 s 1.00
add_one / HLOOpt / tpu / BothRev 0.000002239175 s 0.0000022363750000000003 s 1.00
add_one / PartOpt / tpu / PreRev 0.000002181425 s 0.000002183875 s 1.00
add_one / PartOpt / tpu / PostRev 0.0000022419 s 0.00000223835 s 1.00
add_one / PartOpt / tpu / BothRev 0.0000021854000000000003 s 0.000002182175 s 1.00
add_one / IPartOpt / tpu / PreRev 0.00000224525 s 0.00000224795 s 1.00
add_one / IPartOpt / tpu / PostRev 0.0000021774 s 0.000002180375 s 1.00
add_one / IPartOpt / tpu / BothRev 0.0000022437250000000003 s 0.000002235325 s 1.00
add_one / DefOpt / tpu / PreRev 0.000002196525 s 0.0000021844250000000003 s 1.01
add_one / DefOpt / tpu / PostRev 0.0000022391 s 0.00000223515 s 1.00
add_one / DefOpt / tpu / BothRev 0.000002184025 s 0.000002192675 s 1.00
add_one / IDefOpt / tpu / PreRev 0.00000224075 s 0.0000022315 s 1.00
add_one / IDefOpt / tpu / PostRev 0.00000219735 s 0.0000021845000000000004 s 1.01
add_one / IDefOpt / tpu / BothRev 0.00000224315 s 0.00000223215 s 1.00
add_one / JaXPipe / cpu / Primal 0.00001333 s 0.00000706440000612929 s 1.89
add_one / Jax / cpu / Primal 0.000013246000000000002 s 0.000007601719935337314 s 1.74
add_one / HLOOpt / cpu / Primal 0.000013108 s 0.000007250780045069405 s 1.81
add_one / PartOpt / cpu / Primal 0.000013103 s 0.000007323659974645125 s 1.79
add_one / IPartOpt / cpu / Primal 0.000013169 s 0.000007363139993685763 s 1.79
add_one / DefOpt / cpu / Primal 0.000013203 s 0.000007430599980580155 s 1.78
add_one / IDefOpt / cpu / Primal 0.000013018 s 0.000007264319974638056 s 1.79
add_one / JaXPipe / cpu / Forward 0.000018053 s 0.000011343019978085068 s 1.59
add_one / Jax / cpu / Forward 0.000017839 s 0.00001087398001800466 s 1.64
add_one / HLOOpt / cpu / Forward 0.000018105 s 0.0000115157799791632 s 1.57
add_one / PartOpt / cpu / Forward 0.000017794 s 0.000010771899987958022 s 1.65
add_one / IPartOpt / cpu / Forward 0.000017837 s 0.000011192560004928963 s 1.59
add_one / DefOpt / cpu / Forward 0.000017947000000000003 s 0.000011124239945274894 s 1.61
add_one / IDefOpt / cpu / Forward 0.000018022 s 0.000011266279998380925 s 1.60
add_one / JaXPipe / cpu / PreRev 0.000020671 s 0.000013437640000120154 s 1.54
add_one / JaXPipe / cpu / PostRev 0.00001965 s 0.00001323504003266862 s 1.48
add_one / JaXPipe / cpu / BothRev 0.000019692 s 0.000013404940018517664 s 1.47
add_one / Jax / cpu / BothRev 0.000019872 s 0.000012968259970875809 s 1.53
add_one / HLOOpt / cpu / PreRev 0.000019866000000000003 s 0.000013323479979590047 s 1.49
add_one / HLOOpt / cpu / PostRev 0.000020458 s 0.000014851399982944714 s 1.38
add_one / HLOOpt / cpu / BothRev 0.000020226000000000003 s 0.000012748400004056747 s 1.59
add_one / PartOpt / cpu / PreRev 0.000020005 s 0.000012426560033418356 s 1.61
add_one / PartOpt / cpu / PostRev 0.000020179 s 0.00001276427994525875 s 1.58
add_one / PartOpt / cpu / BothRev 0.000019981 s 0.00001322353997238679 s 1.51
add_one / IPartOpt / cpu / PreRev 0.000020751 s 0.00001235531998645456 s 1.68
add_one / IPartOpt / cpu / PostRev 0.00001964 s 0.000012512420062193996 s 1.57
add_one / IPartOpt / cpu / BothRev 0.000020067000000000003 s 0.000013071919993308256 s 1.54
add_one / DefOpt / cpu / PreRev 0.000019539 s 0.00001339427999482723 s 1.46
add_one / DefOpt / cpu / PostRev 0.000020039 s 0.000013905179985158611 s 1.44
add_one / DefOpt / cpu / BothRev 0.00002019 s 0.000012657760025831522 s 1.60
add_one / IDefOpt / cpu / PreRev 0.00002022 s 0.000012838639995607084 s 1.57
add_one / IDefOpt / cpu / PostRev 0.000019706 s 0.000013451439972413937 s 1.46
add_one / IDefOpt / cpu / BothRev 0.000019612 s 0.00001288153997847985 s 1.52
add_two / JaXPipe / cpu / Primal 0.000008228919978137127 s 0.000007585460025438806 s 1.08
add_two / Jax / cpu / Primal 0.000007852960061427439 s 0.000007901919934738543 s 0.99
add_two / HLOOpt / cpu / Primal 0.00000757889998567407 s 0.000007734699966022163 s 0.98
add_two / PartOpt / cpu / Primal 0.000007924899855424882 s 0.0000077645399778703 s 1.02
add_two / IPartOpt / cpu / Primal 0.000007571559890493517 s 0.00000801171999228245 s 0.95
add_two / DefOpt / cpu / Primal 0.000008030860008148012 s 0.0000074785999822779556 s 1.07
add_two / IDefOpt / cpu / Primal 0.00000777446000938653 s 0.000007569339968540589 s 1.03
add_two / JaXPipe / cpu / Forward 0.000011113740001746918 s 0.000011130499970022357 s 1.00
add_two / Jax / cpu / Forward 0.000011730279911716934 s 0.000011560640014067755 s 1.01
add_two / HLOOpt / cpu / Forward 0.000011978540060226806 s 0.000011478699998406229 s 1.04
add_two / PartOpt / cpu / Forward 0.000011723240040737436 s 0.000011006600025211813 s 1.07
add_two / IPartOpt / cpu / Forward 0.000011729160014510852 s 0.00001155640001343272 s 1.01
add_two / DefOpt / cpu / Forward 0.000011317320004309294 s 0.000011182519965586837 s 1.01
add_two / IDefOpt / cpu / Forward 0.000012071959918102949 s 0.000011505340007715858 s 1.05
add_two / JaXPipe / cpu / PreRev 0.000015901680053502788 s 0.000015082180016179335 s 1.05
add_two / JaXPipe / cpu / PostRev 0.00001510661992142559 s 0.000015239040012602345 s 0.99
add_two / JaXPipe / cpu / BothRev 0.000016034640084399142 s 0.000015677139981562504 s 1.02
add_two / Jax / cpu / BothRev 0.000015171480063145282 s 0.000015563599990855437 s 0.97
add_two / HLOOpt / cpu / PreRev 0.000015356760050053707 s 0.000015294399981939933 s 1.00
add_two / HLOOpt / cpu / PostRev 0.000017783100029191702 s 0.00001706452000689751 s 1.04
add_two / HLOOpt / cpu / BothRev 0.0000157942800251476 s 0.000015444639993802413 s 1.02
add_two / PartOpt / cpu / PreRev 0.000015446899997186847 s 0.000015105920001587946 s 1.02
add_two / PartOpt / cpu / PostRev 0.00001575201999003184 s 0.000015631119968020357 s 1.01
add_two / PartOpt / cpu / BothRev 0.000015933240065351127 s 0.000015236720055327168 s 1.05
add_two / IPartOpt / cpu / PreRev 0.000015364900027634574 s 0.000015926079968267003 s 0.96
add_two / IPartOpt / cpu / PostRev 0.000016114860081870574 s 0.00001560540000355104 s 1.03
add_two / IPartOpt / cpu / BothRev 0.000015513699981966056 s 0.000014749879992450588 s 1.05
add_two / DefOpt / cpu / PreRev 0.000015790779962117085 s 0.00001555791996906919 s 1.01
add_two / DefOpt / cpu / PostRev 0.00001619841994397575 s 0.00001548297998851922 s 1.05
add_two / DefOpt / cpu / BothRev 0.000015951639998093015 s 0.000015098959956958423 s 1.06
add_two / IDefOpt / cpu / PreRev 0.000016616000029898713 s 0.000015209419980237724 s 1.09
add_two / IDefOpt / cpu / PostRev 0.000015404000114358494 s 0.000015477779952561833 s 1.00
add_two / IDefOpt / cpu / BothRev 0.000015376459941762732 s 0.00001620710004317516 s 0.95
add_two / JaXPipe / cuda / Primal 0.000002431 s
add_two / Jax / cuda / Primal 0.000002432 s
add_two / HLOOpt / cuda / Primal 0.000002431 s
add_two / PartOpt / cuda / Primal 0.000002431 s
add_two / IPartOpt / cuda / Primal 0.000002431 s
add_two / DefOpt / cuda / Primal 0.000002432 s
add_two / IDefOpt / cuda / Primal 0.000002431 s
add_two / JaXPipe / cuda / Forward 0.00001088 s
add_two / Jax / cuda / Forward 0.000010752 s
add_two / HLOOpt / cuda / Forward 0.00001088 s
add_two / PartOpt / cuda / Forward 0.00001072 s
add_two / IPartOpt / cuda / Forward 0.0000104 s
add_two / DefOpt / cuda / Forward 0.00001088 s
add_two / IDefOpt / cuda / Forward 0.00001088 s
add_two / JaXPipe / cuda / PreRev 0.000034208 s
add_two / JaXPipe / cuda / PostRev 0.000034016 s
add_two / JaXPipe / cuda / BothRev 0.000034623000000000004 s
add_two / Jax / cuda / BothRev 0.000033888 s
add_two / HLOOpt / cuda / PreRev 0.00003488 s
add_two / HLOOpt / cuda / PostRev 0.000034144000000000004 s
add_two / HLOOpt / cuda / BothRev 0.000034687 s
add_two / PartOpt / cuda / PreRev 0.00003504 s
add_two / PartOpt / cuda / PostRev 0.000033184 s
add_two / PartOpt / cuda / BothRev 0.000033569 s
add_two / IPartOpt / cuda / PreRev 0.000034144000000000004 s
add_two / IPartOpt / cuda / PostRev 0.000033439 s
add_two / IPartOpt / cuda / BothRev 0.000034784000000000004 s
add_two / DefOpt / cuda / PreRev 0.000035232 s
add_two / DefOpt / cuda / PostRev 0.000033665000000000004 s
add_two / DefOpt / cuda / BothRev 0.00003408 s
add_two / IDefOpt / cuda / PreRev 0.000034976 s
add_two / IDefOpt / cuda / PostRev 0.000034399 s
add_two / IDefOpt / cuda / BothRev 0.000034592 s
add_two / JaXPipe / tpu / Primal 0.0000014380999999999998 s 0.0000014355 s 1.00
add_two / Jax / tpu / Primal 0.00000143245 s 0.0000014217 s 1.01
add_two / HLOOpt / tpu / Primal 0.000001430075 s 0.000001438575 s 0.99
add_two / PartOpt / tpu / Primal 0.00000142945 s 0.0000014199 s 1.01
add_two / IPartOpt / tpu / Primal 0.000001433875 s 0.0000014303750000000005 s 1.00
add_two / DefOpt / tpu / Primal 0.0000014306499999999998 s 0.0000014258 s 1.00
add_two / IDefOpt / tpu / Primal 0.0000014299500000000002 s 0.00000144245 s 0.99
add_two / JaXPipe / tpu / Forward 0.000001819825 s 0.00000182285 s 1.00
add_two / Jax / tpu / Forward 0.000001832275 s 0.000001825575 s 1.00
add_two / HLOOpt / tpu / Forward 0.00000183045 s 0.00000182335 s 1.00
add_two / PartOpt / tpu / Forward 0.000001824275 s 0.0000018401 s 0.99
add_two / IPartOpt / tpu / Forward 0.000001830575 s 0.000001820525 s 1.01
add_two / DefOpt / tpu / Forward 0.000001826475 s 0.0000018333 s 1.00
add_two / IDefOpt / tpu / Forward 0.000001841075 s 0.00000182775 s 1.01
add_two / JaXPipe / tpu / PreRev 0.0000028336250000000003 s 0.0000028411500000000003 s 1.00
add_two / JaXPipe / tpu / PostRev 0.000002766825 s 0.0000027745500000000004 s 1.00
add_two / JaXPipe / tpu / BothRev 0.000002842175 s 0.0000028418 s 1.00
add_two / Jax / tpu / BothRev 0.000002752 s 0.00000276245 s 1.00
add_two / HLOOpt / tpu / PreRev 0.000002842225 s 0.000002845725 s 1.00
add_two / HLOOpt / tpu / PostRev 0.0000027545 s 0.0000027546250000000003 s 1.00
add_two / HLOOpt / tpu / BothRev 0.000002840625 s 0.00000285825 s 0.99
add_two / PartOpt / tpu / PreRev 0.000002768925 s 0.0000027701500000000005 s 1.00
add_two / PartOpt / tpu / PostRev 0.000002835 s 0.0000028395 s 1.00
add_two / PartOpt / tpu / BothRev 0.00000274145 s 0.00000275855 s 0.99
add_two / IPartOpt / tpu / PreRev 0.0000028288000000000003 s 0.00000283355 s 1.00
add_two / IPartOpt / tpu / PostRev 0.0000027467000000000003 s 0.000002756025 s 1.00
add_two / IPartOpt / tpu / BothRev 0.0000028326 s 0.0000028466000000000004 s 1.00
add_two / DefOpt / tpu / PreRev 0.0000027463500000000004 s 0.0000027504 s 1.00
add_two / DefOpt / tpu / PostRev 0.00000284445 s 0.0000028446500000000004 s 1.00
add_two / DefOpt / tpu / BothRev 0.0000027459499999999995 s 0.000002745525 s 1.00
add_two / IDefOpt / tpu / PreRev 0.00000284615 s 0.0000028409 s 1.00
add_two / IDefOpt / tpu / PostRev 0.000002758375 s 0.000002753225 s 1.00
add_two / IDefOpt / tpu / BothRev 0.00000283725 s 0.000002842725 s 1.00
add_two / JaXPipe / cpu / Primal 0.00001351 s 0.000007585460025438806 s 1.78
add_two / Jax / cpu / Primal 0.000013249 s 0.000007901919934738543 s 1.68
add_two / HLOOpt / cpu / Primal 0.000013496 s 0.000007734699966022163 s 1.74
add_two / PartOpt / cpu / Primal 0.00001362 s 0.0000077645399778703 s 1.75
add_two / IPartOpt / cpu / Primal 0.000013305 s 0.00000801171999228245 s 1.66
add_two / DefOpt / cpu / Primal 0.000013436 s 0.0000074785999822779556 s 1.80
add_two / IDefOpt / cpu / Primal 0.000013282 s 0.000007569339968540589 s 1.75
add_two / JaXPipe / cpu / Forward 0.000018373 s 0.000011130499970022357 s 1.65
add_two / Jax / cpu / Forward 0.000018196 s 0.000011560640014067755 s 1.57
add_two / HLOOpt / cpu / Forward 0.000018221 s 0.000011478699998406229 s 1.59
add_two / PartOpt / cpu / Forward 0.000018771 s 0.000011006600025211813 s 1.71
add_two / IPartOpt / cpu / Forward 0.000018178 s 0.00001155640001343272 s 1.57
add_two / DefOpt / cpu / Forward 0.000018114000000000003 s 0.000011182519965586837 s 1.62
add_two / IDefOpt / cpu / Forward 0.000018006 s 0.000011505340007715858 s 1.57
add_two / JaXPipe / cpu / PreRev 0.000023699 s 0.000015082180016179335 s 1.57
add_two / JaXPipe / cpu / PostRev 0.000023686 s 0.000015239040012602345 s 1.55
add_two / JaXPipe / cpu / BothRev 0.000023366000000000003 s 0.000015677139981562504 s 1.49
add_two / Jax / cpu / BothRev 0.000022854 s 0.000015563599990855437 s 1.47
add_two / HLOOpt / cpu / PreRev 0.000022941 s 0.000015294399981939933 s 1.50
add_two / HLOOpt / cpu / PostRev 0.00002342 s 0.00001706452000689751 s 1.37
add_two / HLOOpt / cpu / BothRev 0.000023987000000000003 s 0.000015444639993802413 s 1.55
add_two / PartOpt / cpu / PreRev 0.000023106 s 0.000015105920001587946 s 1.53
add_two / PartOpt / cpu / PostRev 0.000023603 s 0.000015631119968020357 s 1.51
add_two / PartOpt / cpu / BothRev 0.000024219 s 0.000015236720055327168 s 1.59
add_two / IPartOpt / cpu / PreRev 0.000024302 s 0.000015926079968267003 s 1.53
add_two / IPartOpt / cpu / PostRev 0.000024582 s 0.00001560540000355104 s 1.58
add_two / IPartOpt / cpu / BothRev 0.000023517 s 0.000014749879992450588 s 1.59
add_two / DefOpt / cpu / PreRev 0.000024343 s 0.00001555791996906919 s 1.56
add_two / DefOpt / cpu / PostRev 0.000023704 s 0.00001548297998851922 s 1.53
add_two / DefOpt / cpu / BothRev 0.000023488 s 0.000015098959956958423 s 1.56
add_two / IDefOpt / cpu / PreRev 0.000024819 s 0.000015209419980237724 s 1.63
add_two / IDefOpt / cpu / PostRev 0.0000255 s 0.000015477779952561833 s 1.65
add_two / IDefOpt / cpu / BothRev 0.000023716 s 0.00001620710004317516 s 1.46
cache / JaXPipe / cpu / Primal 0.000006778320021112449 s 0.000007056179956634878 s 0.96
cache / Jax / cpu / Primal 0.000008608680091128917 s 0.000007446979971064138 s 1.16
cache / HLOOpt / cpu / Primal 0.000008067280050454429 s 0.000006615619986405363 s 1.22
cache / PartOpt / cpu / Primal 0.000007780860069033223 s 0.000006719259981764481 s 1.16
cache / IPartOpt / cpu / Primal 0.0000078753999696346 s 0.000006971860020712484 s 1.13
cache / DefOpt / cpu / Primal 0.000007817879941285355 s 0.000007025120012258412 s 1.11
cache / IDefOpt / cpu / Primal 0.000007880059947638073 s 0.000007174279980972642 s 1.10
cache / JaXPipe / cpu / Forward 0.000014821899894741363 s 0.000014488840042758966 s 1.02
cache / Jax / cpu / Forward 0.000015384719990834128 s 0.000015192699947874644 s 1.01
cache / HLOOpt / cpu / Forward 0.000016243100035353563 s 0.000015517559959334903 s 1.05
cache / PartOpt / cpu / Forward 0.000014748579960723872 s 0.000014401319958778911 s 1.02
cache / IPartOpt / cpu / Forward 0.00001569882002513623 s 0.000015117139928406687 s 1.04
cache / DefOpt / cpu / Forward 0.000015263340046658413 s 0.000014252099936129523 s 1.07
cache / IDefOpt / cpu / Forward 0.0000152645599519019 s 0.000015947860019878136 s 0.96
cache / JaXPipe / cpu / PreRev 0.000017022980009642198 s 0.000017153660010080785 s 0.99
cache / JaXPipe / cpu / PostRev 0.000020436460054042983 s 0.00002086701995722251 s 0.98
cache / JaXPipe / cpu / BothRev 0.000017449760052841155 s 0.000016654460014251528 s 1.05
cache / Jax / cpu / BothRev 0.00002214977999756229 s 0.000020643239995479235 s 1.07
cache / HLOOpt / cpu / PreRev 0.000016472999977850122 s 0.000017784900019250928 s 0.93
cache / HLOOpt / cpu / PostRev 0.0000197194800057332 s 0.000021892259992455367 s 0.90
cache / HLOOpt / cpu / BothRev 0.000017747519996191842 s 0.000018382259986537976 s 0.97
cache / PartOpt / cpu / PreRev 0.000015947099946060917 s 0.000016693739980837562 s 0.96
cache / PartOpt / cpu / PostRev 0.000022589079999306702 s 0.000021782079984404844 s 1.04
cache / PartOpt / cpu / BothRev 0.000016057699904195035 s 0.000017551979999552715 s 0.91
cache / IPartOpt / cpu / PreRev 0.00001659378007389023 s 0.00001678380000157631 s 0.99
cache / IPartOpt / cpu / PostRev 0.000021509379948838613 s 0.000021706779980377176 s 0.99
cache / IPartOpt / cpu / BothRev 0.000016132279924931935 s 0.000017438879986002576 s 0.93
cache / DefOpt / cpu / PreRev 0.000016353860082745087 s 0.00001758438000251772 s 0.93
cache / DefOpt / cpu / PostRev 0.00001690259996394161 s 0.000016727100019124918 s 1.01
cache / DefOpt / cpu / BothRev 0.00001693030000751605 s 0.000016970419956123806 s 1.00
cache / IDefOpt / cpu / PreRev 0.00001601755995579879 s 0.000017113839985540834 s 0.94
cache / IDefOpt / cpu / PostRev 0.000015976459926605458 s 0.000016802999980427557 s 0.95
cache / IDefOpt / cpu / BothRev 0.000015905940072116208 s 0.000017829540001912392 s 0.89
cache / JaXPipe / cuda / Primal 0.000002336 s
cache / Jax / cuda / Primal 0.000002336 s
cache / HLOOpt / cuda / Primal 0.000002335 s
cache / PartOpt / cuda / Primal 0.000002335 s
cache / IPartOpt / cuda / Primal 0.000002335 s
cache / DefOpt / cuda / Primal 0.000002335 s
cache / IDefOpt / cuda / Primal 0.000002335 s
cache / JaXPipe / cuda / Forward 0.0000023670000000000004 s
cache / Jax / cuda / Forward 0.0000023670000000000004 s
cache / HLOOpt / cuda / Forward 0.0000023670000000000004 s
cache / PartOpt / cuda / Forward 0.000002336 s
cache / IPartOpt / cuda / Forward 0.000002336 s
cache / DefOpt / cuda / Forward 0.0000023670000000000004 s
cache / IDefOpt / cuda / Forward 0.0000023670000000000004 s
cache / JaXPipe / cuda / PreRev 0.000011616 s
cache / JaXPipe / cuda / PostRev 0.000011425 s
cache / JaXPipe / cuda / BothRev 0.000011424 s
cache / Jax / cuda / BothRev 0.000011423 s
cache / HLOOpt / cuda / PreRev 0.000013727 s
cache / HLOOpt / cuda / PostRev 0.000013696 s
cache / HLOOpt / cuda / BothRev 0.000013728 s
cache / PartOpt / cuda / PreRev 0.000011072 s
cache / PartOpt / cuda / PostRev 0.000011296 s
cache / PartOpt / cuda / BothRev 0.000010911 s
cache / IPartOpt / cuda / PreRev 0.000011168 s
cache / IPartOpt / cuda / PostRev 0.00001104 s
cache / IPartOpt / cuda / BothRev 0.000010656 s
cache / DefOpt / cuda / PreRev 0.000010944 s
cache / DefOpt / cuda / PostRev 0.000011263 s
cache / DefOpt / cuda / BothRev 0.000011104 s
cache / IDefOpt / cuda / PreRev 0.000010976 s
cache / IDefOpt / cuda / PostRev 0.0000112 s
cache / IDefOpt / cuda / BothRev 0.00001056 s
cache / JaXPipe / tpu / Primal 0.000002471575 s 0.000002457375 s 1.01
cache / Jax / tpu / Primal 0.000002457325 s 0.0000024826 s 0.99
cache / HLOOpt / tpu / Primal 0.000002477925 s 0.0000024602 s 1.01
cache / PartOpt / tpu / Primal 0.0000024645500000000004 s 0.00000245655 s 1.00
cache / IPartOpt / tpu / Primal 0.0000024774 s 0.000002473975 s 1.00
cache / DefOpt / tpu / Primal 0.000002461075 s 0.000002445925 s 1.01
cache / IDefOpt / tpu / Primal 0.00000247445 s 0.000002467875 s 1.00
cache / JaXPipe / tpu / Forward 0.0000035455750000000004 s 0.0000035509 s 1.00
cache / Jax / tpu / Forward 0.00000354205 s 0.00000355365 s 1.00
cache / HLOOpt / tpu / Forward 0.00000353565 s 0.000003554675 s 0.99
cache / PartOpt / tpu / Forward 0.0000035289749999999995 s 0.000003536275 s 1.00
cache / IPartOpt / tpu / Forward 0.000003556375 s 0.0000035529500000000004 s 1.00
cache / DefOpt / tpu / Forward 0.00000352405 s 0.00000352805 s 1.00
cache / IDefOpt / tpu / Forward 0.00000355235 s 0.00000355375 s 1.00
cache / JaXPipe / tpu / PreRev 0.00000495065 s 0.0000049691500000000005 s 1.00
cache / JaXPipe / tpu / PostRev 0.00000497545 s 0.000004967775 s 1.00
cache / JaXPipe / tpu / BothRev 0.000004979374999999999 s 0.000004972925 s 1.00
cache / Jax / tpu / BothRev 0.00000498505 s 0.000004984625 s 1.00
cache / HLOOpt / tpu / PreRev 0.000003948575 s 0.000003951 s 1.00
cache / HLOOpt / tpu / PostRev 0.00000414235 s 0.000004137575 s 1.00
cache / HLOOpt / tpu / BothRev 0.000003937375 s 0.000003938075 s 1.00
cache / PartOpt / tpu / PreRev 0.000004981525 s 0.000005003675 s 1.00
cache / PartOpt / tpu / PostRev 0.000004992675 s 0.00000496145 s 1.01
cache / PartOpt / tpu / BothRev 0.00000498985 s 0.000004965275 s 1.00
cache / IPartOpt / tpu / PreRev 0.000004974799999999999 s 0.000004991749999999999 s 1.00
cache / IPartOpt / tpu / PostRev 0.000004970899999999999 s 0.00000496835 s 1.00
cache / IPartOpt / tpu / BothRev 0.000004947625 s 0.0000049548 s 1.00
cache / DefOpt / tpu / PreRev 0.000004987875 s 0.0000049717 s 1.00
cache / DefOpt / tpu / PostRev 0.0000049723 s 0.000004986725 s 1.00
cache / DefOpt / tpu / BothRev 0.0000049649 s 0.00000496795 s 1.00
cache / IDefOpt / tpu / PreRev 0.00000497315 s 0.00000498765 s 1.00
cache / IDefOpt / tpu / PostRev 0.000004976274999999999 s 0.000004972625 s 1.00
cache / IDefOpt / tpu / BothRev 0.000004969375 s 0.00000497715 s 1.00
cache / JaXPipe / cpu / Primal 0.00001281 s 0.000007056179956634878 s 1.82
cache / Jax / cpu / Primal 0.000012678 s 0.000007446979971064138 s 1.70
cache / HLOOpt / cpu / Primal 0.000012639 s 0.000006615619986405363 s 1.91
cache / PartOpt / cpu / Primal 0.000012657 s 0.000006719259981764481 s 1.88
cache / IPartOpt / cpu / Primal 0.000012754 s 0.000006971860020712484 s 1.83
cache / DefOpt / cpu / Primal 0.000012962 s 0.000007025120012258412 s 1.85
cache / IDefOpt / cpu / Primal 0.000013084 s 0.000007174279980972642 s 1.82
cache / JaXPipe / cpu / Forward 0.000017829 s 0.000014488840042758966 s 1.23
cache / Jax / cpu / Forward 0.000018526 s 0.000015192699947874644 s 1.22
cache / HLOOpt / cpu / Forward 0.000017856 s 0.000015517559959334903 s 1.15
cache / PartOpt / cpu / Forward 0.000018085 s 0.000014401319958778911 s 1.26
cache / IPartOpt / cpu / Forward 0.000017978 s 0.000015117139928406687 s 1.19
cache / DefOpt / cpu / Forward 0.000018006 s 0.000014252099936129523 s 1.26
cache / IDefOpt / cpu / Forward 0.000017606 s 0.000015947860019878136 s 1.10
cache / JaXPipe / cpu / PreRev 0.000018113 s 0.000017153660010080785 s 1.06
cache / JaXPipe / cpu / PostRev 0.000020813 s 0.00002086701995722251 s 1.00
cache / JaXPipe / cpu / BothRev 0.000019192 s 0.000016654460014251528 s 1.15
cache / Jax / cpu / BothRev 0.000030859000000000004 s 0.000020643239995479235 s 1.49
cache / HLOOpt / cpu / PreRev 0.000027816 s 0.000017784900019250928 s 1.56
cache / HLOOpt / cpu / PostRev 0.000026897 s 0.000021892259992455367 s 1.23
cache / HLOOpt / cpu / BothRev 0.000026521 s 0.000018382259986537976 s 1.44
cache / PartOpt / cpu / PreRev 0.000032443 s 0.000016693739980837562 s 1.94
cache / PartOpt / cpu / PostRev 0.000032167 s 0.000021782079984404844 s 1.48
cache / PartOpt / cpu / BothRev 0.000018749 s 0.000017551979999552715 s 1.07
cache / IPartOpt / cpu / PreRev 0.00002821 s 0.00001678380000157631 s 1.68
cache / IPartOpt / cpu / PostRev 0.000024467 s 0.000021706779980377176 s 1.13
cache / IPartOpt / cpu / BothRev 0.000027721 s 0.000017438879986002576 s 1.59
cache / DefOpt / cpu / PreRev 0.000027143 s 0.00001758438000251772 s 1.54
cache / DefOpt / cpu / PostRev 0.000027852 s 0.000016727100019124918 s 1.67
cache / DefOpt / cpu / BothRev 0.000024003 s 0.000016970419956123806 s 1.41
cache / IDefOpt / cpu / PreRev 0.000033742 s 0.000017113839985540834 s 1.97
cache / IDefOpt / cpu / PostRev 0.00002855 s 0.000016802999980427557 s 1.70
cache / IDefOpt / cpu / BothRev 0.000030528 s 0.000017829540001912392 s 1.71
Concat / JaXPipe / cpu / Primal 0.00000853854004162713 s 0.000007423600009133225 s 1.15
Concat / Jax / cpu / Primal 0.000008292900038213702 s 0.0000074543200298649024 s 1.11
Concat / HLOOpt / cpu / Primal 0.000008743599992158124 s 0.000007162399979279143 s 1.22
Concat / PartOpt / cpu / Primal 0.000008038120049604914 s 0.000007042840034046094 s 1.14
Concat / IPartOpt / cpu / Primal 0.000008651279968034942 s 0.000007088260008458746 s 1.22
Concat / DefOpt / cpu / Primal 0.000007556819946330507 s 0.000006931860007171053 s 1.09
Concat / IDefOpt / cpu / Primal 0.00000806989995908225 s 0.000006992560029175365 s 1.15
Concat / JaXPipe / cpu / Forward 0.000011782979981944663 s 0.000010997740000675549 s 1.07
Concat / Jax / cpu / Forward 0.00001173520002339501 s 0.00001100831997973728 s 1.07
Concat / HLOOpt / cpu / Forward 0.000012193139973533108 s 0.00001094812000701495 s 1.11
Concat / PartOpt / cpu / Forward 0.000011208300056750886 s 0.000010536919990045136 s 1.06
Concat / IPartOpt / cpu / Forward 0.00001135069998781546 s 0.000011385640036678523 s 1.00
Concat / DefOpt / cpu / Forward 0.000011503539990371792 s 0.00001138815999183862 s 1.01
Concat / IDefOpt / cpu / Forward 0.000011685179979394888 s 0.000011273079962847987 s 1.04
Concat / JaXPipe / cpu / PreRev 0.0000139744999796676 s 0.000012589880006999013 s 1.11
Concat / JaXPipe / cpu / PostRev 0.000013585540018539178 s 0.00001297502001762041 s 1.05
Concat / JaXPipe / cpu / BothRev 0.000013054519949946551 s 0.000012389000021357788 s 1.05
Concat / Jax / cpu / BothRev 0.000013700719991902587 s 0.000012437199993655667 s 1.10
Concat / HLOOpt / cpu / PreRev 0.000013570220035035164 s 0.00001298888004384935 s 1.04
Concat / HLOOpt / cpu / PostRev 0.0000151305399958801 s 0.000014651680039605708 s 1.03
Concat / HLOOpt / cpu / BothRev 0.000013433439908112631 s 0.000013292440025907126 s 1.01
Concat / PartOpt / cpu / PreRev 0.000013654339963977691 s 0.000012485759962146405 s 1.09
Concat / PartOpt / cpu / PostRev 0.000012869299989688443 s 0.00001300403999266564 s 0.99
Concat / PartOpt / cpu / BothRev 0.00001365785992675228 s 0.000013027780014454036 s 1.05
Concat / IPartOpt / cpu / PreRev 0.000013559300004999386 s 0.000011711039987858384 s 1.16
Concat / IPartOpt / cpu / PostRev 0.000012565239921968896 s 0.000013076560016997973 s 0.96
Concat / IPartOpt / cpu / BothRev 0.000013256500042189143 s 0.000013247099941509076 s 1.00
Concat / DefOpt / cpu / PreRev 0.000013137820042175008 s 0.000012478100006774183 s 1.05
Concat / DefOpt / cpu / PostRev 0.000013935439965280238 s 0.000013085759965179024 s 1.06
Concat / DefOpt / cpu / BothRev 0.0000132373799533525 s 0.000013082300038149697 s 1.01
Concat / IDefOpt / cpu / PreRev 0.000013549500035878736 s 0.000012021520005873751 s 1.13
Concat / IDefOpt / cpu / PostRev 0.000012924419988848968 s 0.000012947859986525145 s 1.00
Concat / IDefOpt / cpu / BothRev 0.000013414599980023924 s 0.000013117840017002892 s 1.02
Concat / JaXPipe / cuda / Primal 0.000002464 s
Concat / Jax / cuda / Primal 0.000002464 s
Concat / HLOOpt / cuda / Primal 0.000002463 s
Concat / PartOpt / cuda / Primal 0.000002463 s
Concat / IPartOpt / cuda / Primal 0.000002463 s
Concat / DefOpt / cuda / Primal 0.000002464 s
Concat / IDefOpt / cuda / Primal 0.000002463 s
Concat / JaXPipe / cuda / Forward 0.000012032 s
Concat / Jax / cuda / Forward 0.000011039 s
Concat / HLOOpt / cuda / Forward 0.000011712 s
Concat / PartOpt / cuda / Forward 0.000011104 s
Concat / IPartOpt / cuda / Forward 0.000011392 s
Concat / DefOpt / cuda / Forward 0.000011136 s
Concat / IDefOpt / cuda / Forward 0.000010688 s
Concat / JaXPipe / cuda / PreRev 0.00001728 s
Concat / JaXPipe / cuda / PostRev 0.000017664 s
Concat / JaXPipe / cuda / BothRev 0.000017152 s
Concat / Jax / cuda / BothRev 0.000017024 s
Concat / HLOOpt / cuda / PreRev 0.000019392 s
Concat / HLOOpt / cuda / PostRev 0.000017344 s
Concat / HLOOpt / cuda / BothRev 0.000017344 s
Concat / PartOpt / cuda / PreRev 0.000017056 s
Concat / PartOpt / cuda / PostRev 0.000017919999999999998 s
Concat / PartOpt / cuda / BothRev 0.000017184 s
Concat / IPartOpt / cuda / PreRev 0.000017536 s
Concat / IPartOpt / cuda / PostRev 0.000017632 s
Concat / IPartOpt / cuda / BothRev 0.000017503999999999997 s
Concat / DefOpt / cuda / PreRev 0.000017472 s
Concat / DefOpt / cuda / PostRev 0.000017119 s
Concat / DefOpt / cuda / BothRev 0.000016993 s
Concat / IDefOpt / cuda / PreRev 0.000017503999999999997 s
Concat / IDefOpt / cuda / PostRev 0.000017344 s
Concat / IDefOpt / cuda / BothRev 0.000017824 s
Concat / JaXPipe / tpu / Primal 0.000001482075 s 0.0000014889749999999998 s 1.00
Concat / Jax / tpu / Primal 0.0000014892 s 0.000001478325 s 1.01
Concat / HLOOpt / tpu / Primal 0.000001480825 s 0.0000014868749999999998 s 1.00
Concat / PartOpt / tpu / Primal 0.000001482 s 0.0000014743999999999998 s 1.01
Concat / IPartOpt / tpu / Primal 0.00000148455 s 0.0000014854999999999998 s 1.00
Concat / DefOpt / tpu / Primal 0.000001482225 s 0.0000014769 s 1.00
Concat / IDefOpt / tpu / Primal 0.000001487575 s 0.0000014824 s 1.00
Concat / JaXPipe / tpu / Forward 0.000001541525 s 0.0000015397500000000002 s 1.00
Concat / Jax / tpu / Forward 0.0000015307 s 0.0000015126500000000002 s 1.01
Concat / HLOOpt / tpu / Forward 0.0000015294499999999998 s 0.00000154435 s 0.99
Concat / PartOpt / tpu / Forward 0.00000152005 s 0.000001529425 s 0.99
Concat / IPartOpt / tpu / Forward 0.0000015528249999999998 s 0.000001542375 s 1.01
Concat / DefOpt / tpu / Forward 0.000001526125 s 0.0000015228500000000002 s 1.00
Concat / IDefOpt / tpu / Forward 0.0000015443000000000002 s 0.0000015567 s 0.99
Concat / JaXPipe / tpu / PreRev 0.000001965475 s 0.000001959925 s 1.00
Concat / JaXPipe / tpu / PostRev 0.000002038725 s 0.0000020423 s 1.00
Concat / JaXPipe / tpu / BothRev 0.000001965125 s 0.0000019545 s 1.01
Concat / Jax / tpu / BothRev 0.0000020292 s 0.00000202485 s 1.00
Concat / HLOOpt / tpu / PreRev 0.00000197145 s 0.00000195365 s 1.01
Concat / HLOOpt / tpu / PostRev 0.000002020575 s 0.0000020227 s 1.00
Concat / HLOOpt / tpu / BothRev 0.000001960375 s 0.000001956225 s 1.00
Concat / PartOpt / tpu / PreRev 0.0000020289 s 0.000002033975 s 1.00
Concat / PartOpt / tpu / PostRev 0.0000019727500000000003 s 0.00000196165 s 1.01
Concat / PartOpt / tpu / BothRev 0.000002029 s 0.000002028325 s 1.00
Concat / IPartOpt / tpu / PreRev 0.000001954825 s 0.00000196235 s 1.00
Concat / IPartOpt / tpu / PostRev 0.000002022225 s 0.0000020308750000000003 s 1.00
Concat / IPartOpt / tpu / BothRev 0.00000195565 s 0.000001966275 s 0.99
Concat / DefOpt / tpu / PreRev 0.000002025225 s 0.0000020217 s 1.00
Concat / DefOpt / tpu / PostRev 0.00000196195 s 0.0000019582 s 1.00
Concat / DefOpt / tpu / BothRev 0.000002034975 s 0.000002024975 s 1.00
Concat / IDefOpt / tpu / PreRev 0.000001968425 s 0.0000019659 s 1.00
Concat / IDefOpt / tpu / PostRev 0.0000020218 s 0.000002021575 s 1.00
Concat / IDefOpt / tpu / BothRev 0.000001962975 s 0.000001958225 s 1.00
Concat / JaXPipe / cpu / Primal 0.000012891 s 0.000007423600009133225 s 1.74
Concat / Jax / cpu / Primal 0.000013159 s 0.0000074543200298649024 s 1.77
Concat / HLOOpt / cpu / Primal 0.000012781 s 0.000007162399979279143 s 1.78
Concat / PartOpt / cpu / Primal 0.000013001 s 0.000007042840034046094 s 1.85
Concat / IPartOpt / cpu / Primal 0.000012987 s 0.000007088260008458746 s 1.83
Concat / DefOpt / cpu / Primal 0.000013385 s 0.000006931860007171053 s 1.93
Concat / IDefOpt / cpu / Primal 0.000012923 s 0.000006992560029175365 s 1.85
Concat / JaXPipe / cpu / Forward 0.000018111 s 0.000010997740000675549 s 1.65
Concat / Jax / cpu / Forward 0.000018073 s 0.00001100831997973728 s 1.64
Concat / HLOOpt / cpu / Forward 0.000017422 s 0.00001094812000701495 s 1.59
Concat / PartOpt / cpu / Forward 0.000017912 s 0.000010536919990045136 s 1.70
Concat / IPartOpt / cpu / Forward 0.000017475 s 0.000011385640036678523 s 1.53
Concat / DefOpt / cpu / Forward 0.000018222 s 0.00001138815999183862 s 1.60
Concat / IDefOpt / cpu / Forward 0.000018261 s 0.000011273079962847987 s 1.62
Concat / JaXPipe / cpu / PreRev 0.000020716 s 0.000012589880006999013 s 1.65
Concat / JaXPipe / cpu / PostRev 0.000020142 s 0.00001297502001762041 s 1.55
Concat / JaXPipe / cpu / BothRev 0.000020054 s 0.000012389000021357788 s 1.62
Concat / Jax / cpu / BothRev 0.000020273 s 0.000012437199993655667 s 1.63
Concat / HLOOpt / cpu / PreRev 0.000020265 s 0.00001298888004384935 s 1.56
Concat / HLOOpt / cpu / PostRev 0.000020156 s 0.000014651680039605708 s 1.38
Concat / HLOOpt / cpu / BothRev 0.000019525 s 0.000013292440025907126 s 1.47
Concat / PartOpt / cpu / PreRev 0.000020162 s 0.000012485759962146405 s 1.61
Concat / PartOpt / cpu / PostRev 0.000020088 s 0.00001300403999266564 s 1.54
Concat / PartOpt / cpu / BothRev 0.000019946 s 0.000013027780014454036 s 1.53
Concat / IPartOpt / cpu / PreRev 0.000020458 s 0.000011711039987858384 s 1.75
Concat / IPartOpt / cpu / PostRev 0.000019953 s 0.000013076560016997973 s 1.53
Concat / IPartOpt / cpu / BothRev 0.000019306 s 0.000013247099941509076 s 1.46
Concat / DefOpt / cpu / PreRev 0.000020191 s 0.000012478100006774183 s 1.62
Concat / DefOpt / cpu / PostRev 0.000020127 s 0.000013085759965179024 s 1.54
Concat / DefOpt / cpu / BothRev 0.000019954 s 0.000013082300038149697 s 1.53
Concat / IDefOpt / cpu / PreRev 0.000020334 s 0.000012021520005873751 s 1.69
Concat / IDefOpt / cpu / PostRev 0.000019702 s 0.000012947859986525145 s 1.52
Concat / IDefOpt / cpu / BothRev 0.00001972 s 0.000013117840017002892 s 1.50
const_scatter / JaXPipe / cpu / Primal 0.000008534140015399316 s 0.000006994200020926655 s 1.22
const_scatter / Jax / cpu / Primal 0.000008727180047571891 s 0.000006940659932297421 s 1.26
const_scatter / HLOOpt / cpu / Primal 0.000009245859891962028 s 0.000007303180018425337 s 1.27
const_scatter / PartOpt / cpu / Primal 0.000007525980072387028 s 0.000006964520007386455 s 1.08
const_scatter / IPartOpt / cpu / Primal 0.000007913200061011594 s 0.000007651199975953204 s 1.03
const_scatter / DefOpt / cpu / Primal 0.000008409620058955624 s 0.000007847279985071509 s 1.07
const_scatter / IDefOpt / cpu / Primal 0.000008935640071285889 s 0.000007382480007436243 s 1.21
const_scatter / JaXPipe / cpu / Forward 0.000012267279889783824 s 0.000011513020008351304 s 1.07
const_scatter / Jax / cpu / Forward 0.000011286559874861269 s 0.000010891859983530592 s 1.04
const_scatter / HLOOpt / cpu / Forward 0.000012388600007398054 s 0.000011693440001181443 s 1.06
const_scatter / PartOpt / cpu / Forward 0.000011982700052612926 s 0.000011965339990638312 s 1.00
const_scatter / IPartOpt / cpu / Forward 0.000012739320081891492 s 0.000012006939978164154 s 1.06
const_scatter / DefOpt / cpu / Forward 0.00001239565997821046 s 0.000011777100016843178 s 1.05
const_scatter / IDefOpt / cpu / Forward 0.00001212765993841458 s 0.000011975979950875628 s 1.01
const_scatter / JaXPipe / cpu / PreRev 0.0002929876999587 s 0.0002884649599764 s 1.02
const_scatter / JaXPipe / cpu / PostRev 0.0002850323799975 s 0.0002808212600211 s 1.01
const_scatter / JaXPipe / cpu / BothRev 0.0002856764399803 s 0.0002820423199955 s 1.01
const_scatter / Jax / cpu / BothRev 0.0002854121200471 s 0.0002806260200213 s 1.02
const_scatter / HLOOpt / cpu / PreRev 0.00028734857995 s 0.0002817600599973 s 1.02
const_scatter / HLOOpt / cpu / PostRev 0.0002885129200694 s 0.0002844262800044 s 1.01
const_scatter / HLOOpt / cpu / BothRev 0.0003000291000353 s 0.0002816947199698 s 1.07
const_scatter / PartOpt / cpu / PreRev 0.0002900109400252 s 0.0002816291600174 s 1.03
const_scatter / PartOpt / cpu / PostRev 0.0002845576200525 s 0.0002827880800123 s 1.01
const_scatter / PartOpt / cpu / BothRev 0.0002947422400029 s 0.0002830636999988 s 1.04
const_scatter / IPartOpt / cpu / PreRev 0.0002870086599614 s 0.0002845580800021 s 1.01
const_scatter / IPartOpt / cpu / PostRev 0.0003003155798978 s 0.0002829547199962 s 1.06
const_scatter / IPartOpt / cpu / BothRev 0.0002866614999948 s 0.0002849848400182 s 1.01
const_scatter / DefOpt / cpu / PreRev 0.0002873075599563 s 0.0002842204000171 s 1.01
const_scatter / DefOpt / cpu / PostRev 0.0002864068999951 s 0.0002823248199911 s 1.01
const_scatter / DefOpt / cpu / BothRev 0.0002866443600032 s 0.0002831766800045 s 1.01
const_scatter / IDefOpt / cpu / PreRev 0.0002865423200091 s 0.0002833324000312 s 1.01
const_scatter / IDefOpt / cpu / PostRev 0.0002881191400228 s 0.000286672539969 s 1.01
const_scatter / IDefOpt / cpu / BothRev 0.00028581397999 s 0.0002846314399539 s 1.00
const_scatter / JaXPipe / cuda / Primal 0.000002463 s
const_scatter / Jax / cuda / Primal 0.000002463 s
const_scatter / HLOOpt / cuda / Primal 0.000002463 s
const_scatter / PartOpt / cuda / Primal 0.000002463 s
const_scatter / IPartOpt / cuda / Primal 0.000002463 s
const_scatter / DefOpt / cuda / Primal 0.000002463 s
const_scatter / IDefOpt / cuda / Primal 0.000002464 s
const_scatter / JaXPipe / cuda / Forward 0.000010944 s
const_scatter / Jax / cuda / Forward 0.000010816 s
const_scatter / HLOOpt / cuda / Forward 0.00001088 s
const_scatter / PartOpt / cuda / Forward 0.000011072 s
const_scatter / IPartOpt / cuda / Forward 0.000011104 s
const_scatter / DefOpt / cuda / Forward 0.0000112 s
const_scatter / IDefOpt / cuda / Forward 0.000013472 s
const_scatter / JaXPipe / cuda / PreRev 0.000017984 s
const_scatter / JaXPipe / cuda / PostRev 0.000017663 s
const_scatter / JaXPipe / cuda / BothRev 0.000017792 s
const_scatter / Jax / cuda / BothRev 0.000017888000000000002 s
const_scatter / HLOOpt / cuda / PreRev 0.000017663 s
const_scatter / HLOOpt / cuda / PostRev 0.0000176 s
const_scatter / HLOOpt / cuda / BothRev 0.000017503999999999997 s
const_scatter / PartOpt / cuda / PreRev 0.000018176 s
const_scatter / PartOpt / cuda / PostRev 0.000020191 s
const_scatter / PartOpt / cuda / BothRev 0.000017632 s
const_scatter / IPartOpt / cuda / PreRev 0.000017696 s
const_scatter / IPartOpt / cuda / PostRev 0.000017632 s
const_scatter / IPartOpt / cuda / BothRev 0.000017312 s
const_scatter / DefOpt / cuda / PreRev 0.000017984 s
const_scatter / DefOpt / cuda / PostRev 0.000017216 s
const_scatter / DefOpt / cuda / BothRev 0.000017088 s
const_scatter / IDefOpt / cuda / PreRev 0.000017824 s
const_scatter / IDefOpt / cuda / PostRev 0.000017664 s
const_scatter / IDefOpt / cuda / BothRev 0.000017536 s
const_scatter / JaXPipe / tpu / Primal 0.0000037999 s 0.00000379185 s 1.00
const_scatter / Jax / tpu / Primal 0.000003812825 s 0.00000380515 s 1.00
const_scatter / HLOOpt / tpu / Primal 0.0000037971 s 0.0000037833 s 1.00
const_scatter / PartOpt / tpu / Primal 0.000003828325 s 0.000003808975 s 1.01
const_scatter / IPartOpt / tpu / Primal 0.000003802575 s 0.00000381 s 1.00
const_scatter / DefOpt / tpu / Primal 0.000003825025 s 0.000003826425 s 1.00
const_scatter / IDefOpt / tpu / Primal 0.00000379495 s 0.00000378795 s 1.00
const_scatter / JaXPipe / tpu / Forward 0.000006450725000000001 s 0.000006485174999999999 s 0.99
const_scatter / Jax / tpu / Forward 0.000006507800000000001 s 0.000006492449999999999 s 1.00
const_scatter / HLOOpt / tpu / Forward 0.000006491 s 0.000006475 s 1.00
const_scatter / PartOpt / tpu / Forward 0.000006496775 s 0.000006491749999999999 s 1.00
const_scatter / IPartOpt / tpu / Forward 0.000006465375 s 0.000006475925 s 1.00
const_scatter / DefOpt / tpu / Forward 0.00000650065 s 0.000006485675 s 1.00
const_scatter / IDefOpt / tpu / Forward 0.00000648715 s 0.000006472849999999999 s 1.00
const_scatter / JaXPipe / tpu / PreRev 0.000006646025 s 0.000006628175 s 1.00
const_scatter / JaXPipe / tpu / PostRev 0.0000066257 s 0.0000066119 s 1.00
const_scatter / JaXPipe / tpu / BothRev 0.000006633050000000001 s 0.000006620675 s 1.00
const_scatter / Jax / tpu / BothRev 0.000006648625 s 0.000006623175 s 1.00
const_scatter / HLOOpt / tpu / PreRev 0.000006609075 s 0.000006610975 s 1.00
const_scatter / HLOOpt / tpu / PostRev 0.000006633525 s 0.00000663025 s 1.00
const_scatter / HLOOpt / tpu / BothRev 0.0000066106 s 0.000006617225 s 1.00
const_scatter / PartOpt / tpu / PreRev 0.00000663915 s 0.000006633499999999999 s 1.00
const_scatter / PartOpt / tpu / PostRev 0.000006620575 s 0.0000066066 s 1.00
const_scatter / PartOpt / tpu / BothRev 0.000006626075000000001 s 0.0000066218 s 1.00
const_scatter / IPartOpt / tpu / PreRev 0.0000066152 s 0.000006618175 s 1.00
const_scatter / IPartOpt / tpu / PostRev 0.00000663185 s 0.000006642824999999999 s 1.00
const_scatter / IPartOpt / tpu / BothRev 0.000006631550000000001 s 0.000006603299999999999 s 1.00
const_scatter / DefOpt / tpu / PreRev 0.000006621325 s 0.000006625575 s 1.00
const_scatter / DefOpt / tpu / PostRev 0.0000066306250000000006 s 0.000006600549999999999 s 1.00
const_scatter / DefOpt / tpu / BothRev 0.000006641025 s 0.000006618275 s 1.00
const_scatter / IDefOpt / tpu / PreRev 0.000006634975 s 0.00000661355 s 1.00
const_scatter / IDefOpt / tpu / PostRev 0.000006633125 s 0.000006639000000000001 s 1.00
const_scatter / IDefOpt / tpu / BothRev 0.000006616975 s 0.000006601775 s 1.00
const_scatter / JaXPipe / cpu / Primal 0.000013002 s 0.000006994200020926655 s 1.86
const_scatter / Jax / cpu / Primal 0.000012712 s 0.000006940659932297421 s 1.83
const_scatter / HLOOpt / cpu / Primal 0.000013778 s 0.000007303180018425337 s 1.89
const_scatter / PartOpt / cpu / Primal 0.000012548 s 0.000006964520007386455 s 1.80
const_scatter / IPartOpt / cpu / Primal 0.000012967 s 0.000007651199975953204 s 1.69
const_scatter / DefOpt / cpu / Primal 0.00001321 s 0.000007847279985071509 s 1.68
const_scatter / IDefOpt / cpu / Primal 0.000013255 s 0.000007382480007436243 s 1.80
const_scatter / JaXPipe / cpu / Forward 0.000018548 s 0.000011513020008351304 s 1.61
const_scatter / Jax / cpu / Forward 0.000016902000000000002 s 0.000010891859983530592 s 1.55
const_scatter / HLOOpt / cpu / Forward 0.0000183 s 0.000011693440001181443 s 1.56
const_scatter / PartOpt / cpu / Forward 0.000017987 s 0.000011965339990638312 s 1.50
const_scatter / IPartOpt / cpu / Forward 0.000018016 s 0.000012006939978164154 s 1.50
const_scatter / DefOpt / cpu / Forward 0.000017943 s 0.000011777100016843178 s 1.52
const_scatter / IDefOpt / cpu / Forward 0.000018372 s 0.000011975979950875628 s 1.53
const_scatter / JaXPipe / cpu / PreRev 0.000520496 s 0.0002884649599764 s 1.80
const_scatter / JaXPipe / cpu / PostRev 0.000504577 s 0.0002808212600211 s 1.80
const_scatter / JaXPipe / cpu / BothRev 0.000522641 s 0.0002820423199955 s 1.85
const_scatter / Jax / cpu / BothRev 0.000499801 s 0.0002806260200213 s 1.78
const_scatter / HLOOpt / cpu / PreRev 0.000505684 s 0.0002817600599973 s 1.79
const_scatter / HLOOpt / cpu / PostRev 0.0004927369999999 s 0.0002844262800044 s 1.73
const_scatter / HLOOpt / cpu / BothRev 0.000514085 s 0.0002816947199698 s 1.82
const_scatter / PartOpt / cpu / PreRev 0.0005327159999999 s 0.0002816291600174 s 1.89
const_scatter / PartOpt / cpu / PostRev 0.0005184569999999 s 0.0002827880800123 s 1.83
const_scatter / PartOpt / cpu / BothRev 0.000538382 s 0.0002830636999988 s 1.90
const_scatter / IPartOpt / cpu / PreRev 0.000520047 s 0.0002845580800021 s 1.83
const_scatter / IPartOpt / cpu / PostRev 0.000520562 s 0.0002829547199962 s 1.84
const_scatter / IPartOpt / cpu / BothRev 0.000524372 s 0.0002849848400182 s 1.84
const_scatter / DefOpt / cpu / PreRev 0.000524334 s 0.0002842204000171 s 1.84
const_scatter / DefOpt / cpu / PostRev 0.000521496 s 0.0002823248199911 s 1.85
const_scatter / DefOpt / cpu / BothRev 0.000517248 s 0.0002831766800045 s 1.83
const_scatter / IDefOpt / cpu / PreRev 0.000543275 s 0.0002833324000312 s 1.92
const_scatter / IDefOpt / cpu / PostRev 0.0005287699999999 s 0.000286672539969 s 1.84
const_scatter / IDefOpt / cpu / BothRev 0.0005252099999999 s 0.0002846314399539 s 1.85
GenDot / JaXPipe / cpu / Primal 0.000008958719990914688 s 0.000008582359987485688 s 1.04
GenDot / Jax / cpu / Primal 0.00000837783993119956 s 0.000008547680045012384 s 0.98
GenDot / HLOOpt / cpu / Primal 0.000009342199937236727 s 0.000009023879938467871 s 1.04
GenDot / PartOpt / cpu / Primal 0.000008772699984547217 s 0.000007385940016320091 s 1.19
GenDot / IPartOpt / cpu / Primal 0.00000909473998035537 s 0.000007326580034714425 s 1.24
GenDot / DefOpt / cpu / Primal 0.000009483880076004423 s 0.000008422580012847902 s 1.13
GenDot / IDefOpt / cpu / Primal 0.00000911121989702224 s 0.000008287680002467823 s 1.10
GenDot / JaXPipe / cpu / Forward 0.000012151399969297927 s 0.00001278026004001731 s 0.95
GenDot / Jax / cpu / Forward 0.000011724580035661347 s 0.000011590759977480048 s 1.01
GenDot / HLOOpt / cpu / Forward 0.000012378199917293388 s 0.000012356199958958311 s 1.00
GenDot / PartOpt / cpu / Forward 0.000012167280019639292 s 0.000012064479988111996 s 1.01
GenDot / IPartOpt / cpu / Forward 0.000012694980068772563 s 0.0000124142200093047 s 1.02
GenDot / DefOpt / cpu / Forward 0.000012157339951954784 s 0.000011938679990635138 s 1.02
GenDot / IDefOpt / cpu / Forward 0.000012216939958307193 s 0.000012493860003814916 s 0.98
GenDot / JaXPipe / cpu / PreRev 0.000012620739962585505 s 0.000012100460007786751 s 1.04
GenDot / JaXPipe / cpu / PostRev 0.000011619399992923718 s 0.000011199239979760025 s 1.04
GenDot / JaXPipe / cpu / BothRev 0.000013036560067121171 s 0.000013475320001816726 s 0.97
GenDot / Jax / cpu / BothRev 0.000012043740025546868 s 0.00001128350001636136 s 1.07
GenDot / HLOOpt / cpu / PreRev 0.000012961479951627552 s 0.00001207449995490606 s 1.07
GenDot / HLOOpt / cpu / PostRev 0.000014781340014451417 s 0.000013877620003768243 s 1.07
GenDot / HLOOpt / cpu / BothRev 0.000012422619984135962 s 0.000012398079988997778 s 1.00
GenDot / PartOpt / cpu / PreRev 0.000012376739960018313 s 0.000011878660006914288 s 1.04
GenDot / PartOpt / cpu / PostRev 0.00001177330002974486 s 0.000011188879989276755 s 1.05
GenDot / PartOpt / cpu / BothRev 0.000013029579986323367 s 0.00001272072001484048 s 1.02
GenDot / IPartOpt / cpu / PreRev 0.000012492900023062248 s 0.000012142459981987483 s 1.03
GenDot / IPartOpt / cpu / PostRev 0.000011422539992054223 s 0.00001083865999135014 s 1.05
GenDot / IPartOpt / cpu / BothRev 0.000012697400015895256 s 0.00001177546005237673 s 1.08
GenDot / DefOpt / cpu / PreRev 0.000012311920090724016 s 0.000012637859963433585 s 0.97
GenDot / DefOpt / cpu / PostRev 0.0000126659199486312 s 0.000012783999991370366 s 0.99
GenDot / DefOpt / cpu / BothRev 0.000012756619998981476 s 0.000013002139994569006 s 0.98
GenDot / IDefOpt / cpu / PreRev 0.000012268939972273074 s 0.00001264439996703004 s 0.97
GenDot / IDefOpt / cpu / PostRev 0.0000127633200645505 s 0.000011830120001832256 s 1.08
GenDot / IDefOpt / cpu / BothRev 0.000011982239921053403 s 0.00001169350002783176 s 1.02
GenDot / JaXPipe / cuda / Primal 0.000002527 s
GenDot / Jax / cuda / Primal 0.000002528 s
GenDot / HLOOpt / cuda / Primal 0.000002527 s
GenDot / PartOpt / cuda / Primal 0.00000256 s
GenDot / IPartOpt / cuda / Primal 0.000002559 s
GenDot / DefOpt / cuda / Primal 0.000002528 s
GenDot / IDefOpt / cuda / Primal 0.000002527 s
GenDot / JaXPipe / cuda / Forward 0.0000128 s
GenDot / Jax / cuda / Forward 0.000012128 s
GenDot / HLOOpt / cuda / Forward 0.000010944 s
GenDot / PartOpt / cuda / Forward 0.000010848 s
GenDot / IPartOpt / cuda / Forward 0.000011935 s
GenDot / DefOpt / cuda / Forward 0.000012352 s
GenDot / IDefOpt / cuda / Forward 0.00001056 s
GenDot / JaXPipe / cuda / PreRev 0.00001088 s
GenDot / JaXPipe / cuda / PostRev 0.000010753 s
GenDot / JaXPipe / cuda / BothRev 0.000011936 s
GenDot / Jax / cuda / BothRev 0.000010784 s
GenDot / HLOOpt / cuda / PreRev 0.000010816 s
GenDot / HLOOpt / cuda / PostRev 0.00001104 s
GenDot / HLOOpt / cuda / BothRev 0.000010848 s
GenDot / PartOpt / cuda / PreRev 0.000010656 s
GenDot / PartOpt / cuda / PostRev 0.00001072 s
GenDot / PartOpt / cuda / BothRev 0.000011008 s
GenDot / IPartOpt / cuda / PreRev 0.000010752 s
GenDot / IPartOpt / cuda / PostRev 0.000011104 s
GenDot / IPartOpt / cuda / BothRev 0.00001184 s
GenDot / DefOpt / cuda / PreRev 0.000011008 s
GenDot / DefOpt / cuda / PostRev 0.000012031 s
GenDot / DefOpt / cuda / BothRev 0.000010752 s
GenDot / IDefOpt / cuda / PreRev 0.000010976 s
GenDot / IDefOpt / cuda / PostRev 0.000011008 s
GenDot / IDefOpt / cuda / BothRev 0.000010784 s
GenDot / JaXPipe / tpu / Primal 9.302e-7 s 9.2965e-7 s 1.00
GenDot / Jax / tpu / Primal 9.258e-7 s 9.25425e-7 s 1.00
GenDot / HLOOpt / tpu / Primal 0.00000158495 s 0.000001571425 s 1.01
GenDot / PartOpt / tpu / Primal 9.255e-7 s 9.26075e-7 s 1.00
GenDot / IPartOpt / tpu / Primal 9.3035e-7 s 9.3045e-7 s 1.00
GenDot / DefOpt / tpu / Primal 0.000001496475 s 0.0000014878 s 1.01
GenDot / IDefOpt / tpu / Primal 0.000001575875 s 0.0000015664 s 1.01
GenDot / JaXPipe / tpu / Forward 0.0000031652750000000004 s 0.0000031493500000000006 s 1.01
GenDot / Jax / tpu / Forward 0.000002319425 s 0.00000232515 s 1.00
GenDot / HLOOpt / tpu / Forward 0.0000031127 s 0.00000311025 s 1.00
GenDot / PartOpt / tpu / Forward 0.0000032258750000000003 s 0.000003215475 s 1.00
GenDot / IPartOpt / tpu / Forward 0.00000311305 s 0.0000031061 s 1.00
GenDot / DefOpt / tpu / Forward 0.000003216475 s 0.000003208275 s 1.00
GenDot / IDefOpt / tpu / Forward 0.0000031163 s 0.00000311325 s 1.00
GenDot / JaXPipe / tpu / PreRev 0.000002963075 s 0.000002957625 s 1.00
GenDot / JaXPipe / tpu / PostRev 0.00000241275 s 0.000002414325 s 1.00
GenDot / JaXPipe / tpu / BothRev 0.0000029551250000000004 s 0.0000029555 s 1.00
GenDot / Jax / tpu / BothRev 0.0000024067 s 0.000002399625 s 1.00
GenDot / HLOOpt / tpu / PreRev 0.000002965875 s 0.0000029610750000000003 s 1.00
GenDot / HLOOpt / tpu / PostRev 0.000002945975 s 0.000002922925 s 1.01
GenDot / HLOOpt / tpu / BothRev 0.0000029622 s 0.000002958475 s 1.00
GenDot / PartOpt / tpu / PreRev 0.00000293435 s 0.00000294585 s 1.00
GenDot / PartOpt / tpu / PostRev 0.0000023909500000000004 s 0.00000239025 s 1.00
GenDot / PartOpt / tpu / BothRev 0.000002936775 s 0.000002932625 s 1.00
GenDot / IPartOpt / tpu / PreRev 0.000002952925 s 0.0000029512749999999995 s 1.00
GenDot / IPartOpt / tpu / PostRev 0.0000024127250000000003 s 0.00000239925 s 1.01
GenDot / IPartOpt / tpu / BothRev 0.000002964375 s 0.0000029439500000000004 s 1.01
GenDot / DefOpt / tpu / PreRev 0.00000293365 s 0.000002926975 s 1.00
GenDot / DefOpt / tpu / PostRev 0.000002962375 s 0.000002960325 s 1.00
GenDot / DefOpt / tpu / BothRev 0.00000294765 s 0.000002949375 s 1.00
GenDot / IDefOpt / tpu / PreRev 0.000002959975 s 0.00000296815 s 1.00
GenDot / IDefOpt / tpu / PostRev 0.0000029458250000000003 s 0.0000029295 s 1.01
GenDot / IDefOpt / tpu / BothRev 0.000002960425 s 0.0000029574000000000003 s 1.00
GenDot / JaXPipe / cpu / Primal 0.000015907000000000002 s 0.000008582359987485688 s 1.85
GenDot / Jax / cpu / Primal 0.000015581 s 0.000008547680045012384 s 1.82
GenDot / HLOOpt / cpu / Primal 0.000014864 s 0.000009023879938467871 s 1.65
GenDot / PartOpt / cpu / Primal 0.000015055 s 0.000007385940016320091 s 2.04
GenDot / IPartOpt / cpu / Primal 0.000015185 s 0.000007326580034714425 s 2.07
GenDot / DefOpt / cpu / Primal 0.000014223 s 0.000008422580012847902 s 1.69
GenDot / IDefOpt / cpu / Primal 0.000014034 s 0.000008287680002467823 s 1.69
GenDot / JaXPipe / cpu / Forward 0.00001935 s 0.00001278026004001731 s 1.51
GenDot / Jax / cpu / Forward 0.000020922 s 0.000011590759977480048 s 1.81
GenDot / HLOOpt / cpu / Forward 0.000019004 s 0.000012356199958958311 s 1.54
GenDot / PartOpt / cpu / Forward 0.000019408 s 0.000012064479988111996 s 1.61
GenDot / IPartOpt / cpu / Forward 0.000019388 s 0.0000124142200093047 s 1.56
GenDot / DefOpt / cpu / Forward 0.000020075 s 0.000011938679990635138 s 1.68
GenDot / IDefOpt / cpu / Forward 0.000019623 s 0.000012493860003814916 s 1.57
GenDot / JaXPipe / cpu / PreRev 0.000020583 s 0.000012100460007786751 s 1.70
GenDot / JaXPipe / cpu / PostRev 0.000021388 s 0.000011199239979760025 s 1.91
GenDot / JaXPipe / cpu / BothRev 0.00002031 s 0.000013475320001816726 s 1.51
GenDot / Jax / cpu / BothRev 0.000021784 s 0.00001128350001636136 s 1.93
GenDot / HLOOpt / cpu / PreRev 0.000019542 s 0.00001207449995490606 s 1.62
GenDot / HLOOpt / cpu / PostRev 0.000019826 s 0.000013877620003768243 s 1.43
GenDot / HLOOpt / cpu / BothRev 0.000020097 s 0.000012398079988997778 s 1.62
GenDot / PartOpt / cpu / PreRev 0.000020538 s 0.000011878660006914288 s 1.73
GenDot / PartOpt / cpu / PostRev 0.00002045 s 0.000011188879989276755 s 1.83
GenDot / PartOpt / cpu / BothRev 0.000020327 s 0.00001272072001484048 s 1.60
GenDot / IPartOpt / cpu / PreRev 0.000020169 s 0.000012142459981987483 s 1.66
GenDot / IPartOpt / cpu / PostRev 0.000021024 s 0.00001083865999135014 s 1.94
GenDot / IPartOpt / cpu / BothRev 0.000019929 s 0.00001177546005237673 s 1.69
GenDot / DefOpt / cpu / PreRev 0.000019617 s 0.000012637859963433585 s 1.55
GenDot / DefOpt / cpu / PostRev 0.000020167 s 0.000012783999991370366 s 1.58
GenDot / DefOpt / cpu / BothRev 0.000020332 s 0.000013002139994569006 s 1.56
GenDot / IDefOpt / cpu / PreRev 0.000019705 s 0.00001264439996703004 s 1.56
GenDot / IDefOpt / cpu / PostRev 0.000020243 s 0.000011830120001832256 s 1.71
GenDot / IDefOpt / cpu / BothRev 0.000019639 s 0.00001169350002783176 s 1.68
hlo_ffi / JaXPipe / cpu / Primal 0.000010924319958576234 s 0.000010208559961029096 s 1.07
hlo_ffi / Jax / cpu / Primal 0.00001144100000601611 s 0.000009593339964339977 s 1.19
hlo_ffi / HLOOpt / cpu / Primal 0.000010675880075723398 s 0.00001168801994936075 s 0.91
hlo_ffi / PartOpt / cpu / Primal 0.000010641139997460414 s 0.000009252899981220252 s 1.15
hlo_ffi / IPartOpt / cpu / Primal 0.000011026320007658795 s 0.000009737240015965653 s 1.13
hlo_ffi / DefOpt / cpu / Primal 0.000010748799995781155 s 0.000009807599963096435 s 1.10
hlo_ffi / IDefOpt / cpu / Primal 0.000010324019949621289 s 0.000009679939939815083 s 1.07
hlo_ffi / JaXPipe / cpu / Forward 0.000015504019993386466 s 0.000013769279967164038 s 1.13
hlo_ffi / Jax / cpu / Forward 0.000015015400076663357 s 0.000013657240015163551 s 1.10
hlo_ffi / HLOOpt / cpu / Forward 0.000015290279970940902 s 0.00001393840000673663 s 1.10
hlo_ffi / PartOpt / cpu / Forward 0.000014986740025051404 s 0.000013640720017065176 s 1.10
hlo_ffi / IPartOpt / cpu / Forward 0.00001530562003608793 s 0.00001365507996524684 s 1.12
hlo_ffi / DefOpt / cpu / Forward 0.000015445160115632462 s 0.000013563320017055958 s 1.14
hlo_ffi / IDefOpt / cpu / Forward 0.000014876340046612312 s 0.000013545380024879703 s 1.10
hlo_ffi / JaXPipe / cpu / PreRev 0.000015881700073805403 s 0.000014215600040188291 s 1.12
hlo_ffi / JaXPipe / cpu / PostRev 0.00001554423995912657 s 0.000014195959975040753 s 1.09
hlo_ffi / JaXPipe / cpu / BothRev 0.000014963959984015672 s 0.000013981700012664078 s 1.07
hlo_ffi / Jax / cpu / BothRev 0.00001568682004290167 s 0.000013890259988329487 s 1.13
hlo_ffi / HLOOpt / cpu / PreRev 0.000016117940085678128 s 0.000014438219986914191 s 1.12
hlo_ffi / HLOOpt / cpu / PostRev 0.00001703290003206348 s 0.000016117239983941544 s 1.06
hlo_ffi / HLOOpt / cpu / BothRev 0.000014804940055910264 s 0.00001423425995199068 s 1.04
hlo_ffi / PartOpt / cpu / PreRev 0.000015164779979386369 s 0.000013936400055172271 s 1.09
hlo_ffi / PartOpt / cpu / PostRev 0.000014396179940376897 s 0.000014319900019472698 s 1.01
hlo_ffi / PartOpt / cpu / BothRev 0.000015006060002633604 s 0.000013974980001876249 s 1.07
hlo_ffi / IPartOpt / cpu / PreRev 0.000015746199933346363 s 0.000014112100025158723 s 1.12
hlo_ffi / IPartOpt / cpu / PostRev 0.000014658080053777668 s 0.000014140760013106048 s 1.04
hlo_ffi / IPartOpt / cpu / BothRev 0.000014772880076634465 s 0.000014128419979897445 s 1.05
hlo_ffi / DefOpt / cpu / PreRev 0.000014747459972568324 s 0.000014014620001034928 s 1.05
hlo_ffi / DefOpt / cpu / PostRev 0.00001416151997545967 s 0.000014004400009071104 s 1.01
hlo_ffi / DefOpt / cpu / BothRev 0.00001534862007247284 s 0.000014273060005507431 s 1.08
hlo_ffi / IDefOpt / cpu / PreRev 0.00001533333990664687 s 0.000014238399980968095 s 1.08
hlo_ffi / IDefOpt / cpu / PostRev 0.000014773519978916738 s 0.000014124179979262409 s 1.05
hlo_ffi / IDefOpt / cpu / BothRev 0.000014627340024162547 s 0.000014526620025208104 s 1.01
hlo_ffi / JaXPipe / cuda / Primal 0.0000023670000000000004 s
hlo_ffi / Jax / cuda / Primal 0.0000023670000000000004 s
hlo_ffi / HLOOpt / cuda / Primal 0.0000023670000000000004 s
hlo_ffi / PartOpt / cuda / Primal 0.000002368 s
hlo_ffi / IPartOpt / cuda / Primal 0.0000023670000000000004 s
hlo_ffi / DefOpt / cuda / Primal 0.0000023670000000000004 s
hlo_ffi / IDefOpt / cuda / Primal 0.0000023670000000000004 s
hlo_ffi / JaXPipe / cuda / Forward 0.000002463 s
hlo_ffi / Jax / cuda / Forward 0.000002463 s
hlo_ffi / HLOOpt / cuda / Forward 0.000002463 s
hlo_ffi / PartOpt / cuda / Forward 0.000002463 s
hlo_ffi / IPartOpt / cuda / Forward 0.000002463 s
hlo_ffi / DefOpt / cuda / Forward 0.000002463 s
hlo_ffi / IDefOpt / cuda / Forward 0.000002463 s
hlo_ffi / JaXPipe / cuda / PreRev 0.000002463 s
hlo_ffi / JaXPipe / cuda / PostRev 0.000002431 s
hlo_ffi / JaXPipe / cuda / BothRev 0.000002463 s
hlo_ffi / Jax / cuda / BothRev 0.000002463 s
hlo_ffi / HLOOpt / cuda / PreRev 0.000002432 s
hlo_ffi / HLOOpt / cuda / PostRev 0.000002431 s
hlo_ffi / HLOOpt / cuda / BothRev 0.000002432 s
hlo_ffi / PartOpt / cuda / PreRev 0.000002463 s
hlo_ffi / PartOpt / cuda / PostRev 0.000002463 s
hlo_ffi / PartOpt / cuda / BothRev 0.000002463 s
hlo_ffi / IPartOpt / cuda / PreRev 0.000002432 s
hlo_ffi / IPartOpt / cuda / PostRev 0.000002431 s
hlo_ffi / IPartOpt / cuda / BothRev 0.000002432 s
hlo_ffi / DefOpt / cuda / PreRev 0.000002433 s
hlo_ffi / DefOpt / cuda / PostRev 0.000002463 s
hlo_ffi / DefOpt / cuda / BothRev 0.000002463 s
hlo_ffi / IDefOpt / cuda / PreRev 0.000002463 s
hlo_ffi / IDefOpt / cuda / PostRev 0.000002432 s
hlo_ffi / IDefOpt / cuda / BothRev 0.000002463 s
hlo_ffi / JaXPipe / tpu / Primal 9.342e-7 s 9.284e-7 s 1.01
hlo_ffi / Jax / tpu / Primal 9.50775e-7 s 9.51775e-7 s 1.00
hlo_ffi / HLOOpt / tpu / Primal 9.1165e-7 s 9.051e-7 s 1.01
hlo_ffi / PartOpt / tpu / Primal 9.59075e-7 s 9.53875e-7 s 1.01
hlo_ffi / IPartOpt / tpu / Primal 9.09725e-7 s 9.071e-7 s 1.00
hlo_ffi / DefOpt / tpu / Primal 9.50875e-7 s 9.5405e-7 s 1.00
hlo_ffi / IDefOpt / tpu / Primal 9.04775e-7 s 9.11875e-7 s 0.99
hlo_ffi / JaXPipe / tpu / Forward 9.49475e-7 s 9.48725e-7 s 1.00
hlo_ffi / Jax / tpu / Forward 9.819e-7 s 9.8115e-7 s 1.00
hlo_ffi / HLOOpt / tpu / Forward 9.73875e-7 s 9.74025e-7 s 1.00
hlo_ffi / PartOpt / tpu / Forward 9.34475e-7 s 9.341e-7 s 1.00
hlo_ffi / IPartOpt / tpu / Forward 9.74525e-7 s 9.73775e-7 s 1.00
hlo_ffi / DefOpt / tpu / Forward 9.348e-7 s 9.3325e-7 s 1.00
hlo_ffi / IDefOpt / tpu / Forward 9.74175e-7 s 9.7405e-7 s 1.00
hlo_ffi / JaXPipe / tpu / PreRev 9.379e-7 s 9.37975e-7 s 1.00
hlo_ffi / JaXPipe / tpu / PostRev 9.6555e-7 s 9.65375e-7 s 1.00
hlo_ffi / JaXPipe / tpu / BothRev 9.62075e-7 s 9.62075e-7 s 1
hlo_ffi / Jax / tpu / BothRev 9.6515e-7 s 9.65025e-7 s 1.00
hlo_ffi / HLOOpt / tpu / PreRev 9.63175e-7 s 9.62025e-7 s 1.00
hlo_ffi / HLOOpt / tpu / PostRev 9.6495e-7 s 9.64725e-7 s 1.00
hlo_ffi / HLOOpt / tpu / BothRev 9.627e-7 s 9.619e-7 s 1.00
hlo_ffi / PartOpt / tpu / PreRev 9.65e-7 s 9.6455e-7 s 1.00
hlo_ffi / PartOpt / tpu / PostRev 9.625e-7 s 9.621e-7 s 1.00
hlo_ffi / PartOpt / tpu / BothRev 9.6525e-7 s 9.646e-7 s 1.00
hlo_ffi / IPartOpt / tpu / PreRev 9.628499999999998e-7 s 9.615e-7 s 1.00
hlo_ffi / IPartOpt / tpu / PostRev 9.6515e-7 s 9.649e-7 s 1.00
hlo_ffi / IPartOpt / tpu / BothRev 9.62675e-7 s 9.61625e-7 s 1.00
hlo_ffi / DefOpt / tpu / PreRev 9.6535e-7 s 9.6435e-7 s 1.00
hlo_ffi / DefOpt / tpu / PostRev 9.62425e-7 s 9.61925e-7 s 1.00
hlo_ffi / DefOpt / tpu / BothRev 9.6475e-7 s 9.644e-7 s 1.00
hlo_ffi / IDefOpt / tpu / PreRev 9.6255e-7 s 9.619e-7 s 1.00
hlo_ffi / IDefOpt / tpu / PostRev 9.65025e-7 s 9.64225e-7 s 1.00
hlo_ffi / IDefOpt / tpu / BothRev 9.62975e-7 s 9.62225e-7 s 1.00
hlo_ffi / JaXPipe / cpu / Primal 0.000018172 s 0.000010208559961029096 s 1.78
hlo_ffi / Jax / cpu / Primal 0.00001771 s 0.000009593339964339977 s 1.85
hlo_ffi / HLOOpt / cpu / Primal 0.000017718999999999998 s 0.00001168801994936075 s 1.52
hlo_ffi / PartOpt / cpu / Primal 0.000018361 s 0.000009252899981220252 s 1.98
hlo_ffi / IPartOpt / cpu / Primal 0.000018365 s 0.000009737240015965653 s 1.89
hlo_ffi / DefOpt / cpu / Primal 0.000018169 s 0.000009807599963096435 s 1.85
hlo_ffi / IDefOpt / cpu / Primal 0.000018386 s 0.000009679939939815083 s 1.90
hlo_ffi / JaXPipe / cpu / Forward 0.000025315 s 0.000013769279967164038 s 1.84
hlo_ffi / Jax / cpu / Forward 0.000024976000000000003 s 0.000013657240015163551 s 1.83
hlo_ffi / HLOOpt / cpu / Forward 0.000025217 s 0.00001393840000673663 s 1.81
hlo_ffi / PartOpt / cpu / Forward 0.000025763 s 0.000013640720017065176 s 1.89
hlo_ffi / IPartOpt / cpu / Forward 0.000025313 s 0.00001365507996524684 s 1.85
hlo_ffi / DefOpt / cpu / Forward 0.00002555 s 0.000013563320017055958 s 1.88
hlo_ffi / IDefOpt / cpu / Forward 0.000024594 s 0.000013545380024879703 s 1.82
hlo_ffi / JaXPipe / cpu / PreRev 0.0000247 s 0.000014215600040188291 s 1.74
hlo_ffi / JaXPipe / cpu / PostRev 0.000023574 s 0.000014195959975040753 s 1.66
hlo_ffi / JaXPipe / cpu / BothRev 0.000023816 s 0.000013981700012664078 s 1.70
hlo_ffi / Jax / cpu / BothRev 0.000024213 s 0.000013890259988329487 s 1.74
hlo_ffi / HLOOpt / cpu / PreRev 0.000024655 s 0.000014438219986914191 s 1.71
hlo_ffi / HLOOpt / cpu / PostRev 0.000024373 s 0.000016117239983941544 s 1.51
hlo_ffi / HLOOpt / cpu / BothRev 0.000024773 s 0.00001423425995199068 s 1.74
hlo_ffi / PartOpt / cpu / PreRev 0.000024723 s 0.000013936400055172271 s 1.77
hlo_ffi / PartOpt / cpu / PostRev 0.000024967 s 0.000014319900019472698 s 1.74
hlo_ffi / PartOpt / cpu / BothRev 0.00002438 s 0.000013974980001876249 s 1.74
hlo_ffi / IPartOpt / cpu / PreRev 0.000024695000000000003 s 0.000014112100025158723 s 1.75
hlo_ffi / IPartOpt / cpu / PostRev 0.000024251 s 0.000014140760013106048 s 1.71
hlo_ffi / IPartOpt / cpu / BothRev 0.000025008 s 0.000014128419979897445 s 1.77
hlo_ffi / DefOpt / cpu / PreRev 0.000024347 s 0.000014014620001034928 s 1.74
hlo_ffi / DefOpt / cpu / PostRev 0.000025511 s 0.000014004400009071104 s 1.82
hlo_ffi / DefOpt / cpu / BothRev 0.000024834 s 0.000014273060005507431 s 1.74
hlo_ffi / IDefOpt / cpu / PreRev 0.000024958 s 0.000014238399980968095 s 1.75
hlo_ffi / IDefOpt / cpu / PostRev 0.00002497 s 0.000014124179979262409 s 1.77
hlo_ffi / IDefOpt / cpu / BothRev 0.000025864 s 0.000014526620025208104 s 1.78
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / Primal 0.0009349355999802 s 0.0008971354000095 s 1.04
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cpu / Primal 0.0009274594001908 s 0.0008899171998564 s 1.04
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / Primal 0.0010079368003061 s 0.0009708640001008 s 1.04
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / Primal 0.0009351313998195 s 0.0009020605999467 s 1.04
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / Primal 0.0009221920001436 s 0.0008823370000754 s 1.05
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / Primal 0.00108485260007 s 0.0009529861999908 s 1.14
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / Primal 0.0010123527998075 s 0.0009414723999725 s 1.08
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / Forward 0.0023791528001311 s 0.0021872678000363 s 1.09
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cpu / Forward 0.0025144418001218 s 0.0022974508001425 s 1.09
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / Forward 0.0023345342000538 s 0.0021531899999899 s 1.08
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / Forward 0.0024414441999397 s 0.0022659975999886 s 1.08
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / Forward 0.0022626698000749 s 0.0022205513998414 s 1.02
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / Forward 0.002243314000043 s 0.0021808726000017 s 1.03
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / Forward 0.0025117023998973 s 0.0022005146000083 s 1.14
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / PreRev 0.0060882047999257 s 0.0053413053999065 s 1.14
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / PostRev 0.0058263736000299 s 0.0055230883999684 s 1.05
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / BothRev 0.0055735834001097 s 0.0064637317998858 s 0.86
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cpu / BothRev 0.0061264169999049 s 0.003398336400005 s 1.80
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / PreRev 0.0065767152000262 s 0.0054967704000773 s 1.20
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / PostRev 0.0038909652001166 s 0.0053526630000305 s 0.73
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / BothRev 0.0062376900001254 s 0.0051467443998262 s 1.21
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / PreRev 0.0040834241997799 s 0.0055261032000998 s 0.74
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / PostRev 0.0063477446001343 s 0.0050601578001078 s 1.25
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / BothRev 0.0039601960001164 s 0.0053215985999486 s 0.74
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / PreRev 0.0064511211998251 s 0.0049230481999984 s 1.31
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / PostRev 0.0040804615999149 s 0.0054744204000598 s 0.75
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / BothRev 0.0063872664002701 s 0.0050823348001358 s 1.26
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / PreRev 0.0042009329998109 s 0.004144794000058 s 1.01
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / PostRev 0.0061575753999932 s 0.0054135484001562 s 1.14
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / BothRev 0.0040519577998566 s 0.0056405348000225 s 0.72
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / PreRev 0.0058569333999912 s 0.0054345575999832 s 1.08
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / PostRev 0.0058035177999045 s 0.0056602189999466 s 1.03
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / BothRev 0.0072535874000095 s 0.004959446400062 s 1.46
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cuda / Primal 0.000295583 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cuda / Primal 0.000296254 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cuda / Primal 0.000302366 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cuda / Primal 0.000296286 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cuda / Primal 0.000295583 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cuda / Primal 0.000303326 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cuda / Primal 0.000302462 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cuda / Forward 0.0005823009999999 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cuda / Forward 0.000567517 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cuda / Forward 0.000582397 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cuda / Forward 0.000582877 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cuda / Forward 0.0005835489999999 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cuda / Forward 0.000582909 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cuda / Forward 0.000583358 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cuda / PreRev 0.001056795 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cuda / PostRev 0.001012763 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cuda / BothRev 0.001052122 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cuda / BothRev 0.001005499 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cuda / PreRev 0.00103753 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cuda / PostRev 0.001059515 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cuda / BothRev 0.001037339 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cuda / PreRev 0.001052057 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cuda / PostRev 0.00100089 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cuda / BothRev 0.001052858 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cuda / PreRev 0.0010513859999999 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cuda / PostRev 0.001000602 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cuda / BothRev 0.00105321 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cuda / PreRev 0.001051547 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cuda / PostRev 0.000985946 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cuda / BothRev 0.00105321 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cuda / PreRev 0.00105417 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cuda / PostRev 0.00105545 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cuda / BothRev 0.001054043 s
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / tpu / Primal 0.0001243749999999 s 0.000130709 s 0.95
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / tpu / Primal 0.00012636525 s 0.0001240825 s 1.02
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / tpu / Primal 0.0001526847499999 s 0.000160036 s 0.95
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / tpu / Primal 0.00013420475 s 0.0001310375 s 1.02
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / tpu / Primal 0.0001313835 s 0.00013850275 s 0.95
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / tpu / Primal 0.0001476569999999 s 0.0001452479999999 s 1.02
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / tpu / Primal 0.0001507925 s 0.000158184 s 0.95
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / tpu / Forward 0.00021233175 s 0.0002136285 s 0.99
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / tpu / Forward 0.0002606585 s 0.00026264175 s 0.99
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / tpu / Forward 0.0002125989999999 s 0.0002197907499999 s 0.97
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / tpu / Forward 0.000218329 s 0.00021473625 s 1.02
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / tpu / Forward 0.00021235775 s 0.0002155807499999 s 0.99
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / tpu / Forward 0.000218641 s 0.0002177224999999 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / tpu / Forward 0.00021244225 s 0.0002154529999999 s 0.99
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / tpu / PreRev 0.00035597575 s 0.00035606725 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / tpu / PostRev 0.0002567705 s 0.00025604325 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / tpu / BothRev 0.00035548375 s 0.00035553875 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / tpu / BothRev 0.00025771575 s 0.00025739475 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / tpu / PreRev 0.0003557859999999 s 0.0003556455 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / tpu / PostRev 0.0002914995 s 0.0002916115 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / tpu / BothRev 0.0003558902499999 s 0.00035595175 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / tpu / PreRev 0.0003577695 s 0.00035638875 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / tpu / PostRev 0.00027321475 s 0.0002722449999999 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / tpu / BothRev 0.000358045 s 0.00035633325 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / tpu / PreRev 0.0003558115 s 0.0003561175 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / tpu / PostRev 0.0002736744999999 s 0.0002719079999999 s 1.01
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / tpu / BothRev 0.00035566225 s 0.0003558575 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / tpu / PreRev 0.0003600397499999 s 0.0003581802499999 s 1.01
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / tpu / PostRev 0.000284063 s 0.000283912 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / tpu / BothRev 0.00035972725 s 0.0003585855 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / tpu / PreRev 0.000358039 s 0.00035803675 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / tpu / PostRev 0.00030212025 s 0.00030181725 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / tpu / BothRev 0.00035780175 s 0.00035818825 s 1.00
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / Primal 0.002271689 s 0.0008971354000095 s 2.53
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cpu / Primal 0.0025739139999999 s 0.0008899171998564 s 2.89
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / Primal 0.002548352 s 0.0009708640001008 s 2.62
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / Primal 0.002345908 s 0.0009020605999467 s 2.60
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / Primal 0.0024799419999999 s 0.0008823370000754 s 2.81
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / Primal 0.00268944 s 0.0009529861999908 s 2.82
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / Primal 0.00210795 s 0.0009414723999725 s 2.24
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / Forward 0.005897662 s 0.0021872678000363 s 2.70
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cpu / Forward 0.00620977 s 0.0022974508001425 s 2.70
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / Forward 0.006260465 s 0.0021531899999899 s 2.91
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / Forward 0.005660108 s 0.0022659975999886 s 2.50
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / Forward 0.006034746 s 0.0022205513998414 s 2.72
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / Forward 0.005741135 s 0.0021808726000017 s 2.63
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / Forward 0.006221135 s 0.0022005146000083 s 2.83
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / PreRev 0.009134504 s 0.0053413053999065 s 1.71
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / PostRev 0.010198764 s 0.0055230883999684 s 1.85
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / JaXPipe / cpu / BothRev 0.009020979 s 0.0064637317998858 s 1.40
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / Jax / cpu / BothRev 0.010675008 s 0.003398336400005 s 3.14
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / PreRev 0.009596225 s 0.0054967704000773 s 1.75
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / PostRev 0.008347126 s 0.0053526630000305 s 1.56
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / HLOOpt / cpu / BothRev 0.009270003 s 0.0051467443998262 s 1.80
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / PreRev 0.008113154 s 0.0055261032000998 s 1.47
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / PostRev 0.009179662 s 0.0050601578001078 s 1.81
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / PartOpt / cpu / BothRev 0.010006134 s 0.0053215985999486 s 1.88
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / PreRev 0.009365746 s 0.0049230481999984 s 1.90
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / PostRev 0.009646417 s 0.0054744204000598 s 1.76
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IPartOpt / cpu / BothRev 0.008469473 s 0.0050823348001358 s 1.67
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / PreRev 0.010134985 s 0.004144794000058 s 2.45
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / PostRev 0.008348219 s 0.0054135484001562 s 1.54
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / DefOpt / cpu / BothRev 0.0098972 s 0.0056405348000225 s 1.75
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / PreRev 0.00951859 s 0.0054345575999832 s 1.75
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / PostRev 0.009772599 s 0.0056602189999466 s 1.73
llama_dim_288_hidden_dim_768_n_layers_6_n_heads_6_n_kv_heads_6_vocab_size_32000_seq_len_256 / IDefOpt / cpu / BothRev 0.010182318 s 0.004959446400062 s 2.05
scatter_sum / JaXPipe / cpu / Primal 0.000010183399990637551 s 0.000009197599974868353 s 1.11
scatter_sum / Jax / cpu / Primal 0.000009889619886962465 s 0.00000863658000525902 s 1.15
scatter_sum / HLOOpt / cpu / Primal 0.000009029760003613771 s 0.000008636440006739577 s 1.05
scatter_sum / PartOpt / cpu / Primal 0.00000970557988694054 s 0.000008683640025992645 s 1.12
scatter_sum / IPartOpt / cpu / Primal 0.000009167279949906515 s 0.000008986519978861907 s 1.02
scatter_sum / DefOpt / cpu / Primal 0.000008687619938427816 s 0.00000832479997370683 s 1.04
scatter_sum / IDefOpt / cpu / Primal 0.000009356920108984924 s 0.000008516199995938223 s 1.10
scatter_sum / JaXPipe / cpu / Forward 0.000013841940053680443 s 0.000013116119998812791 s 1.06
scatter_sum / Jax / cpu / Forward 0.000013263760029076366 s 0.00001274550000744057 s 1.04
scatter_sum / HLOOpt / cpu / Forward 0.000014280699942901264 s 0.000012919219998366315 s 1.11
scatter_sum / PartOpt / cpu / Forward 0.00001395991997924284 s 0.000012863720012319393 s 1.09
scatter_sum / IPartOpt / cpu / Forward 0.000014041120011825117 s 0.000012910679997730768 s 1.09
scatter_sum / DefOpt / cpu / Forward 0.000014347019932756666 s 0.000012142800014771638 s 1.18
scatter_sum / IDefOpt / cpu / Forward 0.000013847979935235345 s 0.000012415180026437157 s 1.12
scatter_sum / JaXPipe / cpu / PreRev 0.00001375821990222903 s 0.000012810599982913118 s 1.07
scatter_sum / JaXPipe / cpu / PostRev 0.00001326634001088678 s 0.000013005880018681636 s 1.02
scatter_sum / JaXPipe / cpu / BothRev 0.000013837100141245172 s 0.000013457019986162776 s 1.03
scatter_sum / Jax / cpu / BothRev 0.00001333199992586742 s 0.000012451120019250084 s 1.07
scatter_sum / HLOOpt / cpu / PreRev 0.00001337258005150943 s 0.0000133911399916542 s 1.00
scatter_sum / HLOOpt / cpu / PostRev 0.00001552734005599632 s 0.000015021919998616796 s 1.03
scatter_sum / HLOOpt / cpu / BothRev 0.00001398595994032803 s 0.000012903119986731329 s 1.08
scatter_sum / PartOpt / cpu / PreRev 0.000014097860039328224 s 0.000013085659957141617 s 1.08
scatter_sum / PartOpt / cpu / PostRev 0.00001340291999440524 s 0.00001335955999820726 s 1.00
scatter_sum / PartOpt / cpu / BothRev 0.0000141941800393397 s 0.000013487699989127578 s 1.05
scatter_sum / IPartOpt / cpu / PreRev 0.000012951199969393202 s 0.00001300502000049164 s 1.00
scatter_sum / IPartOpt / cpu / PostRev 0.000013985779951326549 s 0.000013540099971578456 s 1.03
scatter_sum / IPartOpt / cpu / BothRev 0.000013978120095998747 s 0.000013024959953327198 s 1.07
scatter_sum / DefOpt / cpu / PreRev 0.000013238460051070431 s 0.000012345180020929547 s 1.07
scatter_sum / DefOpt / cpu / PostRev 0.00001396254001519992 s 0.000013244880028651096 s 1.05
scatter_sum / DefOpt / cpu / BothRev 0.00001370358011627104 s 0.00001294892000260006 s 1.06
scatter_sum / IDefOpt / cpu / PreRev 0.000013782300029561155 s 0.000012458519977371909 s 1.11
scatter_sum / IDefOpt / cpu / PostRev 0.00001319270000749384 s 0.000013432480009214488 s 0.98
scatter_sum / IDefOpt / cpu / BothRev 0.00001325865992839681 s 0.000013218240028436412 s 1.00
scatter_sum / JaXPipe / cuda / Primal 0.000011072 s
scatter_sum / Jax / cuda / Primal 0.000011552 s
scatter_sum / HLOOpt / cuda / Primal 0.000011968 s
scatter_sum / PartOpt / cuda / Primal 0.000010592 s
scatter_sum / IPartOpt / cuda / Primal 0.000010849 s
scatter_sum / DefOpt / cuda / Primal 0.000011455999999999998 s
scatter_sum / IDefOpt / cuda / Primal 0.000011072 s
scatter_sum / JaXPipe / cuda / Forward 0.00001808 s
scatter_sum / Jax / cuda / Forward 0.000017856 s
scatter_sum / HLOOpt / cuda / Forward 0.000017632 s
scatter_sum / PartOpt / cuda / Forward 0.000017696 s
scatter_sum / IPartOpt / cuda / Forward 0.000018368 s
scatter_sum / DefOpt / cuda / Forward 0.000017664 s
scatter_sum / IDefOpt / cuda / Forward 0.00001808 s
scatter_sum / JaXPipe / cuda / PreRev 0.000019744000000000003 s
scatter_sum / JaXPipe / cuda / PostRev 0.000017823 s
scatter_sum / JaXPipe / cuda / BothRev 0.000017825 s
scatter_sum / Jax / cuda / BothRev 0.000016992 s
scatter_sum / HLOOpt / cuda / PreRev 0.000017664 s
scatter_sum / HLOOpt / cuda / PostRev 0.000017503999999999997 s
scatter_sum / HLOOpt / cuda / BothRev 0.00001824 s
scatter_sum / PartOpt / cuda / PreRev 0.000018304 s
scatter_sum / PartOpt / cuda / PostRev 0.000017856 s
scatter_sum / PartOpt / cuda / BothRev 0.000017406999999999998 s
scatter_sum / IPartOpt / cuda / PreRev 0.000018176 s
scatter_sum / IPartOpt / cuda / PostRev 0.000017760000000000003 s
scatter_sum / IPartOpt / cuda / BothRev 0.000017888000000000002 s
scatter_sum / DefOpt / cuda / PreRev 0.000019552 s
scatter_sum / DefOpt / cuda / PostRev 0.000017632 s
scatter_sum / DefOpt / cuda / BothRev 0.000018047 s
scatter_sum / IDefOpt / cuda / PreRev 0.000019648 s
scatter_sum / IDefOpt / cuda / PostRev 0.000017984 s
scatter_sum / IDefOpt / cuda / BothRev 0.000017952 s
scatter_sum / JaXPipe / tpu / Primal 0.0000013508999999999998 s 0.000001350125 s 1.00
scatter_sum / Jax / tpu / Primal 0.0000014046500000000002 s 0.0000014033000000000002 s 1.00
scatter_sum / HLOOpt / tpu / Primal 0.0000013516 s 0.0000013502249999999998 s 1.00
scatter_sum / PartOpt / tpu / Primal 0.0000014050250000000002 s 0.0000014037250000000005 s 1.00
scatter_sum / IPartOpt / tpu / Primal 0.00000135135 s 0.0000013500249999999998 s 1.00
scatter_sum / DefOpt / tpu / Primal 0.0000014046500000000002 s 0.0000014043 s 1.00
scatter_sum / IDefOpt / tpu / Primal 0.00000135045 s 0.0000013501750000000002 s 1.00
scatter_sum / JaXPipe / tpu / Forward 0.000002710075 s 0.0000027027 s 1.00
scatter_sum / Jax / tpu / Forward 0.000002737625 s 0.000002726925 s 1.00
scatter_sum / HLOOpt / tpu / Forward 0.000002701125 s 0.0000027038000000000003 s 1.00
scatter_sum / PartOpt / tpu / Forward 0.00000269255 s 0.00000269375 s 1.00
scatter_sum / IPartOpt / tpu / Forward 0.000002701375 s 0.00000270735 s 1.00
scatter_sum / DefOpt / tpu / Forward 0.0000026921 s 0.000002696475 s 1.00
scatter_sum / IDefOpt / tpu / Forward 0.0000027053500000000003 s 0.0000027051750000000003 s 1.00
scatter_sum / JaXPipe / tpu / PreRev 0.00000269 s 0.0000026953 s 1.00
scatter_sum / JaXPipe / tpu / PostRev 0.000002685 s 0.00000268745 s 1.00
scatter_sum / JaXPipe / tpu / BothRev 0.00000271375 s 0.0000027069 s 1.00
scatter_sum / Jax / tpu / BothRev 0.000002739075 s 0.0000027392 s 1.00
scatter_sum / HLOOpt / tpu / PreRev 0.000002709875 s 0.000002704325 s 1.00
scatter_sum / HLOOpt / tpu / PostRev 0.000002749075 s 0.000002741175 s 1.00
scatter_sum / HLOOpt / tpu / BothRev 0.0000027067249999999995 s 0.0000027063 s 1.00
scatter_sum / PartOpt / tpu / PreRev 0.00000275155 s 0.000002745575 s 1.00
scatter_sum / PartOpt / tpu / PostRev 0.000002707175 s 0.000002701075 s 1.00
scatter_sum / PartOpt / tpu / BothRev 0.0000027399 s 0.00000274695 s 1.00
scatter_sum / IPartOpt / tpu / PreRev 0.000002713825 s 0.00000270495 s 1.00
scatter_sum / IPartOpt / tpu / PostRev 0.0000027442250000000004 s 0.000002741025 s 1.00
scatter_sum / IPartOpt / tpu / BothRev 0.000002705475 s 0.000002712075 s 1.00
scatter_sum / DefOpt / tpu / PreRev 0.000002743525 s 0.000002741375 s 1.00
scatter_sum / DefOpt / tpu / PostRev 0.0000027045750000000003 s 0.000002704725 s 1.00
scatter_sum / DefOpt / tpu / BothRev 0.000002741375 s 0.000002746225 s 1.00
scatter_sum / IDefOpt / tpu / PreRev 0.000002705375 s 0.000002709225 s 1.00
scatter_sum / IDefOpt / tpu / PostRev 0.000002738425 s 0.0000027389999999999995 s 1.00
scatter_sum / IDefOpt / tpu / BothRev 0.00000270845 s 0.000002711825 s 1.00
scatter_sum / JaXPipe / cpu / Primal 0.000016272999999999998 s 0.000009197599974868353 s 1.77
scatter_sum / Jax / cpu / Primal 0.000015834 s 0.00000863658000525902 s 1.83
scatter_sum / HLOOpt / cpu / Primal 0.000016011 s 0.000008636440006739577 s 1.85
scatter_sum / PartOpt / cpu / Primal 0.000015360000000000002 s 0.000008683640025992645 s 1.77
scatter_sum / IPartOpt / cpu / Primal 0.000015768000000000002 s 0.000008986519978861907 s 1.75
scatter_sum / DefOpt / cpu / Primal 0.000015929999999999998 s 0.00000832479997370683 s 1.91
scatter_sum / IDefOpt / cpu / Primal 0.000015848999999999997 s 0.000008516199995938223 s 1.86
scatter_sum / JaXPipe / cpu / Forward 0.000024807 s 0.000013116119998812791 s 1.89
scatter_sum / Jax / cpu / Forward 0.000024616 s 0.00001274550000744057 s 1.93
scatter_sum / HLOOpt / cpu / Forward 0.000024219 s 0.000012919219998366315 s 1.87
scatter_sum / PartOpt / cpu / Forward 0.000024722 s 0.000012863720012319393 s 1.92
scatter_sum / IPartOpt / cpu / Forward 0.000026084 s 0.000012910679997730768 s 2.02
scatter_sum / DefOpt / cpu / Forward 0.000023484 s 0.000012142800014771638 s 1.93
scatter_sum / IDefOpt / cpu / Forward 0.000023217 s 0.000012415180026437157 s 1.87
scatter_sum / JaXPipe / cpu / PreRev 0.000024878 s 0.000012810599982913118 s 1.94
scatter_sum / JaXPipe / cpu / PostRev 0.000022709 s 0.000013005880018681636 s 1.75
scatter_sum / JaXPipe / cpu / BothRev 0.00002331 s 0.000013457019986162776 s 1.73
scatter_sum / Jax / cpu / BothRev 0.000023879 s 0.000012451120019250084 s 1.92
scatter_sum / HLOOpt / cpu / PreRev 0.000024609 s 0.0000133911399916542 s 1.84
scatter_sum / HLOOpt / cpu / PostRev 0.000023197 s 0.000015021919998616796 s 1.54
scatter_sum / HLOOpt / cpu / BothRev 0.000023532 s 0.000012903119986731329 s 1.82
scatter_sum / PartOpt / cpu / PreRev 0.000024254 s 0.000013085659957141617 s 1.85
scatter_sum / PartOpt / cpu / PostRev 0.000024451 s 0.00001335955999820726 s 1.83
scatter_sum / PartOpt / cpu / BothRev 0.000024654 s 0.000013487699989127578 s 1.83
scatter_sum / IPartOpt / cpu / PreRev 0.000024829 s 0.00001300502000049164 s 1.91
scatter_sum / IPartOpt / cpu / PostRev 0.000023344 s 0.000013540099971578456 s 1.72
scatter_sum / IPartOpt / cpu / BothRev 0.000023167 s 0.000013024959953327198 s 1.78
scatter_sum / DefOpt / cpu / PreRev 0.00002406 s 0.000012345180020929547 s 1.95
scatter_sum / DefOpt / cpu / PostRev 0.000024225 s 0.000013244880028651096 s 1.83
scatter_sum / DefOpt / cpu / BothRev 0.000024111 s 0.00001294892000260006 s 1.86
scatter_sum / IDefOpt / cpu / PreRev 0.00002274 s 0.000012458519977371909 s 1.83
scatter_sum / IDefOpt / cpu / PostRev 0.000023018 s 0.000013432480009214488 s 1.71
scatter_sum / IDefOpt / cpu / BothRev 0.000023566 s 0.000013218240028436412 s 1.78
slicing / JaXPipe / cpu / Primal 0.000008707579945621547 s 0.000006967759964027209 s 1.25
slicing / Jax / cpu / Primal 0.000007284799994522473 s 0.000007308279982680688 s 1.00
slicing / HLOOpt / cpu / Primal 0.000007880920111347223 s 0.000007030099968687864 s 1.12
slicing / PartOpt / cpu / Primal 0.000007727299962425605 s 0.000006606380047742277 s 1.17
slicing / IPartOpt / cpu / Primal 0.000008197519928216935 s 0.0000070093600061227335 s 1.17
slicing / DefOpt / cpu / Primal 0.000007657079986529425 s 0.000007347559958361672 s 1.04
slicing / IDefOpt / cpu / Primal 0.000007732699959888123 s 0.000007295080013136612 s 1.06
slicing / JaXPipe / cpu / Forward 0.000011202780096937204 s 0.00001058188000570226 s 1.06
slicing / Jax / cpu / Forward 0.000010673479991964995 s 0.000010111599976880823 s 1.06
slicing / HLOOpt / cpu / Forward 0.000011764920036512197 s 0.000010369259971412248 s 1.13
slicing / PartOpt / cpu / Forward 0.000010967979978886433 s 0.000010642100014592873 s 1.03
slicing / IPartOpt / cpu / Forward 0.000010802180077007506 s 0.000011122879968752386 s 0.97
slicing / DefOpt / cpu / Forward 0.000010347040060878498 s 0.000010585239997453756 s 0.98
slicing / IDefOpt / cpu / Forward 0.000011701300009008264 s 0.000010901799969360582 s 1.07
slicing / JaXPipe / cpu / PreRev 0.000011341839981469092 s 0.000010843380023288772 s 1.05
slicing / JaXPipe / cpu / PostRev 0.000012217339954077031 s 0.000010892160007642817 s 1.12
slicing / JaXPipe / cpu / BothRev 0.000011399119939596858 s 0.00001148246002230735 s 0.99
slicing / Jax / cpu / BothRev 0.000011811219937953864 s 0.00001101181999729306 s 1.07
slicing / HLOOpt / cpu / PreRev 0.000011596399999689311 s 0.000011168119945068613 s 1.04
slicing / HLOOpt / cpu / PostRev 0.00001327369996943162 s 0.00001565627999298158 s 0.85
slicing / HLOOpt / cpu / BothRev 0.000011081120046583235 s 0.000011009500012733042 s 1.01
slicing / PartOpt / cpu / PreRev 0.000011381300064385868 s 0.000011034819972337572 s 1.03
slicing / PartOpt / cpu / PostRev 0.0000119806199472805 s 0.000011052660020141048 s 1.08
slicing / PartOpt / cpu / BothRev 0.000011821760072052712 s 0.00001162962001217238 s 1.02
slicing / IPartOpt / cpu / PreRev 0.00001134347994593554 s 0.000010544719989411531 s 1.08
slicing / IPartOpt / cpu / PostRev 0.000011537000009411712 s 0.000011078819979957187 s 1.04
slicing / IPartOpt / cpu / BothRev 0.00001193932001115172 s 0.000011282959994787236 s 1.06
slicing / DefOpt / cpu / PreRev 0.000011067499945056624 s 0.000010709039997891525 s 1.03
slicing / DefOpt / cpu / PostRev 0.000011694160002662102 s 0.000011488679992908146 s 1.02
slicing / DefOpt / cpu / BothRev 0.000011757460069929949 s 0.00001066072003595764 s 1.10
slicing / IDefOpt / cpu / PreRev 0.000011742820006475086 s 0.000010654759971657768 s 1.10
slicing / IDefOpt / cpu / PostRev 0.00001131686003645882 s 0.000011159800033055945 s 1.01
slicing / IDefOpt / cpu / BothRev 0.000011514720008563017 s 0.000010670940046111356 s 1.08
slicing / JaXPipe / cuda / Primal 0.000002304 s
slicing / Jax / cuda / Primal 0.000002303 s
slicing / HLOOpt / cuda / Primal 0.000002303 s
slicing / PartOpt / cuda / Primal 0.000002303 s
slicing / IPartOpt / cuda / Primal 0.000002304 s
slicing / DefOpt / cuda / Primal 0.000002303 s
slicing / IDefOpt / cuda / Primal 0.000002303 s
slicing / JaXPipe / cuda / Forward 0.000010433 s
slicing / Jax / cuda / Forward 0.00001072 s
slicing / HLOOpt / cuda / Forward 0.000010464 s
slicing / PartOpt / cuda / Forward 0.000011392 s
slicing / IPartOpt / cuda / Forward 0.000010784 s
slicing / DefOpt / cuda / Forward 0.000010528 s
slicing / IDefOpt / cuda / Forward 0.000011072 s
slicing / JaXPipe / cuda / PreRev 0.000011104 s
slicing / JaXPipe / cuda / PostRev 0.000011104 s
slicing / JaXPipe / cuda / BothRev 0.000010944 s
slicing / Jax / cuda / BothRev 0.00001104 s
slicing / HLOOpt / cuda / PreRev 0.000011808 s
slicing / HLOOpt / cuda / PostRev 0.000010848 s
slicing / HLOOpt / cuda / BothRev 0.000011328 s
slicing / PartOpt / cuda / PreRev 0.000010656 s
slicing / PartOpt / cuda / PostRev 0.000010592 s
slicing / PartOpt / cuda / BothRev 0.000010432 s
slicing / IPartOpt / cuda / PreRev 0.000010912 s
slicing / IPartOpt / cuda / PostRev 0.000010752 s
slicing / IPartOpt / cuda / BothRev 0.000010592 s
slicing / DefOpt / cuda / PreRev 0.00001024 s
slicing / DefOpt / cuda / PostRev 0.000010752 s
slicing / DefOpt / cuda / BothRev 0.000010847 s
slicing / IDefOpt / cuda / PreRev 0.00001056 s
slicing / IDefOpt / cuda / PostRev 0.000010752 s
slicing / IDefOpt / cuda / BothRev 0.000010496 s
slicing / JaXPipe / tpu / Primal 0.000001024775 s 0.00000102665 s 1.00
slicing / Jax / tpu / Primal 9.68625e-7 s 9.691e-7 s 1.00
slicing / HLOOpt / tpu / Primal 0.00000102725 s 0.000001022725 s 1.00
slicing / PartOpt / tpu / Primal 9.741e-7 s 9.7145e-7 s 1.00
slicing / IPartOpt / tpu / Primal 0.000001022025 s 0.000001027425 s 0.99
slicing / DefOpt / tpu / Primal 9.6835e-7 s 9.7015e-7 s 1.00
slicing / IDefOpt / tpu / Primal 0.00000102545 s 0.0000010241500000000002 s 1.00
slicing / JaXPipe / tpu / Forward 0.000001411 s 0.000001420325 s 0.99
slicing / Jax / tpu / Forward 0.000001477525 s 0.000001482275 s 1.00
slicing / HLOOpt / tpu / Forward 0.00000151975 s 0.000001521325 s 1.00
slicing / PartOpt / tpu / Forward 0.00000150675 s 0.000001498725 s 1.01
slicing / IPartOpt / tpu / Forward 0.000001522025 s 0.0000015166750000000002 s 1.00
slicing / DefOpt / tpu / Forward 0.000001503025 s 0.000001497025 s 1.00
slicing / IDefOpt / tpu / Forward 0.0000015334249999999998 s 0.0000015183749999999997 s 1.01
slicing / JaXPipe / tpu / PreRev 0.00000256575 s 0.0000025757750000000003 s 1.00
slicing / JaXPipe / tpu / PostRev 0.000002519725 s 0.000002527475 s 1.00
slicing / JaXPipe / tpu / BothRev 0.00000259535 s 0.000002581175 s 1.01
slicing / Jax / tpu / BothRev 0.0000025354500000000004 s 0.00000254895 s 0.99
slicing / HLOOpt / tpu / PreRev 0.0000025794499999999995 s 0.00000258125 s 1.00
slicing / HLOOpt / tpu / PostRev 0.0000025419 s 0.000002547175 s 1.00
slicing / HLOOpt / tpu / BothRev 0.000002587475 s 0.0000025804 s 1.00
slicing / PartOpt / tpu / PreRev 0.000002533275 s 0.000002536925 s 1.00
slicing / PartOpt / tpu / PostRev 0.000002586525 s 0.0000025853 s 1.00
slicing / PartOpt / tpu / BothRev 0.0000025449 s 0.0000025357750000000003 s 1.00
slicing / IPartOpt / tpu / PreRev 0.000002576675 s 0.000002592675 s 0.99
slicing / IPartOpt / tpu / PostRev 0.0000025358750000000005 s 0.0000025356000000000003 s 1.00
slicing / IPartOpt / tpu / BothRev 0.0000025857 s 0.0000025901 s 1.00
slicing / DefOpt / tpu / PreRev 0.0000025307 s 0.0000025452500000000003 s 0.99
slicing / DefOpt / tpu / PostRev 0.0000025852000000000003 s 0.000002585725 s 1.00
slicing / DefOpt / tpu / BothRev 0.000002543225 s 0.000002534775 s 1.00
slicing / IDefOpt / tpu / PreRev 0.000002577675 s 0.00000259005 s 1.00
slicing / IDefOpt / tpu / PostRev 0.0000025315250000000003 s 0.000002541125 s 1.00
slicing / IDefOpt / tpu / BothRev 0.000002578375 s 0.0000025866250000000004 s 1.00
slicing / JaXPipe / cpu / Primal 0.000012958 s 0.000006967759964027209 s 1.86
slicing / Jax / cpu / Primal 0.000012691 s 0.000007308279982680688 s 1.74
slicing / HLOOpt / cpu / Primal 0.000012748 s 0.000007030099968687864 s 1.81
slicing / PartOpt / cpu / Primal 0.000012629 s 0.000006606380047742277 s 1.91
slicing / IPartOpt / cpu / Primal 0.000012712 s 0.0000070093600061227335 s 1.81
slicing / DefOpt / cpu / Primal 0.000012529 s 0.000007347559958361672 s 1.71
slicing / IDefOpt / cpu / Primal 0.000012598 s 0.000007295080013136612 s 1.73
slicing / JaXPipe / cpu / Forward 0.000017235 s 0.00001058188000570226 s 1.63
slicing / Jax / cpu / Forward 0.000017001 s 0.000010111599976880823 s 1.68
slicing / HLOOpt / cpu / Forward 0.000016847 s 0.000010369259971412248 s 1.62
slicing / PartOpt / cpu / Forward 0.000016839 s 0.000010642100014592873 s 1.58
slicing / IPartOpt / cpu / Forward 0.000017009 s 0.000011122879968752386 s 1.53
slicing / DefOpt / cpu / Forward 0.00001675 s 0.000010585239997453756 s 1.58
slicing / IDefOpt / cpu / Forward 0.000016903 s 0.000010901799969360582 s 1.55
slicing / JaXPipe / cpu / PreRev 0.000017603 s 0.000010843380023288772 s 1.62
slicing / JaXPipe / cpu / PostRev 0.000017506 s 0.000010892160007642817 s 1.61
slicing / JaXPipe / cpu / BothRev 0.000017277 s 0.00001148246002230735 s 1.50
slicing / Jax / cpu / BothRev 0.000017389999999999998 s 0.00001101181999729306 s 1.58
slicing / HLOOpt / cpu / PreRev 0.00001825 s 0.000011168119945068613 s 1.63
slicing / HLOOpt / cpu / PostRev 0.000018025 s 0.00001565627999298158 s 1.15
slicing / HLOOpt / cpu / BothRev 0.000017539 s 0.000011009500012733042 s 1.59
slicing / PartOpt / cpu / PreRev 0.000018402 s 0.000011034819972337572 s 1.67
slicing / PartOpt / cpu / PostRev 0.000017787 s 0.000011052660020141048 s 1.61
slicing / PartOpt / cpu / BothRev 0.000017964999999999998 s 0.00001162962001217238 s 1.54
slicing / IPartOpt / cpu / PreRev 0.000017701000000000002 s 0.000010544719989411531 s 1.68
slicing / IPartOpt / cpu / PostRev 0.000017593999999999998 s 0.000011078819979957187 s 1.59
slicing / IPartOpt / cpu / BothRev 0.000017819 s 0.000011282959994787236 s 1.58
slicing / DefOpt / cpu / PreRev 0.000018437 s 0.000010709039997891525 s 1.72
slicing / DefOpt / cpu / PostRev 0.000017769 s 0.000011488679992908146 s 1.55
slicing / DefOpt / cpu / BothRev 0.000018353 s 0.00001066072003595764 s 1.72
slicing / IDefOpt / cpu / PreRev 0.000017675 s 0.000010654759971657768 s 1.66
slicing / IDefOpt / cpu / PostRev 0.000017442 s 0.000011159800033055945 s 1.56
slicing / IDefOpt / cpu / BothRev 0.000017743 s 0.000010670940046111356 s 1.66
sum / JaXPipe / cpu / Primal 0.000009703159958007745 s 0.000008534619983038283 s 1.14
sum / Jax / cpu / Primal 0.000009331719993497246 s 0.00000848227999995288 s 1.10
sum / HLOOpt / cpu / Primal 0.000010051419976662146 s 0.00000842622001982818 s 1.19
sum / PartOpt / cpu / Primal 0.000008696079967194236 s 0.000008917679979276727 s 0.98
sum / IPartOpt / cpu / Primal 0.0000098646600417851 s 0.000008620339958724798 s 1.14
sum / DefOpt / cpu / Primal 0.000008684100012033013 s 0.00000882572004229587 s 0.98
sum / IDefOpt / cpu / Primal 0.000008947900059865788 s 0.000008002759977898677 s 1.12
sum / JaXPipe / cpu / Forward 0.000013199400000303285 s 0.000012622960002772744 s 1.05
sum / Jax / cpu / Forward 0.000013182119982957374 s 0.000012528960032796022 s 1.05
sum / HLOOpt / cpu / Forward 0.000013430360031634336 s 0.00001259522004147584 s 1.07
sum / PartOpt / cpu / Forward 0.00001275664002605481 s 0.000012516680026237735 s 1.02
sum / IPartOpt / cpu / Forward 0.000013174080013413914 s 0.000012727580005957862 s 1.04
sum / DefOpt / cpu / Forward 0.000012715920092887243 s 0.0000123404800069693 s 1.03
sum / IDefOpt / cpu / Forward 0.000012533579938462936 s 0.000012064680004186811 s 1.04
sum / JaXPipe / cpu / PreRev 0.000012346200001047692 s 0.000012319859997660388 s 1.00
sum / JaXPipe / cpu / PostRev 0.000012574120046338066 s 0.000012000920023638172 s 1.05
sum / JaXPipe / cpu / BothRev 0.000012460000052669784 s 0.00001183335996756796 s 1.05
sum / Jax / cpu / BothRev 0.000012215100032335613 s 0.000011508679990583917 s 1.06
sum / HLOOpt / cpu / PreRev 0.000012884199950349284 s 0.000012048360040353146 s 1.07
sum / HLOOpt / cpu / PostRev 0.000014180940015648955 s 0.000014090959984969233 s 1.01
sum / HLOOpt / cpu / BothRev 0.000012211080011184095 s 0.0000120684399826132 s 1.01
sum / PartOpt / cpu / PreRev 0.000011928680069104302 s 0.00001170439998531947 s 1.02
sum / PartOpt / cpu / PostRev 0.000012329819892329397 s 0.000012685339997915436 s 0.97
sum / PartOpt / cpu / BothRev 0.000012584139913087713 s 0.000011704099997587036 s 1.08
sum / IPartOpt / cpu / PreRev 0.000012323919982009102 s 0.000011762620006265934 s 1.05
sum / IPartOpt / cpu / PostRev 0.000012879819914815016 s 0.000011630520029939362 s 1.11
sum / IPartOpt / cpu / BothRev 0.000012425980021362193 s 0.000011163259978275164 s 1.11
sum / DefOpt / cpu / PreRev 0.00001232571996297338 s 0.000011822399965240038 s 1.04
sum / DefOpt / cpu / PostRev 0.000012890540037915344 s 0.000011481779956739049 s 1.12
sum / DefOpt / cpu / BothRev 0.000012246999958733796 s 0.00001181014000394498 s 1.04
sum / IDefOpt / cpu / PreRev 0.00001259490003576502 s 0.00001237589999618649 s 1.02
sum / IDefOpt / cpu / PostRev 0.000012125120010750832 s 0.00001125832002799143 s 1.08
sum / IDefOpt / cpu / BothRev 0.000012409059945639457 s 0.000011904700031664106 s 1.04
sum / JaXPipe / cuda / Primal 0.000002463 s
sum / Jax / cuda / Primal 0.000002463 s
sum / HLOOpt / cuda / Primal 0.000002463 s
sum / PartOpt / cuda / Primal 0.000002463 s
sum / IPartOpt / cuda / Primal 0.000002463 s
sum / DefOpt / cuda / Primal 0.000002464 s
sum / IDefOpt / cuda / Primal 0.000002463 s
sum / JaXPipe / cuda / Forward 0.000011392 s
sum / Jax / cuda / Forward 0.000011104 s
sum / HLOOpt / cuda / Forward 0.000011136 s
sum / PartOpt / cuda / Forward 0.000010848 s
sum / IPartOpt / cuda / Forward 0.000011296 s
sum / DefOpt / cuda / Forward 0.000011136 s
sum / IDefOpt / cuda / Forward 0.000010912 s
sum / JaXPipe / cuda / PreRev 0.000010656 s
sum / JaXPipe / cuda / PostRev 0.000010464 s
sum / JaXPipe / cuda / BothRev 0.000010304 s
sum / Jax / cuda / BothRev 0.00001056 s
sum / HLOOpt / cuda / PreRev 0.00001024 s
sum / HLOOpt / cuda / PostRev 0.000010336 s
sum / HLOOpt / cuda / BothRev 0.000010144 s
sum / PartOpt / cuda / PreRev 0.000011008 s
sum / PartOpt / cuda / PostRev 0.000010369 s
sum / PartOpt / cuda / BothRev 0.000010752 s
sum / IPartOpt / cuda / PreRev 0.000010593 s
sum / IPartOpt / cuda / PostRev 0.000010272 s
sum / IPartOpt / cuda / BothRev 0.000010433 s
sum / DefOpt / cuda / PreRev 0.000010688 s
sum / DefOpt / cuda / PostRev 0.000011552 s
sum / DefOpt / cuda / BothRev 0.000010784 s
sum / IDefOpt / cuda / PreRev 0.000012096 s
sum / IDefOpt / cuda / PostRev 0.000010592 s
sum / IDefOpt / cuda / BothRev 0.000010495 s
sum / JaXPipe / tpu / Primal 5.106499999999999e-7 s 5.103250000000001e-7 s 1.00
sum / Jax / tpu / Primal 5.47425e-7 s 5.467e-7 s 1.00
sum / HLOOpt / tpu / Primal 5.104e-7 s 5.1015e-7 s 1.00
sum / PartOpt / tpu / Primal 5.47525e-7 s 5.47125e-7 s 1.00
sum / IPartOpt / tpu / Primal 5.104499999999999e-7 s 5.10225e-7 s 1.00
sum / DefOpt / tpu / Primal 5.4745e-7 s 5.4695e-7 s 1.00
sum / IDefOpt / tpu / Primal 5.108499999999999e-7 s 5.106499999999999e-7 s 1.00
sum / JaXPipe / tpu / Forward 0.000001569275 s 0.0000015479999999999998 s 1.01
sum / Jax / tpu / Forward 0.00000151085 s 0.000001497925 s 1.01
sum / HLOOpt / tpu / Forward 0.000001532775 s 0.0000015321 s 1.00
sum / PartOpt / tpu / Forward 0.0000014927000000000003 s 0.0000014986250000000002 s 1.00
sum / IPartOpt / tpu / Forward 0.000001535425 s 0.0000015334750000000002 s 1.00
sum / DefOpt / tpu / Forward 0.0000015002750000000002 s 0.000001498375 s 1.00
sum / IDefOpt / tpu / Forward 0.0000015358 s 0.0000015289249999999995 s 1.00
sum / JaXPipe / tpu / PreRev 0.000001045525 s 0.000001050825 s 0.99
sum / JaXPipe / tpu / PostRev 0.00000108545 s 0.00000109645 s 0.99
sum / JaXPipe / tpu / BothRev 0.000001051075 s 0.000001054325 s 1.00
sum / Jax / tpu / BothRev 0.000001092725 s 0.000001092325 s 1.00
sum / HLOOpt / tpu / PreRev 0.000001048825 s 0.00000105305 s 1.00
sum / HLOOpt / tpu / PostRev 0.00000108435 s 0.000001093525 s 0.99
sum / HLOOpt / tpu / BothRev 0.000001057975 s 0.00000105495 s 1.00
sum / PartOpt / tpu / PreRev 0.0000010865 s 0.0000010913 s 1.00
sum / PartOpt / tpu / PostRev 0.000001051125 s 0.00000104745 s 1.00
sum / PartOpt / tpu / BothRev 0.0000010857500000000002 s 0.0000010863 s 1.00
sum / IPartOpt / tpu / PreRev 0.0000010480000000000002 s 0.0000010546 s 0.99
sum / IPartOpt / tpu / PostRev 0.0000010873499999999998 s 0.00000109015 s 1.00
sum / IPartOpt / tpu / BothRev 0.0000010501249999999998 s 0.0000010541250000000005 s 1.00
sum / DefOpt / tpu / PreRev 0.000001086075 s 0.00000108945 s 1.00
sum / DefOpt / tpu / PostRev 0.0000010468 s 0.0000010604500000000002 s 0.99
sum / DefOpt / tpu / BothRev 0.0000010877 s 0.00000108885 s 1.00
sum / IDefOpt / tpu / PreRev 0.000001046125 s 0.0000010493 s 1.00
sum / IDefOpt / tpu / PostRev 0.000001085225 s 0.000001086325 s 1.00
sum / IDefOpt / tpu / BothRev 0.000001046225 s 0.00000104625 s 1.00
sum / JaXPipe / cpu / Primal 0.000014969 s 0.000008534619983038283 s 1.75
sum / Jax / cpu / Primal 0.000014789 s 0.00000848227999995288 s 1.74
sum / HLOOpt / cpu / Primal 0.000014407 s 0.00000842622001982818 s 1.71
sum / PartOpt / cpu / Primal 0.000014842 s 0.000008917679979276727 s 1.66
sum / IPartOpt / cpu / Primal 0.00001458 s 0.000008620339958724798 s 1.69
sum / DefOpt / cpu / Primal 0.000015223 s 0.00000882572004229587 s 1.72
sum / IDefOpt / cpu / Primal 0.000015297 s 0.000008002759977898677 s 1.91
sum / JaXPipe / cpu / Forward 0.000021089 s 0.000012622960002772744 s 1.67
sum / Jax / cpu / Forward 0.000020419 s 0.000012528960032796022 s 1.63
sum / HLOOpt / cpu / Forward 0.000020679 s 0.00001259522004147584 s 1.64
sum / PartOpt / cpu / Forward 0.000020356 s 0.000012516680026237735 s 1.63
sum / IPartOpt / cpu / Forward 0.000019847 s 0.000012727580005957862 s 1.56
sum / DefOpt / cpu / Forward 0.000020596 s 0.0000123404800069693 s 1.67
sum / IDefOpt / cpu / Forward 0.000020513 s 0.000012064680004186811 s 1.70
sum / JaXPipe / cpu / PreRev 0.000018943 s 0.000012319859997660388 s 1.54
sum / JaXPipe / cpu / PostRev 0.00001969 s 0.000012000920023638172 s 1.64
sum / JaXPipe / cpu / BothRev 0.000019041 s 0.00001183335996756796 s 1.61
sum / Jax / cpu / BothRev 0.00001953 s 0.000011508679990583917 s 1.70
sum / HLOOpt / cpu / PreRev 0.000019179 s 0.000012048360040353146 s 1.59
sum / HLOOpt / cpu / PostRev 0.000019672 s 0.000014090959984969233 s 1.40
sum / HLOOpt / cpu / BothRev 0.000019776 s 0.0000120684399826132 s 1.64
sum / PartOpt / cpu / PreRev 0.000020115 s 0.00001170439998531947 s 1.72
sum / PartOpt / cpu / PostRev 0.000019526 s 0.000012685339997915436 s 1.54
sum / PartOpt / cpu / BothRev 0.000019612 s 0.000011704099997587036 s 1.68
sum / IPartOpt / cpu / PreRev 0.000019538 s 0.000011762620006265934 s 1.66
sum / IPartOpt / cpu / PostRev 0.000019527 s 0.000011630520029939362 s 1.68
sum / IPartOpt / cpu / BothRev 0.000019783 s 0.000011163259978275164 s 1.77
sum / DefOpt / cpu / PreRev 0.000019404000000000003 s 0.000011822399965240038 s 1.64
sum / DefOpt / cpu / PostRev 0.000019425 s 0.000011481779956739049 s 1.69
sum / DefOpt / cpu / BothRev 0.000019716 s 0.00001181014000394498 s 1.67
sum / IDefOpt / cpu / PreRev 0.000019846 s 0.00001237589999618649 s 1.60
sum / IDefOpt / cpu / PostRev 0.000019331 s 0.00001125832002799143 s 1.72
sum / IDefOpt / cpu / BothRev 0.000018684 s 0.000011904700031664106 s 1.57
value_and_grad / JaXPipe / cpu / Primal 0.000016176939989236417 s 0.00001525831999060756 s 1.06
value_and_grad / Jax / cpu / Primal 0.00001554813999973703 s 0.000015128719960557646 s 1.03
value_and_grad / HLOOpt / cpu / Primal 0.000015296199999284 s 0.000014485420024357154 s 1.06
value_and_grad / PartOpt / cpu / Primal 0.000014763099970878102 s 0.000015235659993777518 s 0.97
value_and_grad / IPartOpt / cpu / Primal 0.00001577749993884936 s 0.000014750719947187462 s 1.07
value_and_grad / DefOpt / cpu / Primal 0.000015779440000187607 s 0.000014715099978275247 s 1.07
value_and_grad / IDefOpt / cpu / Primal 0.000014645979990746128 s 0.000014892859990141006 s 0.98
value_and_grad / JaXPipe / cuda / Primal 0.000034623000000000004 s
value_and_grad / Jax / cuda / Primal 0.00005264 s
value_and_grad / HLOOpt / cuda / Primal 0.000033696 s
value_and_grad / PartOpt / cuda / Primal 0.00003424 s
value_and_grad / IPartOpt / cuda / Primal 0.00003472 s
value_and_grad / DefOpt / cuda / Primal 0.000038688 s
value_and_grad / IDefOpt / cuda / Primal 0.000038719000000000007 s
value_and_grad / JaXPipe / tpu / Primal 0 s 0 s 1
value_and_grad / Jax / tpu / Primal 0 s 0 s 1
value_and_grad / HLOOpt / tpu / Primal 0 s 0 s 1
value_and_grad / PartOpt / tpu / Primal 0 s 0 s 1
value_and_grad / IPartOpt / tpu / Primal 0 s 0 s 1
value_and_grad / DefOpt / tpu / Primal 0 s 0 s 1
value_and_grad / IDefOpt / tpu / Primal 0 s 0 s 1
value_and_grad / JaXPipe / cpu / Primal 0.000023745 s 0.00001525831999060756 s 1.56
value_and_grad / Jax / cpu / Primal 0.000023209 s 0.000015128719960557646 s 1.53
value_and_grad / HLOOpt / cpu / Primal 0.000023165000000000003 s 0.000014485420024357154 s 1.60
value_and_grad / PartOpt / cpu / Primal 0.000023578 s 0.000015235659993777518 s 1.55
value_and_grad / IPartOpt / cpu / Primal 0.000023159 s 0.000014750719947187462 s 1.57
value_and_grad / DefOpt / cpu / Primal 0.000023249 s 0.000014715099978275247 s 1.58
value_and_grad / IDefOpt / cpu / Primal 0.000023265 s 0.000014892859990141006 s 1.56
jaxmd20 / JaXPipe / cuda / Primal 0.001455192 s
jaxmd20 / Jax / cuda / Primal 0.001440409 s
jaxmd20 / HLOOpt / cuda / Primal 0.001354937 s
jaxmd20 / PartOpt / cuda / Primal 0.001331064 s
jaxmd20 / IPartOpt / cuda / Primal 0.0013646959999999 s
jaxmd20 / DefOpt / cuda / Primal 0.000944188 s
jaxmd20 / IDefOpt / cuda / Primal 0.000974971 s
jaxmd20 / JaXPipe / cuda / Forward 0.001631991 s
jaxmd20 / Jax / cuda / Forward 0.00187743 s
jaxmd20 / HLOOpt / cuda / Forward 0.001712792 s
jaxmd20 / PartOpt / cuda / Forward 0.001714423 s
jaxmd20 / IPartOpt / cuda / Forward 0.001740919 s
jaxmd20 / DefOpt / cuda / Forward 0.001707318 s
jaxmd20 / IDefOpt / cuda / Forward 0.001723735 s
jaxmd20 / JaXPipe / cuda / PreRev 0.002773201 s
jaxmd20 / JaXPipe / cuda / PostRev 0.005449763 s
jaxmd20 / JaXPipe / cuda / BothRev 0.002788913 s
jaxmd20 / Jax / cuda / BothRev 0.0054343709999999 s
jaxmd20 / HLOOpt / cuda / PreRev 0.0028404 s
jaxmd20 / HLOOpt / cuda / PostRev 0.005525922 s
jaxmd20 / HLOOpt / cuda / BothRev 0.00280357 s
jaxmd20 / PartOpt / cuda / PreRev 0.0028976479999999 s
jaxmd20 / PartOpt / cuda / PostRev 0.005594082 s
jaxmd20 / PartOpt / cuda / BothRev 0.0028265129999999 s
jaxmd20 / IPartOpt / cuda / PreRev 0.002937712 s
jaxmd20 / IPartOpt / cuda / PostRev 0.005611298 s
jaxmd20 / IPartOpt / cuda / BothRev 0.0028372 s
jaxmd20 / DefOpt / cuda / PreRev 0.002918064 s
jaxmd20 / DefOpt / cuda / PostRev 0.00281928 s
jaxmd20 / DefOpt / cuda / BothRev 0.002826097 s
jaxmd20 / IDefOpt / cuda / PreRev 0.002897969 s
jaxmd20 / IDefOpt / cuda / PostRev 0.00233426 s
jaxmd20 / IDefOpt / cuda / BothRev 0.002831697 s
jaxmd20 / JaXPipe / tpu / Primal 0.009277274375 s 0.009285324375 s 1.00
jaxmd20 / Jax / tpu / Primal 0.0092756325 s 0.0092787793749999 s 1.00
jaxmd20 / HLOOpt / tpu / Primal 0.009160073125 s 0.009167460625 s 1.00
jaxmd20 / PartOpt / tpu / Primal 0.00919567125 s 0.009200495625 s 1.00
jaxmd20 / IPartOpt / tpu / Primal 0.00919831625 s 0.0091986468749999 s 1.00
jaxmd20 / DefOpt / tpu / Primal 0.008805305625 s 0.0087980087499999 s 1.00
jaxmd20 / IDefOpt / tpu / Primal 0.00869804 s 0.00869818375 s 1.00
jaxmd20 / JaXPipe / tpu / Forward 0.017413935625 s 0.0174107325 s 1.00
jaxmd20 / Jax / tpu / Forward 0.0187338025 s 0.0187580718749999 s 1.00
jaxmd20 / HLOOpt / tpu / Forward 0.017401165625 s 0.017402763125 s 1.00
jaxmd20 / PartOpt / tpu / Forward 0.0174205006249999 s 0.017414855625 s 1.00
jaxmd20 / IPartOpt / tpu / Forward 0.017415825 s 0.017412161875 s 1.00
jaxmd20 / DefOpt / tpu / Forward 0.017415030625 s 0.0174230925 s 1.00
jaxmd20 / IDefOpt / tpu / Forward 0.01741135375 s 0.017409656875 s 1.00
jaxmd20 / JaXPipe / tpu / PreRev 0.025449028125 s 0.025466438125 s 1.00
jaxmd20 / JaXPipe / tpu / PostRev 0.02186584125 s 0.02187530625 s 1.00
jaxmd20 / JaXPipe / tpu / BothRev 0.02543556625 s 0.02545988875 s 1.00
jaxmd20 / Jax / tpu / BothRev 0.0218562487499999 s 0.021869294375 s 1.00
jaxmd20 / HLOOpt / tpu / PreRev 0.02555658 s 0.025579618125 s 1.00
jaxmd20 / HLOOpt / tpu / PostRev 0.020704740625 s 0.02071224125 s 1.00
jaxmd20 / HLOOpt / tpu / BothRev 0.02565634625 s 0.025692965 s 1.00
jaxmd20 / PartOpt / tpu / PreRev 0.025476410625 s 0.02546148625 s 1.00
jaxmd20 / PartOpt / tpu / PostRev 0.02151089875 s 0.021266319375 s 1.01
jaxmd20 / PartOpt / tpu / BothRev 0.025579195 s 0.02556031 s 1.00
jaxmd20 / IPartOpt / tpu / PreRev 0.025449066875 s 0.0254667925 s 1.00
jaxmd20 / IPartOpt / tpu / PostRev 0.021525515625 s 0.0215107793749999 s 1.00
jaxmd20 / IPartOpt / tpu / BothRev 0.025544875 s 0.0255709374999999 s 1.00
jaxmd20 / DefOpt / tpu / PreRev 0.0254830025 s 0.02546553125 s 1.00
jaxmd20 / DefOpt / tpu / PostRev 0.018809855625 s 0.0188311475 s 1.00
jaxmd20 / DefOpt / tpu / BothRev 0.02557464875 s 0.02555516125 s 1.00
jaxmd20 / IDefOpt / tpu / PreRev 0.0254569825 s 0.0254661425 s 1.00
jaxmd20 / IDefOpt / tpu / PostRev 0.0183353737499999 s 0.018310786875 s 1.00
jaxmd20 / IDefOpt / tpu / BothRev 0.02554438 s 0.0255700231249999 s 1.00
jaxmd40 / JaXPipe / cpu / Primal 0.0754830029999999 s 0.073993308 s 1.02
jaxmd40 / Jax / cpu / Primal 0.071149994 s 0.073613883 s 0.97
jaxmd40 / HLOOpt / cpu / Primal 0.091414001 s 0.107396891 s 0.85
jaxmd40 / PartOpt / cpu / Primal 0.0721791589999999 s 0.081463886 s 0.89
jaxmd40 / IPartOpt / cpu / Primal 0.07307664 s 0.080921025 s 0.90
jaxmd40 / DefOpt / cpu / Primal 0.084365976 s 0.1020056 s 0.83
jaxmd40 / IDefOpt / cpu / Primal 0.089137949 s 0.109501773 s 0.81
jaxmd40 / JaXPipe / cpu / Forward 0.162158269 s 0.189545239 s 0.86
jaxmd40 / Jax / cpu / Forward 0.090833853 s 0.10736556 s 0.85
jaxmd40 / HLOOpt / cpu / Forward 0.168094777 s 0.189890195 s 0.89
jaxmd40 / PartOpt / cpu / Forward 0.161890177 s 0.190169587 s 0.85
jaxmd40 / IPartOpt / cpu / Forward 0.176897202 s 0.2004099889999999 s 0.88
jaxmd40 / DefOpt / cpu / Forward 0.1607591109999999 s 0.192691593 s 0.83
jaxmd40 / IDefOpt / cpu / Forward 0.175798451 s 0.1917536279999999 s 0.92
jaxmd40 / JaXPipe / cpu / PreRev 0.228621829 s 0.2491823819999999 s 0.92
jaxmd40 / JaXPipe / cpu / PostRev 0.143784428 s 0.159837425 s 0.90
jaxmd40 / JaXPipe / cpu / BothRev 0.227943803 s 0.251328121 s 0.91
jaxmd40 / Jax / cpu / BothRev 0.142277806 s 0.152441235 s 0.93
jaxmd40 / HLOOpt / cpu / PreRev 0.227200626 s 0.247478724 s 0.92
jaxmd40 / HLOOpt / cpu / PostRev 0.1807468639999999 s 0.204412115 s 0.88
jaxmd40 / HLOOpt / cpu / BothRev 0.2598833619999999 s 0.296773168 s 0.88
jaxmd40 / PartOpt / cpu / PreRev 0.241460985 s 0.264468797 s 0.91
jaxmd40 / PartOpt / cpu / PostRev 0.1400372299999999 s 0.1569377059999999 s 0.89
jaxmd40 / PartOpt / cpu / BothRev 0.273114126 s 0.285594582 s 0.96
jaxmd40 / IPartOpt / cpu / PreRev 0.220240829 s 0.256192396 s 0.86
jaxmd40 / IPartOpt / cpu / PostRev 0.137621319 s 0.149952415 s 0.92
jaxmd40 / IPartOpt / cpu / BothRev 0.2499928669999999 s 0.276281015 s 0.90
jaxmd40 / DefOpt / cpu / PreRev 0.224390852 s 0.256144481 s 0.88
jaxmd40 / DefOpt / cpu / PostRev 0.17791083 s 0.19889593 s 0.89
jaxmd40 / DefOpt / cpu / BothRev 0.254190759 s 0.285229248 s 0.89
jaxmd40 / IDefOpt / cpu / PreRev 0.232619737 s 0.255377117 s 0.91
jaxmd40 / IDefOpt / cpu / PostRev 0.1792368899999999 s 0.214632765 s 0.84
jaxmd40 / IDefOpt / cpu / BothRev 0.253944873 s 0.2347946269999999 s 1.08
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / JaXPipe / cuda / Primal 1.701141962 s
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / Jax / cuda / Primal 1.704263293 s
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / HLOOpt / cuda / Primal 1.716250644 s
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / PartOpt / cuda / Primal 1.695894459 s
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / IPartOpt / cuda / Primal 1.694047356 s
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / DefOpt / cuda / Primal 1.665130931 s
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / IDefOpt / cuda / Primal 1.911048516 s
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / JaXPipe / tpu / Primal 3.038812840625 s 3.038831480625 s 1.00
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / Jax / tpu / Primal 3.0394444575 s 3.0393183325 s 1.00
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / HLOOpt / tpu / Primal 3.121668325625 s 3.121700555625 s 1.00
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / PartOpt / tpu / Primal 3.060118801875 s 3.06013362875 s 1.00
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / IPartOpt / tpu / Primal 3.060385166875 s 3.0603782275 s 1.00
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / DefOpt / tpu / Primal 2.1025125950000003 s 2.1024763275 s 1.00
neuralgcm_v1/deterministic_2_8_deg_inner_steps_24_outer_steps_4 / IDefOpt / tpu / Primal 4.3564873175 s 4.35644791125 s 1.00
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / JaXPipe / cpu / Primal 6.332010231 s 6.952634694 s 0.91
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / Jax / cpu / Primal 6.308204574 s 6.998000881 s 0.90
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / HLOOpt / cpu / Primal 6.230678027 s 6.8480217 s 0.91
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / PartOpt / cpu / Primal 6.471540448 s 7.121108887 s 0.91
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / IPartOpt / cpu / Primal 6.308579509 s 6.987212517 s 0.90
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / DefOpt / cpu / Primal 2.532982659 s 2.837167547 s 0.89
neuralgcm_v1/deterministic_2_8_deg_inner_steps_2_outer_steps_2 / IDefOpt / cpu / Primal 6.879658571 s 7.750080211999999 s 0.89

This comment was automatically generated by workflow using github-action-benchmark.

let summary = "Equivalent to " "`MPI_Comm_rank(MPI_COMM_WORLD, &rank)`";

let arguments = (
ins AnyTensor : $inrank
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this shouldn't require an inrank, it can just return the rank, same with size

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed the inrank arg from comm rank and updated the lowering pass. In the new lowering pass, I now create a constant tensor to hold the result which I then pass into the wrapper function, which allows me to still use output operand aliases to get the result back.

// Create a constant tensor to hold the result
auto tensorType = llvm::cast<RankedTensorType>(op->getResultTypes()[0]);
auto constantAttr = DenseIntElementsAttr::get(tensorType,
ArrayRef<int32_t>{-1});
Value constantTensor = rewriter.create<stablehlo::ConstantOp>(
op.getLoc(), tensorType, constantAttr);

Is this an ok approach? If so, I'll go ahead and change all the other ops similarly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed inrank, insize, inrequest from comm_rank, comm_size, isend, irecv.

Now, I still have recv, irecv and allreduce taking in an inbuf and outputting an outbuf. Is this design ok, or should I similarly remove the inbufs from those too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants