Conversation
| } | ||
| if (!validForCommit) return; | ||
| size_t index = 0; | ||
| uint8_t ins_cnt = 0; |
There was a problem hiding this comment.
@jj16791 Can the number of micro-ops generated per macro-op exceed 256?
There was a problem hiding this comment.
I think this is the number of micro-ops for a single macro-op rather than total micro-ops in the ROB.
In such a case, the max number of micro-ops generated per macro op is implementation defined. The only candidate I can think of for currently supportable instructions is the AArch64 gather/scatter loads IF the vector length is 2048-bits and you are doing a byte load (i.e. loading 256 byte elements).
However, checking the spec the gather / scatter instructions which load/store single bytes only permit the source/destination register to be of the .S or .D variant - meaning that there would be a max of 64 or 32 elements per vector respectively to scatter or gather.
In summary - no, the number of micro-ops per macro-op is very very unlikely to exceed 256.
There was a problem hiding this comment.
Sorry, I should've specified i did mean number of micro-ops per macro-op.
In summary - no, the number of micro-ops per macro-op is very very unlikely to exceed 256.
When you say very very unlikely, that means there it could exceed 256 right? If that's the case the data type above needs to be incremented otherwise the logic is broken
There was a problem hiding this comment.
Currently it could never exceed 256, but in the future there is a extremely small chance it could. For now I would keep it the same
There was a problem hiding this comment.
After a full review, there would be no harm in making this a uint16 to "future proof" this
|
Please add a description, assignee, labels, and ensure the status on the associated github project is correct |
I've added labels and asignee. I don't think the description is necessary as the title communicates everything that is done. I've just changed one function. |
FinnWilkinson
left a comment
There was a problem hiding this comment.
All looks fine, just a few comments made.
Could you add to the description of this PR the performance improvement seen on your local system and on the Mac Studio? (in Release mode). Looking at the Jenkin's performance pipeline results there doesn't seem to be much improvement (with the exception of triad_gcc_a64fx). There are also 2 regressions...
| .vscode | ||
| .idea | ||
| .DS_Store | ||
| .cache |
There was a problem hiding this comment.
This is duplicated from PR #353, please remove it from this PR or close the other PR
| } | ||
| if (!validForCommit) return; | ||
| size_t index = 0; | ||
| uint8_t ins_cnt = 0; |
There was a problem hiding this comment.
To avoid confusion akin to the other comment left on this variable, perhaps rename this to uopCount or microOpCount?
| if (mop_id == insnId) { | ||
| if (!uop->isWaitingCommit()) return; | ||
| ins_cnt++; | ||
| } else if (ins_cnt && mop_id != insnId) { |
There was a problem hiding this comment.
mop_id && insnID in this else if seems redundant given that this else if can only be reached if this is true
| while (index < bsize) { | ||
| auto& uop = buffer_[index]; | ||
| uint64_t mop_id = uop->getInstructionId(); | ||
|
|
||
| if (mop_id == insnId) { | ||
| if (!uop->isWaitingCommit()) return; | ||
| ins_cnt++; | ||
| } else if (ins_cnt && mop_id != insnId) { | ||
| break; | ||
| } | ||
| index++; | ||
| } | ||
|
|
||
| index = index - ins_cnt; | ||
| for (int x = 0; x < ins_cnt; x++) { | ||
| buffer_[index]->setCommitReady(); | ||
| index++; | ||
| } | ||
|
|
There was a problem hiding this comment.
Generally looks good. I think some comments akin to the previous implementation, plus one explaining that the setCommitReady logic is only reached iff all micro-ops of a macro-op are waitingCommit, would be beneficial.
| } | ||
| if (!validForCommit) return; | ||
| size_t index = 0; | ||
| uint8_t ins_cnt = 0; |
There was a problem hiding this comment.
After a full review, there would be no harm in making this a uint16 to "future proof" this
| } | ||
|
|
||
| index = index - ins_cnt; | ||
| for (int x = 0; x < ins_cnt; x++) { |
There was a problem hiding this comment.
Could x be given the same type as ins_cnt
Hmm that seems strange, my access to mac studio is a bit messed up given my laptop broke. Let me see what I can do to address this. It might be compiler dependant though but I will investigate. |
dANW34V3R
left a comment
There was a problem hiding this comment.
This seems like a very "minimal gains" optimisation e.g. no fundamental algorithmic change, rather less repeated array loads/getter calls in the high level language (which may not happen once compiled). I would need to see actual performance gains before approval. Some comments explaining what is going on also needed (took me a bit of time to work it all through and understand)
|
#rerun tests |
|
Closing as stale / not planned |
No description provided.