Conversation
ABenC377
left a comment
There was a problem hiding this comment.
Code all looks good. Though, I'm not sure about the infinite loop checker. If I've missed a discussion about this then please ignore me. But if it is being added as a work around for problems encountered because of erroneous configs or broken logic, shouldn't we be fixing the causes not the symptoms? Erroneous configs are kind of a user error, but we could update the documentation to help them avoid this, and if there is broken logic in SimEng we should be fixing it not plastering over the resulting problem. I suppose I can see the value of this type of check in debug mode to flag a problem to the user if they are running into issues in release, but it seems like unnecessary overhead for Release.
FinnWilkinson
left a comment
There was a problem hiding this comment.
Most bits look good. Comments mainly about adhearing to the project's style and some confusion on SVE helpers
FinnWilkinson
left a comment
There was a problem hiding this comment.
All looking pretty good - just a few changes needed and some pedantic comments on comments 😅
| std::cerr << "[SimEng:ReorderBuffer] Infinite loop detected in rob " | ||
| "commit at instruction address " | ||
| << std::hex << uop->getInstructionAddress() << std::dec | ||
| << " (" << uop->getMicroOpIndex() << ")." << std::endl; |
There was a problem hiding this comment.
Whats the rational for printing the Micro-op index?
There was a problem hiding this comment.
This may give additional context to the user what exactly is stuck at the head of the ROB if the instruction is uopd. I have updated the comment generally, though we should have a discussion offline on what exactly we want to print out in one of these cases.
There was a problem hiding this comment.
Following offline discussion, we agree to keep this message in release mode, but add more detail s.t. the user is aware of why this is being triggered, what's triggering it, and what to do to resolve the issue. The Micro-op index in particular just adds additional verbosity so remains in the message.
| )"); | ||
| EXPECT_EQ(getGeneralRegister<int64_t>(21), -1); | ||
| EXPECT_EQ(getGeneralRegister<int64_t>(22), -1); | ||
| EXPECT_EQ(getGeneralRegister<int64_t>(23), 1); |
There was a problem hiding this comment.
What has caused this to change?
| stateChange = {ChangeType::REPLACEMENT, {R0}, {retval}}; | ||
| stateChange.memoryAddresses.push_back({mask, 1}); | ||
| uint64_t retval = static_cast<uint64_t>(bitmask); | ||
| stateChange = {ChangeType::REPLACEMENT, {R0}, {sizeof(retval)}}; |
There was a problem hiding this comment.
The man page for sched_getaffinity states that the function returns 0 on success and -1 on failure. This seems to be returning the size of a uint64_t which will always be 8. I think this is incorrect.
What I think you have done is update the value being set in memory correctly on 434 (updating the size). But also updated the value returned to the program to be the size also on 433. Depending on the behaviour we want, 433 should be updated potentially in the way it was done previously i.e. set to 0 if pid == 0 and -1 otherwise.
What was the reason for the update?
There was a problem hiding this comment.
This is worth @jj16791 investigating as it was his find/fix so he will know more than I do on the issue.
The reason given at the time was:
The assert you were triggering was KMP_ASSERT(__kmp_avail_proc == __kmp_topology->get_num_hw_threads());. Newer LLVM OMP runtimes require the affinity mask to be at least 8 bytes in length otherwise it will read the number of available cores out as 0 due to some casting. The affinity mask we were returning was 1 byte in length hence the assert triggered as __kmp_avail_proc was 0. Figured it out from a combination of isolating the instructions run leading up to this assert and then from GodBolt/SimEng figuring out why our mask was being converted to 0 procs available
I've been testing using a STREAM binary (with OpenMP support) compiled with ACFL23. With the current fix, this works. Removing the sizeof on 433 means that this fails. I do agree though that the current implementation doesn't line up with what I'd expect should work.
There was a problem hiding this comment.
Not sure what manpage you found but most should say something along the lines of "but see "C library/kernel differences" below, which notes that the underlying sched_getaffinity() differs in its return value". The difference is that is returns the number of bytes used to represent the mask. With a 64bit mask that's 8 bytes
There was a problem hiding this comment.
Always check for stuff like this re the non-wrapped version of the syscall does something different
…instructions/helpers from neoverse-v2 branch.
…xed a few metadata issues
…Updated comment for infinite loop detector
62561bc to
c9f708b
Compare
| "variable `robHeadRepeatLimit_`. Please raise " | ||
| "an issue on GitHub if the problem persists." | ||
| << std::endl; | ||
| exit(1); |
There was a problem hiding this comment.
Anything to be said here for resetting the counter and letting the simulation continue? Just thinking about a possible scenario where this triggers after days of simulation but it wasn't actually a failing state.
| const U* n = sourceValues[0].getAsVector<U>(); | ||
| T out = 0; | ||
| for (int i = 0; i < I; i++) { | ||
| out += n[i]; |
There was a problem hiding this comment.
Should n[i] be explicitly cast here? Just aware we've had some issues with relying on implicit casting in prior years
| const uint16_t partition_num = VL_bits / (sizeof(T) * 8); | ||
| T out[256 / sizeof(T)] = {0}; | ||
|
|
||
| U bit_0_mask = static_cast<U>(1) << (sizeof(T) * 8 - 1); |
There was a problem hiding this comment.
We should be getting bit 0 for the sign not bit N-1
There was a problem hiding this comment.
Check if there are any other instances of this
| // If no active lane has been found, select highest element instead | ||
| if (i == 0) lastElem = partition_num - 1; | ||
| } | ||
| return {n[lastElem], 256}; |
There was a problem hiding this comment.
Need to make sure we zero-extend here
| return {n[lastElem], 256}; | ||
| } | ||
|
|
||
| /** Helper function for SVE instructions with the format `clastb zd, pg, zd, |
There was a problem hiding this comment.
I think the mnemonic is wrong here (reflects CLASTB (vectors)
| } else { | ||
| out = n[lastElem]; | ||
| } | ||
| return {out, 256}; |
There was a problem hiding this comment.
Need to make sure we zero-extend here
| * T represents the type of sourceValues (e.g. for zn.d, T = uint64_t). | ||
| * Returns correctly formatted RegisterValue. */ | ||
| template <typename T> | ||
| RegisterValue sveSplice(srcValContainer& sourceValues, const uint16_t VL_bits) { |
There was a problem hiding this comment.
I assume this is the destructive variant of splice? If so, we should denote this somewhere
| sourceValues_[1].get<uint64_t>()); | ||
| break; | ||
| } | ||
| case Opcode::AArch64_SPLICE_ZPZ_D: { // splice zdn.d, pv, zdn.t, zm.d |
There was a problem hiding this comment.
Change .t in mnemonic to actual data type in use
| results_[0] = sveSplice<double>(sourceValues_, VL_bits); | ||
| break; | ||
| } | ||
| case Opcode::AArch64_SPLICE_ZPZ_S: { // splice zdn.s, pv, zdn.t, zm.s |
There was a problem hiding this comment.
Change .t in mnemonic to actual data type in use
Merging work done towards enabling support for a few codes for ACFL 23, namely STREAM, Minibude, Cloverleaf, Tealeaf, and Minisweep.
This PR is mostly made up of added instruction support. 58 instructions have been added, with 24 unique instructions with the remainder being variants. Most instructions are SVE, with some NEON added.
An additional feature of "infinite loop checking" has been added. This adds a counter in the ROB which throws an error if the same address has been at the head of the ROB for a very long time. This catches a few errors previously found where an erroneous config or broken logic can cause SimEng to get caught in a loop and sometimes eventually hit OOM.
This also fixes an OpenMP bug that has previously popped up for ACFL 23 support, work that Jack had done in a separate branch.
Tests are still being added, and the new group tests need to be added for all instructions. The PR will leave draft stage once all tests have been added.
Here are a list of instructions added: