Skip to content

Multiple HWPE support + ECC#53

Merged
FrancescoConti merged 33 commits intomasterfrom
smazzola/chimera
Apr 28, 2025
Merged

Multiple HWPE support + ECC#53
FrancescoConti merged 33 commits intomasterfrom
smazzola/chimera

Conversation

@FrancescoConti
Copy link
Member

@FrancescoConti FrancescoConti commented Mar 7, 2025

This PR adds:

Luigi Ghionda and others added 30 commits August 14, 2024 15:07
* Introduce additional encoders/decoders along the interconnect to properly handle ECC bits across the different modules
* Add router and arbiter modules able to properly handle ECC bits
…dify the original modules accordingly

Initially, dedicated modules were introduced for simplicity. However, to avoid adding unnecessary files, they have now been removed and merged with the original modules
* Errors on data are evaluated on read valid
* Errors on metadata are evaluated on successful handshake
Some HWPE might depend on the r_valid for the read response whereas
don't care for the write and they can support multiple outstanding
instructions. So a r_valid generated for a write coming after multiple
cycles could behave like a r_valid for a read response(read request
follows a write request). Even though the TCDM responds the data after
a single cycle latency, any FIFO in the request path can cause the r_valid
to arrive a few cycles later even though the handshake is done(writing data
to the FIFO).  For selected HWPEs we can filter the r_valid for the write
requests using FILTER_WRITE_R_VALID parameter exposed to the top.
fix arbiter tree to handle any number of wide ports
2D interface arrays are not synthesizable at the moment. Thus, 2D interfaces
are flattened to 1D arrays. Additionally the alignements are fixed
Copy link
Member Author

@FrancescoConti FrancescoConti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot be an "official" review because I opened the PR, but I do have a few comments, mostly cosmetical. Despite the name of the PR, which mostly refers to @arpansur's work, the bulk of the comments refer to the ECC work from @LuigiGhionda.
@sermazz in an ideal world it would be a bit better to split this in two PRs (one for ECC, one for multi-HWPE) but since it's already here, and tested, I suggest we complete it a bit and then merge as-is, without splitting.

@LuigiGhionda on the latter, I think that there is a need for a bit of cleanup in documentation, very minor things such as parameter names using the Lowrisc style instead of the style used in the rest of hci, and my always-hated parametric types (I know, I'm boring!). Moreover, it would be good since you have a register generator to include the hjson source in a subfolder (e.g., ecc/gen?) and, importantly, also include the script used for generation. All of this is non-urgent and non-blocking and if for any reason we decide we need this PR merged sooner, we can go on - perhaps opening an issue to not forget to fix these things.

hwpe-stream: { git: "https://github.com/pulp-platform/hwpe-stream.git", version: 1.8.0 }
cluster_interconnect: { git: "https://github.com/pulp-platform/cluster_interconnect.git", rev: 1284def6c0b7f7e9355eb093d00883ad9dead1b7 }
L2_tcdm_hybrid_interco: { git: "https://github.com/pulp-platform/L2_tcdm_hybrid_interco.git", version: 1.0.0 }
redundancy_cells: { git: "https://github.com/pulp-platform/redundancy_cells.git", rev: c37bdb47339bf70e8323de8df14ea8bbeafb6583 } # branch: astral_rebase
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this a version @LuigiGhionda ?

*/

/**
* ADD DESCRIPTION
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary to add (minimal) description before merging!

*/

/**
* ADD DESCRIPTION
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add description!

* | *EXPFIFO* | 0 | Depth of HCI router FIFO. |
* +---------------------+-----------------------------+----------------------------------------------------------------------------------+
* | *SEL_LIC* | 0 | Kind of LIC to instantiate (0=regular L1, 1=L2). |
* +---------------------+-----------------------------+----------------------------------------------------------------------------------+
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing some parameters here!

* this License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
* CONDITIONS OF ANY KIND, either express or implied. See the License for the
* specific language governing permissions and limitations under the License.
*/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add doc or refer to core_sink

import hci_ecc_manager_reg_pkg::*;
#(
parameter int unsigned N_CHUNK = 1,
parameter int unsigned ParData = 1,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hci we typically use a snake-case all-caps style for parameters (from before Lowrisc). Can you align ParData to PAR_DATA, ParMeta to PAR_META?

// Licensed under the Apache License, Version 2.0, see LICENSE for details.
// SPDX-License-Identifier: Apache-2.0
//
// Register Package auto-generated by `reggen` containing data structure
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you include the actual script used to generate from hjson? I am a reggen newbie ;)

#(
parameter int unsigned FIFO_DEPTH = 0,
parameter int unsigned NB_OUT_CHAN = 8,
parameter bit UseECC = 0,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hci we typically use a snake-case all-caps style for parameters (from before Lowrisc). Can you align UseECC to USE_ECC?

parameter int unsigned NB_OUT_CHAN = 2,
parameter int unsigned FILTER_WRITE_R_VALID = 0
parameter int unsigned FILTER_WRITE_R_VALID = 0,
parameter bit UseECC = 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hci we typically use a snake-case all-caps style for parameters (from before Lowrisc). Can you align UseECC to USE_ECC?


);

localparam int unsigned RespDataWidth = (UseECC) ? (32+7) : 32;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hci we typically use a snake-case all-caps style for parameters (from before Lowrisc). Can you align RespDataWidth to RESP_DATA_WIDTH?

// Register instances
// R[data_correctable_errors]: V(False)

prim_subreg #(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear these autogenerated registers are not soft-clearable. Can we fix that by using some reggen option?

Copy link
Contributor

@sermazz sermazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrancescoConti I don't mind not splitting, we can complete this PR and merge everything together. I took the freedom to extend the description of the PR for documentation purposes :)

I have just one more comment: when using the top hci_interconnect we can have multiple HWPEs on the shallow branch (line). However, the top for ECC, hci_ecc_interconnect is still hardcoded to one HWPE only (line).
I stumbled upon this when integrating Chimera's PULP Cluster on Astral's, where you can configure whether you want ECC or not. This difference in the interface creates issues when instantiating the HCI, so currently in the PULP Cluster I have an assertion that errors out if ECC is enabled when HCI has >1 HWPE ports (see here).
I started implementing support for this as I thought it was a matter of only hci_ecc_enc and _dec but then I got blocked when I realized also hci_ecc_manager requires modifications, obviously.
Is there a use-case for multiple HWPEs using ECC? Do we want to add support for that? Is it complicated? @LuigiGhionda

@FrancescoConti
Copy link
Member Author

I would say it is for sure interesting (for all the post-Astral line of research, which is very active and alive) but not necessarily something to fix in this PR which is already big. my2c, @LuigiGhionda what's your view on that (also @yvantor might have opinions)

@yvantor
Copy link

yvantor commented Mar 10, 2025

I would say it is for sure interesting (for all the post-Astral line of research, which is very active and alive) but not necessarily something to fix in this PR which is already big. my2c, @LuigiGhionda what's your view on that (also @yvantor might have opinions)

I think allowing for multiple HWPEs that support ECC is in general very interesting since now we have two accelerators that have fault tolerance features (NEureka and RedMulE). Might be useful to open an issue to keep it in the back of our minds and tackle it in a different dedicated PR if the HCI requires significant changes.

@sermazz
Copy link
Contributor

sermazz commented Mar 10, 2025

I would say it is for sure interesting (for all the post-Astral line of research, which is very active and alive) but not necessarily something to fix in this PR which is already big. my2c, @LuigiGhionda what's your view on that (also @yvantor might have opinions)

I think allowing for multiple HWPEs that support ECC is in general very interesting since now we have two accelerators that have fault tolerance features (NEureka and RedMulE). Might be useful to open an issue to keep it in the back of our minds and tackle it in a different dedicated PR if the HCI requires significant changes.

Ok then, sounds good. Do you think we can put an assert somewhere in the HCI so the top-level instantiating it doesn't need to be aware of this?
I think it makes sense to align the interface of hci and hci_ecc (so they both have an array of HWPE interfaces) and error out in the hci_ecc when N_HWPE != 1.

@sermazz sermazz changed the title Multiple HWPE support (new) Multiple HWPE support + ECC Mar 11, 2025
ECC HCI top-level now has the same interface of the normal HCI (i.e., with
multiple HWPE ports) to avoid problems with higher-level parametric
instantiations of the HCI. The ECC HCI top-level currently does not support
multiple HWPEs and errors out when N_HWPE > 1
@sermazz
Copy link
Contributor

sermazz commented Mar 11, 2025

Ok then, sounds good. Do you think we can put an assert somewhere in the HCI so the top-level instantiating it doesn't need to be aware of this? I think it makes sense to align the interface of hci and hci_ecc (so they both have an array of HWPE interfaces) and error out in the hci_ecc when N_HWPE != 1.

Done, and opened issue #54

@sermazz sermazz dismissed their stale review March 11, 2025 11:39

Moved to issue #54

@FrancescoConti
Copy link
Member Author

Ping @LuigiGhionda @arpansur !

Copy link
Contributor

@arpansur arpansur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-hwpe integration looks fine. Some enhancements are needed is tracked using
#56 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Comments