From f7dc8e3516ab679d0fdeefec50124f39c6745ed6 Mon Sep 17 00:00:00 2001
From: Sam Parker <sam.parker@arm.com>
Date: Thu, 16 Dec 2021 10:25:13 +0000
Subject: [PATCH 1/8] RFC: Cranelift sizeless vector types

Copyright (c) 2021, Arm Limited.
---
 cranelift-sizeless-vector.md | 46 ++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)
 create mode 100644 cranelift-sizeless-vector.md

diff --git a/cranelift-sizeless-vector.md b/cranelift-sizeless-vector.md
new file mode 100644
index 0000000..821b8f0
--- /dev/null
+++ b/cranelift-sizeless-vector.md
@@ -0,0 +1,46 @@
+# Summary
+
+This RFC proposes a way to handle flexible vector types as specified at: https://github.com/WebAssembly/flexible-vectors
+
+[summary]: #summary
+
+The proposal is to introduce new sizeless vector types into Cranelift, that:
+- Express a vector, with a lane type and size, but a target-defined number of lanes.
+- Are denoted by prefixing the lane type with 'sv' (sizeless vector): svi16, svi32, svf32, etc...
+- Have a minimum width of 128-bits, meaning we can simply map to existing simd-128 implementations.
+
+# Motivation
+[motivation]: #motivation
+
+Flexible vectors are likely coming to WebAssembly so we should support the current spec. This is current path forward to support vectors that are wider than 128-bits.
+Rust is also currently starting to use LLVM's ScalableVectorType and, as a target backend, Cranelift could support those directly with a sizeless vector type.
+
+# Proposal
+[proposal]: #proposal
+
+We can add sizeless vector types by modifying existing structs to hold an extra bit of information to represent the sizeless nature:
+
+- The new types don't report themselves as vectors, so ty.is\_vector() = false, but are explicitly reported via ty.is\_sizeless\_vector().
+- The TypeSet and ValueTypeSet structs gain a bool to represent whether the type is sizeless.
+- TypeSetBuilder also gains a bool to control the building of those types.
+- At the encoding level, a bit is used to represent whether the type is sizeless, this bit has been taken from the max number of vector lanes supported, so they'd be reduced to 128 from 256.
+
+# Rationale and alternatives
+[rationale-and-alternatives]: #rationale-and-alternatives
+
+The main design decision is to have a vector type that isn't comparable to the existing vector types, which means that no existing paths can accidently try to treat them as such. Although setting the minimum size of the opaque type to 128 bits still allows us to infer the minimum number of lanes for a given lane type. The flexible vector specification also provides specific operations for lane accesses and shuffles so these aren't operations that we have to handle with our existing operations.
+
+It's possible that the hardware vector length will be fixed, so one alternative would be to generate IR with fixed widths using information from the backend. The one advantage is that we'd not have to add sizeless types in the IR at all. But there are two main disadvantages with this approach:
+- In an ahead-of-time setting, Cranelift would not be able to take advantage of larger vectors where the width is implementation defined. It would be possible to target architecture extensions such as Intel's AVX2 and AVX-512, which have static sizes, but not for architectures like Arm's SVE.
+- It is also currently undecided whether the flexible vector specification will include operations to set the vector length during program execution, so we shouldn't design out this possibility.
+
+This doesn't mean that a backend can't select a fixed width during code generation, if desired. The current simd-128 implementations would be able to map the sizeless types directly to their current operations and we could also add a legalization layer for backends which only want to support simd-128, or another fixed size.
+
+# Open questions
+[open-questions]: #open-questions
+
+- Does anyone care that Cranelift could only support a maximum of 128 vector lanes? The only problem I could imagine is if someone has a 128-lane vector of 1-bit bools...
+- How will the register allocator (regalloc2..?) handle a new vector type and/or potential register aliasing?
+- Are there parts of Cranelift, which aren't backend specific, that would need to handle these types? (is there generic stack handling or anything else data size specific...?)
+- What behaviour would the interpreter have? I would expect it to default to the existing simd-128 semantics.
+- Testing is also an issue, is it reasonable to assume that function under (run)test neither take or return sizeless vectors? If so, how should the result values be defined and checked against?

From aa56ec43a6b17c7fdfca5cadc7667b789c54f9f5 Mon Sep 17 00:00:00 2001
From: Sam Parker <sam.parker@arm.com>
Date: Tue, 1 Feb 2022 14:14:58 +0000
Subject: [PATCH 2/8] Fleshed out the frame layout

---
 cranelift-sizeless-vector.md | 80 ++++++++++++++++++++++++++++++++----
 1 file changed, 73 insertions(+), 7 deletions(-)

diff --git a/cranelift-sizeless-vector.md b/cranelift-sizeless-vector.md
index 821b8f0..d6ce800 100644
--- a/cranelift-sizeless-vector.md
+++ b/cranelift-sizeless-vector.md
@@ -18,12 +18,80 @@ Rust is also currently starting to use LLVM's ScalableVectorType and, as a targe
 # Proposal
 [proposal]: #proposal
 
-We can add sizeless vector types by modifying existing structs to hold an extra bit of information to represent the sizeless nature:
+The following proposal includes changes to the type system, adding specific entities for sizeless stack slots as well as specific instructions that take those entities as operands.
 
-- The new types don't report themselves as vectors, so ty.is\_vector() = false, but are explicitly reported via ty.is\_sizeless\_vector().
+We can add sizeless vector types by modifying existing structs to hold an extra bit of information to represent the sizeless nature.
+
+## Type System
+- The new types do not report themselves as vectors, so ty.is\_vector() = false, but are explicitly reported via ty.is\_sizeless\_vector().
+- is\_vector is also renamed to is\_sized\_vector to avoid ambiguity.
 - The TypeSet and ValueTypeSet structs gain a bool to represent whether the type is sizeless.
 - TypeSetBuilder also gains a bool to control the building of those types.
-- At the encoding level, a bit is used to represent whether the type is sizeless, this bit has been taken from the max number of vector lanes supported, so they'd be reduced to 128 from 256.
+- At the encoding level, a bit is used to represent whether the type is sizeless and this bit has been taken from special types range.
+- These changes allow the usual polymorphic vector operations to be automatically built for the new set of sizeless types.
+
+## IR Entities and Function Changes
+A new entity is added to the IR, the SizelessStackSlot, and the Function will hold these separately from the existing StackSlot entities, also providing two APIs to create them:
+- create\_sized\_stack\_slot
+- create\_sizeless\_stack\_slot
+
+Keeping two vectors enables us to continue to use each entity's index as it's slot identifier, and allows us to place the entities in different positions in the frame. It also enables the backends to easily track which slots are sizeless, and which are not.
+
+## Instructions
+Three new instructions are added at the IR level to use the new SizelessStackSlot entity:
+- SizelessStackAddr
+- SizelessStackLoad
+- SizelessStackStore
+
+The primary difference between these operations and their existing counterparts is that they only take a SizelessStackSlot operand, without a byte offset.
+
+## ABI Layer Changes
+
+- The method stack_stackslot_addr is renamed to sized\_stackslot\_addr.
+- The method sizeless\_stackslots\_addr is introduced, which takes a vector\_scale parameter.
+- get\_number\_of\_spillslots\_for\_value is also modified to take a vector\_scale parameter.
+
+A key challenge to supporting these new types is that the register allocator expects to be given an constant value for the size of a spill slot, for a given register class. So, the current expectation is that the backends will continue to provide a fixed number, potentially larger that they currently do. This value is provided to the ABI layer via a new method on the TargetIsa trait, vector\_scale, which returns the largest number of bytes for a target vector register. This can then also be used to scale the index when calculating the address of a SizelessStackSlot, if a backend chooses a fixed sized during code generation.
+
+With the notion of a sizeless stack slot possible, but not a sizeless spill slot, the proposed frame layout would look like the following:
+
+```
+//! ```plain
+//!   (high address)
+//!
+//!                              +---------------------------+
+//!                              |          ...              |
+//!                              | stack args                |
+//!                              | (accessed via FP)         |
+//!                              +---------------------------+
+//! SP at function entry ----->  | return address            |
+//!                              +---------------------------+
+//!                              |          ...              |
+//!                              | clobbered callee-saves    |
+//! unwind-frame base     ---->  | (pushed by prologue)      |
+//!                              +---------------------------+
+//! FP after prologue -------->  | FP (pushed by prologue)   |
+//!                              +---------------------------+
+//!                              | sizeless stack slots      |
+//!                              | (accessed via FP)         |
+//!                              |          ...              |
+//!                              +---------------------------+
+//!                              | spill slots               |
+//!                              | (accessed via nominal SP) |
+//!                              |          ...              |
+//!                              | stack slots               |
+//!                              | (accessed via nominal SP) |
+//! nominal SP --------------->  | (alloc'd by prologue)     |
+//! (SP at end of prologue)      +---------------------------+
+//!                              | [alignment as needed]     |
+//!                              |          ...              |
+//!                              | args for call             |
+//! SP before making a call -->  | (pushed at callsite)      |
+//!                              +---------------------------+
+//!
+//!   (low address)
+//! ```
+```
 
 # Rationale and alternatives
 [rationale-and-alternatives]: #rationale-and-alternatives
@@ -39,8 +107,6 @@ This doesn't mean that a backend can't select a fixed width during code generati
 # Open questions
 [open-questions]: #open-questions
 
-- Does anyone care that Cranelift could only support a maximum of 128 vector lanes? The only problem I could imagine is if someone has a 128-lane vector of 1-bit bools...
-- How will the register allocator (regalloc2..?) handle a new vector type and/or potential register aliasing?
-- Are there parts of Cranelift, which aren't backend specific, that would need to handle these types? (is there generic stack handling or anything else data size specific...?)
+- How will regalloc2 handle a new vector type and/or potential register aliasing? And will sizeless spill slots be possible?
 - What behaviour would the interpreter have? I would expect it to default to the existing simd-128 semantics.
-- Testing is also an issue, is it reasonable to assume that function under (run)test neither take or return sizeless vectors? If so, how should the result values be defined and checked against?
+- Testing is also an issue, is it reasonable to assume that function under (run)test neither take or return sizeless vectors? If so, how should the result values be defined and checked against? I have currently implemented an instruction, extract\_vector, which takes a sizeless vector and an immediate which provides an index to a 128-bit sub-vector. Together with passing scalars as function parameters and splatting them into sizeless vectors, it allows simple testing of lane-wise operations.

From 0748ffa658b47f658dc52550e184233c2ec2f702 Mon Sep 17 00:00:00 2001
From: Sam Parker <sam.parker@arm.com>
Date: Tue, 5 Apr 2022 11:48:59 +0100
Subject: [PATCH 3/8] Moved to dynamic types

---
 cranelift-sizeless-vector.md | 71 +++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 33 deletions(-)

diff --git a/cranelift-sizeless-vector.md b/cranelift-sizeless-vector.md
index d6ce800..13f7570 100644
--- a/cranelift-sizeless-vector.md
+++ b/cranelift-sizeless-vector.md
@@ -4,56 +4,62 @@ This RFC proposes a way to handle flexible vector types as specified at: https:/
 
 [summary]: #summary
 
-The proposal is to introduce new sizeless vector types into Cranelift, that:
-- Express a vector, with a lane type and size, but a target-defined number of lanes.
-- Are denoted by prefixing the lane type with 'sv' (sizeless vector): svi16, svi32, svf32, etc...
-- Have a minimum width of 128-bits, meaning we can simply map to existing simd-128 implementations.
+The proposal is to introduce new dynamically-sized vector types into Cranelift, that:
+- Enable dynamic vector type creation using existing fixed vector types and a dynamic scaling factor.
+- The dynamic types, 'dt', express a vector with a lane type and shape, but a target-defined scaling factor.
+- Space as been allocated in ir::Type for concrete definitions of these new types.
+- The dynamic scaling factor is a global value which is defined by the target.
+- We currently only support scaling factors which are compile-time constants.
 
 # Motivation
 [motivation]: #motivation
 
 Flexible vectors are likely coming to WebAssembly so we should support the current spec. This is current path forward to support vectors that are wider than 128-bits.
-Rust is also currently starting to use LLVM's ScalableVectorType and, as a target backend, Cranelift could support those directly with a sizeless vector type.
+Rust is also currently starting to use LLVM's ScalableVectorType and, as a target backend, Cranelift could support those directly with a dynamic vector type.
 
 # Proposal
 [proposal]: #proposal
 
-The following proposal includes changes to the type system, adding specific entities for sizeless stack slots as well as specific instructions that take those entities as operands.
+The following proposal includes changes to the type system, adding specific entities for dynamic stack slots as well as specific instructions that take those entities as operands.
 
-We can add sizeless vector types by modifying existing structs to hold an extra bit of information to represent the sizeless nature.
+We can add dynamic vector types by modifying existing structs to hold an extra bit of information to represent the dynamic nature.
 
 ## Type System
-- The new types do not report themselves as vectors, so ty.is\_vector() = false, but are explicitly reported via ty.is\_sizeless\_vector().
+- The new types do not report themselves as vectors, so ty.is\_vector() = false, but are explicitly reported via ty.is\_dynamic\_vector().
 - is\_vector is also renamed to is\_sized\_vector to avoid ambiguity.
-- The TypeSet and ValueTypeSet structs gain a bool to represent whether the type is sizeless.
+- The TypeSet and ValueTypeSet structs gain a bool to represent whether the type is dynamic.
 - TypeSetBuilder also gains a bool to control the building of those types.
-- At the encoding level, a bit is used to represent whether the type is sizeless and this bit has been taken from special types range.
-- These changes allow the usual polymorphic vector operations to be automatically built for the new set of sizeless types.
+- At the encoding level, space has been taken from the special types range to allow for the new types. Special types now occupy 0x01-0x2f and everything else has moved to fill the space, with dynamic types occupying the end of the range 0x80-0xff.
+- These changes allow the usual polymorphic vector operations to be automatically built for the new set of dynamic types.
 
 ## IR Entities and Function Changes
-A new entity is added to the IR, the SizelessStackSlot, and the Function will hold these separately from the existing StackSlot entities, also providing two APIs to create them:
+
+A new global value is introduced `dyn_scale` which is parameterized by a base vector type. This global value can then be used to create dynamic types, such as `dt0 = i32x4*gv0`.
+
+DynamicTypes are created and held like other IR entities, with the function holding a PrimaryMap\<DynamicType, DynamicTypeData\>. The DynamicTypeData holds the base vector type along with the GlobalValue which is the scaling factor.
+
+A new entity is added to the IR, the DynamicStackSlot, and the Function will hold these separately from the existing StackSlot entities, also providing two APIs to create them:
 - create\_sized\_stack\_slot
-- create\_sizeless\_stack\_slot
+- create\_dynamic\_stack\_slot
 
-Keeping two vectors enables us to continue to use each entity's index as it's slot identifier, and allows us to place the entities in different positions in the frame. It also enables the backends to easily track which slots are sizeless, and which are not.
+Keeping two vectors enables us to continue to use each entity's index as it's slot identifier, and allows us to place the entities in different positions in the frame. It also enables the backends to easily track which slots are dynamic, and which are not. DynamicStackSlots are defined differently to existing StackSlots as they are defined with a DynamicType instead of a size, e.g. `dss0 = explicit_dynamic_slot dt0`
 
 ## Instructions
-Three new instructions are added at the IR level to use the new SizelessStackSlot entity:
-- SizelessStackAddr
-- SizelessStackLoad
-- SizelessStackStore
+Three new instructions are added at the IR level to use the new DynamicStackSlot entity:
+- DynamicStackAddr
+- DynamicStackLoad
+- DynamicStackStore
 
-The primary difference between these operations and their existing counterparts is that they only take a SizelessStackSlot operand, without a byte offset.
+The primary difference between these operations and their existing counterparts is that they only take a DynamicStackSlot operand, without a byte offset.
+
+DynamicVectorScale is the other instruction introduced, and this enables the materialization of a `dyn_scale` value when used by `globalvalue`.
 
 ## ABI Layer Changes
 
 - The method stack_stackslot_addr is renamed to sized\_stackslot\_addr.
-- The method sizeless\_stackslots\_addr is introduced, which takes a vector\_scale parameter.
-- get\_number\_of\_spillslots\_for\_value is also modified to take a vector\_scale parameter.
-
-A key challenge to supporting these new types is that the register allocator expects to be given an constant value for the size of a spill slot, for a given register class. So, the current expectation is that the backends will continue to provide a fixed number, potentially larger that they currently do. This value is provided to the ABI layer via a new method on the TargetIsa trait, vector\_scale, which returns the largest number of bytes for a target vector register. This can then also be used to scale the index when calculating the address of a SizelessStackSlot, if a backend chooses a fixed sized during code generation.
+- The method dynamic\_stackslots\_addr is introduced.
 
-With the notion of a sizeless stack slot possible, but not a sizeless spill slot, the proposed frame layout would look like the following:
+A key challenge to supporting these new types is that the register allocator expects to be given an constant value for the size of a spill slot, for a given register class. So, the current expectation is that the backends will continue to provide a fixed number, potentially larger that they currently do. A new method on the TargetIsa trait, `vector_scale`, which returns the largest number of bytes for a given dynamic IR type. This is used by the ABI layer to cache the sizes of all the used dynamic types, the largest of which is used for the spillslot size. The size returned by the Isa is also used to calculate the dynamic stackslot offsets, just as is done for the existing stack slots. This means that the frame layout changes are minimal, just with the dynamic slots appended after the fixed size slots.
 
 ```
 //! ```plain
@@ -72,14 +78,11 @@ With the notion of a sizeless stack slot possible, but not a sizeless spill slot
 //!                              +---------------------------+
 //! FP after prologue -------->  | FP (pushed by prologue)   |
 //!                              +---------------------------+
-//!                              | sizeless stack slots      |
-//!                              | (accessed via FP)         |
-//!                              |          ...              |
-//!                              +---------------------------+
 //!                              | spill slots               |
 //!                              | (accessed via nominal SP) |
 //!                              |          ...              |
 //!                              | stack slots               |
+//!                              | dynamic stack slots       |
 //!                              | (accessed via nominal SP) |
 //! nominal SP --------------->  | (alloc'd by prologue)     |
 //! (SP at end of prologue)      +---------------------------+
@@ -96,17 +99,19 @@ With the notion of a sizeless stack slot possible, but not a sizeless spill slot
 # Rationale and alternatives
 [rationale-and-alternatives]: #rationale-and-alternatives
 
-The main design decision is to have a vector type that isn't comparable to the existing vector types, which means that no existing paths can accidently try to treat them as such. Although setting the minimum size of the opaque type to 128 bits still allows us to infer the minimum number of lanes for a given lane type. The flexible vector specification also provides specific operations for lane accesses and shuffles so these aren't operations that we have to handle with our existing operations.
+The main change here is the introduction of dynamically created types, using an existing vector type as a base and a scaling factor represented by a global value. Using a global value fits with clif IR in that we have a value which is not expected (allowed?) to change during the execution of the function. The alternative is to add types which have an implicit scaling factor which could make verification more complicated, or impossible.
+
+The new vector types also aren't comparable to the existing vector types, which means that no existing paths can accidently try to treat them as such.
 
-It's possible that the hardware vector length will be fixed, so one alternative would be to generate IR with fixed widths using information from the backend. The one advantage is that we'd not have to add sizeless types in the IR at all. But there are two main disadvantages with this approach:
+It's possible that the hardware vector length will be fixed, so one alternative would be to generate IR with fixed widths using information from the backend. The one advantage is that we'd not have to add dynamic types in the IR at all. But there are two main disadvantages with this approach:
 - In an ahead-of-time setting, Cranelift would not be able to take advantage of larger vectors where the width is implementation defined. It would be possible to target architecture extensions such as Intel's AVX2 and AVX-512, which have static sizes, but not for architectures like Arm's SVE.
 - It is also currently undecided whether the flexible vector specification will include operations to set the vector length during program execution, so we shouldn't design out this possibility.
 
-This doesn't mean that a backend can't select a fixed width during code generation, if desired. The current simd-128 implementations would be able to map the sizeless types directly to their current operations and we could also add a legalization layer for backends which only want to support simd-128, or another fixed size.
+This doesn't mean that a backend can't select a fixed width during code generation, if desired. The current simd-128 implementations would be able to map the dynamic types directly to their current operations and we could also add a legalization layer for backends which only want to support simd-128, or another fixed size.
 
 # Open questions
 [open-questions]: #open-questions
 
-- How will regalloc2 handle a new vector type and/or potential register aliasing? And will sizeless spill slots be possible?
+- How will regalloc2 handle a new vector type and/or potential register aliasing? And will dynamic spill slots be possible?
 - What behaviour would the interpreter have? I would expect it to default to the existing simd-128 semantics.
-- Testing is also an issue, is it reasonable to assume that function under (run)test neither take or return sizeless vectors? If so, how should the result values be defined and checked against? I have currently implemented an instruction, extract\_vector, which takes a sizeless vector and an immediate which provides an index to a 128-bit sub-vector. Together with passing scalars as function parameters and splatting them into sizeless vectors, it allows simple testing of lane-wise operations.
+- Testing is also an issue, is it reasonable to assume that function under (run)test neither take or return dynamic vectors? If so, how should the result values be defined and checked against? I have currently implemented an instruction, extract\_vector, which takes a dynamic vector and an immediate which provides an index to a 128-bit sub-vector. Together with passing scalars as function parameters and splatting them into dynamic vectors, it allows simple testing of lane-wise operations.

From 9c7dcfc747f7a1a30c550e2afb04709e90953b4b Mon Sep 17 00:00:00 2001
From: Sam Parker <sam.parker@arm.com>
Date: Tue, 12 Apr 2022 16:16:23 +0100
Subject: [PATCH 4/8] NumSet for dynamic_simd_lanes in Type generation

---
 cranelift-sizeless-vector.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/cranelift-sizeless-vector.md b/cranelift-sizeless-vector.md
index 13f7570..b7fe994 100644
--- a/cranelift-sizeless-vector.md
+++ b/cranelift-sizeless-vector.md
@@ -27,8 +27,7 @@ We can add dynamic vector types by modifying existing structs to hold an extra b
 ## Type System
 - The new types do not report themselves as vectors, so ty.is\_vector() = false, but are explicitly reported via ty.is\_dynamic\_vector().
 - is\_vector is also renamed to is\_sized\_vector to avoid ambiguity.
-- The TypeSet and ValueTypeSet structs gain a bool to represent whether the type is dynamic.
-- TypeSetBuilder also gains a bool to control the building of those types.
+- The TypeSet, ValueTypeSet and TypeSetBuilder structs gains an extra NumSet to specify a minimum number of dynamic lanes.
 - At the encoding level, space has been taken from the special types range to allow for the new types. Special types now occupy 0x01-0x2f and everything else has moved to fill the space, with dynamic types occupying the end of the range 0x80-0xff.
 - These changes allow the usual polymorphic vector operations to be automatically built for the new set of dynamic types.
 

From b7cb4c679ad37959138f24782b13b9c2eb1e0e54 Mon Sep 17 00:00:00 2001
From: Sam Parker <sam.parker@arm.com>
Date: Thu, 21 Apr 2022 09:30:23 +0100
Subject: [PATCH 5/8] fix typo

---
 cranelift-sizeless-vector.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cranelift-sizeless-vector.md b/cranelift-sizeless-vector.md
index b7fe994..753b00b 100644
--- a/cranelift-sizeless-vector.md
+++ b/cranelift-sizeless-vector.md
@@ -100,7 +100,7 @@ A key challenge to supporting these new types is that the register allocator exp
 
 The main change here is the introduction of dynamically created types, using an existing vector type as a base and a scaling factor represented by a global value. Using a global value fits with clif IR in that we have a value which is not expected (allowed?) to change during the execution of the function. The alternative is to add types which have an implicit scaling factor which could make verification more complicated, or impossible.
 
-The new vector types also aren't comparable to the existing vector types, which means that no existing paths can accidently try to treat them as such.
+The new vector types also aren't comparable to the existing vector types, which means that no existing paths can accidentally try to treat them as such.
 
 It's possible that the hardware vector length will be fixed, so one alternative would be to generate IR with fixed widths using information from the backend. The one advantage is that we'd not have to add dynamic types in the IR at all. But there are two main disadvantages with this approach:
 - In an ahead-of-time setting, Cranelift would not be able to take advantage of larger vectors where the width is implementation defined. It would be possible to target architecture extensions such as Intel's AVX2 and AVX-512, which have static sizes, but not for architectures like Arm's SVE.

From 40335b8d92909478cac5be9f31d7ad2673cf9048 Mon Sep 17 00:00:00 2001
From: Sam Parker <sam.parker@arm.com>
Date: Thu, 21 Apr 2022 11:49:51 +0100
Subject: [PATCH 6/8] global value not allowed to change

---
 cranelift-sizeless-vector.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cranelift-sizeless-vector.md b/cranelift-sizeless-vector.md
index 753b00b..59c1c67 100644
--- a/cranelift-sizeless-vector.md
+++ b/cranelift-sizeless-vector.md
@@ -98,7 +98,7 @@ A key challenge to supporting these new types is that the register allocator exp
 # Rationale and alternatives
 [rationale-and-alternatives]: #rationale-and-alternatives
 
-The main change here is the introduction of dynamically created types, using an existing vector type as a base and a scaling factor represented by a global value. Using a global value fits with clif IR in that we have a value which is not expected (allowed?) to change during the execution of the function. The alternative is to add types which have an implicit scaling factor which could make verification more complicated, or impossible.
+The main change here is the introduction of dynamically created types, using an existing vector type as a base and a scaling factor represented by a global value. Using a global value fits with clif IR in that we have a value which is not allowed to change during the execution of the function. The alternative is to add types which have an implicit scaling factor which could make verification more complicated, or impossible.
 
 The new vector types also aren't comparable to the existing vector types, which means that no existing paths can accidentally try to treat them as such.
 

From 98980008f605f2cbc1b0608ca2139e354ae84719 Mon Sep 17 00:00:00 2001
From: Sam Parker <sam.parker@arm.com>
Date: Wed, 27 Apr 2022 14:31:48 +0100
Subject: [PATCH 7/8] move doc to accepted

---
 .../cranelift-dynamic-vector.md                               | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
 rename cranelift-sizeless-vector.md => accepted/cranelift-dynamic-vector.md (97%)

diff --git a/cranelift-sizeless-vector.md b/accepted/cranelift-dynamic-vector.md
similarity index 97%
rename from cranelift-sizeless-vector.md
rename to accepted/cranelift-dynamic-vector.md
index 59c1c67..62a15de 100644
--- a/cranelift-sizeless-vector.md
+++ b/accepted/cranelift-dynamic-vector.md
@@ -7,7 +7,7 @@ This RFC proposes a way to handle flexible vector types as specified at: https:/
 The proposal is to introduce new dynamically-sized vector types into Cranelift, that:
 - Enable dynamic vector type creation using existing fixed vector types and a dynamic scaling factor.
 - The dynamic types, 'dt', express a vector with a lane type and shape, but a target-defined scaling factor.
-- Space as been allocated in ir::Type for concrete definitions of these new types.
+- Space has been allocated in ir::Type for concrete definitions of these new types.
 - The dynamic scaling factor is a global value which is defined by the target.
 - We currently only support scaling factors which are compile-time constants.
 
@@ -27,7 +27,7 @@ We can add dynamic vector types by modifying existing structs to hold an extra b
 ## Type System
 - The new types do not report themselves as vectors, so ty.is\_vector() = false, but are explicitly reported via ty.is\_dynamic\_vector().
 - is\_vector is also renamed to is\_sized\_vector to avoid ambiguity.
-- The TypeSet, ValueTypeSet and TypeSetBuilder structs gains an extra NumSet to specify a minimum number of dynamic lanes.
+- The TypeSet, ValueTypeSet and TypeSetBuilder structs gain an extra NumSet to specify a minimum number of dynamic lanes.
 - At the encoding level, space has been taken from the special types range to allow for the new types. Special types now occupy 0x01-0x2f and everything else has moved to fill the space, with dynamic types occupying the end of the range 0x80-0xff.
 - These changes allow the usual polymorphic vector operations to be automatically built for the new set of dynamic types.
 

From 8225327584f2a1028f105366f19f7d0ef51a61bd Mon Sep 17 00:00:00 2001
From: Sam Parker <sam.parker@arm.com>
Date: Thu, 28 Apr 2022 08:51:34 +0100
Subject: [PATCH 8/8] removed two 'open' questions

---
 accepted/cranelift-dynamic-vector.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/accepted/cranelift-dynamic-vector.md b/accepted/cranelift-dynamic-vector.md
index 62a15de..401cf3e 100644
--- a/accepted/cranelift-dynamic-vector.md
+++ b/accepted/cranelift-dynamic-vector.md
@@ -51,7 +51,9 @@ Three new instructions are added at the IR level to use the new DynamicStackSlot
 
 The primary difference between these operations and their existing counterparts is that they only take a DynamicStackSlot operand, without a byte offset.
 
-DynamicVectorScale is the other instruction introduced, and this enables the materialization of a `dyn_scale` value when used by `globalvalue`.
+DynamicVectorScale is another instruction introduced, and this enables the materialization of a `dyn_scale` value when used by `globalvalue`.
+
+ExtractVector is also introduced, currently just for testing, which takes a dynamic vector value and an immediate value as a sub-vector index. This allows us to return a fixed-width value from a test function.
 
 ## ABI Layer Changes
 
@@ -111,6 +113,5 @@ This doesn't mean that a backend can't select a fixed width during code generati
 # Open questions
 [open-questions]: #open-questions
 
-- How will regalloc2 handle a new vector type and/or potential register aliasing? And will dynamic spill slots be possible?
 - What behaviour would the interpreter have? I would expect it to default to the existing simd-128 semantics.
 - Testing is also an issue, is it reasonable to assume that function under (run)test neither take or return dynamic vectors? If so, how should the result values be defined and checked against? I have currently implemented an instruction, extract\_vector, which takes a dynamic vector and an immediate which provides an index to a 128-bit sub-vector. Together with passing scalars as function parameters and splatting them into dynamic vectors, it allows simple testing of lane-wise operations.