Skip to content

[Proposal] Add Matrix Multiplication (MatMul) as a Tensor-Level Abstraction in XLS IR #3983

@Waleed99i

Description

@Waleed99i

Motivation

XLS currently operates primarily at a scalar level, where computations are expressed as fine-grained operations in IR. While this design is powerful for general computation, it is not optimal for emerging machine learning workloads, which rely heavily on tensor-level operations such as matrix multiplication.

Introducing a matrix multiplication (MatMul) abstraction at the IR level would allow XLS to better support hardware-efficient generation of ML accelerators.


Problem

Currently, matrix multiplication must be manually expressed as a combination of scalar operations, which:

  • Increases IR complexity
  • Limits optimization opportunities
  • Prevents the compiler from making tensor-level scheduling and architectural decisions
  • Leads to suboptimal hardware in terms of area, timing, and power (PPA)

Proposed Solution

I propose introducing a MatMul (or equivalent) tensor-level operation in XLS IR, along with multiple lowering strategies.

1. IR-Level Abstraction

  • Define a MatMul operation in IR
  • Represent matrix multiplication as a first-class operation instead of scalar expansion

2. Lowering Strategies

Implement multiple lowering paths:

(a) Combinational Expansion

  • Fully unrolled scalar multiplications
  • High area, low latency

(b) Pipelined MAC Tree

  • Reduces critical path
  • Improves timing performance

(c) Systolic Array Mapping (if possible)

  • Hardware-efficient for ML workloads
  • Enables scalable parallelism

3. Integration with Scheduling

  • Allow the XLS scheduler to exploit MatMul-level knowledge
  • Improve pipeline balancing and resource sharing

4. OpenROAD Integration

After RTL generation, use OpenROAD flow to:

  • Evaluate Area
  • Evaluate Timing
  • Evaluate Power
  • Evaluate Routability

This enables a feedback loop:

Compiler → RTL → Physical Design → PPA Metrics → Compiler Optimization


Expected Outcomes

  • Introduction of tensor-level abstraction in XLS
  • Multiple hardware-efficient lowering strategies
  • Improved PPA compared to scalar-expanded implementations
  • A reproducible flow combining compiler + physical design tools

Why this matters

This work bridges the gap between:

  • Compiler design
  • Hardware generation
  • Physical design evaluation

It aligns with modern ML accelerator design trends and enables XLS to generate optimized hardware for real-world workloads.


Status

I am currently exploring the XLS codebase and have successfully set up the environment and started contributing via documentation improvements.

I am interested in implementing this proposal incrementally through further contributions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions