Skip to content

Conversation

@yesoer
Copy link

@yesoer yesoer commented Jul 15, 2025

This pullrequests targets Basic parfor-loop support in DaphneDSL #515.
It can be used instead of regular for-loops to parallelize iterations.
There are some limitations though which are listed below.

Note : By default parallelization is disabled. Use the build flag --enable-parallel-parfor to enable it. This is to keep automatic testing predictable.

Performance Test

We used the following daphne script for testing performance :

vals = rand(1, 250000, 250.0, 500.0, 1.0, -1);

parfor(i in 0:249999) {
    vals[0, i] = vals[0, i] * 2;
}

Without parallelization it ran in 24.192882 seconds.
With parallelization it ran in 8.019115 seconds.
To make sure we only time the actual execution, we implemented it in the kernel.
Whether you want to include timing can be controlled by a build flag, just like the parallelization itself. Namely --enable-time-parfor.
For an even more reproducible setup you might want to set the OMP_NUM_THREADS env variable e.g. export OMP_NUM_THREADS=2. In our setup this resulted in 12.233227 seconds, which clearly shows good usage of the two threads compared to the single thread version that took ~24 seconds.

Dependencies

For parallization we used omp in the kernel. Omp is bundled with gcc (13) and the corresponding cmake module basically just sets the flag -fopenmp.

Limitations

We support loops that go from some value to some other value. Optionally one may provide a step. By default the step is assumed to be 1. Any more advanced for loop headers are not supported in this PR.

We do support if statements i.e. scf.if within the parfor body but not for loops.

This pullrequest does not provide dependency analysis to check whether the iterations are independent of each other and can actually be parallelized. The following options were considered :

  • DataFlow does not provide the required analysis tools for this task.

  • LLVM and Affine dialects cannot be used, as they lack any understanding of the semantics of Daphne kernel calls.

  • To implement the necessary analysis by hand, we first need alias analysis — identifying which SSA values or operations refer to the same memory. Once that’s in place, we can enrich the IR with read/write flags using DataFlow. This, however, requires defining aliasing behavior for all Daphne operations.

As of now we do not support function calls nested in parfor loop bodies. Also the usage of parfor inside of functions is not possible at the moment. That is because of the missing type inference of ParForOp.

We support multiple return values but not of different types.
The supported types are DenseMatrix double, DenseMatrix float and DenseMatrix int64_t.
Related to this is issue #397.

Ossinking and others added 30 commits May 20, 2025 11:03
These args are the non-local variables to the loop body (which also makes them dependency candidates).
This includes setting the blocks arguments to carry induction and non-local variables.
Since it was named before, the getter function was in use, which caused an error.
The way they are initalized in LowerToLLVMPass is incomplete, see TODOs there.
As the TODOs mentioned this part was somewhat unclear to me before. Pretty sure this the correct idea.
So we can later use it for parallelization in the kernel.
… still a problem with storing of pointer types to parameter array of the generetad body function)
yesoer and others added 30 commits July 14, 2025 01:03
The former version would first replace the operand of return with the output pointer and then replace the return by storing this new operand to the output pointer. By just not doing the first part we get a sensible translation where the original return operand is now written into the output pointer instead.
When the input/output matrix object is created it has only one use as the arg to parfor and therefore is decremented right after. The return value of parfor points to the same object though so we need to add an increment to make sure the input/output matrix is available after the execution.
Commit 6465a94 removed the corresponding attribute from the parfor op.
removing unused variables, unnecessary comments, and using some more cpp idiomatic constructs for loops.
since we just deleted the reference to the first block, it could not be erased on replace. we also cannot erase it here yet though, because its arguments are not rewired yet. by skipping the first block here the replace can take care of everything later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants