-
Notifications
You must be signed in to change notification settings - Fork 78
Basic parfor-loop support in DaphneDSL #515 #976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
yesoer
wants to merge
85
commits into
daphne-eu:main
Choose a base branch
from
Ossinking:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
These args are the non-local variables to the loop body (which also makes them dependency candidates).
This includes setting the blocks arguments to carry induction and non-local variables.
Since it was named before, the getter function was in use, which caused an error.
The way they are initalized in LowerToLLVMPass is incomplete, see TODOs there.
As the TODOs mentioned this part was somewhat unclear to me before. Pretty sure this the correct idea.
So we can later use it for parallelization in the kernel.
… still a problem with storing of pointer types to parameter array of the generetad body function)
The former version would first replace the operand of return with the output pointer and then replace the return by storing this new operand to the output pointer. By just not doing the first part we get a sensible translation where the original return operand is now written into the output pointer instead.
When the input/output matrix object is created it has only one use as the arg to parfor and therefore is decremented right after. The return value of parfor points to the same object though so we need to add an increment to make sure the input/output matrix is available after the execution.
…lex types as block arguments
Commit 6465a94 removed the corresponding attribute from the parfor op.
removing unused variables, unnecessary comments, and using some more cpp idiomatic constructs for loops.
since we just deleted the reference to the first block, it could not be erased on replace. we also cannot erase it here yet though, because its arguments are not rewired yet. by skipping the first block here the replace can take care of everything later.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pullrequests targets Basic parfor-loop support in DaphneDSL #515.
It can be used instead of regular for-loops to parallelize iterations.
There are some limitations though which are listed below.
Performance Test
We used the following daphne script for testing performance :
Without parallelization it ran in 24.192882 seconds.
With parallelization it ran in 8.019115 seconds.
To make sure we only time the actual execution, we implemented it in the kernel.
Whether you want to include timing can be controlled by a build flag, just like the parallelization itself. Namely --enable-time-parfor.
For an even more reproducible setup you might want to set the OMP_NUM_THREADS env variable e.g.
export OMP_NUM_THREADS=2. In our setup this resulted in 12.233227 seconds, which clearly shows good usage of the two threads compared to the single thread version that took ~24 seconds.Dependencies
For parallization we used omp in the kernel. Omp is bundled with gcc (13) and the corresponding cmake module basically just sets the flag
-fopenmp.Limitations
We support loops that go from some value to some other value. Optionally one may provide a step. By default the step is assumed to be 1. Any more advanced for loop headers are not supported in this PR.
We do support if statements i.e. scf.if within the parfor body but not for loops.
This pullrequest does not provide dependency analysis to check whether the iterations are independent of each other and can actually be parallelized. The following options were considered :
DataFlow does not provide the required analysis tools for this task.
LLVM and Affine dialects cannot be used, as they lack any understanding of the semantics of Daphne kernel calls.
To implement the necessary analysis by hand, we first need alias analysis — identifying which SSA values or operations refer to the same memory. Once that’s in place, we can enrich the IR with read/write flags using DataFlow. This, however, requires defining aliasing behavior for all Daphne operations.
As of now we do not support function calls nested in parfor loop bodies. Also the usage of parfor inside of functions is not possible at the moment. That is because of the missing type inference of ParForOp.
We support multiple return values but not of different types.
The supported types are DenseMatrix double, DenseMatrix float and DenseMatrix int64_t.
Related to this is issue #397.