Skip to content

Conversation

@Solaryee
Copy link
Contributor

@Solaryee Solaryee commented Dec 21, 2023

This PR is originally submited as part of #7940.
It aims to make load/store logic more general so that it can be optimized to vector load/store pattern. It changes IR from

%linear_index_plus_base = add nuw nsw i32 %linear_index_base, %loop.indvar
%linear_index1 = add nuw nsw i32 %linear_index_plus_base, 1
%linear_index2 = add nuw nsw i32 %linear_index_plus_base, 2
%linear_index3 = add nuw nsw i32 %linear_index_plus_base, 3

%21 = getelementptr inbounds float, ptr %0, i32 %linear_index_plus_base
%22 = load float, ptr %21, align 4, !invariant.load !4
%26 = getelementptr inbounds float, ptr %0, i32 %linear_index1
%27 = load float, ptr %26, align 4, !invariant.load !4
%31 = getelementptr inbounds float, ptr %0, i32 %linear_index2
%32 = load float, ptr %31, align 4, !invariant.load !4
%36 = getelementptr inbounds float, ptr %0, i32 %linear_index3
%37 = load float, ptr %36, align 4, !invariant.load !4

to

%linear_index_plus_base = add nuw nsw i32 %linear_index_base, %loop.indvar

%21 = getelementptr float, ptr %0, i32 %linear_index_plus_base
%22 = getelementptr inbounds float, ptr %21, i32 0
%23 = load float, ptr %22, align 4, !invariant.load !4
%29 = getelementptr float, ptr %0, i32 %linear_index_plus_base
%30 = getelementptr inbounds float, ptr %29, i32 1
%31 = load float, ptr %30, align 4, !invariant.load !4
%37 = getelementptr float, ptr %0, i32 %linear_index_plus_base
%38 = getelementptr inbounds float, ptr %37, i32 2
%39 = load float, ptr %38, align 4, !invariant.load !4
%45 = getelementptr float, ptr %0, i32 %linear_index_plus_base
%46 = getelementptr inbounds float, ptr %45, i32 3
%47 = load float, ptr %46, align 4, !invariant.load !4

The former one does not always work for different backends since it needs additional pass to handle GEP pattern.

There are only ~20 lines of core code changes, the others are all UTs changes.

@github-actions github-actions bot added the kokoro:force-run Forces CI to rerun label Dec 21, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Dec 21, 2023
@github-actions github-actions bot added the kokoro:force-run Forces CI to rerun label Dec 21, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Dec 21, 2023
@kamaljeeti kamaljeeti requested a review from tdanyluk January 2, 2024 06:20
llvm::Value* row_index = nullptr;
if (!launch_config_.row_vectorized) {
array_indices.emplace_back(linear_index_base, shape_, b_);
llvm::Value* linear_index =
Copy link
Contributor

@tdanyluk tdanyluk Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR.
Could you please add a comment why is the Add operation needed if the added value is 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with my comment.

    // The add operation is needed even if the offset is 0, since when the
    // kernel is unrolled, the following GEP instruction shares the same pointer
    // and sequential indices with others, allowing the default SLP pass to
    // optimize them into vectorized load/store operations.

@github-actions github-actions bot added the kokoro:force-run Forces CI to rerun label Jan 5, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Jan 5, 2024
Copy link
Contributor

@tdanyluk tdanyluk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@copybara-service copybara-service bot closed this in 47b68d4 Jan 5, 2024
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 5, 2024
Imported from GitHub PR openxla/xla#7980

This PR is originally submited as part of openxla/xla#7940.
It aims to make load/store logic more general so that it can be optimized to vector load/store pattern. It changes IR from
```llvm
%linear_index_plus_base = add nuw nsw i32 %linear_index_base, %loop.indvar
%linear_index1 = add nuw nsw i32 %linear_index_plus_base, 1
%linear_index2 = add nuw nsw i32 %linear_index_plus_base, 2
%linear_index3 = add nuw nsw i32 %linear_index_plus_base, 3

%21 = getelementptr inbounds float, ptr %0, i32 %linear_index_plus_base
%22 = load float, ptr %21, align 4, !invariant.load !4
%26 = getelementptr inbounds float, ptr %0, i32 %linear_index1
%27 = load float, ptr %26, align 4, !invariant.load !4
%31 = getelementptr inbounds float, ptr %0, i32 %linear_index2
%32 = load float, ptr %31, align 4, !invariant.load !4
%36 = getelementptr inbounds float, ptr %0, i32 %linear_index3
%37 = load float, ptr %36, align 4, !invariant.load !4
```
to
```llvm
%linear_index_plus_base = add nuw nsw i32 %linear_index_base, %loop.indvar

%21 = getelementptr float, ptr %0, i32 %linear_index_plus_base
%22 = getelementptr inbounds float, ptr %21, i32 0
%23 = load float, ptr %22, align 4, !invariant.load !4
%29 = getelementptr float, ptr %0, i32 %linear_index_plus_base
%30 = getelementptr inbounds float, ptr %29, i32 1
%31 = load float, ptr %30, align 4, !invariant.load !4
%37 = getelementptr float, ptr %0, i32 %linear_index_plus_base
%38 = getelementptr inbounds float, ptr %37, i32 2
%39 = load float, ptr %38, align 4, !invariant.load !4
%45 = getelementptr float, ptr %0, i32 %linear_index_plus_base
%46 = getelementptr inbounds float, ptr %45, i32 3
%47 = load float, ptr %46, align 4, !invariant.load !4
```
The former one does not always work for different backends since it needs additional pass to handle GEP pattern.

There are only ~20 lines of core code changes, the others are all UTs changes.
Copybara import of the project:

--
df1286a48461dd6337c841d4a353b694ce60bf86 by Sheng, Yang <yang.sheng@intel.com>:

Make vector load/store logic more general

--
66f0782ad62e90e01fd8de7b41d20d607f974844 by Sheng, Yang <yang.sheng@intel.com>:

fix IR in UTs

--
b944ee015c177d4180f9cba6b564cf72e1c80bbc by Sheng, Yang <yang.sheng@intel.com>:

Add comments

Merging this change closes #7980

PiperOrigin-RevId: 595953368
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants