Fix out of bounds reads in strided ARM loads by dsharletg · Pull Request #5784 · halide/Halide

dsharletg · 2021-03-03T19:01:20Z

This PR fixes long-standing out of bounds reads for strided loads. This combines some of the logic in CodeGen_LLVM (for stride 2 loads) with the logic in CodeGen_ARM (for stride up to 4), and implements it in CodeGen_LLVM.

This will be a performance regression for code that uses strided loads from external buffers without sufficient alignment information to determine the loads are safe. However, measurement on a variety of code shows the performance impact is mostly small or even improvements in some cases.

…arm-seg2

steven-johnson · 2021-03-05T18:14:00Z

Looks clean. Ready to land?

dsharletg · 2021-03-08T15:31:40Z

We discussed the possibility of emitting a warning when we do not generate vldN when we would have before this change. However, I now think we should not do this, because I found many cases where this warning would be emitted, yet performance is better or unchanged. Overall, the performance impact of this is actually a lot more mixed than I expected. There are a few extreme cases where performance is up to 2x worse, but most of the time, the impact is negligible and often an improvement.

abadams · 2021-03-08T19:07:26Z

It's surprising that there are cases where performance is better. Have you looked at the asm to see why?

dsharletg · 2021-03-08T19:49:54Z

I think those cases are mostly vld2, and the base class generates reasonable code for those (2 loads + shuffle).

dsharletg · 2021-03-09T16:29:56Z

I modified this to implement up to stride 4 in CodeGen_LLVM, using a combination of the old CodeGen_LLVM logic, and the new CodeGen_ARM logic.

abadams · 2021-03-09T18:40:29Z

I'm seeing some good speed-ups on the packed cases in the resize app on x86 due to using dense loads and shuffles instead of gathers.

src/CodeGen_LLVM.cpp

steven-johnson · 2021-03-09T21:01:18Z

correctness_memoize is failing on arm32.

dsharletg · 2021-03-09T21:43:18Z

I can't reproduce that and correctness_memoize looks completely unaffected by this change.

dsharletg · 2021-03-09T21:57:59Z

I restarted that build let's just see what happens.

dsharletg · 2021-03-09T22:02:46Z

Oh, the same failure occurred on another unrelated PR: https://buildbot.halide-lang.org/master/#/builders/32/builds/16, as well as quite a few other issues.

I think that is surely a flake.

steven-johnson · 2021-03-09T22:12:43Z

Oh, the same failure occurred on another unrelated PR: https://buildbot.halide-lang.org/master/#/builders/32/builds/16, as well as quite a few other issues.

I think that is surely a flake.

I wonder if #5780 could be causing the memoize issue?

abadams · 2022-06-22T17:05:23Z

Dang, this was actually a major regression for strided loads from input buffers on x86. Now it's a vector gather instead of a load and shuffle. Not sure how we missed it.

abadams · 2022-06-22T17:06:09Z

Basic 2x downsampling pipelines are a mess now.

abadams · 2022-06-22T19:04:02Z

Plus as far as I can tell, this loads from before the start of external buffers due to the offset being applied unconditionally.

dsharletg added 9 commits March 3, 2021 09:49

Safer version of vldN code generation.

a28a503

Only be more conservative with alignment for external buffers.

94ca00a

Add tolerance to allocation size tests.

e4574a7

Remove old comments.

50a91e1

Merge branch 'master' of github.com:halide/Halide into dsharletg/fix-…

7b4fba1

…arm-seg2

Improve ARM alignment and vldN code generation.

8c42f51

Remove merge straggler

927583b

Fix alignment condition (again).

3099b2d

Fix alignment.

40d18c8

Avoid divide by zero.

7a911fb

dsharletg mentioned this pull request Mar 8, 2021

Refactor host alignment to enable specializing #5793

Closed

dsharletg mentioned this pull request Mar 8, 2021

Simplifier rules for nested broadcasts #5794

Merged

dsharletg added 3 commits March 8, 2021 18:06

Move CodeGen_ARM's logic for strided loads to CodeGen_LLVM.

1786c5f

Fix comment.

945ae21

clang-format.

ecb8daa

abadams reviewed Mar 9, 2021

View reviewed changes

src/CodeGen_LLVM.cpp Outdated Show resolved Hide resolved

Remove sketchy alignment check.

ace9fb3

abadams approved these changes Mar 9, 2021

View reviewed changes

dsharletg merged commit e75d9fb into master Mar 9, 2021

dsharletg deleted the dsharletg/fix-arm-seg2 branch March 9, 2021 22:16

dsharletg added a commit that referenced this pull request Mar 10, 2021

Fix bug found by asan from #5784

37f3963

dsharletg added a commit that referenced this pull request Mar 10, 2021

Fix bug found by asan from #5784

b1cf25b

dsharletg mentioned this pull request Mar 10, 2021

Fix bug found by asan from #5784 #5798

Merged

dsharletg added a commit that referenced this pull request Mar 10, 2021

Fix bug found by asan from #5784 (#5798)

34c402f

alexreinking added this to the v12.0.0 milestone May 19, 2021

Conversation

dsharletg commented Mar 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steven-johnson commented Mar 5, 2021

Uh oh!

dsharletg commented Mar 8, 2021

Uh oh!

abadams commented Mar 8, 2021

Uh oh!

dsharletg commented Mar 8, 2021

Uh oh!

dsharletg commented Mar 9, 2021

Uh oh!

abadams commented Mar 9, 2021

Uh oh!

Uh oh!

steven-johnson commented Mar 9, 2021

Uh oh!

dsharletg commented Mar 9, 2021

Uh oh!

dsharletg commented Mar 9, 2021

Uh oh!

dsharletg commented Mar 9, 2021

Uh oh!

steven-johnson commented Mar 9, 2021

Uh oh!

abadams commented Jun 22, 2022

Uh oh!

abadams commented Jun 22, 2022

Uh oh!

abadams commented Jun 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dsharletg commented Mar 3, 2021 •

edited

Loading