Skip to content

Refactor the autoschedulers to their own directory.#5228

Merged
steven-johnson merged 38 commits intomasterfrom
refactor/autoschedulers
Sep 15, 2020
Merged

Refactor the autoschedulers to their own directory.#5228
steven-johnson merged 38 commits intomasterfrom
refactor/autoschedulers

Conversation

@alexreinking
Copy link
Member

@alexreinking alexreinking commented Aug 31, 2020

This PR adjusts all three existing autoschedulers including Mullapudi2016 to be standalone. It reorganizes them into a src/autoschedulers folder and also installs them into the release packages.

The autoschedulers are also built with hidden symbols by default, so the HALIDE_EXPORT macro is now exposed and GCC compatible.

The add_halide_library function now adds the "auto_schedule=true" parameter to the start of the list when the AUTOSCHEDULER argument is passed. This improves ergonomics somewhat now that there is no default autoscheduler baked into the library.

All the autoscheduler tests now take which autoscheduler to use as their first argument. This will enable us (later) to test every autoscheduler against the existing tests.

Fixes #4341
Fixes #4349
Fixes #4053

@alexreinking alexreinking added the code_cleanup No functional changes. Reformatting, reorganizing, or refactoring existing code. label Aug 31, 2020
@alexreinking alexreinking marked this pull request as draft August 31, 2020 11:37
@alexreinking alexreinking added the skip_buildbots Do not run buildbots on this PR. Must add before opening PR as we scan labels immediately. label Aug 31, 2020
@abadams
Copy link
Member

abadams commented Aug 31, 2020

Can you elaborate on the weak linkage thing? I didn't think that was the case. My understanding was that the plugins rely on having their static constructors called at load time, and need to use symbols from the parent library, and that both of these things were normal on Windows.

Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General direction definitely LGTM; obviously we'll need to get this working on non-Windows too. (I guess we're gonna need to add Makefile support too, alas.)

compiling Li2018 against shared Halide results in a ~130KB binary and compiling it against static Halide results in a 20MB binary

I wonder if there might be hidden issues with compiling against a non-shared libHalide; Halide uses globals in various ways and it doesn't seem impossible that having non-shared sets of these could cause amusing issues (e.g. when creating unique names).

src/Util.cpp Outdated
<< err_msg << "\n";
}

FARPROC as_name = GetProcAddress(library, "AutoschedulerName");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know I should be surprised that the legacy of NEAR/FAR ptrs on x86 lingers on to this day, but somehow I am

@@ -1,23 +1,14 @@
#include "halide_mullapudi2016_export.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where/how do these _export.h files get created? What do they contain?


namespace Halide {
namespace Internal {
#include <Halide.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure that it should always be "Halide.h" and never <Halide.h>

@alexreinking
Copy link
Member Author

alexreinking commented Aug 31, 2020

Can you elaborate on the weak linkage thing? I didn't think that was the case. My understanding was that the plugins rely on having their static constructors called at load time, and need to use symbols from the parent library, and that both of these things were normal on Windows.

If a library is shared, all platforms behave similarly at runtime. At build time. the plugin and the executable both link directly to the same shared library. The library gets loaded when the application loads. Then when the plugin gets loaded, its dependency on the library gets resolved to the previously loaded copy. This is only no-compromises, cross-platform way for loadable plugins to work.

Referencing symbols from a parent executable is totally different. Because the Windows loader has absolutely no form of dynamic lookup, a plugin referencing symbols in an executable must still link to a .lib import library at build time. CMake makes this convenient using its ENABLE_EXPORTS feature, which generates the import library for you. Then you still have to link your plugin to the executable. It papers over differences on other platforms, too. On macOS, it's equivalent to linking directly to the executable using -bundle_loader. On Linux, it does nothing special at all because Linux's loader is like a honey badger...

But now we can see why trying to reference symbols from a parent static library is unportable. On Windows, the static library just gets completely absorbed into the parent application. Every application has its symbols at different positions, so there's no universal .lib to which we could link a universally compatible plugin (remember, no dynamic lookup). Linux and macOS each have their own special dynamic lookup implementations, so you build the plugin with platform-dependent flags (resp. -rdynamic and -undefined dynamic_lookup) and then build the executable such that it exports its own symbols.

I wonder if there might be hidden issues with compiling against a non-shared libHalide; Halide uses globals in various ways and it doesn't seem impossible that having non-shared sets of these could cause amusing issues (e.g. when creating unique names).

There probably are. The other consequence of no dynamic symbol lookup is that we can't rely on static initialization to call Pipeline::add_autoscheduler because the static pipeline globals on Windows are indeed duplicated between the executable and the plugin. The executable therefore has to look for known entry points, here these are AutoschedulerName and AutoschedulerRun. Even on macOS/Linux, relying on static initialization is awkward because we require applications to link with -whole-archive.

@abadams
Copy link
Member

abadams commented Aug 31, 2020

Can you elaborate on the weak linkage thing? I didn't think that was the case. My understanding was that the plugins rely on having their static constructors called at load time, and need to use symbols from the parent library, and that both of these things were normal on Windows.

If a library is shared, all platforms behave similarly at runtime. At build time. the plugin and the executable both link directly to the same shared library. The library gets loaded when the application loads. Then when the plugin gets loaded, its dependency on the library gets resolved to the previously loaded copy. This is only no-compromises, cross-platform way for loadable plugins to work.

I thought this is what we were already doing.

Referencing symbols from a parent executable is totally different. Because the Windows loader has absolutely no form of dynamic lookup, a plugin referencing symbols in an executable must still link to a .lib import library at build time. CMake makes this convenient using its ENABLE_EXPORTS feature, which generates the import library for you. Then you still have to link your plugin to the executable. It papers over differences on other platforms, too. On macOS, it's equivalent to linking directly to the executable using -bundle_loader. On Linux, it does nothing special at all because Linux's loader is like a honey badger...

But now we can see why trying to reference symbols from a parent static library is unportable. On Windows, the static library just gets completely absorbed into the parent application. Every application has its symbols at different positions, so there's no universal .lib to which we could link a universally compatible plugin (remember, no dynamic lookup). Linux and macOS each have their own special dynamic lookup implementations, so you build the plugin with platform-dependent flags (resp. -rdynamic and -undefined dynamic_lookup) and then build the executable such that it exports its own symbols.

If this works, it's a side-effect of the current design, not intentional. You should not be statically linking libHalide into a generator binary - that makes every generator binary huge.

I wonder if there might be hidden issues with compiling against a non-shared libHalide; Halide uses globals in various ways and it doesn't seem impossible that having non-shared sets of these could cause amusing issues (e.g. when creating unique names).

There probably are. The other consequence of no dynamic symbol lookup is that we can't rely on static initialization to call Pipeline::add_autoscheduler because the static pipeline globals on Windows are indeed duplicated between the executable and the plugin. The executable therefore has to look for known entry points, here these are AutoschedulerName and AutoschedulerRun. Even on macOS/Linux, relying on static initialization is awkward because we require applications to link with -whole-archive.

I don't understand how the static initialization could be duplicated. The only translation unit it exists in is one that went into the plugin. That code is hidden inside a .cpp file inside the autoscheduler.

@abadams
Copy link
Member

abadams commented Aug 31, 2020

Oh you're saying the static in Pipeline::get_autoscheduler_map will be duplicated, not the static initializer itself.

So there are multiple ways in which things can't work if we statically link libHalide, but the current design seems like it works fine with a shared libHalide. So why redesign?

EDIT: Also that's not the only static hidden inside libHalide that the plugin needs, so just changing things to avoid that one usage isn't going to make a static libHalide work with a plugin.

@alexreinking
Copy link
Member Author

alexreinking commented Aug 31, 2020

I thought this is what we were already doing.

It is, for shared libs.

If this works, it's a side-effect of the current design, not intentional. You should not be statically linking libHalide into a generator binary - that makes every generator binary huge.

Then why are we using -rdynamic and -undefined dynamic_lookup? That's the whole point of those flags. If you have app -(shared)-> libHalide <-(shared)- plugin, there's no need for the plugin to look at symbols in app, is there?

Oh you're saying the static in Pipeline::get_autoscheduler_map will be duplicated, not the static initializer itself.

Correct.

So why redesign?

To allow plugins to work with statically linked generator binaries. Or statically linked JIT applications. There's no reason it shouldn't work or we shouldn't allow it.

@abadams
Copy link
Member

abadams commented Aug 31, 2020

I don't believe it's possible to get statically linked generator binaries to work with plugins. They can use arbitrary statics inside libHalide. (E.g. the ones in IROperator.cpp)

@abadams
Copy link
Member

abadams commented Aug 31, 2020

As far as I can tell, the makefile doesn't use rdynamic for libHalide. It uses it for the tests to make define_extern work when JITting (because shared libHalide needs to resolve a symbol in the test binary).

@abadams
Copy link
Member

abadams commented Aug 31, 2020

Maybe the static variant we should support is a static libHalide with static autoschedulers. If you want things to be static, you can link the autoschedulers into the generator binary and still use -s, if not -p.

@alexreinking
Copy link
Member Author

alexreinking commented Aug 31, 2020

Then there are three paths forward:

  1. Clarify that the autoschedulers cannot be used or built against static Halide and enforce this in CMake.
  2. Fix our use of global state. As far as I can tell from a quick git grep of src/, it's just the unique name counters that conflict. Everything else is either constant (or effectively so).
  3. What you said.

@alexreinking
Copy link
Member Author

Maybe the static variant we should support is a static libHalide with static autoschedulers. If you want things to be static, you can link the autoschedulers into the generator binary and still use -s, if not -p.

This is what I meant by

We could also look into baking plugins in to our library when building statically, like Qt does. But this would be a significantly more invasive change.

above.

@abadams
Copy link
Member

abadams commented Aug 31, 2020

Ah, I missed that part. I think loading two copies of libHalide is probably a bad idea, no matter what we do with our globals. We should just go with option 1.

@abadams
Copy link
Member

abadams commented Aug 31, 2020

People are welcome to also statically link the plugin into the generator binary, but we don't have to support that in the cmake build. They can write their own build config if they want to do that.

@alexreinking
Copy link
Member Author

alexreinking commented Sep 2, 2020

This is what I meant by

We could also look into baking plugins in to our library when building statically, like Qt does. But this would be a significantly more invasive change.

above.

Actually, I misunderstood you. See (2) below.

Maybe the static variant we should support is a static libHalide with static autoschedulers. If you want things to be static, you can link the autoschedulers into the generator binary and still use -s, if not -p.

I played around with doing this, and there are two approaches.

  1. Do what I think you're suggesting and compile the autoscheduler totally separately, then link it in with the platform-specific whole-archive flag.
  2. Do what I was originally suggesting (and now realize is different), which is to roll the autoschedulers directly into libHalide.a. This would require staging the construction of that static library into first building the core of Halide, using those objects to compile the autoscheduler, then merging everything together into the final static lib. Don't create intermediate libraries; just use collections of object files to avoid symbols getting dropped. This is possible with CMake, but not easy.

Assuming we want to support a static scenario, I think (2) is probably worth the effort since you don't want to be compiling with whole-archive if you can avoid it, anyway and it would certainly be more convenient, linking-wise.

@abadams
Copy link
Member

abadams commented Sep 2, 2020

I was indeed suggesting 1, because of the tricky issues you raise with 2.

edit to add: but I also think we should just not support static linkage for now and just let people figure it out themselves if they have some weird need for it.

@alexreinking
Copy link
Member Author

edit to add: but I also think we should just not support static linkage for now and just let people figure it out themselves if they have some weird need for it.

That's fine by me and I'm currently re-working this PR to use the static initializers and not build the autoschedulers when building Halide as a static library.

@alexreinking alexreinking force-pushed the refactor/autoschedulers branch from 2f672c8 to dac3d93 Compare September 6, 2020 19:39
@alexreinking alexreinking force-pushed the refactor/autoschedulers branch 3 times, most recently from a58deae to 7fe187c Compare September 9, 2020 22:33
@alexreinking alexreinking removed the skip_buildbots Do not run buildbots on this PR. Must add before opening PR as we scan labels immediately. label Sep 9, 2020
@alexreinking
Copy link
Member Author

Rebased on master... removing skip_buildbots to verify that CMake is still working everywhere. Still need support updating the Makefiles.

@alexreinking alexreinking marked this pull request as ready for review September 13, 2020 03:04
@steven-johnson
Copy link
Contributor

space

I deleted some dead stuff to reclaim space. (The last buildbot update renamed some things but didn't delete old stuff.)

@alexreinking
Copy link
Member Author

Single failure is due to Hexagon issues upstream. Woot! Please review & we can merge and cut releases

@steven-johnson
Copy link
Contributor

We can merge and cut releases

We can merge, but any 'releases' we cut will need to be labeled as 'we know this is broken, do not use long term' until the upstream issue is fixed.

@steven-johnson steven-johnson merged commit aef3f1b into master Sep 15, 2020
@alexreinking
Copy link
Member Author

We can merge and cut releases

We can merge, but any 'releases' we cut will need to be labeled as 'we know this is broken, do not use long term' until the upstream issue is fixed.

Releasing versions 9 and 10 shouldn't be affected

@steven-johnson
Copy link
Contributor

Ah, gotcha. (FYI, the buildbots aren't building LLVM9 at all now, and haven't for a few months. We could add that back, but might be simpler to just build those releases locally somewhere.)

@alexreinking
Copy link
Member Author

Ah, gotcha. (FYI, the buildbots aren't building LLVM9 at all now, and haven't for a few months. We could add that back, but might be simpler to just build those releases locally somewhere.)

I have everything at my desk except for ARM hardware.

@abadams
Copy link
Member

abadams commented Sep 15, 2020

I don't think there's a pressing need to release a version 9 if it's a pain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

code_cleanup No functional changes. Reformatting, reorganizing, or refactoring existing code.

Projects

None yet

3 participants