Skip to content

Conversation

@Hzfengsy
Copy link
Member

std::regex in TVM codebase may cause a symbol conflict with PyTorch, we temporarily disable it before we find a better solution, meanwhile the current usage of std::regex is not necessary.

cc @tqchen

`std::regex` in TVM codebase may cause a symbol conflict with PyTorch,
we temporarily disable it before we find a better solution, meanwhile
the current usage of `std::regex` is not necessary.
@github-actions github-actions bot requested a review from tqchen December 15, 2023 09:12
@tqchen
Copy link
Member

tqchen commented Dec 15, 2023

let us update the testcase for now(global var patterns) given the regex feature is not freq used yet here

@junrushao junrushao merged commit f794db4 into apache:unity Dec 16, 2023
@Hzfengsy Hzfengsy deleted the disable_regex branch December 19, 2023 02:47
@Lunderberg
Copy link
Contributor

Is there a reproducible test case that can trigger the symbol conflict? I don't see any in this PR, and haven't run into this issue in the past.

I have multiple upcoming PRs that use std::regex, and while I would like to avoid causing breakage from doing so, there aren't enough details here to try to reproduce the issue.

@tqchen
Copy link
Member

tqchen commented Jan 16, 2024

@Lunderberg this only happens when tvm runs together with PyTorch and in the case of the main examples.

Unfortunately it is hard enough to trigger a UT. A rule of thumb is to avoid regex and use exact or prefix match for now.

Likely the reason is both tvm and PT depends on some form of regex but there is an C++ ABI issue

@Lunderberg
Copy link
Contributor

Lunderberg commented Jan 16, 2024

Thank you. I haven't been able to reproduce the issue locally, and I want to avoid leaving this open to potential regressions in the future.

For the environments where this issue occurs, what is the value of torch.compiled_with_cxx11_abi()? I suspect that we'll need the -D_GLIBCXX_USE_CXX11_ABI=0 whenever torch.compiled_with_cxx11_abi() returns, but I can't find why that wouldn't have caused an issue before now (e.g. with std::string).

@tqchen
Copy link
Member

tqchen commented Jan 16, 2024

The previous repro happens when we use ci docker and run a particular test(pytorch frontend qnn ) AFAIK. We will run another unity main test before transition, will keep folks posted.

@Lunderberg
Copy link
Contributor

Thank you. I wonder if it's related to #9362. I'm testing with set(HIDE_PRIVATE_SYMBOLS OFF) to see if that allows it to be reproduced.

@tqchen
Copy link
Member

tqchen commented Jan 16, 2024

This happens when HIDE_PRIVATE_SYMBOLS OFF

@Lunderberg
Copy link
Contributor

Lunderberg commented Jan 16, 2024

Strange that the CI didn't run into it as reported in #13156. Edit: Looks like that specific unit test isn't enabled for the unity branch.

@Lunderberg
Copy link
Contributor

Ooh boy, tracking down the CXXABI rabbit hole

  • gcc only provides Dual ABI support for pre-C++11 classes (e.g. std::string), and not for C++11 onward (e.g. std::regex). The Dual ABI is only intended for compatibility with C++ code that predates C++11. (link)
  • Even though it should only be used for backwards compatibility, Pytorch defaults to being compiled with USE_CXX11_ABI=0. This is to make sure that the wheels can be compiled under the manylinux docker image, required by PyPI. (link)
  • The manylinux docker image is based on CentOS 7 (link)
  • The compiler recommended by PEP-513 for use in the manylinux container does not support the USE_CXX11_ABI=1 flag. This is a CentOS 7 limitation, and is closed as cantfix (link).

@Lunderberg
Copy link
Contributor

It looks like we can't match TVM's ABI flag to whichever one pytorch uses, because -D_USE_CXX11_ABI=0 causes breakage with default LLVM installations.

@Lunderberg
Copy link
Contributor

Hmm. Pytorch's pip index (link) has USE_CXX11_ABI=1 builds, but only for CPU-only versions of pytorch. The CUDA-enabled pytorch wheels still have USE_CXX11_ABI=0.

@Lunderberg
Copy link
Contributor

Short of requiring users to compile pytorch from source, which I think is too large of a setup burden to require, I don't think there's a good solution until pytorch starts providing wheels with USE_CXX11_ABI=1.

Since the tests/python/frontend/pytorch/qnn_test.py::test_serialized_modules unit test can consistently reproduce this error, we should include it in the unity test suite. Existing use of std::regex can be replaced with a packed func that calls into Python re module

@tqchen
Copy link
Member

tqchen commented Jan 16, 2024

do we have a strong demand for regex? I feel most of them can be replaced by a callback check, or some more restricted version of prefix matching

@Lunderberg
Copy link
Contributor

I think regexes are a reasonable way to extend user-facing utilities that select a name. Prefix matching assumes that the user has control over the naming scheme in order to provide a common prefix, and callbacks may be more general than desired. I think regex matching is a good middle ground between those two.

@Lunderberg
Copy link
Contributor

I've submitted #16412, a linting rule to raise an error on use #include <regex>. I expect this check to fail due to the current use of std::regex. After the std::regex use is replaced, the lint can then be enabled to avoid additional regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants