Skip to content

Conversation

@d-smirnov
Copy link
Contributor

Calling Map::Set invalidates existing iterators to protect from
using already deleted data due to re-hashing

Related to the issue fixed here

@junrushao
Copy link
Member

junrushao commented Jan 5, 2022

Hey @d-smirnov for this PR!

Iterator invalidation is a problem that exists for multiple languages, and a general rule of thumb is to avoid modifying a container when iterating it - even though STL impl guarantees that for associative containers, inserting a new element doesn't invalidate iterators that point to existing elements.

I personally got bitten by this issue once in the past year, and thus am supportive of more careful checks. However, runtime check is neither sufficient (consider the multi-threaded case) nor efficient (incur runtime overhead). Therefore, I was wondering if you might want to update this PR with the following changes:

  • Add similar checks for Array
  • Macro-guard the checks so that it only exists in debug build
  • Use atomic counter and atomic comparison

@d-smirnov
Copy link
Contributor Author

d-smirnov commented Jan 5, 2022

@junrushao1994

Macro-guard the checks so that it only exists in debug build

What macro should be used to check whether build is Debug or not? It seems setting CMAKE_BUILD_TYPE=Debug adds -g flag to compiler options but not passes DEBUG (or whatever it is called) macro to compiler. Could you advise on this, please.

Will proceed with NDEBUG

@d-smirnov
Copy link
Contributor Author

@junrushao1994
NDEBUG Cannot be used due to:

  1. It switched on/off via usage of set(USE_RELAY_DEBUG ON) setting but not by CMAKE_BUILD_TYPE=Debug
  2. Currently it is always set to NDEBUG=1 as per comment in cmake/modules/LLVM.cmake:
 18 # LLVM rules
 19 # Due to LLVM debug symbols you can sometimes face linking issues on
 20 # certain compiler, platform combinations if you don't set NDEBUG.
 21 #
 22 # See https://github.com/imageworks/OpenShadingLanguage/issues/1069
 23 # for more discussion.
 24 add_definitions(-DDMLC_USE_FOPEN64=0 -DNDEBUG=1)

Could you advise on this, please?

@junrushao
Copy link
Member

Thanks @d-smirnov for trying this out! TBH I don't have much experience on which macro to use. @tqchen do you have any ideas? Thanks in advance!

@tqchen
Copy link
Member

tqchen commented Jan 7, 2022

TVM_LOG_DEBUG might be a good candidate given it is used to control DCHECK etc. I agree that iterator concurrent update is a problem of stl containers as well so we should not include the code in a prod mode.

@d-smirnov
Copy link
Contributor Author

@tqchen Since we virtually have no NDEBUG macro available (due to LLVM issue), could a separate debug macro (i.e. TVM_DEBUG or similar) be considered instead to be present in pre-processor if Debug build configured via cmake? This is mostly because TVM_LOG_DEBUG needs specific steps for configuration which not as simple and straightforward as switching between Debug/Release builds.

@junrushao1994 I might be wrong but it looks like Array does not need this type of checks. Could you advise on this, please

@junrushao
Copy link
Member

@d-smirnov Thanks for investigating in the Array implementation! The particular case I encountered in the past year is that when iterating an Array, changing its contents might lead to segfault, so it might make sense to protect Array as well :-)

@kparzysz-quic
Copy link
Contributor

The counter doesn't need to be atomic. The structure is not thread-safe anyways, so making the counter atomic doesn't do much.

@junrushao
Copy link
Member

@kparzysz-quic ok that's a fair point!

@areusch
Copy link
Contributor

areusch commented Apr 8, 2022

@d-smirnov want to update this PR? i believe we can remove the atomic counter and then it seems like it could be ready to merge after that.

Calling Map::Set invalidates exising iterators to protect from
using already deleted data due to re-hashing

Change-Id: Ib6b580758e74c8b77ed560932d87b643bd6c9402
Now uses TVM_LOG_DEBUG
Map state_marker made atomic

Change-Id: I090c4b33e6edaa977cccba11f8d1c6ff3fbca430
Change-Id: I7bd930cb52d58ca10fd49a5fe8f5d48b3e955d0a
@d-smirnov
Copy link
Contributor Author

@areusch I removed atomics. My concerns are mostly about "dormant state" of this changes due to usage of TVM_LOG_DEBUG requires extra efforts.

@areusch
Copy link
Contributor

areusch commented Apr 11, 2022

@kparzysz-quic and I were discussing updating the LLVM used in CI to have -DLLVM_ENABLE_ASSERTIONS either in the full CI or in a specific build. maybe wherever that's done is a good spot to include -DUSE_RELAY_DEBUG

@areusch
Copy link
Contributor

areusch commented Apr 11, 2022

let's discuss whether we think we can enable USE_RELAY_DEBUG in CI, since otherwise this PR is essentially a no-op. i think this would require us to either create ci_cpu_asserts and run the unittests there, or to decide that ci_cpu or ci_i386 should be dedicated to running with asserts/USE_RELAY_DEBUG on. i sort of favor creating ci_cpu_asserts...thoughts @tqchen @driazati @leandron ?

@driazati
Copy link
Member

I'd also be in favor of having a more stringent build like ci_cpu_asserts where USE_RELAY_DEBUG is on, and maybe we also build and run unit tests with Clang's AddressSanitizer (but thats a bit of a separate discussion)

@areusch
Copy link
Contributor

areusch commented Apr 19, 2022

ok so let's merge this given the group discussion about moving towards a CI image that has more asserts and USE_RELAY_DEBUG enabled.

@areusch areusch merged commit 5987982 into apache:main Apr 19, 2022
@rebel-shshin
Copy link
Contributor

rebel-shshin commented Apr 22, 2022

@areusch @driazati Hi guys, thanks for your contribution to TVM. I think I found a bug related with this PR. After merging this branch to main, the cpp tests are going to be failed with segfault. Here are the logs.

image
image

All the failed cases are raised when using map. When I revert this patch, CPP tests were passed.
Do you have any idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants