-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-5365: [C++][CI] Enable ASAN/UBSAN in CI #4347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It looks like there is a UBSan bug in flatbuffers. I've submitted a patch upstream: google/flatbuffers#5355 |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. Some comments below.
cpp/src/arrow/array/builder_binary.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add this to all Append methods which take variable-length data? Perhaps instead we want to ensure we never pass a null pointer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. The never passing a nullptr is kind of the approach I've taken in other places (but they feel a little hacky). Please let me know if those look OK, and I can take a similar approach with these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run the builder benchmarks before and after the change? If there's no significant difference, then looks ok to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had a chance yet. Should get to this in the next few days. I will ping you when I do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running benchmarks, and I don't trust them on my box. The biggest thing that seemed to change was stdev (without corresponding changes to the actual benchmark). Are the benchmarks setup yet on Ursa's hardware, can we use those? Otherwise I will go through and find all instances of where we pass through nullptr as a precaution (or we could turn off the check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emkornfield stddev is irrelevant unless it becomes too large. If the min or average numbers didn't change significantly, then I'd say things are good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ursabot benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fsaintjacques I'm not seeing an update from Ursa bot should I try to squash/force push?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only works with global comments it seems.
|
I took a look at the thread pool issue. It seems this is a case similar to google/sanitizers#911. The only way I've found to make it disappear is to disable this particular check: diff --git a/cpp/cmake_modules/san-config.cmake b/cpp/cmake_modules/san-config.cmake
index 1ef2299dd..bb7bb63c7 100644
--- a/cpp/cmake_modules/san-config.cmake
+++ b/cpp/cmake_modules/san-config.cmake
@@ -41,7 +41,7 @@ if(${ARROW_USE_UBSAN})
endif()
set(
CMAKE_CXX_FLAGS
- "${CMAKE_CXX_FLAGS} -fsanitize=undefined -fno-sanitize=alignment,vptr -fno-sanitize-recover=all"
+ "${CMAKE_CXX_FLAGS} -fsanitize=undefined -fno-sanitize=alignment,vptr,function -fno-sanitize-recover=all"
)
endif()
|
cpp/src/arrow/buffer-builder.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should make this consistent with naming in flatbuffers (either field or constant case, not one in each location)
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is still WIP but here are some comments.
cpp/src/arrow/array/builder_binary.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run the builder benchmarks before and after the change? If there's no significant difference, then looks ok to me.
|
Still todo:
|
|
cc'ing @wesm for the Parquet / Thrift changes. |
|
@ursabot benchmark |
|
I've successfully started builds for this PR |
|
AMD64 Ubuntu 18.04 C++ Benchmark (#12878) builder failed with an exception. Revision: dba2b4ec9ce83f799d5566cf7b4fcb0fc8f22110 Archery: Traceback (most recent call last):
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1475, in gotResult
_inlineCallbacks(r, g, status)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
--- <exception caught here> ---
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/buildstep.py", line 566, in startStep
self.results = yield self.run()
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/home/ursabot/ursabot/ursabot/steps.py", line 42, in run
await log.addContent(content)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/log.py", line 130, in addContent
return self.lbf.append(text)
File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/util/lineboundaries.py", line 62, in append
text = self.newline_re.sub('\n', text)
builtins.TypeError: expected string or bytes-like object |
|
@fsaintjacques You can ignore the trace above |
|
@emkornfield have you rebased with master? |
|
@fsaintjacques just did, sorry didn't make the connection that I had to |
|
@emkornfield I'll need you to rebase soon-ish (once more), applying fix. |
|
While I get this merged, I think you did introduce a regression :) Notably the last 2, -10% and -14%, but integer builders are up by 14 and 20%. I'm not sure if we can trust this. (This is on my local desktop). |
|
@emkornfield if you want to run locally, |
|
Once #4434 is merged, ursabot will be able to report again. |
|
@fsaintjacques thanks, looks like I will need to take a less global approach here :( |
|
@emkornfield the results vary a lot when I run it locally, so you might not have introduced a regression. |
|
@fsaintjacques thanks for the update, I'm glad I'm not the only one seeing randomness. @fsaintjacques @pitrou what are you thoughts on merging this (any other ideas on validating performance regressions or not (i.e. what would you like to see to get this PR merged)? FWIW, I'm not going to do the docker-compose as part of this PR, as its running into build issues with the compiler switch to clang, and I would prefer not to sink time into figuring it out right now (essentially protobuf building looks for a clang specific sanitizer header that it can't find, if anyone has seen that before). CC @wesm |
Codecov Report
@@ Coverage Diff @@
## master #4347 +/- ##
==========================================
- Coverage 88.41% 88.41% -0.01%
==========================================
Files 793 794 +1
Lines 101335 101362 +27
Branches 1253 1253
==========================================
+ Hits 89598 89617 +19
- Misses 11490 11498 +8
Partials 247 247
Continue to review full report at Codecov.
|
|
@emkornfield No problem with me. As a side note, it would be nice if the docker-compose setup could be documented some day. |
|
I ran the benchmarks twice locally without rebuilding on this branch and there is a lot of variability. @fsaintjacques these benchmarks seem like they (?) aren't being run for enough iterations to yield a precise result. The changes here seem to have resulted in this change to not run the benchmarks for many iterations: |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This enables ASAN/UBSAN in the clang builds. Pushing this for PR early to see if there are any comments on approach or additional checks to remove (the first two fixes where related to passing null values to memcpy).
Note the UBSAN behavior should already be configured to exclude unaligned pointer errors (I'll file a separate JIRA for fixing those).
Remaining for this PR:
CC @wesm @pitrou in case you have any early feedback