ARROW-5365: [C++][CI] Enable ASAN/UBSAN in CI #4347

emkornfield · 2019-05-20T05:18:01Z

This enables ASAN/UBSAN in the clang builds. Pushing this for PR early to see if there are any comments on approach or additional checks to remove (the first two fixes where related to passing null values to memcpy).

Note the UBSAN behavior should already be configured to exclude unaligned pointer errors (I'll file a separate JIRA for fixing those).

Remaining for this PR:

Try to fix as many UBSAN/ASAN errors as possible (going to time box this to a week), and then blacklist the rest with a follow-up JIRA

CC @wesm @pitrou in case you have any early feedback

emkornfield · 2019-05-20T06:58:22Z

It looks like there is a UBSan bug in flatbuffers. I've submitted a patch upstream: google/flatbuffers#5355

pitrou

Thanks for doing this. Some comments below.

cpp/cmake_modules/DefineOptions.cmake

pitrou · 2019-05-20T13:44:37Z

cpp/src/arrow/array/builder_binary.h

Do we want to add this to all Append methods which take variable-length data? Perhaps instead we want to ensure we never pass a null pointer here.

Probably. The never passing a nullptr is kind of the approach I've taken in other places (but they feel a little hacky). Please let me know if those look OK, and I can take a similar approach with these.

Did you run the builder benchmarks before and after the change? If there's no significant difference, then looks ok to me.

I haven't had a chance yet. Should get to this in the next few days. I will ping you when I do.

I tried running benchmarks, and I don't trust them on my box. The biggest thing that seemed to change was stdev (without corresponding changes to the actual benchmark). Are the benchmarks setup yet on Ursa's hardware, can we use those? Otherwise I will go through and find all instances of where we pass through nullptr as a precaution (or we could turn off the check).

#4285 once this is merged, we can call @ursabot to the rescue.

@emkornfield stddev is irrelevant unless it becomes too large. If the min or average numbers didn't change significantly, then I'd say things are good.

@ursabot benchmark

@fsaintjacques I'm not seeing an update from Ursa bot should I try to squash/force push?

Only works with global comments it seems.

pitrou · 2019-05-20T14:28:30Z

I took a look at the thread pool issue. It seems this is a case similar to google/sanitizers#911. The only way I've found to make it disappear is to disable this particular check:

diff --git a/cpp/cmake_modules/san-config.cmake b/cpp/cmake_modules/san-config.cmake
index 1ef2299dd..bb7bb63c7 100644
--- a/cpp/cmake_modules/san-config.cmake
+++ b/cpp/cmake_modules/san-config.cmake
@@ -41,7 +41,7 @@ if(${ARROW_USE_UBSAN})
   endif()
   set(
     CMAKE_CXX_FLAGS
-    "${CMAKE_CXX_FLAGS} -fsanitize=undefined -fno-sanitize=alignment,vptr -fno-sanitize-recover=all"
+    "${CMAKE_CXX_FLAGS} -fsanitize=undefined -fno-sanitize=alignment,vptr,function -fno-sanitize-recover=all"
     )
 endif()

emkornfield · 2019-05-26T06:12:54Z

cpp/src/arrow/buffer-builder.h

I should make this consistent with naming in flatbuffers (either field or constant case, not one in each location)

pitrou

I suppose this is still WIP but here are some comments.

cpp/src/arrow/buffer-builder.h

cpp/src/arrow/ipc/json-test.cc

pitrou · 2019-05-27T15:36:30Z

cpp/src/arrow/array/builder_binary.h

Did you run the builder benchmarks before and after the change? If there's no significant difference, then looks ok to me.

cpp/src/arrow/json/test-common.h

cpp/src/parquet/schema.cc

emkornfield · 2019-05-28T05:41:31Z

Still todo:

Run benchmarks for buffer building.
Verify docker-compose changes work.

cpp/src/arrow/util/ubsan.h

cpp/src/parquet/schema.cc

pitrou · 2019-05-28T09:05:07Z

cc'ing @wesm for the Parquet / Thrift changes.

fsaintjacques · 2019-05-31T11:53:41Z

@ursabot benchmark

ursabot · 2019-05-31T11:53:42Z

I've successfully started builds for this PR

ursabot · 2019-05-31T12:01:44Z

AMD64 Ubuntu 18.04 C++ Benchmark (#12878) builder failed with an exception.

Revision: dba2b4ec9ce83f799d5566cf7b4fcb0fc8f22110

Archery: 'archery benchmark ...' step's traceback:

Traceback (most recent call last):
  File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1475, in gotResult
    _inlineCallbacks(r, g, status)
  File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
--- <exception caught here> ---
  File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/buildstep.py", line 566, in startStep
    self.results = yield self.run()
  File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/ursabot/ursabot/ursabot/steps.py", line 42, in run
    await log.addContent(content)
  File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/log.py", line 130, in addContent
    return self.lbf.append(text)
  File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/util/lineboundaries.py", line 62, in append
    text = self.newline_re.sub('\n', text)
builtins.TypeError: expected string or bytes-like object

kszucs · 2019-05-31T12:17:58Z

@fsaintjacques You can ignore the trace above <exception caught here>. The remaining stacktrace indicates that the jsonlines file is either empty or mailformed (doesn't contain any newlines).

fsaintjacques · 2019-05-31T12:19:18Z

@emkornfield have you rebased with master?

emkornfield · 2019-05-31T12:22:46Z

@fsaintjacques just did, sorry didn't make the connection that I had to

fsaintjacques · 2019-05-31T12:25:30Z

@emkornfield I'll need you to rebase soon-ish (once more), applying fix.

fsaintjacques · 2019-05-31T12:34:16Z

While I get this merged, I think you did introduce a regression :)

{"benchmark": "BuildAdaptiveIntNoNulls", "change": 0.1617474450770153, "regression": false, "baseline": 5129060131.229492, "contender": 5958672503.102243, "unit": "bytes_per_second", "less_is_better": false, "suite": "arrow-builder-benchmark"}
{"benchmark": "BuildStringDictionaryArray", "change": -0.006711359103828405, "regression": false, "baseline": 612759619.4875532, "contender": 608647169.636847, "unit": "bytes_per_second", "less_is_better": false, "suite": "arrow-builder-benchmark"}   
{"benchmark": "BuildInt64DictionaryArraySequential", "change": 0.057169940438590444, "regression": false, "baseline": 1330896310.7767181, "contender": 1406983573.5937629, "unit": "bytes_per_second", "less_is_better": false, "suite": "arrow-builder-benchmark"}                                                           
{"benchmark": "BuildBooleanArrayNoNulls", "change": 0.08337061696806138, "regression": false, "baseline": 3543833863.3790855, "contender": 3839285479.0013084, "unit": "bytes_per_second", "less_is_better": false, "suite": "arrow-builder-benchmark"}                                                                       
{"benchmark": "ArrayDataConstructDestruct", "change": 0.06058180397997656, "regression": true, "baseline": 38068.15628400451, "contender": 40374.39386588118, "unit": "ns", "less_is_better": true, "suite": "arrow-builder-benchmark"}             
{"benchmark": "BuildAdaptiveIntNoNullsScalarAppend", "change": -0.03943874316966369, "regression": false, "baseline": 3200117738.787092, "contender": 3073909117.174383, "unit": "bytes_per_second", "less_is_better": false, "suite": "arrow-builder-benchmark"}
{"benchmark": "BuildIntArrayNoNulls", "change": 0.21810919190859968, "regression": false, "baseline": 2528914452.5297303, "contender": 3080493940.1769686, "unit": "bytes_per_second", "less_is_better": false, "suite": "arrow-builder-benchmark"}
{"benchmark": "BuildInt64DictionaryArrayRandom", "change": 0.06291035570510808, "regression": false, "baseline": 1283452148.015481, "contender": 1364194579.17762, "unit": "bytes_per_second", "less_is_better": false, "suite": "arrow-builder-benchmark"}
{"benchmark": "BuildInt64DictionaryArraySimilar", "change": 0.012352011237130539, "regression": false, "baseline": 738692005.7938592, "contender": 747816337.7502035, "unit": "bytes_per_second", "less_is_better": false, "suite": "arrow-builder-benchmark"}
{"benchmark": "BuildFixedSizeBinaryArray", "change": 0.04800722030995441, "regression": false, "baseline": 1401057068.1653943, "contender": 1468317923.5036292, "unit": "bytes_per_second", "less_is_better": false, "suite": "arrow-builder-benchmark"}
{"benchmark": "BuildChunkedBinaryArray", "change": -0.10294213300067846, "regression": true, "baseline": 1140851343.80984, "contender": 1023409673.0413647, "unit": "bytes_per_second", "less_is_better": false, "suite": "arrow-builder-benchmark"}
{"benchmark": "BuildBinaryArray", "change": -0.14401897751689627, "regression": true, "baseline": 532744280.186322, "contender": 456018993.675913, "unit": "bytes_per_second", "less_is_better": false, "suite": "arrow-builder-benchmark"}

Notably the last 2, -10% and -14%, but integer builders are up by 14 and 20%. I'm not sure if we can trust this. (This is on my local desktop).

fsaintjacques · 2019-05-31T12:41:44Z

@emkornfield if you want to run locally, archery benchmark diff --suite-filter=arrow-builder-benchmark --benchmark-filter=""

fsaintjacques · 2019-05-31T12:44:55Z

Once #4434 is merged, ursabot will be able to report again.

emkornfield · 2019-05-31T12:46:14Z

@fsaintjacques thanks, looks like I will need to take a less global approach here :(

fsaintjacques · 2019-05-31T13:04:08Z

@emkornfield the results vary a lot when I run it locally, so you might not have introduced a regression.

emkornfield · 2019-06-01T07:36:34Z

@fsaintjacques thanks for the update, I'm glad I'm not the only one seeing randomness. @fsaintjacques @pitrou what are you thoughts on merging this (any other ideas on validating performance regressions or not (i.e. what would you like to see to get this PR merged)?

FWIW, I'm not going to do the docker-compose as part of this PR, as its running into build issues with the compiler switch to clang, and I would prefer not to sink time into figuring it out right now (essentially protobuf building looks for a clang specific sanitizer header that it can't find, if anyone has seen that before).

CC @wesm

codecov-io · 2019-06-01T08:18:58Z

Codecov Report

Merging #4347 into master will decrease coverage by <.01%.
The diff coverage is 87.3%.

@@            Coverage Diff             @@
##           master    #4347      +/-   ##
==========================================
- Coverage   88.41%   88.41%   -0.01%     
==========================================
  Files         793      794       +1     
  Lines      101335   101362      +27     
  Branches     1253     1253              
==========================================
+ Hits        89598    89617      +19     
- Misses      11490    11498       +8     
  Partials      247      247

Impacted Files	Coverage Δ
cpp/src/parquet/types.h	`100% <ø> (ø)`	⬆️
cpp/src/parquet/metadata.cc	`90.4% <0%> (-0.37%)`	⬇️
cpp/src/arrow/buffer-builder.h	`100% <100%> (ø)`	⬆️
cpp/src/arrow/io/memory.cc	`90.42% <100%> (+0.05%)`	⬆️
cpp/src/arrow/ipc/metadata-internal.cc	`90.1% <100%> (+0.04%)`	⬆️
cpp/src/arrow/json/test-common.h	`100% <100%> (ø)`	⬆️
cpp/src/arrow/ipc/json-test.cc	`98.49% <100%> (ø)`	⬆️
cpp/src/parquet/schema.cc	`90.61% <100%> (+0.21%)`	⬆️
cpp/src/arrow/util/ubsan.h	`100% <100%> (ø)`
cpp/src/arrow/util/rle-encoding.h	`99.56% <100%> (ø)`	⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa18d25...bba5824. Read the comment docs.

pitrou · 2019-06-01T11:40:18Z

@emkornfield No problem with me.

As a side note, it would be nice if the docker-compose setup could be documented some day.

wesm · 2019-06-03T16:58:54Z

I ran the benchmarks twice locally without rebuilding on this branch and there is a lot of variability.

$ ./release/arrow-builder-benchmark 
2019-06-03 11:56:03
Running ./release/arrow-builder-benchmark
Run on (6 X 4400 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 12288K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
---------------------------------------------------------------------------
Benchmark                                    Time           CPU Iterations
---------------------------------------------------------------------------
BuildBooleanArrayNoNulls             103547484 ns  103543125 ns          6   2.41445GB/s
BuildIntArrayNoNulls                 397660135 ns  397656222 ns          2   643.772MB/s
BuildAdaptiveIntNoNulls               93693446 ns   93689377 ns          7   2.66839GB/s
BuildAdaptiveIntNoNullsScalarAppend   85571042 ns   85569211 ns          8   2.92161GB/s
BuildBinaryArray                    1021960510 ns 1021942636 ns          1   250.503MB/s
BuildChunkedBinaryArray              626873787 ns  626860046 ns          1   408.385MB/s
BuildFixedSizeBinaryArray            601008442 ns  600996572 ns          1   425.959MB/s
BuildInt64DictionaryArrayRandom      396085207 ns  396079988 ns          2   646.334MB/s
BuildInt64DictionaryArraySequential  388689218 ns  388682825 ns          2   658.635MB/s
BuildInt64DictionaryArraySimilar     558866272 ns  558851210 ns          1   458.083MB/s
BuildStringDictionaryArray           849311744 ns  849289510 ns          1   402.142MB/s
ArrayDataConstructDestruct               47162 ns      47161 ns      14785
(arrow-3.7) 11:56 ~/code/arrow/cpp/build  (use_asan)$ ./release/arrow-builder-benchmark 
2019-06-03 11:56:39
Running ./release/arrow-builder-benchmark
Run on (6 X 4400 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 12288K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
---------------------------------------------------------------------------
Benchmark                                    Time           CPU Iterations
---------------------------------------------------------------------------
BuildBooleanArrayNoNulls             105403765 ns  105402687 ns          6   2.37186GB/s
BuildIntArrayNoNulls                 404681031 ns  404672432 ns          2    632.61MB/s
BuildAdaptiveIntNoNulls               96711289 ns   96706007 ns          7   2.58515GB/s
BuildAdaptiveIntNoNullsScalarAppend   89377305 ns   89374632 ns          8   2.79721GB/s
BuildBinaryArray                    1063720888 ns 1063711698 ns          1   240.667MB/s
BuildChunkedBinaryArray              662845635 ns  662830863 ns          1   386.222MB/s
BuildFixedSizeBinaryArray            625139081 ns  625118533 ns          1   409.522MB/s
BuildInt64DictionaryArrayRandom      417338914 ns  417334943 ns          2   613.416MB/s
BuildInt64DictionaryArraySequential  414967413 ns  414956560 ns          2   616.932MB/s
BuildInt64DictionaryArraySimilar     586722976 ns  586708695 ns          1   436.332MB/s
BuildStringDictionaryArray           889535831 ns  889523296 ns          1   383.953MB/s
ArrayDataConstructDestruct               49797 ns      49797 ns      14176

@fsaintjacques these benchmarks seem like they (?) aren't being run for enough iterations to yield a precise result. The changes here seem to have resulted in this change to not run the benchmarks for many iterations:

20961b7#diff-4a540ff83d04bf9d4c88c81b2ada6c7fR352

wesm

+1

pitrou reviewed May 20, 2019

View reviewed changes

emkornfield force-pushed the use_asan branch from 3097d11 to 1470904 Compare May 26, 2019 05:13

emkornfield commented May 26, 2019

View reviewed changes

pitrou reviewed May 27, 2019

View reviewed changes

emkornfield force-pushed the use_asan branch from 91b83b0 to 7207222 Compare May 28, 2019 05:29

emkornfield commented May 28, 2019

View reviewed changes

cpp/src/arrow/util/ubsan.h Outdated Show resolved Hide resolved

pitrou reviewed May 28, 2019

View reviewed changes

cpp/src/parquet/schema.cc Outdated Show resolved Hide resolved

emkornfield force-pushed the use_asan branch from 1f56f0c to dba2b4e Compare May 29, 2019 05:48

emkornfield changed the title ~~ARROW-5365: [WIP][C++][CI] Enable ASAN/UBSAN in CI~~ ARROW-5365: [C++][CI] Enable ASAN/UBSAN in CI May 29, 2019

emkornfield force-pushed the use_asan branch from dba2b4e to f39558c Compare May 31, 2019 12:22

emkornfield added 3 commits May 31, 2019 20:21

ubsan stuff

05467be

add back asan

dc300d4

handle signed and unsigned enum loading in parquet to support windows

19ee908

emkornfield force-pushed the use_asan branch from f39558c to 19ee908 Compare June 1, 2019 06:56

fix io/memory.cc asan

bba5824

wesm approved these changes Jun 3, 2019

View reviewed changes

wesm closed this in cdedd85 Jun 3, 2019

asfimport mentioned this pull request Jun 3, 2019

[C++][CI] Add UBSan and ASAN into CI #21824

Closed

ARROW-5365: [C++][CI] Enable ASAN/UBSAN in CI #4347

ARROW-5365: [C++][CI] Enable ASAN/UBSAN in CI #4347

Uh oh!

Conversation

emkornfield commented May 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emkornfield commented May 20, 2019

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou commented May 20, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

emkornfield commented May 28, 2019

Uh oh!

Uh oh!

Uh oh!

pitrou commented May 28, 2019

Uh oh!

fsaintjacques commented May 31, 2019

Uh oh!

ursabot commented May 31, 2019

Uh oh!

ursabot commented May 31, 2019

Uh oh!

kszucs commented May 31, 2019

Uh oh!

fsaintjacques commented May 31, 2019

Uh oh!

emkornfield commented May 31, 2019

Uh oh!

fsaintjacques commented May 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fsaintjacques commented May 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fsaintjacques commented May 31, 2019

Uh oh!

fsaintjacques commented May 31, 2019

Uh oh!

emkornfield commented May 31, 2019

Uh oh!

fsaintjacques commented May 31, 2019

emkornfield commented May 20, 2019 •

edited

Loading

fsaintjacques commented May 31, 2019 •

edited

Loading

fsaintjacques commented May 31, 2019 •

edited

Loading

wesm commented Jun 3, 2019 •

edited

Loading