Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ matrix:
- ARROW_TRAVIS_VERBOSE=1
- ARROW_TRAVIS_USE_SYSTEM_JAVA=1
- ARROW_BUILD_WARNING_LEVEL=CHECKIN
- ARROW_USE_ASAN=1
- ARROW_USE_UBSAN=1
- CC="clang-7"
- CXX="clang++-7"
before_script:
Expand Down
9 changes: 9 additions & 0 deletions ci/travis_before_script_cpp.sh
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,15 @@ if [ "$ARROW_TRAVIS_VALGRIND" == "1" ]; then
CMAKE_COMMON_FLAGS="$CMAKE_COMMON_FLAGS -DARROW_TEST_MEMCHECK=ON"
fi

if [ "$ARROW_USE_ASAN" == "1" ]; then
CMAKE_COMMON_FLAGS="$CMAKE_COMMON_FLAGS -DARROW_USE_ASAN=ON"
fi

if [ "$ARROW_USE_UBSAN" == "1" ]; then
CMAKE_COMMON_FLAGS="$CMAKE_COMMON_FLAGS -DARROW_USE_UBSAN=ON"
fi


if [ "$ARROW_TRAVIS_COVERAGE" == "1" ]; then
CMAKE_COMMON_FLAGS="$CMAKE_COMMON_FLAGS -DARROW_GENERATE_COVERAGE=ON"
fi
Expand Down
2 changes: 2 additions & 0 deletions cpp/cmake_modules/DefineOptions.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,8 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_CURRENT_SOURCE_DIR}")

define_option(ARROW_USE_TSAN "Enable Thread Sanitizer checks" OFF)

define_option(ARROW_USE_UBSAN "Enable Undefined Behavior sanitizer checks" OFF)

#----------------------------------------------------------------------
set_option_category("Project component")

Expand Down
3 changes: 2 additions & 1 deletion cpp/cmake_modules/san-config.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ endif()
# - disable 'vptr' because it currently crashes somewhere in boost::intrusive::list code
# - disable 'alignment' because unaligned access is really OK on Nehalem and we do it
# all over the place.
# - disable 'function' because it appears to give a false positive https://github.com/google/sanitizers/issues/911
if(${ARROW_USE_UBSAN})
if(NOT
(("${COMPILER_FAMILY}" STREQUAL "clang")
Expand All @@ -45,7 +46,7 @@ if(${ARROW_USE_UBSAN})
endif()
set(
CMAKE_CXX_FLAGS
"${CMAKE_CXX_FLAGS} -fsanitize=undefined -fno-sanitize=alignment,vptr -fno-sanitize-recover=all"
"${CMAKE_CXX_FLAGS} -fsanitize=undefined -fno-sanitize=alignment,vptr,function -fno-sanitize-recover=all"
)
endif()

Expand Down
9 changes: 7 additions & 2 deletions cpp/src/arrow/array/builder_binary.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,10 @@ class ARROW_EXPORT BinaryBuilder : public ArrayBuilder {
Status Append(const uint8_t* value, int32_t length) {
ARROW_RETURN_NOT_OK(Reserve(1));
ARROW_RETURN_NOT_OK(AppendNextOffset());
ARROW_RETURN_NOT_OK(value_data_builder_.Append(value, length));
// Safety check for UBSAN.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add this to all Append methods which take variable-length data? Perhaps instead we want to ensure we never pass a null pointer here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. The never passing a nullptr is kind of the approach I've taken in other places (but they feel a little hacky). Please let me know if those look OK, and I can take a similar approach with these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run the builder benchmarks before and after the change? If there's no significant difference, then looks ok to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance yet. Should get to this in the next few days. I will ping you when I do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running benchmarks, and I don't trust them on my box. The biggest thing that seemed to change was stdev (without corresponding changes to the actual benchmark). Are the benchmarks setup yet on Ursa's hardware, can we use those? Otherwise I will go through and find all instances of where we pass through nullptr as a precaution (or we could turn off the check).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4285 once this is merged, we can call @ursabot to the rescue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emkornfield stddev is irrelevant unless it becomes too large. If the min or average numbers didn't change significantly, then I'd say things are good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ursabot benchmark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fsaintjacques I'm not seeing an update from Ursa bot should I try to squash/force push?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only works with global comments it seems.

if (ARROW_PREDICT_TRUE(length > 0)) {
ARROW_RETURN_NOT_OK(value_data_builder_.Append(value, length));
}

UnsafeAppendToBitmap(true);
return Status::OK();
Expand Down Expand Up @@ -246,7 +249,9 @@ class ARROW_EXPORT FixedSizeBinaryBuilder : public ArrayBuilder {

void UnsafeAppend(const uint8_t* value) {
UnsafeAppendToBitmap(true);
byte_builder_.UnsafeAppend(value, byte_width_);
if (ARROW_PREDICT_TRUE(byte_width_ > 0)) {
byte_builder_.UnsafeAppend(value, byte_width_);
}
}

void UnsafeAppend(util::string_view value) {
Expand Down
8 changes: 7 additions & 1 deletion cpp/src/arrow/buffer-builder.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
#include "arrow/status.h"
#include "arrow/util/bit-util.h"
#include "arrow/util/macros.h"
#include "arrow/util/ubsan.h"
#include "arrow/util/visibility.h"

namespace arrow {
Expand All @@ -42,7 +43,12 @@ namespace arrow {
class ARROW_EXPORT BufferBuilder {
public:
explicit BufferBuilder(MemoryPool* pool ARROW_MEMORY_POOL_DEFAULT)
: pool_(pool), data_(NULLPTR), capacity_(0), size_(0) {}
: pool_(pool),
data_(/*ensure never null to make ubsan happy and avoid check penalties below*/
&util::internal::non_null_filler),

capacity_(0),
size_(0) {}

/// \brief Resize the buffer to the nearest multiple of 64 bytes
///
Expand Down
8 changes: 5 additions & 3 deletions cpp/src/arrow/io/memory.cc
Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,11 @@ Status BufferOutputStream::Write(const void* data, int64_t nbytes) {
return Status::IOError("OutputStream is closed");
}
DCHECK(buffer_);
RETURN_NOT_OK(Reserve(nbytes));
memcpy(mutable_data_ + position_, data, nbytes);
position_ += nbytes;
if (ARROW_PREDICT_TRUE(nbytes > 0)) {
RETURN_NOT_OK(Reserve(nbytes));
memcpy(mutable_data_ + position_, data, nbytes);
position_ += nbytes;
}
return Status::OK();
}

Expand Down
8 changes: 6 additions & 2 deletions cpp/src/arrow/ipc/json-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,9 @@ void TestSchemaRoundTrip(const Schema& schema) {
std::string json_schema = sb.GetString();

rj::Document d;
d.Parse(json_schema);
// Pass explicit size to avoid ASAN issues with
// SIMD loads in RapidJson.
d.Parse(json_schema.data(), json_schema.size());

DictionaryMemo in_memo;
std::shared_ptr<Schema> out;
Expand All @@ -83,7 +85,9 @@ void TestArrayRoundTrip(const Array& array) {
std::string array_as_json = sb.GetString();

rj::Document d;
d.Parse(array_as_json);
// Pass explicit size to avoid ASAN issues with
// SIMD loads in RapidJson.
d.Parse(array_as_json.data(), array_as_json.size());

if (d.HasParseError()) {
FAIL() << "JSON parsing failed";
Expand Down
22 changes: 14 additions & 8 deletions cpp/src/arrow/ipc/metadata-internal.cc
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
#include "arrow/util/checked_cast.h"
#include "arrow/util/key_value_metadata.h"
#include "arrow/util/logging.h"
#include "arrow/util/ubsan.h"
#include "arrow/visitor_inline.h"

namespace arrow {
Expand Down Expand Up @@ -630,7 +631,8 @@ class FieldToFlatbufferVisitor {
type_ids.push_back(code);
}

auto fb_type_ids = fbb_.CreateVector(type_ids);
auto fb_type_ids =
fbb_.CreateVector(util::MakeNonNull(type_ids.data()), type_ids.size());

type_offset_ = flatbuf::CreateUnion(fbb_, mode, fb_type_ids).Union();
return Status::OK();
Expand All @@ -653,7 +655,8 @@ class FieldToFlatbufferVisitor {
Status GetResult(const std::shared_ptr<Field>& field, FieldOffset* offset) {
auto fb_name = fbb_.CreateString(field->name());
RETURN_NOT_OK(VisitType(*field->type()));
auto fb_children = fbb_.CreateVector(children_);
auto fb_children =
fbb_.CreateVector(util::MakeNonNull(children_.data()), children_.size());

DictionaryOffset dictionary = 0;
if (field->type()->id() == Type::DICTIONARY) {
Expand Down Expand Up @@ -799,7 +802,7 @@ static Status WriteFieldNodes(FBB& fbb, const std::vector<FieldMetadata>& nodes,
}
fb_nodes.emplace_back(node.length, node.null_count);
}
*out = fbb.CreateVectorOfStructs(fb_nodes);
*out = fbb.CreateVectorOfStructs(util::MakeNonNull(fb_nodes.data()), fb_nodes.size());
return Status::OK();
}

Expand All @@ -812,7 +815,9 @@ static Status WriteBuffers(FBB& fbb, const std::vector<BufferMetadata>& buffers,
const BufferMetadata& buffer = buffers[i];
fb_buffers.emplace_back(buffer.offset, buffer.length);
}
*out = fbb.CreateVectorOfStructs(fb_buffers);
*out =
fbb.CreateVectorOfStructs(util::MakeNonNull(fb_buffers.data()), fb_buffers.size());

return Status::OK();
}

Expand Down Expand Up @@ -871,9 +876,11 @@ Status WriteTensorMessage(const Tensor& tensor, int64_t buffer_start_offset,
dims.push_back(flatbuf::CreateTensorDim(fbb, tensor.shape()[i], name));
}

auto fb_shape = fbb.CreateVector(dims);
auto fb_strides = fbb.CreateVector(tensor.strides());
auto fb_shape = fbb.CreateVector(util::MakeNonNull(dims.data()), dims.size());

flatbuffers::Offset<flatbuffers::Vector<int64_t>> fb_strides;
fb_strides = fbb.CreateVector(util::MakeNonNull(tensor.strides().data()),
tensor.strides().size());
int64_t body_length = tensor.size() * elem_size;
flatbuf::Buffer buffer(buffer_start_offset, body_length);

Expand Down Expand Up @@ -1004,7 +1011,7 @@ FileBlocksToFlatbuffer(FBB& fbb, const std::vector<FileBlock>& blocks) {
fb_blocks.emplace_back(block.offset, block.metadata_length, block.body_length);
}

return fbb.CreateVectorOfStructs(fb_blocks);
return fbb.CreateVectorOfStructs(util::MakeNonNull(fb_blocks.data()), fb_blocks.size());
}

Status WriteFileFooter(const Schema& schema, const std::vector<FileBlock>& dictionaries,
Expand Down Expand Up @@ -1035,7 +1042,6 @@ Status WriteFileFooter(const Schema& schema, const std::vector<FileBlock>& dicti

auto footer = flatbuf::CreateFooter(fbb, kCurrentMetadataVersion, fb_schema,
fb_dictionaries, fb_record_batches);

fbb.Finish(footer);

int32_t size = fbb.GetSize();
Expand Down
4 changes: 3 additions & 1 deletion cpp/src/arrow/json/test-common.h
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,9 @@ inline static Status ParseFromString(ParseOptions options, string_view src_str,

std::string PrettyPrint(string_view one_line) {
rj::Document document;
document.Parse(one_line.data());

// Must pass size to avoid ASAN issues.
document.Parse(one_line.data(), one_line.size());
rj::StringBuffer sb;
rj::PrettyWriter<rj::StringBuffer> writer(sb);
document.Accept(writer);
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/arrow/util/rle-encoding.h
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,7 @@ inline int RleDecoder::GetBatchWithDictSpaced(const T* dictionary, T* values,
int values_read = 0;
int remaining_nulls = null_count;

internal::BitmapReader bit_reader(valid_bits, valid_bits_offset, batch_size);
arrow::internal::BitmapReader bit_reader(valid_bits, valid_bits_offset, batch_size);

while (values_read < batch_size) {
bool is_valid = bit_reader.IsSet();
Expand Down
28 changes: 28 additions & 0 deletions cpp/src/arrow/util/ubsan.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#include "arrow/util/ubsan.h"

namespace arrow {
namespace util {

namespace internal {

uint8_t non_null_filler = 0xFF;

} // namespace internal
} // namespace util
} // namespace arrow
53 changes: 53 additions & 0 deletions cpp/src/arrow/util/ubsan.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

// Contains utilities for making UBSan happy.

#pragma once

#include <memory>

#include "arrow/util/macros.h"

namespace arrow {
namespace util {

namespace internal {

static uint8_t non_null_filler;

} // namespace internal

/// \brief Returns maybe_null if not null or a non-null pointer to an arbitrary memory
/// that shouldn't be dereferenced.
///
/// Memset/Memcpy are undefinfed when a nullptr is passed as an argument use this utility
/// method to wrap locations where this could happen.
///
/// Note: Flatbuffers has UBSan warnings if a zero length vector is passed.
/// https://github.com/google/flatbuffers/pull/5355 is trying to resolve them.
template <typename T>
inline T* MakeNonNull(T* maybe_null) {
if (ARROW_PREDICT_TRUE(maybe_null != NULLPTR)) {
return maybe_null;
}

return reinterpret_cast<T*>(&internal::non_null_filler);
}

} // namespace util
} // namespace arrow
2 changes: 2 additions & 0 deletions cpp/src/parquet/metadata.cc
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ std::shared_ptr<Statistics> MakeColumnStats(const format::ColumnMetaData& meta_d
return MakeTypedColumnStats<ByteArrayType>(meta_data, descr);
case Type::FIXED_LEN_BYTE_ARRAY:
return MakeTypedColumnStats<FLBAType>(meta_data, descr);
case Type::UNDEFINED:
break;
}
throw ParquetException("Can't decode page statistics for selected column type");
}
Expand Down
Loading