Skip to content

Conversation

@cambyzju
Copy link
Contributor

@cambyzju cambyzju commented Feb 2, 2024

Proposed changes

While read orc data through multi-catalog, if column type in meta is string, but the real data type is INT or Timestamp or others, it may make be crash:

(gdb) bt
#0  0x00007f73c30c2387 in raise () from /lib64/libc.so.6
#1  0x00007f73c30c3a78 in abort () from /lib64/libc.so.6
#2  0x000056552fad78b9 in ?? ()
#3  0x000056552faccecd in google::LogMessage::Fail() ()
#4  0x000056552facf409 in google::LogMessage::SendToLog() ()
#5  0x000056552facca36 in google::LogMessage::Flush() ()
#6  0x000056552facfa79 in google::LogMessageFatal::~LogMessageFatal() ()
#7  0x000056552b76b115 in doris::vectorized::ColumnString::check_chars_length (element_number=<optimized out>, total_length=<optimized out>, 
    this=<optimized out>) at /data/doris-1.x/be/src/vec/columns/column_string.h:66
#8  doris::vectorized::ColumnString::insert_many_strings (this=<optimized out>, strings=<optimized out>, num=4064)
    at /data/doris-1.x/be/src/vec/columns/column_string.h:270
#9  0x000056552f66548c in doris::vectorized::OrcReader::_decode_string_column (this=this@entry=0x7f726d3b6a00, col_name=..., data_column=..., 
    type_kind=@0x7f67b47bc0e0: orc::TIMESTAMP, cvb=cvb@entry=0x7f6d98aadd60, num_values=4064)
    at /data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:649
#10 0x000056552f670300 in doris::vectorized::OrcReader::_orc_column_to_doris_column (this=this@entry=0x7f726d3b6a00, col_name=..., doris_column=..., 
    data_type=..., orc_column_type=0x7f6d96907440, cvb=0x7f6d98aadd60, num_values=4064) at /data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:738
#11 0x000056552f672d50 in doris::vectorized::OrcReader::get_next_block (this=0x7f726d3b6a00, block=0x7f6d7fe59b20, read_rows=0x7f67b47bc318, 
    eof=<optimized out>) at /data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:788
#12 0x000056552f633ad8 in doris::vectorized::VFileScanner::_get_block_impl (this=0x7f6f051b3a00, state=<optimized out>, block=0x7f6d7fe59b20, 
    eof=0x7f67b47bc539) at /data/doris-1.x/be/src/vec/exec/scan/vfile_scanner.cpp:155
#13 0x000056552f5ff329 in doris::vectorized::VScanner::get_block (this=this@entry=0x7f6f051b3a00, state=state@entry=0x7f6d70a61900, 
    block=block@entry=0x7f6d7fe59b20, eof=eof@entry=0x7f67b47bc539) at /data/doris-1.x/be/src/vec/exec/scan/vscanner.cpp:54
#14 0x000056552f5fc682 in doris::vectorized::ScannerScheduler::_scanner_scan (this=<optimized out>, scheduler=<optimized out>, ctx=0x7f6f007bdc00, 
    scanner=0x7f6f051b3a00) at /data/doris-1.x/be/src/vec/exec/scan/scanner_scheduler.cpp:247
#15 0x000056552a903b15 in std::function<void ()>::operator()() const (this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560
#16 doris::FunctionRunnable::run (this=<optimized out>) at /data/doris-1.x/be/src/util/threadpool.cpp:46
#17 doris::ThreadPool::dispatch_thread (this=0x7f739ad6b180) at /data/doris-1.x/be/src/util/threadpool.cpp:535
#18 0x000056552a8f8f6f in std::function<void ()>::operator()() const (this=0x7f7267d7da38) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560
#19 doris::Thread::supervise_thread (arg=0x7f7267d7da20) at /data/doris-1.x/be/src/util/thread.cpp:454
#20 0x00007f73c2e77ea5 in start_thread () from /lib64/libpthread.so.0
#21 0x00007f73c318a9fd in ioperm () from /lib64/libc.so.6
#22 0x0000000000000000 in ?? ()

Here we expect orc::StringVectorBatch*, but the real type is orc::TimestampVectorBatch*:
image

In the code, we already try to check the cast result:

    auto* data = down_cast<orc::StringVectorBatch*>(cvb);
    if (data == nullptr) {
        return Status::InternalError("Wrong data type for colum '{}'", col_name);
    }

Unfortunately down_cast equals to static_cast, because assert do not work in release build type.

template <typename To, typename From> // use like this: down_cast<T*>(foo);
inline To down_cast(From* f) {        // so we only accept pointers
    // Ensures that To is a sub-type of From *.  This test is here only
    // for compile-time type checking, and has no overhead in an
    // optimized build at run-time, as it will be optimized away
    // completely.

    // TODO(user): This should use COMPILE_ASSERT.
    if (false) {
        ::implicit_cast<From*, To>(NULL);
    }

    // uses RTTI in dbg and fastbuild. asserts are disabled in opt builds.
    assert(f == NULL || dynamic_cast<To>(f) != NULL);
    return static_cast<To>(f);
}

While we build release version, we define NDEBUG to turn off assert

# For CMAKE_BUILD_TYPE=Release
#   -O3: Enable all compiler optimizations
#   -DNDEBUG: Turn off dchecks/asserts/debug only code.
set(CXX_FLAGS_RELEASE "${CXX_GCC_FLAGS} -O3 -DNDEBUG")

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Copy link
Contributor

@lide-reed lide-reed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lide-reed lide-reed merged commit d9539d4 into apache:branch-1.2-lts Feb 2, 2024
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 2, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2024

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2024

PR approved by anyone and no changes requested.

@cambyzju
Copy link
Contributor Author

cambyzju commented Feb 4, 2024

#23068 for master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/vectorization reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants