Skip to content

[Bug]Using Broker load with ORC file may cause BE crash #4349

@WingsGo

Description

@WingsGo

Describe the bug
I meet a same situation in #3840 , and I find the follwing stracktrace like following, the reason why BE crash is we didn't catch a Exception orc::TimezoneError. But we didn't open coredump, so I have to look the source code to find the reason. by the way, I think open coredump it important, it will save much of time.OvO~~.

4002 terminate called after throwing an instance of 'orc::TimezoneError'
4003 rc::TimezoneErro 
4004   what():  Can't open /usr/share/zoneinfo/GMT+08:00
4005 *** Aborted at 1597225209 (unix time) try "date -d @1597225209" if you are using GNU date ***
4006 PC: @     0x7f9cd9e1a1f7 __GI_raise
4007 *** SIGABRT (@0x1ac58) received by PID 109656 (TID 0x7f9c0202a700) from PID 109656; stack trace: ***
4008     @     0x7f9cd9e1a270 (unknown)
4009     @     0x7f9cd9e1a1f7 __GI_raise
4010     @     0x7f9cd9e1b8e8 __GI_abort
4011     @          0x2f21645 __gnu_cxx::__verbose_terminate_handler()
4012     @          0x2e8d706 __cxxabiv1::__terminate()
4013     @          0x2e8d751 std::terminate()
4014     @          0x2ed1c6e execute_native_thread_routine
4015     @     0x7f9cd9bd0e25 start_thread
4016     @     0x7f9cd9edd34d __clone 

I go to ORC's source code I found that When writing timestamps, the ORC library now records the time zone in the stripe footer. So in orc's Reader.hh file we use RowReaderImpl::next to get the data from orc, and the function is called by us in

https://github.com/apache/incubator-doris/blob/d6028863f3e9d8f401f1dea34a119e48fd21c7fe/be/src/exec/orc_scanner.cpp#L163

but the function will call startNextStripe() in RowReaderImpl::next , in startNextStripe() function it will judge whether the orc file has writerTimezone in stripe footer, the relate code is in Reader.cc , line 829: const Tinezone& writerTimezone = currentStripeFooter.has_writertimezone() ? getTimezoneByName(currentStripeFooter.writertimezone()) : localTimezone;, so, if the orc file has_writertimezone(), the function will call getTimezoneByName internally.

In getTimezoneByName, it will call getTimezoneByFilename, the function will open file in /usr/share/zoneinfo to get specify timezone, if not found, will Throw a orc::ParseError, the error is cause by FileInputStream's constructor(In OrcFile.cc, line 51), after catch the orc::ParserError, it will throw anothor error, the relate code is in Timezone.cc line 689

try {
} catch (ParseError& err) {
    throw TimezoneError(err.what());
}

Now, be's crash reason is clear, if BE's machine have no relate zoneinfo file, it will throw a orc::TimezoneError, it we forget to catch it , be will crash , the function call stracktrace is:

throw a orc::TimezoneError
throw a orc::ParseError
FileInputStream::FileInputStream
orc::readLocalFile()
readFile()
Timezone::getTimezoneByFilename() 
Timezone::getTimezoneByName() 
RowReaderImpl::startNextStripe()
RowReaderImpl::next()
RowReader::next() 

Expected behavior
BE not crash

Solutions
when call reader->next(), we should catch the orc::TimezoneError exception and return an InternalError to users to avoid be crash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions