-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](new_json_reader)fix new_json_reader core #41290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
eldenmoon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
|
TeamCity be ut coverage result: |
| } | ||
|
|
||
| if (!objectValue->IsObject()) { | ||
| return Status::DataQualityError("JSON data is not an object."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add first n bytes of data in error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this issue only occur in the TVF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes , only in select tvf . streamload will return false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by streamload will return false? I'm concerned that this change might affect other load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function just for scanner to read json file when in tvf , other load will not call this function
just with VFileScanner::_get_block_wrapped --> NewJsonReader::get_next_block --> NewJsonReader::_read_json_column --> NewJsonReader::_vhandle_simple_json --> NewJsonReader::_set_column_value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and I think I could print origin type instead of n bytes...
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
TeamCity be ut coverage result: |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
TeamCity be ut coverage result: |
|
run cloud_p0 |
eldenmoon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
TeamCity be ut coverage result: |
xiaokang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
eldenmoon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if we put not json object or json array in json file with select tvf , here will meet core like : ``` SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory] [WARNING!] /sys/kernel/mm/transparent_hugepage/enabled: [always] madvise never, Doris not recommend turning on THP, which may cause the BE process to use more memory and cannot be freed in time. Turn off THP: `echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled` start BE in local mode doris_be: /mnt/disk1/wangqiannan/amory/doris/thirdparty/installed/include/rapidjson/document.h:1178:SizeType rapidjson::GenericValue<rapidjson::UTF8<>>::MemberCount() const [Encoding = rapidjson::UTF8<>, Allocator = rapidjson::MemoryPoolAllocator<>]: 假设 ‘IsObject()’ 失 败。 *** Query id: 0-0 *** *** is nereids: 0 *** *** tablet id: 0 *** *** Aborted at 1727253403 (unix time) try "date -d @1727253403" if you are using GNU date *** *** Current BE git commitID: daf0fa7 *** *** SIGABRT unknown detail explain (@0x461000ac22f) received by PID 705071 (TID 711574 OR 0x7f35fb40a700) from PID 705071; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/disk1/wangqiannan/amory/doris/be/src/common/signal_handler.h:421 1# 0x00007F3A3902EB50 in /lib64/libc.so.6 2# gsignal in /lib64/libc.so.6 3# __GI_abort in /lib64/libc.so.6 4# _nl_load_domain.cold.0 in /lib64/libc.so.6 5# 0x00007F3A39027426 in /lib64/libc.so.6 6# rapidjson::GenericValue<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >::MemberCount() const at /mnt/disk1/wangqiannan/amory/doris/thirdparty/installed/include/rapidjson/document.h:1178 7# doris::vectorized::NewJsonReader::get_parsed_schema(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, std::vector<doris::TypeDescriptor, std::allocator<doris::TypeDescriptor> >*) at /mnt/disk1/wangqiannan/amory/doris/be/src/vec/exec/format/json/new_json_reader.cpp:347 8# doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0::operator()() const at /mnt/disk1/wangqiannan/amory/doris/be/src/service/internal_service.cpp:832 9# void std::__invoke_impl<void, doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0&>(std::__invoke_other, doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61 10# std::enable_if<is_invocable_r_v<void, doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0&>, void>::type std::__invoke_r<void, doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0&>(doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:117 11# std::_Function_handler<void (), doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291 12# std::function<void ()>::operator()() const at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560 13# doris::WorkThreadPool<false>::work_thread(int) at /mnt/disk1/wangqiannan/amory/doris/be/src/util/work_thread_pool.hpp:158 14# void std::__invoke_impl<void, void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&>(std::__invoke_memfun_deref, void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:74 15# std::__invoke_result<void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&>::type std::__invoke<void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&>(void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96 16# decltype (std::__invoke((*this)._M_pmf, std::forward<doris::WorkThreadPool<false>*&>({parm#1}), std::forward<int&>({parm#1}))) std::_Mem_fn_base<void (doris::WorkThreadPool<false>::*)(int), true>::operator()<doris::WorkThreadPool<false>*&, int&>(doris::WorkThreadPool<false>*&, int&) const at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:131 17# void std::__invoke_impl<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&>(std::__invoke_other, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61 18# std::enable_if<is_invocable_r_v<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&>, void>::type std::__invoke_r<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&>(std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:117 19# void std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:570 ```
if we put not json object or json array in json file with select tvf , here will meet core like : ``` SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory] [WARNING!] /sys/kernel/mm/transparent_hugepage/enabled: [always] madvise never, Doris not recommend turning on THP, which may cause the BE process to use more memory and cannot be freed in time. Turn off THP: `echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled` start BE in local mode doris_be: /mnt/disk1/wangqiannan/amory/doris/thirdparty/installed/include/rapidjson/document.h:1178:SizeType rapidjson::GenericValue<rapidjson::UTF8<>>::MemberCount() const [Encoding = rapidjson::UTF8<>, Allocator = rapidjson::MemoryPoolAllocator<>]: 假设 ‘IsObject()’ 失 败。 *** Query id: 0-0 *** *** is nereids: 0 *** *** tablet id: 0 *** *** Aborted at 1727253403 (unix time) try "date -d @1727253403" if you are using GNU date *** *** Current BE git commitID: daf0fa7 *** *** SIGABRT unknown detail explain (@0x461000ac22f) received by PID 705071 (TID 711574 OR 0x7f35fb40a700) from PID 705071; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/disk1/wangqiannan/amory/doris/be/src/common/signal_handler.h:421 1# 0x00007F3A3902EB50 in /lib64/libc.so.6 2# gsignal in /lib64/libc.so.6 3# __GI_abort in /lib64/libc.so.6 4# _nl_load_domain.cold.0 in /lib64/libc.so.6 5# 0x00007F3A39027426 in /lib64/libc.so.6 6# rapidjson::GenericValue<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >::MemberCount() const at /mnt/disk1/wangqiannan/amory/doris/thirdparty/installed/include/rapidjson/document.h:1178 7# doris::vectorized::NewJsonReader::get_parsed_schema(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, std::vector<doris::TypeDescriptor, std::allocator<doris::TypeDescriptor> >*) at /mnt/disk1/wangqiannan/amory/doris/be/src/vec/exec/format/json/new_json_reader.cpp:347 8# doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0::operator()() const at /mnt/disk1/wangqiannan/amory/doris/be/src/service/internal_service.cpp:832 9# void std::__invoke_impl<void, doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0&>(std::__invoke_other, doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61 10# std::enable_if<is_invocable_r_v<void, doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0&>, void>::type std::__invoke_r<void, doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0&>(doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:117 11# std::_Function_handler<void (), doris::PInternalService::fetch_table_schema(google::protobuf::RpcController*, doris::PFetchTableSchemaRequest const*, doris::PFetchTableSchemaResult*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291 12# std::function<void ()>::operator()() const at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560 13# doris::WorkThreadPool<false>::work_thread(int) at /mnt/disk1/wangqiannan/amory/doris/be/src/util/work_thread_pool.hpp:158 14# void std::__invoke_impl<void, void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&>(std::__invoke_memfun_deref, void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:74 15# std::__invoke_result<void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&>::type std::__invoke<void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&>(void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96 16# decltype (std::__invoke((*this)._M_pmf, std::forward<doris::WorkThreadPool<false>*&>({parm#1}), std::forward<int&>({parm#1}))) std::_Mem_fn_base<void (doris::WorkThreadPool<false>::*)(int), true>::operator()<doris::WorkThreadPool<false>*&, int&>(doris::WorkThreadPool<false>*&, int&) const at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:131 17# void std::__invoke_impl<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&>(std::__invoke_other, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61 18# std::enable_if<is_invocable_r_v<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&>, void>::type std::__invoke_r<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&>(std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:117 19# void std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) at /mnt/disk1/wangqiannan/tool/ldb_toolchain_16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:570 ```
Cherry-picked from #41290 Co-authored-by: amory <wangqiannan@selectdb.com>
Cherry-picked from #41290 Co-authored-by: amory <wangqiannan@selectdb.com>
Proposed changes
if we put not json object or json array in json file with select tvf , here will meet core like :
Issue Number: close #xxx