-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Feature][Variant] support support schema for inner sub types in variant type #40573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
TPC-H: Total hot run time: 38251 ms |
TPC-DS: Total hot run time: 192755 ms |
ClickBench: Total hot run time: 31.83 s |
|
run buildall |
|
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 38203 ms |
TPC-DS: Total hot run time: 191297 ms |
ClickBench: Total hot run time: 31.46 s |
9e8806d to
f04ceb0
Compare
|
run buildall |
|
TeamCity be ut coverage result: |
f04ceb0 to
0e0eb12
Compare
|
run buildall |
|
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 38429 ms |
TPC-DS: Total hot run time: 197213 ms |
ClickBench: Total hot run time: 31.33 s |
|
run buildall |
TPC-H: Total hot run time: 38214 ms |
TPC-DS: Total hot run time: 192847 ms |
|
TeamCity be ut coverage result: |
ClickBench: Total hot run time: 31.69 s |
83e2850 to
bc9c36f
Compare
|
run buildall |
| DCHECK_EQ(node.node_type, TExprNodeType::JSON_LITERAL); | ||
| DCHECK(node.__isset.json_literal); | ||
| JsonBinaryValue value(node.json_literal.value); | ||
| return String(value.value(), value.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe next PR
|
run buildall |
82ae21b to
98224ec
Compare
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
| // Currently the jsonb type should be the top level type, so we should not wrap it in array, | ||
| // see create_array_of_type. | ||
| // TODO we need to support array<jsonb> correctly | ||
| if (UNLIKELY(field.get_type_id() == TypeIndex::JSONB && info->num_dimensions > 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: boolean expression can be simplified by DeMorgan's theorem [readability-simplify-boolean-expr]
if (UNLIKELY(field.get_type_id() == TypeIndex::JSONB && info->num_dimensions > 0)) {
^Additional context
be/src/common/compiler_util.h:35: expanded from macro 'UNLIKELY'
#define UNLIKELY(expr) __builtin_expect(!!(expr), 0)
^| } | ||
|
|
||
| DataTypePtr DataTypeFactory::create_data_type(const TypeIndex& type_index, bool is_nullable) { | ||
| DataTypePtr DataTypeFactory::create_data_type(const TypeIndex& type_index, bool is_nullable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'create_data_type' exceeds recommended size/complexity thresholds [readability-function-size]
DataTypePtr DataTypeFactory::create_data_type(const TypeIndex& type_index, bool is_nullable,
^Additional context
be/src/vec/data_types/data_type_factory.cpp:250: 114 lines including whitespace and comments (threshold 80)
DataTypePtr DataTypeFactory::create_data_type(const TypeIndex& type_index, bool is_nullable,
^|
|
||
| #pragma once | ||
|
|
||
| #include <glog/logging.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'glog/logging.h' file not found [clang-diagnostic-error]
#include <glog/logging.h>
^|
TeamCity be ut coverage result: |
98224ec to
11d404d
Compare
|
run buildall |
|
TeamCity be ut coverage result: |
| validateNestedType(scalarType, fieldType); | ||
| if (!fieldNames.add(field.getName())) { | ||
| throw new AnalysisException("Duplicate field name " + field.getName() | ||
| + " in struct " + scalarType.toSql()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in variant
| } | ||
| column_with_cast_type.set_type(filed_type); | ||
| column_with_cast_type.set_precision_frac(cast_type.precision, cast_type.scale); | ||
| column_with_cast_type.set_is_decimal(cast_type.precision > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not rigorous to use cast_type.precision > 0 to judge is_decimal. You can use cast_type.is_decimal_v2_type() and cast_type.is_decimal_v3_type() or add a new is_decimal_type for cast_type.
| auto full_path = full_path_builder.append(parent_column->name_lower_case(), false) | ||
| .append(entry->path.get_parts(), false) | ||
| .build(); | ||
| if (typed_columns.contains(entry->path.get_path())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of typed_columns?
| Field get_type_field(const IColumn& column, int row) const override { | ||
| Field field; | ||
| column.get(row, field); | ||
| field.set_type_info(get_type_id(), 0, static_cast<int>(get_scale())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is precision == 0 right?
| assert_cast<const ColumnUInt64&, TypeCheckOnRelease::DISABLE>(column); | ||
| Field field; | ||
| column_data.get(row, field); | ||
| field.set_type_info(get_type_id(), 0, static_cast<int>(get_scale())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is precision == 0 right?
11d404d to
a267cd5
Compare
|
run buildall |
TPC-H: Total hot run time: 40205 ms |
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 197059 ms |
ClickBench: Total hot run time: 32.62 s |
|
run buildall |
9564428 to
bec313e
Compare
|
run buildall |
Background
Currently we support auto detect schema info from semi-structure json, but when we encounter something like '2021-01-01' we just store as string type and query like string type, if we use it as date type, it maybe slow.So in this PR, we support to specify subschemas for variant type.
Usage
Design
like struct, we store we predefined fields as static type and no need to do schema merge