-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Feature](Dynamic schema table) step1 support schema change expression #17494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c62c45f to
a955e85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
run buildall |
a955e85 to
1dcc39b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
run buildall |
gensrc/thrift/Exprs.thrift
Outdated
| 29: optional bool is_nullable | ||
|
|
||
| 30: optional TJsonLiteral json_literal | ||
| 31: optional TSchemaChangeInfo schema_change_info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It't somewhat strange that Expr has schema_change_info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to TSchemaChangeExpr
fe/fe-core/src/main/java/org/apache/doris/analysis/SchemaChangeExpr.java
Outdated
Show resolved
Hide resolved
| msg.node_type = TExprNodeType.SCHEMA_CHANGE_EXPR; | ||
| // set type info | ||
| TTypeDesc desc = new TTypeDesc(); | ||
| desc.setTypes(new ArrayList<TTypeNode>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the function of setTypes to an emply list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variant.toThrift(desc) will add TTypeNode to the empty list
fe/fe-core/src/main/java/org/apache/doris/load/loadv2/LoadingTaskPlanner.java
Show resolved
Hide resolved
| } | ||
| size_t operator()(const Int64& x) { | ||
| // Only support Int32 and Int64 | ||
| // Only Int64 at present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why delete support for Int32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since int32 is not enough to hold values when user first load int value then load bigint value, the bigint values will be overflow
| column_object.finalize(); | ||
| // flatten object columns for the purpose of extracting static columns and | ||
| // fill default values missing in static columns | ||
| schema_util::unfold_object(columns.size() - 1, columns, _slot_desc_index, slot_descs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no flatten_object before, why add unfold_object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flatten renamed to unfold
1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
ac1fcb0 to
fd66af8
Compare
|
run buildall |
|
run p0 |
xiaokang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by anyone and no changes requested. |
qidaye
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* [WIP](dynamic-table) support dynamic schema table (apache#16335) Issue Number: close apache#16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically. * [improve](dynamic-table) change `addColumns` RPC interface fields from `required` to `optional` and and config doc (apache#16632) * [doc](dynamic-table) Add docs for dynamic-table (apache#16669) * [improve](dynamic table) refine SegmentWriter columns writer generate (apache#16816) * [improve](dynamic table) refine SegmentWriter columns writer generate ``` Dynamic Block consists of two parts, dynamic part of columns and static part of columns static dynamic | ----- | ------- | the static ones are original _tablet_schame columns the dynamic ones are auto generated and extended from file scan. ``` **We should only consisder to use Block info to generte columns when it's a dynamic table load procudure.** And seperate the static ones and dynamic ones * test * [typo](docs)fix dynamic Table version label (apache#16895) * [Feature](Dynamic schema table) step1 support schema change expression (apache#17494) 1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility * Fix compile * [Bug](dynamic-table) Fix column alignment logic and support filtering null values when slot is not null Before this PR when encountering null values with some columns which is specified as `NOT NULL`, null values will not be filtered,thi behavior does not match with the original load behavior. Second column alignment logic has bug : ``` template <typename ColumnInserterFn> void align_variant_by_name_and_type(ColumnObject& dst, const ColumnObject& src, size_t row_cnt, ColumnInserterFn inserter) { CHECK(dst.is_finalized() && src.is_finalized()); // Use rows() here instead of size(), since size() will check_consistency // but we could not check_consistency since num_rows will be upgraded even // if src and dst is empty, we just increase the num_rows of dst and fill // num_rows of default values when meet new data size_t num_rows = dst.rows(); ``` --------- Co-authored-by: jiafeng.zhang <zhangjf1@gmail.com>
apache#17494) 1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
VARIANTto encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columnsSchemaChangeExprfor doing schema change for extensibilityProposed changes
Issue Number: close #17493
Problem summary
Describe your changes.
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...