Data export function, add export to specify certain columns#5689
Data export function, add export to specify certain columns#5689EmmyMiao87 merged 10 commits intoapache:masterfrom hf200012:dev
Conversation
… table Data export function, add certain columns that can be exported to the table
Modify the data export usage document
|
|
||
| package org.apache.doris.analysis; | ||
|
|
||
| import com.google.common.base.Splitter; |
There was a problem hiding this comment.
please pay attention to import sequence
| public static final String KEY_IN_PARAM_BACKEND_ID = "backend_id"; | ||
|
|
||
| //export | ||
| public static final String EXPORT_KEY_IN_PARAM_COLUMNS = "columns"; |
There was a problem hiding this comment.
use KEY_IN_PARAM_COLUMNS instead
| properties, ExportStmt.DEFAULT_COLUMN_SEPARATOR)); | ||
| this.lineDelimiter = Separator.convertSeparator(PropertyAnalyzer.analyzeLineDelimiter( | ||
| properties, ExportStmt.DEFAULT_LINE_DELIMITER)); | ||
| if(properties.containsKey(LoadStmt.EXPORT_KEY_IN_PARAM_COLUMNS)){ |
| slot.setIsMaterialized(true); | ||
| slot.setColumn(col); | ||
| slot.setIsNullable(col.isAllowNull()); | ||
| if(!this.exportColumns.isEmpty() && this.exportColumns.contains(col.getName().toLowerCase())) { |
| exportTupleDesc = desc.createTupleDescriptor(); | ||
| exportTupleDesc.setTable(exportTable); | ||
| exportTupleDesc.setRef(tableRef); | ||
| this.exportColumns = stmt.getColumns(); |
There was a problem hiding this comment.
please set it in public void setJob(ExportStmt stmt) throws UserException
|
|
||
| private void genExecFragment() throws UserException { | ||
| registerToDesc(); | ||
| private void genExecFragment(ExportStmt stmt) throws UserException { |
There was a problem hiding this comment.
| private void genExecFragment(ExportStmt stmt) throws UserException { | |
| private void genExecFragment() throws UserException { |
| } | ||
|
|
||
| private void registerToDesc() { | ||
| private void registerToDesc(ExportStmt stmt) { |
There was a problem hiding this comment.
| private void registerToDesc(ExportStmt stmt) { | |
| private void registerToDesc() { |
| this.tableId = exportTable.getId(); | ||
| this.tableName = stmt.getTblName(); | ||
| genExecFragment(); | ||
| genExecFragment(stmt); |
There was a problem hiding this comment.
| genExecFragment(stmt); | |
| genExecFragment(); |
| private OriginStatement origStmt; | ||
| protected Map<String, String> sessionVariables = Maps.newHashMap(); | ||
|
|
||
| private List<String> exportColumns ; |
There was a problem hiding this comment.
If you store columns as a separate attribute in the export job from properties, you need to consider persistence.
Either reload the columns attribute during replay.
Or just persist the columns object directly.
My suggestion is not to modify the persistence logic. Re-parse columns after persistence.
Code format modify
| for (Column col : exportTable.getBaseSchema()) { | ||
| if(!this.exportColumns.isEmpty() && this.exportColumns.contains(col.getName().toLowerCase())) { | ||
| String colName = col.getName().toLowerCase(); | ||
| if (!this.exportColumns.isEmpty() && this.exportColumns.contains(colName)) { |
There was a problem hiding this comment.
| if (!this.exportColumns.isEmpty() && this.exportColumns.contains(colName)) { | |
| if (this.exportColumns !=null && this.exportColumns.contains(colName)) { |
There may be a null problem in the modification
columns persistence
| ``` | ||
|
|
||
| * `column_separator`:列分隔符。默认为 `\t`。支持不可见字符,比如 '\x07'。 | ||
| * columns:要导出的列,使用英文状态逗号隔开,如果不填这个参数默认是导出表的所有列 |
| Text.writeString(out, exportPath); | ||
| Text.writeString(out, columnSeparator); | ||
| Text.writeString(out, lineDelimiter); | ||
| Text.writeString(out, columns); |
There was a problem hiding this comment.
In fact, there is no need to modify the logic here. You only need to initialize the columns after reading the properties.
Also, even if columns are to be persisted, they cannot actually be placed in this position.
modify columns Persistence
| import org.apache.doris.common.Pair; | ||
| import org.apache.doris.common.Status; | ||
| import org.apache.doris.common.UserException; | ||
| import org.apache.doris.common.*; |
code style
remove import *
| this.properties.put(propertyKey, propertyValue); | ||
| } | ||
| } | ||
| this.columns = this.properties.get(LoadStmt.KEY_IN_PARAM_COLUMNS); |
There was a problem hiding this comment.
| this.columns = this.properties.get(LoadStmt.KEY_IN_PARAM_COLUMNS); | |
| this.columns = this.properties.get(LoadStmt.KEY_IN_PARAM_COLUMNS); | |
| if (!Strings.isNullOrEmpty(this.columns)) { | |
| Splitter split = Splitter.on(',').trimResults().omitEmptyStrings(); | |
| this.exportColumns = split.splitToList(stmt.getColumns().toLowerCase()); | |
| } |
modify readFields exportColumns
Data export function, add certain columns that can be exported to the table.
Export stmt properties ("columns" = "k1, k2, k3");
## Proposed changes Issue Number: close #xxx <!--Describe your changes.-->
EXPORT TABLE db.tbl
TO "hdfs://namenode:8020/tmp/doris_20213"
PROPERTIES
(
"columns"="city_name,date",
"column_separator"=",",
"exec_mem_limit"="2147483648",
"timeout" = "3600"
)
WITH BROKER "broker_name_2"
(
"username" = "",
"password" = ""
);
The data export function adds a parameter “columns”, which is used to specify the column names in the export table, which can be multiple columns, separated by commas, and the column names are not case sensitive
If this parameter is not filled in, all columns of the table will be exported by default
Proposed changes
Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request. If it fixes a bug or resolves a feature request, be sure to link to that issue.
Types of changes
What types of changes does your code introduce to Doris?
Put an
xin the boxes that applyChecklist
Put an
xin the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...