Skip to content

Data export function, add export to specify certain columns#5689

Merged
EmmyMiao87 merged 10 commits intoapache:masterfrom
hf200012:dev
Apr 27, 2021
Merged

Data export function, add export to specify certain columns#5689
EmmyMiao87 merged 10 commits intoapache:masterfrom
hf200012:dev

Conversation

@hf200012
Copy link
Contributor

EXPORT TABLE db.tbl
TO "hdfs://namenode:8020/tmp/doris_20213"
PROPERTIES
(
"columns"="city_name,date",
"column_separator"=",",
"exec_mem_limit"="2147483648",
"timeout" = "3600"
)
WITH BROKER "broker_name_2"
(
"username" = "",
"password" = ""
);

The data export function adds a parameter “columns”, which is used to specify the column names in the export table, which can be multiple columns, separated by commas, and the column names are not case sensitive
If this parameter is not filled in, all columns of the table will be exported by default

Proposed changes

Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request. If it fixes a bug or resolves a feature request, be sure to link to that issue.

Types of changes

What types of changes does your code introduce to Doris?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)
  • Code refactor (Modify the code structure, format the code, etc...)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have created an issue on (Fix #ISSUE) and described the bug/feature there in detail
  • Compiling and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • If these changes need document changes, I have updated the document
  • Any dependent changes have been merged

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

… table

Data export function, add certain columns that can be exported to the table
Modify the data export usage document

package org.apache.doris.analysis;

import com.google.common.base.Splitter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please pay attention to import sequence

public static final String KEY_IN_PARAM_BACKEND_ID = "backend_id";

//export
public static final String EXPORT_KEY_IN_PARAM_COLUMNS = "columns";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use KEY_IN_PARAM_COLUMNS instead

properties, ExportStmt.DEFAULT_COLUMN_SEPARATOR));
this.lineDelimiter = Separator.convertSeparator(PropertyAnalyzer.analyzeLineDelimiter(
properties, ExportStmt.DEFAULT_LINE_DELIMITER));
if(properties.containsKey(LoadStmt.EXPORT_KEY_IN_PARAM_COLUMNS)){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code format

slot.setIsMaterialized(true);
slot.setColumn(col);
slot.setIsNullable(col.isAllowNull());
if(!this.exportColumns.isEmpty() && this.exportColumns.contains(col.getName().toLowerCase())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code format

exportTupleDesc = desc.createTupleDescriptor();
exportTupleDesc.setTable(exportTable);
exportTupleDesc.setRef(tableRef);
this.exportColumns = stmt.getColumns();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please set it in public void setJob(ExportStmt stmt) throws UserException


private void genExecFragment() throws UserException {
registerToDesc();
private void genExecFragment(ExportStmt stmt) throws UserException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private void genExecFragment(ExportStmt stmt) throws UserException {
private void genExecFragment() throws UserException {

}

private void registerToDesc() {
private void registerToDesc(ExportStmt stmt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private void registerToDesc(ExportStmt stmt) {
private void registerToDesc() {

this.tableId = exportTable.getId();
this.tableName = stmt.getTblName();
genExecFragment();
genExecFragment(stmt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
genExecFragment(stmt);
genExecFragment();

private OriginStatement origStmt;
protected Map<String, String> sessionVariables = Maps.newHashMap();

private List<String> exportColumns ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you store columns as a separate attribute in the export job from properties, you need to consider persistence.
Either reload the columns attribute during replay.
Or just persist the columns object directly.
My suggestion is not to modify the persistence logic. Re-parse columns after persistence.

Code format modify
for (Column col : exportTable.getBaseSchema()) {
if(!this.exportColumns.isEmpty() && this.exportColumns.contains(col.getName().toLowerCase())) {
String colName = col.getName().toLowerCase();
if (!this.exportColumns.isEmpty() && this.exportColumns.contains(colName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!this.exportColumns.isEmpty() && this.exportColumns.contains(colName)) {
if (this.exportColumns !=null && this.exportColumns.contains(colName)) {

There may be a null problem in the modification
columns persistence
```

* `column_separator`:列分隔符。默认为 `\t`。支持不可见字符,比如 '\x07'。
* columns:要导出的列,使用英文状态逗号隔开,如果不填这个参数默认是导出表的所有列
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以把英文注释也加一下~

Text.writeString(out, exportPath);
Text.writeString(out, columnSeparator);
Text.writeString(out, lineDelimiter);
Text.writeString(out, columns);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, there is no need to modify the logic here. You only need to initialize the columns after reading the properties.
Also, even if columns are to be persisted, they cannot actually be placed in this position.

modify columns Persistence
import org.apache.doris.common.Pair;
import org.apache.doris.common.Status;
import org.apache.doris.common.UserException;
import org.apache.doris.common.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove .*

@EmmyMiao87 EmmyMiao87 self-assigned this Apr 26, 2021
this.properties.put(propertyKey, propertyValue);
}
}
this.columns = this.properties.get(LoadStmt.KEY_IN_PARAM_COLUMNS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.columns = this.properties.get(LoadStmt.KEY_IN_PARAM_COLUMNS);
this.columns = this.properties.get(LoadStmt.KEY_IN_PARAM_COLUMNS);
if (!Strings.isNullOrEmpty(this.columns)) {
Splitter split = Splitter.on(',').trimResults().omitEmptyStrings();
this.exportColumns = split.splitToList(stmt.getColumns().toLowerCase());
}

modify readFields exportColumns
Copy link
Contributor

@EmmyMiao87 EmmyMiao87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EmmyMiao87 EmmyMiao87 added area/backup Issues of PRS related to backup and restore kind/improvement api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by one committer. labels Apr 26, 2021
@EmmyMiao87 EmmyMiao87 merged commit c7af83b into apache:master Apr 27, 2021
@EmmyMiao87 EmmyMiao87 mentioned this pull request May 7, 2021
10 tasks
EmmyMiao87 pushed a commit to EmmyMiao87/incubator-doris that referenced this pull request May 14, 2021
Data export function, add certain columns that can be exported to the table.
Export stmt properties ("columns" = "k1, k2, k3");
@morningman morningman mentioned this pull request Oct 10, 2021
yiguolei pushed a commit to yiguolei/incubator-doris that referenced this pull request Dec 30, 2025
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by one committer. area/backup Issues of PRS related to backup and restore kind/improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants