Skip to content

[Feature] Select outfile support parquet format#5938

Merged
morningman merged 3 commits intoapache:masterfrom
xinghuayu007:outfile_support_parquet
Jun 10, 2021
Merged

[Feature] Select outfile support parquet format#5938
morningman merged 3 commits intoapache:masterfrom
xinghuayu007:outfile_support_parquet

Conversation

@xinghuayu007
Copy link
Contributor

@xinghuayu007 xinghuayu007 commented May 28, 2021

Proposed changes

Select outfile into currently only supports to export data with CSV format. This patch extends the feature to supports parquet format.

Usage:
LocaFile:
SELECT citycode FROM table1 INTO OUTFILE "file:///root/doris/" FORMAT AS PARQUET PROPERTIES ("schema"="required,int32,siteid;", "parquet.compression"="snappy");

BrokerFile:
SELECT siteid FROM table1 INTO OUTFILE "hdfs://host/test_sql_prc_2019_02_19/" FORMAT AS PARQUET PROPERTIES ( "broker.name" = "hdfs_broker", "broker.hadoop.security.authentication" = "kerberos", "broker.kerberos_principal" = "test", "broker.kerberos_keytab_content" = "base64" , "schema"="required,int32,siteid;");

Field schema is requied, whick defines the schema of a parquet file. Prefix parquet. is the parquet file properties, like compression, version, enable_dictionary.

Types of changes

What types of changes does your code introduce to Doris?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)
  • Code refactor (Modify the code structure, format the code, etc...)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have created an issue on (Fix #ISSUE) and described the bug/feature there in detail
  • Compiling and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • If these changes need document changes, I have updated the document
  • Any dependent changes have been merged

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@morningman morningman added the kind/feature Categorizes issue or PR as related to a new feature. label May 29, 2021
@xinghuayu007 xinghuayu007 force-pushed the outfile_support_parquet branch 4 times, most recently from fb6a36e to 2d923b5 Compare June 5, 2021 08:37
fileFormatType = TFileFormatType.FORMAT_PARQUET;
break;
default:
throw new AnalysisException("format:"+this.format+" not be supported.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new AnalysisException("format:"+this.format+" not be supported.");
throw new AnalysisException("format:" + this.format + " is supported.");

brokerDesc = new BrokerDesc(brokerName, brokerProps);
}

private void getParquetProperties(Set<String> processedPropKeys) throws AnalysisException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better add comment with an example to explain what kind of property format does this method parsed

<< _str_schema[index][2] << " is " << _str_schema[index][1];
return Status::InvalidArgument(ss.str());
}
LOG(WARNING) << "wangxixu-write-one-data:";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG(WARNING) << "wangxixu-write-one-data:";

}
break;
}
case TYPE_LARGEINT: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be checked on FE side, and reject request with LARGEINT column

return Status::OK();
}

int64_t ParquetWriterWrapper::getWritedLen() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int64_t ParquetWriterWrapper::getWritedLen() {
int64_t ParquetWriterWrapper::get_written_len() {

_outstream->Close();
} catch (const std::exception& e) {
_rg_writer = nullptr;
LOG(WARNING) <<"Parquet writer close error: " << e.what();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG(WARNING) <<"Parquet writer close error: " << e.what();
LOG(WARNING) << "Parquet writer close error: " << e.what();

@xinghuayu007 xinghuayu007 force-pushed the outfile_support_parquet branch from 2d923b5 to f70b2f1 Compare June 8, 2021 08:29
Status FileResultWriter::_close_file_writer(bool done, bool only_close) {
if (_parquet_writer != nullptr) {
_parquet_writer->close();
_current_written_bytes = _parquet_writer->get_writed_len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_current_written_bytes = _parquet_writer->get_writed_len();
_current_written_bytes = _parquet_writer->_written_len();

@xinghuayu007 xinghuayu007 force-pushed the outfile_support_parquet branch from 5a84c39 to e78db50 Compare June 9, 2021 04:33
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman added the approved Indicates a PR has been approved by one committer. label Jun 9, 2021
@morningman morningman merged commit e245aee into apache:master Jun 10, 2021
@morningman morningman mentioned this pull request Oct 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. kind/feature Categorizes issue or PR as related to a new feature.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants