Skip to content

Conversation

@rahul8383
Copy link
Contributor

…Records to Rows in HCatalogIO

char and varchar types in HcatRecord requires the length of the character sequence to be provided as a parameter.
serdeConstants only specifies the base name of the type. So, serdeConstants cannot be used to convert "char(10)" and "varchar(100)" to Beam Fieldtype.STRING.

This PR converts the FieldSchema to HCatFieldSchema and the Type enum defined in HCatFieldSchema is used to convert the FieldSchema to Beam Schema.

Decimal type can optionally be provided with precision and scale as parameters while defining the type. Using HCatFieldSchema will take care of converting "decimal(30,16)" type to FieldType.DECIMAL type.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@rahul8383
Copy link
Contributor Author

R: @akedin @iemejia

@iemejia iemejia requested a review from akedin April 29, 2020 21:55
@aaltay
Copy link
Member

aaltay commented May 1, 2020

retest this please

@TheNeuralBit
Copy link
Member

@akedin isn't very involved with Beam anymore. I think I can help review this instead

@TheNeuralBit TheNeuralBit self-requested a review May 4, 2020 18:41
Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rahul8383!

I think ideally these parameterized types should map to standard logical types. What do you think about adding some as part of this PR? If you'd rather not, we could probably merge it as-is, but file a jira to follow-up with logical type mappings.

Schema.builder()
.addNullableField("parameterizedChar", Schema.FieldType.STRING)
.addNullableField("parameterizedVarchar", Schema.FieldType.STRING)
.addNullableField("parameterizedDecimal", Schema.FieldType.DECIMAL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should map to logical types instead of to primitives so we don't lose the information from the parameter. Unfortunately we don't (yet) have good logical types in schemas.logicaltypes to map them to, but maybe we will after your other PR, #11581 (or you could just add the relevant ones here).

char(10) looks like it could map to a FixedLengthString logical type, varchar(100) probably deserves its own type, maybe just called Varchar? and I've been meaning to add a logical type for DECIMAL parameterized by precision and scale as part of BEAM-7554 (and deprecate the primitive one).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TheNeuralBit for the review.

I am adding logical types in schemas.logicaltypes called VariableLengthBytes, FixedLengthString, VariableLengthString, LogicalDecimal as part of #11581 .

I will take up the task of mapping these to logical types once my other PR gets merged. I also hope that #11272 get merged by then, so that I can use SqlTypes.DATE logical type. Can you please create a JIRA ticket and assign it to me.

@rahul8383
Copy link
Contributor Author

@TheNeuralBit if it is okay with you that I handle converting the parameterized types to necessary logical types in a separate PR, and if there are no comments, can we merge this PR?

@TheNeuralBit
Copy link
Member

Yep! sorry about that meant to do this yesterday

@TheNeuralBit TheNeuralBit merged commit dcd9f75 into apache:master May 6, 2020
@TheNeuralBit
Copy link
Member

Filed BEAM-9909 to track the logical type change

@rahul8383 rahul8383 deleted the parameterized_HCatTypes_to_BeamTypes branch May 6, 2020 20:55
yirutang pushed a commit to yirutang/beam that referenced this pull request Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants