Conversation
fd1979e to
6590282
Compare
a2l007
left a comment
There was a problem hiding this comment.
Sorry for being the grammar police here, but it would be better if we correct Seperate to Separate
thank you for pointing out |
| if (!this.columns.isEmpty()) { | ||
| for (String column : this.columns) { | ||
| Preconditions.checkArgument( | ||
| !column.contains("tab".equals(seperator) ? "\t" : ","), |
There was a problem hiding this comment.
What do you think of using something like the AbstractFlatTextFormatParser.FlatTextFormat enum for this instead of hard-coding strings?
There was a problem hiding this comment.
it's a good idea, working on that.
| public int hashCode() | ||
| { | ||
| return Objects.hash(listDelimiter, columns, findColumnsFromHeader, skipHeaderRows); | ||
| } |
There was a problem hiding this comment.
Looks like format is missing in equalsAndHashCode
| this.multiValueFunction = ParserUtils.getMultiValueFunction(finalListDelimeter, Splitter.on(finalListDelimeter)); | ||
| this.columns = findColumnsFromHeader ? null : columns; // columns will be overriden by header row | ||
| this.format = format; | ||
| this.parser = createOpenCsvParser(format.getDefaultDelimiter().charAt(0)); |
There was a problem hiding this comment.
I think it'd be cleaner for the format to have a method that returns a char for this
suneet-s
left a comment
There was a problem hiding this comment.
Thanks for this PR! I read through it and have some overall thoughts. I'm just asking a bunch of questions to get a better understanding.
Overall comments:
- Please add javadocs to the new files you've created. I think that will help clear up a lot of my confusion.
- Could you explain why you chose to have the CSV* and TSV* classes extend the SeparateValue* classes. I like your idea of sharing the logic between them in a common class, but I think composition might be an easier approach to follow (ie CSV* and TSV* classes have a delegate SeparateValue* object that is instantiated differently based on which class is calling it)
| CSV(","), | ||
| TSV("\t"); | ||
|
|
||
| private final String defaultDelimiter; |
There was a problem hiding this comment.
Why defaultDelimiter Should this just be delimiter?
|
|
||
| public String getLiteral() | ||
| { | ||
| return ",".equals(defaultDelimiter) ? "comma" : "tab"; |
There was a problem hiding this comment.
nit: IMO it'd be easier to read if the enum had to define both the delimiter and the literal
CSV(",", "comma")
| private final FlatTextFormat format; | ||
|
|
||
| @JsonCreator | ||
| public SeparateValueInputFormat( |
There was a problem hiding this comment.
Is this class ever instantiated via json? I think it's just a base class if I'm reading the PR correctly - so I think we can remove all the Json annotations. And I think the constructor should only be package private
| ); | ||
| } | ||
|
|
||
| public InputEntityReader createReader( |
There was a problem hiding this comment.
sorry, I find this a little confusing. The function to create a reader above creates a SeparateValueReader, but if you pass in a format it constructs either a CSVReader or a TSVReader. I think I'm finding it hard to wrap my head around the differences
There was a problem hiding this comment.
Sorry I found that confusing too, will refactor that code
| } | ||
|
|
||
| @VisibleForTesting | ||
| public TsvInputFormat( |
There was a problem hiding this comment.
Do we really need this public constructor to be visible for testing? If it was package private, that might have been ok, but it's hard to enforce public constructors are not used elsewhere in the main source code.
Maybe it's better if the tests are explicit about passing hasHeaderRow as null when instantiating this object
| * It implements the common logic between {@link CsvInputFormat} and {@link TsvInputFormat} | ||
| * Should never be instantiated | ||
| */ | ||
| public class SeparateValueInputFormat implements InputFormat |
There was a problem hiding this comment.
This can be an abstract class and you can leave the "Should never be instantiated" comment out
Description
add support for TSV format file.
add
tsvInputFormat,tsvReader, and unit tests for them.tsvInputFormatandcsvInputFormatare pretty similar, I'm thinking have a parent class calledseperateValueInputFormat implements InputFormatand
tsvInputFormat extends seperateValueInputFormat,csvInputFormat extends seperateValueInputFormatConfiguration Interface Design
This PR has:
Key changed/added classes in this PR
tsvInputFormattsvReader