[PARQUET-1968] FilterApi support In predicate#923
[PARQUET-1968] FilterApi support In predicate#923gszadovszky merged 16 commits intoapache:masterfrom
Conversation
|
@gszadovszky @shangxinli @rdblue Could you please take a look at this PR when you have time? Thanks a lot! |
|
also cc @chenjunjiedada |
gszadovszky
left a comment
There was a problem hiding this comment.
I have some comments in the code. All of them are more to open discussions so I neither approve nor disapprove for now.
Otherwise the code seems good to me. Thanks a lot for working on it!
...rc/main/java/org/apache/parquet/filter2/recordlevel/IncrementallyUpdatedFilterPredicate.java
Outdated
Show resolved
Hide resolved
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| // base class for In and NotIn |
There was a problem hiding this comment.
Have a better comment since it is public method
| this.values = Objects.requireNonNull(values, "values cannot be null"); | ||
| checkArgument(!values.isEmpty(), "values in SetColumnFilterPredicate shouldn't be empty!"); | ||
|
|
||
| String name = getClass().getSimpleName().toLowerCase(Locale.ENGLISH); |
There was a problem hiding this comment.
I see you have a 'toString' to cache but do we see generally this is reused multiple times? If no, proactively converting to string will be a waste.
| iter++; | ||
| } | ||
| String valueStr = values.size() <= 100 ? str.substring(0, str.length() - 2) : str + "..."; | ||
| this.toString = name + "(" + column.getColumnPath().toDotString() + ", " + valueStr + ")"; |
There was a problem hiding this comment.
Would it be possible to merge lines 272 and 273 into the above code of that building? the string? String operations sometimes consume a lot of memory like this.
There was a problem hiding this comment.
Is it just enough to replace str + "..." to str.append("...").toString?
There was a problem hiding this comment.
str.substring(0, str.length() - 2) is still StringBuilder operation. Seems fine?
There was a problem hiding this comment.
Maybe we can replace line 273 with StringBuilder operation too?
| @Override | ||
| public boolean equals(Object o) { | ||
| if (this == o) return true; | ||
| if (o == null || getClass() != o.getClass()) return false; |
There was a problem hiding this comment.
I guess you can just 'return this.getClass() == o.getClass()'
There was a problem hiding this comment.
Yes, but just trying to follow the style at https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java#L150
| if (this == o) return true; | ||
| if (o == null || getClass() != o.getClass()) return false; | ||
| SetColumnFilterPredicate<?> that = (SetColumnFilterPredicate<?>) o; | ||
| return column.equals(that.column) && values.equals(that.values) && Objects.equals(toString, that.toString); |
There was a problem hiding this comment.
Is toString comparison still needed here? It seems toString have (values and class). You can just compare class here.
There was a problem hiding this comment.
Removed toString comparison
…ions by throwing Exception
|
@gszadovszky @shangxinli @dbtsai Thank you all very much for reviewing! I have changed the code to generate the visit methods for in/notIn and also added the default by throwing Exception. Will address the rest of the comments tomorrow or the day after tomorrow. |
gszadovszky
left a comment
There was a problem hiding this comment.
I have some more concrete comments in the code. Some more work is needed but I think it is going to the good direction. Thanks a lot for your efforts.
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/filter2/bloomfilterlevel/BloomFilterImpl.java
Outdated
Show resolved
Hide resolved
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Show resolved
Hide resolved
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Outdated
Show resolved
Hide resolved
...umn/src/test/java/org/apache/parquet/internal/filter2/columnindex/TestColumnIndexFilter.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/parquet/filter2/IncrementallyUpdatedFilterPredicateGenerator.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/parquet/filter2/IncrementallyUpdatedFilterPredicateGenerator.java
Show resolved
Hide resolved
gszadovszky
left a comment
There was a problem hiding this comment.
Added a couple of more comments/requests.
Sorry if I am a bit strict here but filtering is not an easy topic and can have serious issues (lost of data).
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java
Outdated
Show resolved
Hide resolved
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Outdated
Show resolved
Hide resolved
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java
Outdated
Show resolved
Hide resolved
...-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/parquet/filter2/IncrementallyUpdatedFilterPredicateGenerator.java
Show resolved
Hide resolved
parquet-hadoop/src/test/java/org/apache/parquet/filter2/recordlevel/TestRecordLevelFilters.java
Show resolved
Hide resolved
parquet-hadoop/src/test/java/org/apache/parquet/filter2/recordlevel/TestRecordLevelFilters.java
Show resolved
Hide resolved
gszadovszky
left a comment
There was a problem hiding this comment.
I have a couple of comments for the new MinMax class but otherwise everything seems great!
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
| T element = iterator.next(); | ||
| if (max == null) { | ||
| max = element; | ||
| } else if (max != null && element != null) { |
There was a problem hiding this comment.
You are already in the else path so do not need to check for max != null.
gszadovszky
left a comment
There was a problem hiding this comment.
I have some minor issues only. Thanks a lot for your efforts to implement this! I think it is a great improvement for the query engines.
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/MinMax.java
Outdated
Show resolved
Hide resolved
|
@gszadovszky @shangxinli @viirya @dbtsai Thank you so much for all your help!! |
|
Thank you for your contribution, @huaxingao! Great work! |
Make sure you have checked all steps below.
Jira
Tests
Commits
Documentation