Skip to content

Add projectStrict for Dates and Timestamps#283

Merged
rdblue merged 21 commits intoapache:masterfrom
moulimukherjee:implement-strict-projection
Jul 26, 2019
Merged

Add projectStrict for Dates and Timestamps#283
rdblue merged 21 commits intoapache:masterfrom
moulimukherjee:implement-strict-projection

Conversation

@moulimukherjee
Copy link
Copy Markdown
Contributor

@moulimukherjee moulimukherjee commented Jul 13, 2019

Implementing projectStrict for Dates and Timestamps (#35)

Includes fix of ResidualEvaluator from moulimukherjee#1

Expected behaviour:
ts > 2019-07-07T23:59:59.9999 will be projected strict using ts_day >= 2019-07-08
ts > 2019-07-07T15:52:50.0000 will also be projected strict using ts_day >= 2019-07-08
ts > 2019-07-07T00:00:00:0000 will be projected strict using ts_day >= 2019-07-07

@moulimukherjee
Copy link
Copy Markdown
Contributor Author

Expected behaviour:
ts > 2019-07-07T23:59:59.9999 can be projected strict using ts_day >= 2019-07-08
ts > 2019-07-07T15:52:50.0000 can also be projected strict using ts_day >= 2019-07-08

@moulimukherjee moulimukherjee changed the title WIP: Add projectStrict for Dates and Timestamps Add projectStrict for Dates and Timestamps Jul 15, 2019
@moulimukherjee
Copy link
Copy Markdown
Contributor Author

@rdblue Is it apparent to you what might be the issue here? In TestFilteredScan.testHourPartitionedTimestampFilters, it's not returning the expected number of records based on the filter.

@moulimukherjee moulimukherjee force-pushed the implement-strict-projection branch from 4da4c29 to 2e6dec8 Compare July 17, 2019 20:48
@moulimukherjee moulimukherjee force-pushed the implement-strict-projection branch from 2e6dec8 to f976325 Compare July 18, 2019 01:23
Comment thread api/src/main/java/org/apache/iceberg/expressions/ResidualEvaluator.java Outdated
@moulimukherjee
Copy link
Copy Markdown
Contributor Author

@rdblue ptal?


List<UnboundPredicate<?>> strictProjections = Lists.transform(parts,
part -> ((Transform<T, ?>) part.transform()).projectStrict(part.name(), pred));
for (PartitionField part : parts) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you restructured this to simply return alwaysTrue or alwaysFalse if any projection can determine the result. That's a lot simpler than before.

Comment thread api/src/main/java/org/apache/iceberg/expressions/ResidualEvaluator.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/transforms/ProjectionUtil.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/transforms/ProjectionUtil.java
Simplifying the LT and GT predicates and updating tests to reflect that
@moulimukherjee
Copy link
Copy Markdown
Contributor Author

@rdblue ptal again?

Comment thread api/src/main/java/org/apache/iceberg/transforms/ProjectionUtil.java
Comment thread api/src/test/java/org/apache/iceberg/transforms/TestDatesProjection.java Outdated
Comment thread api/src/test/java/org/apache/iceberg/transforms/TestDatesProjection.java Outdated
result,
((Transform<T, ?>) part.transform()).project(part.name(), pred));
UnboundPredicate<?> inclusiveProjection = ((Transform<T, ?>) part.transform()).project(part.name(), pred);
if (inclusiveProjection != null) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added null check

@rdblue rdblue merged commit daf0620 into apache:master Jul 26, 2019
@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented Jul 26, 2019

Merged. Thanks for working on this, @moulimukherjee!

@moulimukherjee moulimukherjee deleted the implement-strict-projection branch July 26, 2019 21:14
danielcweeks pushed a commit that referenced this pull request Aug 1, 2019
* Add argument validation to HadoopTables#create (#298)

* Install source JAR when running install target (#310)

* Add projectStrict for Dates and Timestamps (#283)

* Correctly publish artifacts on JitPack (#321)

The Gradle install target produces invalid POM files that are missing
the dependencyManagement section and versions for some dependencies.
Instead, we directly tell JitPack to run the correct Gradle target.

* Add build info to README.md (#304)

* Convert Iceberg time type to Hive string type (#325)

* Add overwrite option to write builders (#318)

* Fix out of order Pig partition fields (#326)

* Add mapping to Iceberg for external name-based schemas (#338)

* Site: Fix broken link to Iceberg API (#333)

* Add forTable method for Avro WriteBuilder (#322)

* Remove multiple literal strings check rule for scala (#335)

* Fix invalid javadoc url in README.md (#336)

* Use UnicodeUtil.truncateString for Truncate transform. (#340)

This truncates by unicode codepoint instead of Java chars.

* Refactor metrics tests for reuse (#331)

* Spark: Add support for write-audit-publish workflows (#342)

* Avoid write failures if metrics mode is invalid (#301)

* Fix truncateStringMax in UnicodeUtil (#334)

Fixes #328, fixes #329.

Index to codePointAt should be the offset calculated by code points

* [Vectorization] Added batch sizing, switched to BufferAllocator, other minor style fixes.
rdblue pushed a commit to rdblue/iceberg that referenced this pull request Aug 7, 2019
rdblue pushed a commit to rdblue/iceberg that referenced this pull request Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants