-
Notifications
You must be signed in to change notification settings - Fork 268
feat: Support basic Delta scans #3035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| reqwest = { version = "0.12", default-features = false, features = ["rustls-tls-native-roots", "http2"] } | ||
| object_store_opendal = {version = "0.55.0", optional = true} | ||
| hdfs-sys = {version = "0.3", optional = true, features = ["hdfs_3_3"]} | ||
| hdfs-sys = {version = "0.3", optional = true, features = ["hdfs_3_3", "vendored"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With opendal enabled by default in #2929, I was getting failures finding libhdfs.so running unit tests on Linux. The hdrs dependency below for macos was purely activating the vendored feature of hdfs-sys, so I just enabled that globally here so you don't need libhdfs available for developing on Linux either
| <dependency> | ||
| <groupId>com.google.guava</groupId> | ||
| <artifactId>failureaccess</artifactId> | ||
| <version>1.0.3</version> | ||
| <scope>test</scope> | ||
| </dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this I'm getting an error creating the Delta tables:
Cause: java.lang.ClassNotFoundException: com.google.common.util.concurrent.internal.InternalFutureFailureAccess
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
...
Not sure why, could use some help figuring out what is missing with Guava from where
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3035 +/- ##
============================================
+ Coverage 56.12% 59.60% +3.47%
- Complexity 976 1379 +403
============================================
Files 119 168 +49
Lines 11743 15534 +3791
Branches 2251 2579 +328
============================================
+ Hits 6591 9259 +2668
- Misses 4012 4975 +963
- Partials 1140 1300 +160 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks, @Kimahriman. Please also add content to the documentation (either the user guide or the contributor guide) explaining this new feature. |
|
Added a snippet to the user guide. There's not much for the contributor guide since this doesn't start the process of trying to use the delta rust libraries. Still trying to figure out if/how to integrate those |
Which issue does this PR close?
Related to #174, not full support so probably should keep that open (or open new tickets specifically for column mapping and deletion vectors)
Rationale for this change
Delta scans are simply built-in Parquet scans when column mapping and deletion vectors are not used, so the existing Comet parquet scans work fine for them. This adds support for these cases in the scan rule and adds some unit tests that could be used for future integration of delta-rs or delta-kernel-rs for full scan support.
What changes are included in this PR?
Updates the
CometScanRuleto use Comet parquet scans for Delta V1 scans of tables that don't use column mapping or deletion vectors. Adds a newDeltaReflectionclass for accessing the Delta properties to determine the enabled features.How are these changes tested?
Adds unit tests for different Delta scans and adds Delta as a test dependency.