Skip to content
This repository was archived by the owner on Jun 14, 2024. It is now read-only.

Conversation

@andrei-ionescu
Copy link
Contributor

@andrei-ionescu andrei-ionescu commented Jan 9, 2021

What is the context for this pull request?

What changes were proposed in this pull request?

This PR adds support for Iceberg.

The following changes are in this PR and each of them are separate commits:

Does this PR introduce any user-facing change?

No. The main changes to user-facing APIs are in the #321 PR. Detailed information can be found in the #318 proposal.

How was this patch tested?

  1. Integration test added for the new functionality
  2. Locally & Databricks Runtime tests
  • Local build
sbt publishLocal
  • Run Spark shell with Hyperspace and Iceberg libraries loaded
$ spark-shell \
--driver-memory 4g \
--packages "com.microsoft.hyperspace:hyperspace-core_2.11:0.4.0-SNAPSHOT,org.apache.iceberg:iceberg-spark-runtime:0.10.0" \
--driver-java-options "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5006 -XX:+UseG1GC -Dlog4j.debug=true"
  • Paste the following code
import org.apache.spark.sql._
import com.microsoft.hyperspace._
import com.microsoft.hyperspace.index._
import scala.collection.JavaConverters._
import org.apache.iceberg.PartitionSpec
import org.apache.iceberg.TableProperties
import org.apache.iceberg.spark._
import org.apache.iceberg.hadoop._

val hs = new Hyperspace(spark)

// create Iceberg table
val props = Map(TableProperties.WRITE_NEW_DATA_LOCATION -> "table3").asJava
val sourceDf = Seq((1, "name1"), (2, "name2")).toDF("id", "name")
val schema = SparkSchemaUtil.convert(sourceDf.schema)
val part = PartitionSpec.builderFor(schema).build()
val icebergTable = new HadoopTables().create(schema, part, props, "table3")
sourceDf.write.mode("overwrite").format("iceberg").save("./table3")

// read created table
val iceDf = spark.read.format("iceberg").load("./table3")

// create indexes
hs.createIndex(iceDf, IndexConfig("index_ice0", indexedColumns = Seq("id"), includedColumns = Seq("name")))
hs.createIndex(iceDf, IndexConfig("index_ice1", indexedColumns = Seq("name")))

// verify plans
val query = iceDf.filter(iceDf("id") === 1).select("name")
hs.explain(query, verbose = true)

@andrei-ionescu andrei-ionescu force-pushed the iceberg branch 2 times, most recently from 2b55d68 to 86c510a Compare January 11, 2021 15:34
Copy link
Collaborator

@sezruby sezruby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test file for Iceberg? - for example https://github.com/microsoft/hyperspace/blob/master/src/test/scala/com/microsoft/hyperspace/index/DeltaLakeIntegrationTest.scala and incremental refresh test
https://github.com/microsoft/hyperspace/pull/301/files#diff-f32a70d0b9c560ff5d6a55595db0f12be911fef2ccd303ec24fe0799c7b31b0eR102

BTW to reduce PR size, could you split this PR into

  1. support DataSourceV2 (LogicalRelation -> LogicalPlan, ExtractIndexSupportedLogicalPlan(see below comment)) + unit tests for DataSourceV2 if possible
  2. based on 1), IcebergFileBasedSourceProvider + Iceberg tests

This will help us to understand your change better :)
You can keep this PR for reference and open 2 new PRs.

@andrei-ionescu andrei-ionescu force-pushed the iceberg branch 2 times, most recently from ec30343 to 84f7597 Compare January 11, 2021 21:27
@andrei-ionescu
Copy link
Contributor Author

@sezruby I'll keep this PR for Iceberg related commits:

  • Add Iceberg support
  • Add support for incremental refresh
  • Add Iceberg integration test

I'll create another PR for the DataSourceV2 changes.

@andrei-ionescu
Copy link
Contributor Author

I'll rebase this PR as soon as the #321 gets merged.

// The index should be applied for the updated version.
assert(isIndexUsed(query().queryExecution.optimizedPlan, "iceIndex", true))

// Append data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than "append", you could remove 1~2 files from the source data.
To delete the source files easily, I used partitioned data in other test cases.
& in any case, we also need to test partitioned source data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Iceberg (version 0.10), I can only remove entire files. If I try to remove just some rows from a file it will fail the delete action.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep Hyperspace indexes also only support entire file delete, not row-level delete.

Could you add a test for a partitioned table of Iceberg & Hybrid Scan append?
It's similar to this test, but use partitioned df .

@andrei-ionescu
Copy link
Contributor Author

andrei-ionescu commented Jan 14, 2021

@sezruby Here are the requested plans with Hybrid Scan enabled:

Right after creation

Project [query#215]
+- Filter (clicks#217 <= 2000)
   +- Relation[Query#215,clicks#217] parquet

After deleting a file

Project [query#237]
+- Filter (clicks#239 <= 2000)
   +- Project [Query#237, clicks#239]
      +- Filter NOT (_data_file_id#246L = 0)
         +- Relation[Query#237,clicks#239,_data_file_id#246L] parquet

After adding some more data

Union
:- Project [query#266]
:  +- Filter (clicks#268 <= 2000)
:     +- Project [Query#266, clicks#268]
:        +- Filter NOT (_data_file_id#275L = 0)
:           +- Relation[Query#266,clicks#268,_data_file_id#275L] parquet
+- Project [query#266]
   +- Filter (clicks#268 <= 2000)
      +- Relation[Query#266,clicks#268] parquet

I think they look as expected.

Just FYI...

IcebergSource read plan w/o index:

Project [query#28]
+- Filter (clicks#30 <= 2000)
   +- RelationV2 iceberg[Date#26, RGUID#27, Query#28, imprs#29, clicks#30] (Options: [path=/private/var/folders/dm/9mytk9kx49s4sf1b3f0cvcs80000gn/T/spark-25d9e8bb-cc56-4cac-b1f5-e2a2...)

@sezruby
Copy link
Collaborator

sezruby commented Jan 15, 2021

@andrei-ionescu
Thanks! It looks good 👍
Could you also share the plans of join query? (refer join() in the quick refresh test)
And please try collect for each filter/join query (with the transformed plan) and compare the result without indexes?
(jfyi, after spark.disableHyperspace or disable hybrid scan, you need to redefine(e.g. val filter = ..) the query to generate a new plan)
We'll add Hybrid Scan Test to check the plan transformation & result comparison later :)

@andrei-ionescu
Copy link
Contributor Author

@sezruby This is the join

Project [c2#206, c4#218]
+- Join Inner, (c2#206 = c2#216)
   :- Union
   :  :- Project [c2#206]
   :  :  +- Filter isnotnull(c2#206)
   :  :     +- Project [c2#206, c4#208]
   :  :        +- Filter NOT (_data_file_id#423L = 0)
   :  :           +- Relation[c2#206,c4#208,_data_file_id#423L] parquet
   :  +- Project [c2#206]
   :     +- Filter isnotnull(c2#206)
   :        +- Relation[c2#206,c4#208] parquet
   +- Union
      :- Project [c2#216, c4#218]
      :  +- Filter isnotnull(c2#216)
      :     +- Project [c2#216, c4#218]
      :        +- Filter NOT (_data_file_id#424L = 0)
      :           +- Relation[c2#216,c4#218,_data_file_id#424L] parquet
      +- Project [c2#216, c4#218]
         +- Filter isnotnull(c2#216)
            +- Relation[c2#216,c4#218] parquet

@sezruby
Copy link
Collaborator

sezruby commented Jan 15, 2021

@andrei-ionescu could you share sparkPlan?
Make sure to set 'spark.sql.autoBroadcastJoinThreshold' as -1 to see shuffle is removed.

@andrei-ionescu
Copy link
Contributor Author

andrei-ionescu commented Jan 15, 2021

@sezruby The Spark plan

Project [c2#206, c4#218]
+- SortMergeJoin [c2#206], [c2#216], Inner
   :- Union
   :  :- Project [c2#206]
   :  :  +- Filter (NOT (_data_file_id#687L = 0) && isnotnull(c2#206))
   :  :     +- FileScan parquet [c2#206,_data_file_id#687L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/aionescu/github/hyperspace/src/test/resources/icebergIntegrationTes..., PartitionFilters: [], PushedFilters: [Not(EqualTo(_data_file_id,0)), IsNotNull(c2)], ReadSchema: struct<c2:string,_data_file_id:bigint>
   :  +- Project [c2#206]
   :     +- Filter isnotnull(c2#206)
   :        +- FileScan parquet [c2#206] Batched: true, Format: Parquet, Location: InMemoryFileIndex[/private/var/folders/dm/9mytk9kx49s4sf1b3f0cvcs80000gn/T/spark-6ef5741e-dd25-49..., PartitionFilters: [], PushedFilters: [IsNotNull(c2)], ReadSchema: struct<c2:string>
   +- Union
      :- Project [c2#216, c4#218]
      :  +- Filter (NOT (_data_file_id#688L = 0) && isnotnull(c2#216))
      :     +- FileScan parquet [c2#216,c4#218,_data_file_id#688L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/aionescu/github/hyperspace/src/test/resources/icebergIntegrationTes..., PartitionFilters: [], PushedFilters: [Not(EqualTo(_data_file_id,0)), IsNotNull(c2)], ReadSchema: struct<c2:string,c4:int,_data_file_id:bigint>
      +- Project [c2#216, c4#218]
         +- Filter isnotnull(c2#216)
            +- FileScan parquet [c2#216,c4#218] Batched: true, Format: Parquet, Location: InMemoryFileIndex[/private/var/folders/dm/9mytk9kx49s4sf1b3f0cvcs80000gn/T/spark-6ef5741e-dd25-49..., PartitionFilters: [], PushedFilters: [IsNotNull(c2)], ReadSchema: struct<c2:string,c4:int>

@sezruby
Copy link
Collaborator

sezruby commented Jan 15, 2021

@andrei-ionescu Seems bucketSpec is not applied properly.

  • Union => BucketUnion
  • relation should have bucketing information - e.g. selected 200 of 200
  • Sort should exist between SortMergeJoin and BucketUnion.

Could you investigate the cause? We could check the join plan with index, but no hybrid scan case first.
Thanks!

@andrei-ionescu
Copy link
Contributor Author

@sezruby I found yet another place that I missed adding the DataSourceV2Relation pattern matching.

Here are the optimizedPlan and the sparkPlan:

Project [c2#206, c4#218]
+- Join Inner, (c2#206 = c2#216)
   :- BucketUnion 200 buckets, bucket columns: [c2]
   :  :- Project [c2#206]
   :  :  +- Filter isnotnull(c2#206)
   :  :     +- Project [c2#206, c4#208]
   :  :        +- Filter NOT (_data_file_id#423L = 0)
   :  :           +- Relation[c2#206,c4#208,_data_file_id#423L] parquet
   :  +- RepartitionByExpression [c2#206], 200
   :     +- Project [c2#206]
   :        +- Filter isnotnull(c2#206)
   :           +- Relation[c2#206,c4#208] parquet
   +- BucketUnion 200 buckets, bucket columns: [c2]
      :- Project [c2#216, c4#218]
      :  +- Filter isnotnull(c2#216)
      :     +- Project [c2#216, c4#218]
      :        +- Filter NOT (_data_file_id#424L = 0)
      :           +- Relation[c2#216,c4#218,_data_file_id#424L] parquet
      +- RepartitionByExpression [c2#216], 200
         +- Project [c2#216, c4#218]
            +- Filter isnotnull(c2#216)
               +- Relation[c2#216,c4#218] parquet
Project [c2#206, c4#218]
+- SortMergeJoin [c2#206], [c2#216], Inner
   :- BucketUnion 200 buckets, bucket columns: [c2]
   :  :- Project [c2#206]
   :  :  +- Filter (NOT (_data_file_id#553L = 0) && isnotnull(c2#206))
   :  :     +- FileScan parquet [c2#206,_data_file_id#553L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/aionescu/github/hyperspace/src/test/resources/icebergIntegrationTes..., PartitionFilters: [], PushedFilters: [Not(EqualTo(_data_file_id,0)), IsNotNull(c2)], ReadSchema: struct<c2:string,_data_file_id:bigint>, SelectedBucketsCount: 200 out of 200
   :  +- Exchange hashpartitioning(c2#206, 200)
   :     +- Project [c2#206]
   :        +- Filter isnotnull(c2#206)
   :           +- FileScan parquet [c2#206] Batched: true, Format: Parquet, Location: InMemoryFileIndex[/private/var/folders/dm/9mytk9kx49s4sf1b3f0cvcs80000gn/T/spark-1c3acd51-9f16-42..., PartitionFilters: [], PushedFilters: [IsNotNull(c2)], ReadSchema: struct<c2:string>
   +- BucketUnion 200 buckets, bucket columns: [c2]
      :- Project [c2#216, c4#218]
      :  +- Filter (NOT (_data_file_id#554L = 0) && isnotnull(c2#216))
      :     +- FileScan parquet [c2#216,c4#218,_data_file_id#554L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/aionescu/github/hyperspace/src/test/resources/icebergIntegrationTes..., PartitionFilters: [], PushedFilters: [Not(EqualTo(_data_file_id,0)), IsNotNull(c2)], ReadSchema: struct<c2:string,c4:int,_data_file_id:bigint>, SelectedBucketsCount: 200 out of 200
      +- Exchange hashpartitioning(c2#216, 200)
         +- Project [c2#216, c4#218]
            +- Filter isnotnull(c2#216)
               +- FileScan parquet [c2#216,c4#218] Batched: true, Format: Parquet, Location: InMemoryFileIndex[/private/var/folders/dm/9mytk9kx49s4sf1b3f0cvcs80000gn/T/spark-1c3acd51-9f16-42..., PartitionFilters: [], PushedFilters: [IsNotNull(c2)], ReadSchema: struct<c2:string,c4:int>

@sezruby
Copy link
Collaborator

sezruby commented Jan 16, 2021

@andrei-ionescu Thanks! There's still missing 'Sort' node.

Could you check this?
Thanks a lot!

@andrei-ionescu
Copy link
Contributor Author

@sezruby I did compare with the Delta output I don't see any difference. Here is the optimizedPlan and the sparkPlan output of the Delta test:

Project [c2#3663, c4#3675]
+- Join Inner, (c2#3663 = c2#3673)
   :- BucketUnion 200 buckets, bucket columns: [c2]
   :  :- Project [c2#3663]
   :  :  +- Filter isnotnull(c2#3663)
   :  :     +- Relation[c2#3663,c4#3665] parquet
   :  +- RepartitionByExpression [c2#3663], 200
   :     +- Project [c2#3663]
   :        +- Filter isnotnull(c2#3663)
   :           +- Relation[c2#3663,c4#3665] parquet
   +- BucketUnion 200 buckets, bucket columns: [c2]
      :- Project [c2#3673, c4#3675]
      :  +- Filter isnotnull(c2#3673)
      :     +- Relation[c2#3673,c4#3675] parquet
      +- RepartitionByExpression [c2#3673], 200
         +- Project [c2#3673, c4#3675]
            +- Filter isnotnull(c2#3673)
               +- Relation[c2#3673,c4#3675] parquet
Project [c2#3663, c4#3675]
+- SortMergeJoin [c2#3663], [c2#3673], Inner
   :- BucketUnion 200 buckets, bucket columns: [c2]
   :  :- Project [c2#3663]
   :  :  +- Filter isnotnull(c2#3663)
   :  :     +- FileScan parquet [c2#3663] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/aionescu/github/hyperspace/src/test/resources/deltaLakeIntegrationT..., PartitionFilters: [], PushedFilters: [IsNotNull(c2)], ReadSchema: struct<c2:string>, SelectedBucketsCount: 200 out of 200
   :  +- Exchange hashpartitioning(c2#3663, 200)
   :     +- Project [c2#3663]
   :        +- Filter isnotnull(c2#3663)
   :           +- FileScan parquet [c2#3663] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/private/var/folders/dm/9mytk9kx49s4sf1b3f0cvcs80000gn/T/spark-c30f7f12-4a..., PartitionFilters: [], PushedFilters: [IsNotNull(c2)], ReadSchema: struct<c2:string>
   +- BucketUnion 200 buckets, bucket columns: [c2]
      :- Project [c2#3673, c4#3675]
      :  +- Filter isnotnull(c2#3673)
      :     +- FileScan parquet [c2#3673,c4#3675] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/aionescu/github/hyperspace/src/test/resources/deltaLakeIntegrationT..., PartitionFilters: [], PushedFilters: [IsNotNull(c2)], ReadSchema: struct<c2:string,c4:int>, SelectedBucketsCount: 200 out of 200
      +- Exchange hashpartitioning(c2#3673, 200)
         +- Project [c2#3673, c4#3675]
            +- Filter isnotnull(c2#3673)
               +- FileScan parquet [c2#3673,c4#3675] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/private/var/folders/dm/9mytk9kx49s4sf1b3f0cvcs80000gn/T/spark-c30f7f12-4a..., PartitionFilters: [], PushedFilters: [IsNotNull(c2)], ReadSchema: struct<c2:string,c4:int>

If there is something missing then it is missing from Delta too.

BTW there is a SortMergeJoin node in the sparkPlan.

@sezruby
Copy link
Collaborator

sezruby commented Jan 18, 2021

From Delta Lake hybrid scan test:

Project [clicks#1983, query#1981, Date#1989]
+- SortMergeJoin [clicks#1983], [clicks#1993], Inner
   :- Sort [clicks#1983 ASC NULLS FIRST], false, 0
   :  +- Filter isnotnull(clicks#1983)
   :     +- BucketUnion 200 buckets, bucket columns: [clicks]
   :        :- Project [clicks#1983, query#1981]
   :        :  +- Filter ((isnotnull(clicks#1983) && (clicks#1983 >= 2000)) && (clicks#1983 <= 4000))
   :        :     +- FileScan parquet [clicks#1983,Query#1981] Batched: false, Format: Parquet, Location: InMemoryFileIndex[file:/path/to/src/test/resources/hybridScanTest/index..., PartitionFilters: [], PushedFilters: [IsNotNull(clicks), GreaterThanOrEqual(clicks,2000), LessThanOrEqual(clicks,4000)], ReadSchema: struct<clicks:int,Query:string>, SelectedBucketsCount: 200 out of 200
   :        +- Exchange hashpartitioning(clicks#1983, 200)
   :           +- Project [clicks#1983, query#1981]
   :              +- Filter ((isnotnull(clicks#1983) && (clicks#1983 >= 2000)) && (clicks#1983 <= 4000))
   :                 +- FileScan parquet [Query#1981,clicks#1983,RGUID#1980] Batched: false, Format: Parquet, Location: InMemoryFileIndex[file:/path/to/AppData/Local/Temp/spark-60a4469f-edc0-4286-9944-2769bbd..., PartitionCount: 1, PartitionFilters: [], PushedFilters: [IsNotNull(clicks), GreaterThanOrEqual(clicks,2000), LessThanOrEqual(clicks,4000)], ReadSchema: struct<Query:string,clicks:int>
   +- Sort [clicks#1993 ASC NULLS FIRST], false, 0
      +- Filter isnotnull(clicks#1993)
         +- BucketUnion 200 buckets, bucket columns: [clicks]
            :- Project [clicks#1993, Date#1989]
            :  +- Filter ((isnotnull(clicks#1993) && (clicks#1993 <= 4000)) && (clicks#1993 >= 2000))
            :     +- FileScan parquet [clicks#1993,Date#1989] Batched: false, Format: Parquet, Location: InMemoryFileIndex[file:/C:/path/to/src/test/resources/hybridScanTest/index..., PartitionFilters: [], PushedFilters: [IsNotNull(clicks), LessThanOrEqual(clicks,4000), GreaterThanOrEqual(clicks,2000)], ReadSchema: struct<clicks:int,Date:string>, SelectedBucketsCount: 200 out of 200
            +- Exchange hashpartitioning(clicks#1993, 200)
               +- Project [clicks#1993, Date#1989]
                  +- Filter ((isnotnull(clicks#1993) && (clicks#1993 <= 4000)) && (clicks#1993 >= 2000))
                     +- FileScan parquet [Date#1989,clicks#1993,RGUID#1990] Batched: false, Format: Parquet, Location: InMemoryFileIndex[file:/path/to/AppData/Local/Temp/spark-60a4469f-edc0-4286-9944-2769bbd..., PartitionCount: 1, PartitionFilters: [], PushedFilters: [IsNotNull(clicks), LessThanOrEqual(clicks,4000), GreaterThanOrEqual(clicks,2000)], ReadSchema: struct<Date:string,clicks:int>

I think there might be no duplicated bucket between appended data & original source data in your testcase.
Could you check it again by appending the same data? Thanks!

@andrei-ionescu
Copy link
Contributor Author

andrei-ionescu commented Jan 18, 2021

@sezruby,

The results (optimizedPlan & sparkPlan) I did paste above are both from Verify JoinIndexRule utilizes indexes correctly after quick refresh when some file gets deleted and some appended to source data. tests in each Delta (DeltaLakeIntegrationTest.scala#L146) & Iceberg (IcebergIntegrationTest.scala#L162) tests. They seem to have the same output. In this respect the test has the same behaviour on both cases.

In regards to your question we do add duplicate data here: https://github.com/microsoft/hyperspace/pull/320/files#diff-ce1f32f296e1683385beb0fe1954b154710c0ba0120f028167afbe5953347dd3R186-R192.

I'm not sure about the output you did paste and where that comes from. Can you please provide the code of the test? Can you point me to the test that gives this output? I want to run the same test on Iceberg too and then debug and compare the differences.

Thank you.

@sezruby
Copy link
Collaborator

sezruby commented Jan 18, 2021

@andrei-ionescu

This is the test:

// code in HybridScanSuite.scala
 test(
    "Append-only: join rule, appended data should be shuffled with indexed columns " +
      "and merged by BucketUnion")

Then there might be a problem in quick refresh?
Could you check hybrid scan first? In the join + quick refresh Iceberg test, you could test it by :

// hyperspace.refreshIndex(indexConfig.indexName, REFRESH_MODE_QUICK)`
// instead of refresh, enable hybrid scan:
withSQLConf(TestConfig.HybridScanEnabled: _*) {

@andrei-ionescu
Copy link
Contributor Author

andrei-ionescu commented Jan 18, 2021

@sezruby The suggested test is a PR that has changes on the Hybrid Scan logic. I can try take that test and add it in my PR and print out the output for both Delta and Iceberg. But I will not add the changes added by that PR in the logic of the Hybrid Scan.

@andrei-ionescu
Copy link
Contributor Author

andrei-ionescu commented Jan 19, 2021

@sezruby I did merge your hybridtest_refactoring branch that contains the #274 changes, into my local development and I did run all the tests (set +test) and it did successfully pass. This means that the DataSourceV2 PR #321 does not bring any new changes into the current functionality of Hyperspace.

I did try to replicate the HybridScanForDeltaLakeTest into and Iceberg test: HybridScanForIcebergTest but there is a lot of work to be done as Iceberg has a different way of getting the appended and deleted files. I need to understand more the test and use it as a reference for Iceberg but it will require more changes in some test related areas.

Taking into account the following things:

  1. No changes into the current implementation
  2. No changes into the Hybrid Scan test refactor branch
  3. The HybridScan for Iceberg is tightly linked to Iceberg implementation

I would suggest merging the #321 and #274 PRs and after that I can fruitfully work on bringing this Iceberg implementation on par with Delta one.

Or, merge the #274 PR first and I'll rebase the my #321 and keep the tests to validate that nothing is broken by my DataSourceV2 support addition.

@sezruby what do you think?

@sezruby
Copy link
Collaborator

sezruby commented Jan 19, 2021

@andrei-ionescu I'm okay with either way. BTW we need @imback82's review to merge the changes :)
Please understand any delay in our review... 🙏

Thanks for the great work!

@imback82
Copy link
Contributor

Sorry for the delay. I will get to #274 soon.

@andrei-ionescu andrei-ionescu force-pushed the iceberg branch 3 times, most recently from 4fbd068 to e68fc57 Compare January 22, 2021 14:44
@sezruby
Copy link
Collaborator

sezruby commented Jan 24, 2021

@andrei-ionescu
Seems "missing sort node" is because it's sparkPlan, not executedPlan.
Sorry I wasn't aware of the difference 😅 Could you check executedPlan again? Thanks!

@andrei-ionescu
Copy link
Contributor Author

Closing this PR because of the new work from @imback82 - PR #355. I created this new Iceberg format table related PR only: #358.

@andrei-ionescu andrei-ionescu deleted the iceberg branch February 22, 2021 20:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[PROPOSAL]: Support Iceberg table format [FEATURE REQUEST]: Add support for Iceberg table format

3 participants