Skip to content

AWS Glue Catalog for Iceberg ingest extension #17392

Merged
a2l007 merged 31 commits intoapache:masterfrom
Shekharrajak:feature-gluecatalog
Nov 11, 2024
Merged

AWS Glue Catalog for Iceberg ingest extension #17392
a2l007 merged 31 commits intoapache:masterfrom
Shekharrajak:feature-gluecatalog

Conversation

@Shekharrajak
Copy link
Copy Markdown
Contributor

@Shekharrajak Shekharrajak commented Oct 22, 2024

Fixes #17352.

Description

Release note


Key changed/added classes in this PR
  • GlueIcebergCatalog

Note: Integraton testing needs a separate discussion / changes.

private Catalog setupGlueCatalog() {
catalog = new GlueCatalog();
catalogProperties.put(CatalogProperties.WAREHOUSE_LOCATION, warehousePath);
catalog.initialize(CATALOG_NAME, catalogProperties);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catalog properties must have these key value pairs

                "type" : "glue",
           	"catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog",
           	"io-impl": "org.apache.iceberg.aws.s3.S3FileIO",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warehouse path must be s3://bucket/path

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS related env variables must be available where druid cluster is running.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS related env variables must be available where druid cluster is running.

Could we add more information related to this in the docs specific to the glue catalog?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will do that. Recently figured out that there is simpler approach in iceberg API itself to choose the catalog. I am spending sometime to check if that would drastically make it modular & work for all available iceberg catalog support on the fly.

@Shekharrajak
Copy link
Copy Markdown
Contributor Author

While testing I find error:

Invalid value for the field [inputSource]. Reason: [Please make sure to load all the necessary extensions and jars with type 'iceberg' on 'druid/broker' service. Could not resolve type id 'iceberg' as a subtype of `org.apache.druid.data.input.InputSource` known type ids = [combining, hdfs, http, inline, local, nil, sql] at [Source: (String)"{"type":"iceberg","tableName": "

Please let me know if anyone have faced similar error message, it is related to not able to find IcebergInputSource from the iceberg extension as subtype for input source.

@a2l007
Copy link
Copy Markdown
Contributor

a2l007 commented Oct 23, 2024

@shekhar-rajak Thank you for working on this!
Please add the extension to the broker load list, which should fix the error described.

@Shekharrajak
Copy link
Copy Markdown
Contributor Author

Please add the extension to the broker load list, which should fix the error described.

Thanks! I found that there was already druid.extensions.loadList in common.runtime.properties file and it was overriding the below line that I added :

druid.extensions.loadList=["druid-iceberg-extensions"]

After adding into the existing list. I am able to run it.

@Shekharrajak
Copy link
Copy Markdown
Contributor Author

I reallise lib folder not copyting the jars from the druid-iceberg-extension/lib which is needed at runtime . When I copied those jar then GlueCatalog was detected and able to run load iceberg table

@Shekharrajak
Copy link
Copy Markdown
Contributor Author

Shekharrajak commented Oct 25, 2024

We need to have integration testing for glue catalog. That need a separate discussion and test pipeline.

<version>${iceberg.core.version}</version>
</dependency>
<!-- GlueCatalog class-->
<dependency>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread extensions-contrib/druid-iceberg-extensions/pom.xml
Comment thread extensions-contrib/druid-iceberg-extensions/pom.xml Outdated
@a2l007
Copy link
Copy Markdown
Contributor

a2l007 commented Oct 29, 2024

@shekhar-rajak Catalog changes look good to me.
Do you mind adding some docs in https://github.com/apache/druid/blob/master/docs/ingestion/input-sources.md as well? Also please review and fix the CI failures.

@Shekharrajak
Copy link
Copy Markdown
Contributor Author

Update the doc and PR as per the review comment.

Comment thread pom.xml Outdated
Copy link
Copy Markdown
Contributor

@a2l007 a2l007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks for the contribution @shekhar-rajak !

@a2l007 a2l007 merged commit ae049a4 into apache:master Nov 11, 2024
jtuglu1 pushed a commit to jtuglu1/druid that referenced this pull request Nov 20, 2024
* iceberg glue catalog dependencies added

* GlueIcebergCatalog added in druid module

* default version of iceberg glue catalog implementation - basics

* basic tests added

* removed dependecy iceberg-aws-bundle

* glue catalog support - docs update for iceberg

* Update IcebergDruidModule.java

* Update IcebergDruidModule.java

* updates in dependencies and warehousePath must be under catalogProp

* removed some dependencies - which not required

* only glue sdk added

* update license

* avro exclusion removed

* doc update

* doc update

* set the type to glue

* minor change

* minor change

* fixing codestyle

* checkstyle fixes

* checkstyle fixes

* checkstyle fixes

* dependency check fixes

* update pom for ignore warning for glue catalog

* compile scope needed - iceberg-aws and awssdk

* updates pom with comment

* minor change

* mvn dependency check in iceberg extension

* revert pom.xml changes

* aws sdk sts and s3 for gluecatalog initialize

* dependency check - ignore aws sdk s3 and sts

---------

Co-authored-by: SHEKHAR PRASAD RAJAK <shekhar_rajak@apple.com>
@adarshsanjeev adarshsanjeev added this to the 32.0.0 milestone Jan 16, 2025
@Shekharrajak Shekharrajak deleted the feature-gluecatalog branch March 10, 2025 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AWS Glue Catalog for Iceberg ingest extension

3 participants