-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Apache Iceberg version
1.7.2
Query engine
Spark
Please describe the bug 🐞
When reading a large Iceberg table from S3 using S3FileIO with S3 Access Grants enabled, Spark jobs intermittently fail with a NullPointerException inside the AWS SDK v2 AttributeMap$Builder.resolveValue, called from S3AccessGrantsIdentityProvider.resolveIdentity.
This only appears under high concurrency / large datasets (e.g., spark.read.table(...).count() over many files). Smaller tables or lower parallelism may run successfully, but increasing parallelism makes the failure reproducible.
The error message from the AWS SDK is:
Encountered a null value when resolving configuration attributes. This is commonly caused by concurrent modifications to non-thread-safe types. Ensure you're synchronizing access to all non-thread-safe types.
From the Iceberg side we are using S3FileIO with S3 Access Grants configured according to the docs, and the S3 client is built via S3Client.builder() with S3FileIOProperties.applyS3AccessGrantsConfigurations(...) (or equivalent).
java.lang.NullPointerException: Cannot invoke "software.amazon.awssdk.utils.AttributeMap$Value.get(software.amazon.awssdk.utils.AttributeMap$LazyValueSource)" because "value" is null
at software.amazon.awssdk.utils.AttributeMap$Builder.resolveValue(AttributeMap.java:396)
at software.amazon.awssdk.utils.AttributeMap$Builder.buildResolvedMap(AttributeMap.java:371)
at software.amazon.awssdk.utils.AttributeMap$Builder.build(AttributeMap.java:358)
...
at software.amazon.awssdk.s3accessgrants.plugin.S3AccessGrantsIdentityProvider.resolveIdentity(S3AccessGrantsIdentityProvider.java:...)
...
at software.amazon.awssdk.services.s3.S3Client.getObject(S3Client.java:...)
...
at org.apache.iceberg.io.ResolvingFileIO.newInputFile(ResolvingFileIO.java:...)
at org.apache.iceberg.io.FileIO.newInputFile(FileIO.java:...)
...
at org.apache.iceberg.spark.source.BaseDataReader.next(BaseDataReader.java:...)
at org.apache.iceberg.spark.source.SparkBatchScan$$anon$1.next(SparkBatchScan.scala:...)
...
We have already tried these below combos where still the NPE issue persist
Iceberg versions
1.7.2 and upgraded to 1.10.0 → NPE persists in both.
AWS SDK v2 versions
Tried 2.24.6, 2.30.31, 2.32.1→ NPE persists across all.
S3 Access Grants plugin versions
Tried 2.0.2 and 2.3.0 → NPE persists across both.
Spark / JDK combinations
Spark 3.5.6 with JDK17 and Spark 4.0.1 (JDK21 inside image) → same NPE in both.
Parallelism tuning - Reduced spark.sql.shuffle.partitions / spark.default.parallelism → can change frequency but does not reliably remove the NPE on large tables.
Could you please help me to understand the issue:
1. Known issue?
Are you aware of any known concurrency problems between Iceberg’s S3FileIO S3 Access Grants integration and AWS SDK v2 / aws-s3-accessgrants-java-plugin that could cause AttributeMap$Builder.resolveValue to throw an NPE under high Spark parallelism?
2. Recommended version matrix?
Is there a recommended or validated combination of:
Iceberg version
AWS SDK v2 version
aws-s3-accessgrants-java-plugin version
for running S3 Access Grants with S3FileIO in a high‑concurrency Spark environment?
3. Client factory / configuration guidance?
From Iceberg’s side, is there any specific guidance on how the S3 client factory should be implemented (or additional S3FileIO / S3AG configuration) to avoid shared, non‑thread‑safe state that might trigger this NPE?
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time