CI: Add runtime-deps.txt files for all runtimes and bundles#16081
CI: Add runtime-deps.txt files for all runtimes and bundles#16081kevinjqliu wants to merge 1 commit intoapache:mainfrom
Conversation
|
no |
|
Should we be publishing the iceberg-open-api module? It is for testing so it would make sense to me if we didn't publish one. I also didn't know about that runtime module and we will need to evaluate its LICENSE and NOTICE files before including it in any more releases. |
runtime-deps.txt files| org.apache.httpcomponents:httpcore:4.4.16 | ||
| org.apache.logging.log4j:log4j-api:2.20.0 | ||
| org.apache.logging.log4j:log4j-core:2.20.0 | ||
| org.apache.logging.log4j:log4j-slf4j-impl:2.20.0 |
There was a problem hiding this comment.
Log4J is not included in the LICENSE file.
| software.amazon.awssdk:utils:2.42.33 | ||
| software.amazon.eventstream:eventstream:1.0.1 | ||
| software.amazon.s3.accessgrants:aws-s3-accessgrants-java-plugin:2.4.1 | ||
| software.amazon.s3.analyticsaccelerator:analyticsaccelerator-s3:1.3.1 |
There was a problem hiding this comment.
I think that these are fine. I didn't check every one against the latest update (fb2c8ac3faf) because they are now grouped into high level SDK modules, but they are all ALv2 and should be okay.
| @@ -0,0 +1,70 @@ | |||
| com.github.ben-manes.caffeine:caffeine:2.9.3 | |||
There was a problem hiding this comment.
Note to other reviewers: The aws-bundle/LICENSE file and azure-bundle/LICENSE file both include "JCTools (via Netty)". This is correct: Netty shades org.jctools under io/netty/util/shaded, which is then shaded in org/apache/iceberg/aws/shaded.
| org.apache.avro:avro:1.12.1 | ||
| org.apache.datasketches:datasketches-java:6.2.0 | ||
| org.apache.datasketches:datasketches-memory:3.0.2 | ||
| org.apache.flink:flink-metrics-dropwizard:2.1.0 |
There was a problem hiding this comment.
This is not listed in the LICENSE file.
I also wonder if this should be excluded and not added to LICENSE because it seems like something that should be included in the Flink runtime. I suspect that we need to add it as a compileOnly dependency or suppress it in the runtime config.
| @@ -0,0 +1,33 @@ | |||
| com.fasterxml.jackson.core:jackson-annotations:2.21 | |||
There was a problem hiding this comment.
I'm not reviewing older versions of Flink yet. I think we should determine what needs to change for the current version and then verify the same changes on the older ones.
| com.github.ben-manes.caffeine:caffeine:2.9.3 | ||
| com.github.luben:zstd-jni:1.5.7-3 | ||
| com.google.errorprone:error_prone_annotations:2.10.0 | ||
| dev.failsafe:failsafe:3.3.2 |
There was a problem hiding this comment.
This is leaked by iceberg-aws and should not be bundled.
There was a problem hiding this comment.
This is used directly by S3InputStream, which means it needs to be included when iceberg-aws is included becuase it is not provided by the AWS dependencies. I don't think this is a good reason to keep using it and that we should replace it with Tasks, unless it is doing something special.
Since this is in the license docs, I think this isn't a blocker for 1.11.0 or 1.10.2, but we should remove it to keep dependencies to a minimum.
| @@ -0,0 +1,33 @@ | |||
| com.fasterxml.jackson.core:jackson-annotations:2.21 | |||
There was a problem hiding this comment.
The LICENSE file stats that this contains many missing libraries:
- Apache Commons [not in the Jar]
- OpenTelemetry [not in the Jar]
- Netty [not in the Jar]
- Apache Arrow [not in the Jar]
- javax.annotation-api [not in the Jar]
- Apache Apache Thrift [shaded by Parquet]
- Fastutil [shaded by Parquet]
- Apache Hive [artifact from ORC?]
- Alibaba Cloud Credentials [fixed by Aliyun: Remove leaked transitive dependencies #15858]
- Alibaba Cloud Tea [fixed by Aliyun: Remove leaked transitive dependencies #15858]
- Google GAX [fixed by Change bigquery to compileonly #15655]
- Google Auth Library [fixed by Change bigquery to compileonly #15655]
- Google Cloud BigQuery client [fixed by Change bigquery to compileonly #15655]
- Google Gson [fixed by Aliyun: Remove leaked transitive dependencies #15858]
- Google Protobuf [fixed by Change bigquery to compileonly #15655]
- Google API Common [fixed by Change bigquery to compileonly #15655]
- Google Http Client [fixed by Change bigquery to compileonly #15655]
- Google Auto Valve [fixed by Change bigquery to compileonly #15655]
- Google flatbuffers [fixed by Change bigquery to compileonly #15655]
- Kotlin standard library [fixed by Aliyun: Remove leaked transitive dependencies #15858]
- gRPC [fixed by Aliyun: Remove leaked transitive dependencies #15858]
- Google APIs [fixed by Change bigquery to compileonly #15655]
- Google Cloud APIs [fixed by Change bigquery to compileonly #15655]
- JSpecify [fixed by Change bigquery to compileonly #15655]
- Animal Sniffer Annotations [fixed by Change bigquery to compileonly #15655]
- Android Annotations [fixed by Change bigquery to compileonly #15655]
- Conscrypt [fixed by Change bigquery to compileonly #15655]
- Perfmark [fixed by Change bigquery to compileonly #15655]
- org.json. [fixed by Change bigquery to compileonly #15655]
- OpenSensus [fixed by Change bigquery to compileonly #15655]
- JaCoCo [fixed by Aliyun: Remove leaked transitive dependencies #15858]
- JAXB [fixed by Aliyun: Remove leaked transitive dependencies #15858]
- Okio [fixed by Aliyun: Remove leaked transitive dependencies #15858]
- OkHttp [fixed by Aliyun: Remove leaked transitive dependencies #15858]
- ThreeTen BP [fixed by Change bigquery to compileonly #15655]
The reason for most of the extras is that there were no LICENSE updates after a few recent PRs:
- Aliyun: Remove leaked transitive dependencies #15858 removed leaked transitive dependencies from Aliyun
- Change bigquery to compileonly #15655 removed leaked transitive dependencies from GCP
- Core: Replace Failsafe with Tasks #15613 removed failsafe, but iceberg-aws still leaks it
Also, LICENSE contains Google Guava, which is present because this shades iceberg-bundled-guava. But shading in that module means we don't have it listed here (FYI).
Action items:
- Find out why some libraries were there but are no longer:
- Arrow, Netty, Apache Commons, OpenTelemetry, javax.annotations
- Fix the Hive entry in LICENSE. Before chore: several fixes on the LICENSE/NOTICE #15449 it was clear that this was shaded by ORC. Now the only Hive reference I see is META-INF files so I think this is probably incorrect.
- Remove all of the fixed dependency leaks from LICENSE and NOTICE
There was a problem hiding this comment.
Apache Commons is from this commit: 760a20b
#2102 copied array methods into ArrayUtil. This isn't a big problem, but it doesn't seem worth the hassle of tracking it down in LICENSE to have array copy methods. The implementations don't match project style or provide value. A good first issue is to remove them.
| com.google.cloud:google-cloud-kms:2.91.0 | ||
| com.google.cloud:google-cloud-monitoring:3.89.0 | ||
| com.google.cloud:google-cloud-storage:2.64.1 | ||
| com.google.code.findbugs:jsr305:3.0.2 |
There was a problem hiding this comment.
Findbugs is excluded throughout the codebase because it was originally LGPL and cannot be bundled. The license issues weren't clarified, and a clean implementation was created: https://github.com/stephenc/findbugs-annotations
Although the maven metadata reports ALv2, we need to exclude it. If we need the annotations (which are not required to function), then we should use the stephenc verison.
| org.jspecify:jspecify:1.0.0 | ||
| org.slf4j:slf4j-api:2.0.17 | ||
| org.threeten:threeten-extra:1.8.0 | ||
| org.threeten:threetenbp:1.7.0 |
There was a problem hiding this comment.
The rest of these appear to be real dependencies from GCP and correctly included in the LICENSE file.
|
@rdblue This is the PR to fix open-api LICENSE and NOTICE issues. Please take a look and we can discuss whether the module needs to be published there. |
Context: #16080
Add all
runtime-deps.txtso that CI will start enforcing dependency changes.I checked out
main, git pull and ranCommitted the resulting files in this PR.
Note that
spark/v4.1/spark-runtime/runtime-deps.txtwas added in #16080