Build: Add runtime dependency guard for bundled artifacts#15855
Build: Add runtime dependency guard for bundled artifacts#15855rdblue merged 7 commits intoapache:mainfrom
Conversation
9a54beb to
c786701
Compare
Adds a build-time check that prevents accidental transitive dependency leaks into shipped shadow JARs and distribution archives. A checked-in runtime-deps.txt baseline lists every dependency resolved into each bundled artifact. checkRuntimeDeps compares resolved deps against the baseline and fails the build with a clear diff on mismatch, wired into the check lifecycle so it runs in CI automatically. This guards all 11 bundled modules: Spark runtime (3.4, 3.5, 4.0, 4.1), Flink runtime (1.20, 2.0, 2.1), cloud bundles (AWS, Azure, GCP), and Kafka Connect runtime.
c786701 to
ba21888
Compare
Internal Iceberg module dependencies don't affect LICENSE/NOTICE compliance or shadow JAR size concerns — only third-party dependency changes matter. Filtering them out avoids false positives when modules are added or reorganized.
… guard Report version bumps separately from truly new or removed dependencies so reviewers can quickly distinguish routine upgrades from dependency surface changes that require LICENSE/NOTICE updates.
|
I currently have this erroring out and requiring a manual fix for version changes. I know this is going to be a bit of an issue for rennovate or auto version bumpers. |
These are plain data files containing only dependency coordinates, so a license header is not practical.
| enabled = false | ||
| } | ||
|
|
||
| apply from: "${rootDir}/gradle/runtime-deps.gradle" |
There was a problem hiding this comment.
nit:
Is it possible for us to have a Baseline Gradle that all of the others can inherit from? When we create a new project, we're bound to accidentally forget to grab something
There was a problem hiding this comment.
This was actually in @rdblue 's impl, he enabled this generically. That may be safer but will cover a lot more things. I didn't want to run in every config, do you know how we could cover just jobs that produce a fat-artifact?
There was a problem hiding this comment.
I think we probably should keep this manual for now. We shouldn't be adding modules all that often and when we do it's usually copy pasted from the old file so I think we are mostly safe.
Then for new things like docker images or kafka connect bundles we'll have to manually check those as well but that should have a lot of scrutiny anyway.
Add a checkAllRuntimeDeps aggregation task in build.gradle that collects checkRuntimeDeps from all subprojects, and a dedicated check-runtime-deps CI job in java-ci.yml that runs it on every PR. Incorporates ideas from rdblue's apache#15857 (top-level aggregation task and dedicated CI job) into the runtime dependency guard approach.
…move baselines Move runtime-deps.gradle from gradle/ to the project root alongside other custom scripts (tasks.gradle, deploy.gradle, baseline.gradle). Change the missing baseline check from a hard failure to a warning so that the guard can be wired up before baselines are generated. Remove all runtime-deps.txt baseline files — these will be regenerated in a follow-up once the infrastructure is finalized.
|
Thanks!! I really like that the dependencies and transitive dependencies are laid out, now its super easy to diff. I dont think we necessary need to do this diff for every PR. It would be helpful to diff in between releases. For example, 1.10.0 against 1.10.1 or 1.10.1 against 1.11.0 |
Add -q flag to the Gradle invocation so that only build failures (the dependency mismatch diff) are printed. Compilation warnings from upstream project dependencies are not relevant to this check.
|
Doing it on a per-PR basis would let us catch issues before they're merged in. Doing it once per release would potentially cause a lot of churn right before the release I absolutely love the addition of this in any manner! |
|
@kevinjqliu I don't think it costs us that much to do, it should cache for anything that isn't changed so 99% of the time it will do nothing. For Github Actions, we probably need to do something like a remote build cache |
|
@kevinjqliu Claude says we already are using build cache for our actions so ... it should be pretty free |
|
I approved this. Let's merge when CI passes! |
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM.
I tried this locally using changes from #15655, was able to see the diff when bigquery deps are pulled in 😄
Also, there are 15 build.gradle files in this repo right now,
12 are touched by this PR, 3 are not:
mr/build.gradleflink/build.gradlespark/build.gradle
flink/spark top level gradle just points to their respective versioned folders. But maybe we want to add it to mr/build.gradle
| distribution: zulu | ||
| java-version: 17 | ||
| - uses: gradle/actions/setup-gradle@0723195856401067f7a2779048b490ace7a47d7c # v5 # zizmor: ignore[cache-poisoning] -- cache writes are restricted to the default branch by setup-gradle | ||
| - run: ./gradlew checkAllRuntimeDeps -q |
There was a problem hiding this comment.
this is no-op right now, right?
|
heres the generated https://github.com/kevinjqliu/iceberg-deps/tree/main i also generated for previous releases as git branches of that repo |
Adds a build-time check that prevents accidental transitive dependency leaks into shipped shadow JARs and distribution archives. A checked-in runtime-deps.txt baseline lists every dependency resolved into each bundled artifact. checkRuntimeDeps compares resolved deps against the baseline and fails the build with a clear diff on mismatch, wired into the check lifecycle so it runs in CI automatically. This guards all 11 bundled modules: Spark runtime (3.4, 3.5, 4.0, 4.1), Flink runtime (1.20, 2.0, 2.1), cloud bundles (AWS, Azure, GCP), and Kafka Connect runtime.
Motivated by #15655 to protect us against License and Notice going out of sync
Adds a build-time check that prevents accidental transitive dependency
leaks into shipped shadow JARs and distribution archives. A checked-in
runtime-deps.txt baseline lists every dependency resolved into each
bundled artifact.
checkRuntimeDeps compares resolved deps against the
baseline and fails the build with a clear diff on mismatch, wired into
the check lifecycle so it runs in CI automatically.
This guards all 11 bundled modules: Spark runtime (3.4, 3.5, 4.0, 4.1),
Flink runtime (1.20, 2.0, 2.1), cloud bundles (AWS, Azure, GCP), and
Kafka Connect runtime.