Skip to content

Build: Add runtime dependency guard for bundled artifacts#15855

Merged
rdblue merged 7 commits intoapache:mainfrom
RussellSpitzer:runtime-deps-guard
Apr 1, 2026
Merged

Build: Add runtime dependency guard for bundled artifacts#15855
rdblue merged 7 commits intoapache:mainfrom
RussellSpitzer:runtime-deps-guard

Conversation

@RussellSpitzer
Copy link
Copy Markdown
Member

@RussellSpitzer RussellSpitzer commented Apr 1, 2026

Motivated by #15655 to protect us against License and Notice going out of sync

Adds a build-time check that prevents accidental transitive dependency
leaks into shipped shadow JARs and distribution archives. A checked-in
runtime-deps.txt baseline lists every dependency resolved into each
bundled artifact.

checkRuntimeDeps compares resolved deps against the
baseline and fails the build with a clear diff on mismatch, wired into
the check lifecycle so it runs in CI automatically.

This guards all 11 bundled modules: Spark runtime (3.4, 3.5, 4.0, 4.1),
Flink runtime (1.20, 2.0, 2.1), cloud bundles (AWS, Azure, GCP), and
Kafka Connect runtime.

Adds a build-time check that prevents accidental transitive dependency
leaks into shipped shadow JARs and distribution archives. A checked-in
runtime-deps.txt baseline lists every dependency resolved into each
bundled artifact. checkRuntimeDeps compares resolved deps against the
baseline and fails the build with a clear diff on mismatch, wired into
the check lifecycle so it runs in CI automatically.

This guards all 11 bundled modules: Spark runtime (3.4, 3.5, 4.0, 4.1),
Flink runtime (1.20, 2.0, 2.1), cloud bundles (AWS, Azure, GCP), and
Kafka Connect runtime.
Internal Iceberg module dependencies don't affect LICENSE/NOTICE
compliance or shadow JAR size concerns — only third-party dependency
changes matter. Filtering them out avoids false positives when modules
are added or reorganized.
… guard

Report version bumps separately from truly new or removed dependencies
so reviewers can quickly distinguish routine upgrades from dependency
surface changes that require LICENSE/NOTICE updates.
@RussellSpitzer
Copy link
Copy Markdown
Member Author

I currently have this erroring out and requiring a manual fix for version changes. I know this is going to be a bit of an issue for rennovate or auto version bumpers.

These are plain data files containing only dependency coordinates, so
a license header is not practical.
@github-actions github-actions Bot added the INFRA label Apr 1, 2026
Comment thread flink/v1.20/build.gradle Outdated
enabled = false
}

apply from: "${rootDir}/gradle/runtime-deps.gradle"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Is it possible for us to have a Baseline Gradle that all of the others can inherit from? When we create a new project, we're bound to accidentally forget to grab something

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually in @rdblue 's impl, he enabled this generically. That may be safer but will cover a lot more things. I didn't want to run in every config, do you know how we could cover just jobs that produce a fat-artifact?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably should keep this manual for now. We shouldn't be adding modules all that often and when we do it's usually copy pasted from the old file so I think we are mostly safe.

Then for new things like docker images or kafka connect bundles we'll have to manually check those as well but that should have a lot of scrutiny anyway.

Add a checkAllRuntimeDeps aggregation task in build.gradle that
collects checkRuntimeDeps from all subprojects, and a dedicated
check-runtime-deps CI job in java-ci.yml that runs it on every PR.

Incorporates ideas from rdblue's apache#15857 (top-level aggregation task
and dedicated CI job) into the runtime dependency guard approach.
…move baselines

Move runtime-deps.gradle from gradle/ to the project root alongside
other custom scripts (tasks.gradle, deploy.gradle, baseline.gradle).

Change the missing baseline check from a hard failure to a warning so
that the guard can be wired up before baselines are generated.

Remove all runtime-deps.txt baseline files — these will be regenerated
in a follow-up once the infrastructure is finalized.
@kevinjqliu
Copy link
Copy Markdown
Contributor

Thanks!! I really like that the dependencies and transitive dependencies are laid out, now its super easy to diff.

I dont think we necessary need to do this diff for every PR. It would be helpful to diff in between releases. For example, 1.10.0 against 1.10.1 or 1.10.1 against 1.11.0
I think a diff would have helped us catch the original iceberg-bigquery issue

Add -q flag to the Gradle invocation so that only build failures
(the dependency mismatch diff) are printed. Compilation warnings
from upstream project dependencies are not relevant to this check.
@rambleraptor
Copy link
Copy Markdown
Contributor

rambleraptor commented Apr 1, 2026

Doing it on a per-PR basis would let us catch issues before they're merged in. Doing it once per release would potentially cause a lot of churn right before the release

I absolutely love the addition of this in any manner!

@RussellSpitzer
Copy link
Copy Markdown
Member Author

@kevinjqliu I don't think it costs us that much to do, it should cache for anything that isn't changed so 99% of the time it will do nothing. For Github Actions, we probably need to do something like a remote build cache

@RussellSpitzer
Copy link
Copy Markdown
Member Author

@kevinjqliu Claude says we already are using build cache for our actions so ... it should be pretty free

@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented Apr 1, 2026

I approved this. Let's merge when CI passes!

Copy link
Copy Markdown
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I tried this locally using changes from #15655, was able to see the diff when bigquery deps are pulled in 😄

Also, there are 15 build.gradle files in this repo right now,
12 are touched by this PR, 3 are not:

  • mr/build.gradle
  • flink/build.gradle
  • spark/build.gradle

flink/spark top level gradle just points to their respective versioned folders. But maybe we want to add it to mr/build.gradle

distribution: zulu
java-version: 17
- uses: gradle/actions/setup-gradle@0723195856401067f7a2779048b490ace7a47d7c # v5 # zizmor: ignore[cache-poisoning] -- cache writes are restricted to the default branch by setup-gradle
- run: ./gradlew checkAllRuntimeDeps -q
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no-op right now, right?

@rdblue rdblue merged commit 245637a into apache:main Apr 1, 2026
38 checks passed
@kevinjqliu
Copy link
Copy Markdown
Contributor

heres the generated runtime-deps.txt files for all modules using a recent apache/iceberg commit (88d5538)

https://github.com/kevinjqliu/iceberg-deps/tree/main

i also generated for previous releases as git branches of that repo

kevinjqliu pushed a commit to kevinjqliu/iceberg that referenced this pull request Apr 16, 2026
Adds a build-time check that prevents accidental transitive dependency
leaks into shipped shadow JARs and distribution archives. A checked-in
runtime-deps.txt baseline lists every dependency resolved into each
bundled artifact. checkRuntimeDeps compares resolved deps against the
baseline and fails the build with a clear diff on mismatch, wired into
the check lifecycle so it runs in CI automatically.

This guards all 11 bundled modules: Spark runtime (3.4, 3.5, 4.0, 4.1),
Flink runtime (1.20, 2.0, 2.1), cloud bundles (AWS, Azure, GCP), and
Kafka Connect runtime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants