feat(kafka): SASL/GSSAPI plugin + CLI + Docker E2E (0.15.0)#96
Draft
feat(kafka): SASL/GSSAPI plugin + CLI + Docker E2E (0.15.0)#96
Conversation
Adds a `gssapi` cargo feature on both `kafka-backup-core` and `kafka-backup-cli` (passthrough) with `default = []`, plus an optional `libgssapi = "0.9"` workspace dependency. No logic changes — subsequent commits build the `GssapiPlugin` impl behind this gate. Default builds are unchanged and do not pull `libgssapi`. The gssapi feature requires system krb5 development headers at build time: - Debian/Ubuntu: `apt-get install libkrb5-dev` - RHEL/Fedora: `dnf install krb5-devel` - macOS: `brew install krb5` (then `export PKG_CONFIG_PATH="$(brew --prefix krb5)/lib/pkgconfig:$PKG_CONFIG_PATH"`) Verified: - `cargo check -p kafka-backup-core` (default, no libgssapi) - `cargo check -p kafka-backup-core --features gssapi` (pulls libgssapi) - `cargo check -p kafka-backup-cli` (default) - `cargo check -p kafka-backup-cli --features gssapi` (passthrough works) Part of the GSSAPI plugin rework superseding PR #95 (authored by @kthimjo) on the `SaslMechanismPlugin` extension point. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the always-present `SaslMechanism::Gssapi` enum variant and three optional `SecurityConfig` fields backing it: - `sasl_kerberos_service_name` — Kafka service principal (defaults to `kafka` at the CLI layer) - `sasl_keytab_path` — keytab file path; OS credential cache is used if unset - `sasl_krb5_config_path` — path to `krb5.conf`; system default if unset All three are `#[serde(default)]` so existing configs keep parsing. YAML round-trip tested: `sasl_mechanism: GSSAPI` (SCREAMING-KEBAB-CASE) decodes to `SaslMechanism::Gssapi` and all three path fields populate. The variant is always compiled (so the YAML surface is consistent across binaries), but a working GSSAPI client requires the `gssapi` cargo feature at the CLI level. Core's `authenticate()` surfaces a clear error if `SaslMechanism::Gssapi` is set without a plugin — the CLI installs a `GssapiPlugin` via `populate_sasl_plugin` in a later commit. Part of the GSSAPI plugin rework superseding PR #95 by @kthimjo. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds `kafka_backup_core::kafka::GssapiPlugin` behind the `gssapi`
cargo feature. The plugin is a state machine around
`libgssapi::context::ClientCtx` that implements RFC 4752 §3.1:
Phase 1 — gss_init_sec_context rounds (Context → ContextInProgress)
Phase 1→2 transition — empty turnaround token (AwaitingLayerProposal)
Phase 2 — unwrap broker proposal, check 0x01 (no-security-layer) bit,
wrap reply `0x01 0x00 0x00 0x00 | authz_id` (AwaitingFinalAck)
Done — broker ack closes the handshake
Notable design decisions:
- Interior mutability via `Arc<tokio::sync::Mutex<State>>` to bridge
the trait's `&self` methods with `ClientCtx::step`'s `&mut`.
- Process-wide `KRB5_ENV_LOCK: tokio::sync::Mutex<()>` serialises
`KRB5_CLIENT_KTNAME` / `KRB5_CONFIG` env-var mutation around
`Cred::acquire`. libgssapi 0.9.1 does not expose a keytab-path
argument, so env vars are the only route; without this lock,
concurrent `KafkaClient`s would race. PR #95's unsynchronised
`set_var` is the underlying issue this fixes.
- `reauth_payload` resets state to Initial and re-acquires a fresh
credential + ClientCtx — Kerberos tickets expire and a stale
context cannot be reused.
- Keytab existence is checked upfront in `new()` so misconfig fails
fast at construction rather than mid-handshake.
Day-1 spike result: libgssapi 0.9.1 exposes `Cred::acquire` and
`Cred::acquire_with_password` only; no keytab-aware constructor. The
env-var mutex is the correct mitigation for OSS until upstream gains
a keytab argument.
Tests (7 unit tests, feature-gated): Phase 2 proposal parser
(rejects <4 bytes, rejects missing 0x01 bit, accepts 0x01/0x07),
Phase 2 reply wire format, keytab-missing construction error,
continue_payload-before-initial poison, mechanism name. The full
gss_init_sec_context / wrap / unwrap round-trip is exercised by the
Docker E2E added in a later commit.
Part of the GSSAPI plugin rework superseding PR #95 by @kthimjo.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds two new CLI helpers:
- commands/sasl_plugin.rs: populate_sasl_plugin[_opt] installs a
GssapiPlugin into SecurityConfig::sasl_mechanism_plugin when
sasl_mechanism: GSSAPI is set. Gated by #[cfg(feature = "gssapi")];
the default build surfaces an actionable rebuild error naming the
feature and the system krb5 dev headers for each major platform.
- commands/security_args.rs: #[derive(Args)] SecurityCliArgs with
--security-protocol, --sasl-mechanism, --sasl-keytab,
--sasl-krb5-config, --sasl-kerberos-service-name (plus env-var
fallbacks). into_security_config(bootstrap_servers) assembles a
SecurityConfig and runs populate_sasl_plugin.
Config-file entry points (backup, restore, three-phase, snapshot-groups)
call populate_sasl_plugin_opt immediately after serde_yaml::from_str so
YAML sasl_mechanism: GSSAPI is wired automatically.
Offset-reset-family commands (offset-reset execute, offset-reset-bulk,
offset-rollback rollback/verify/snapshot) now flatten SecurityCliArgs
in place of the lone --security-protocol flag and consume
into_security_config. Deletes the three triplicated parse_security_config
helpers.
Adds "env" feature to the workspace clap dep so #[arg(env = "…")]
compiles.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds tests/sasl-gssapi-test-infra/ — a three-container compose stack
that boots a self-contained MIT Kerberos realm (TEST.LOCAL), exports
service + client keytabs, and advertises a cp-kafka 7.7.0 broker on
SASL_PLAINTEXT://kafka.test.local:9098 with GSSAPI enabled. The KDC
runs in a self-hosted image (Dockerfile.kdc, ubuntu:22.04 + krb5-kdc)
rather than pulling an abandoned upstream. Keytab init is idempotent
and the compose healthcheck gates broker startup on the kafka keytab
existing on disk.
Rationale for compose-level decisions:
- hostname: kafka.test.local enforces that clients connect to the
FQDN that matches the service principal — the documented remedy
for KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN.
- Separate INTERNAL listener (PLAINTEXT://:9092) so inter-broker
traffic doesn't need its own GSSAPI identity.
- KAFKA_CONNECTIONS_MAX_REAUTH_MS=60000 keeps the reauth E2E test
under 90s while staying comfortably above the 30s floor clamp in
sasl/reauth.rs.
- Only aes256/aes128-cts enctypes enabled; DES is disabled in MIT
1.19 and would only cause salt-mismatch noise.
E2E tests (sasl_gssapi_tests.rs, #[cfg(feature = "gssapi")], #[ignore]):
- sasl_gssapi_keytab_e2e: full handshake + post-auth metadata RPC.
- sasl_gssapi_missing_keytab_surfaces_clear_error: construction
validates the keytab path before any GSS call.
- sasl_gssapi_reauth_fires_within_broker_window: holds a connection
open for 90s with 15s metadata probes; every probe must succeed,
proving KIP-368 reauth runs inside the broker's 60s window.
Registered in integration_suite/mod.rs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…(0.15.0) - Bump workspace version 0.14.0 -> 0.15.0 (per CI version-gate rule). - CHANGELOG: full 0.15.0 entry covering SaslMechanism::Gssapi, GssapiPlugin, KRB5_ENV_LOCK env-var serialisation, CLI flag surface, Docker fixture, and the E2E test trio. Calls out build requirements + V1 single-broker limit. - README: new "Optional: SASL/GSSAPI (Kerberos) support" subsection under Building from Source with the krb5 install commands and a pointer at tests/sasl-gssapi-test-infra/. - PRD: adds section 10.a describing the in-tree feature-gated plugin, crate surface, handshake mapping to the trait, env-var serialisation rationale, and V1 operational caveats. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Ports two small improvements from PR #95: - Build an OidSet containing GSS_MECH_KRB5 and pass it to Cred::acquire's desired-mechs parameter instead of None. Locks the mechanism to Kerberos 5 rather than relying on the libgssapi default; matches the convention in librdkafka + the Java Kafka client. - Pass Some(&GSS_MECH_KRB5) to ClientCtx::new for the same reason. Plus one observability improvement adapted from PR #95: parse_phase2_proposal now returns the observed layer mask and the caller emits it at DEBUG alongside the authz_id when Phase 2 wrap succeeds, so a field report can distinguish "broker offered 0x01" from "broker offered 0x07". The thread-safe KRB5_ENV_LOCK + plugin-trait refactor + upfront keytab validation + Docker E2E fixture + unit tests + CLI flag surface remain as they were — those stay ours. Co-authored-by: Krist Thimjo <krist.thimjo@intesasanpaolo.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Four fixes uncovered while running the Docker GSSAPI E2E tests locally for the first time on this machine: 1. `GssapiPlugin` now sets `KRB5CCNAME=MEMORY:<ptr>` whenever a keytab is configured, so the plugin never reads or writes the OS default credential cache. Without this, stale tickets from a prior `kinit` (common on macOS `API:<uuid>` caches that persist across logins) are preferred over a fresh TGT from the keytab. The broker then rejects the AP-REQ with "invalid credentials" because the ticket was issued by an older KDC instance whose service key has since rotated. The env-var write is already under `KRB5_ENV_LOCK`. 2. Fixture KDC publishes on host port 88 (was 48088). The shared `krb5.conf` references `kdc.test.local:88`; the previous mapping broke host-side clients that followed the file-as-written. Port 88 is the Kerberos default and is unbound on macOS / most Linux dev boxes by default. 3. Fixture broker config swaps JVM-level `-Djava.security.auth.login.config` for listener-scoped `KAFKA_LISTENER_NAME_SASL_GSSAPI_SASL_JAAS_CONFIG`. The JVM-level form makes cp-kafka's preflight ZK client try SASL against an unauthenticated ZooKeeper, which hangs `cub zk-ready`. Matches the OAUTH fixture's earlier fix (c4d7e59). 4. `init-kdc.sh` removes stale host-mounted keytabs before `ktadd`. `docker compose down -v` wipes the in-container KDC principal DB but not the `./keytabs` bind mount — without this cleanup a second `up` leaves the old keytab in place and nothing works. Also drops the unused `kafka_server_jaas.conf` (superseded by inline JAAS in the compose file) and extends README troubleshooting with the four failure-mode descriptions encountered during this session. Verified by: - 3/3 `sasl_gssapi_*` E2E tests pass, including the 90s reauth probe. - OAUTH E2E stays green (1/1). - CLI smoke test (`offset-rollback snapshot` with GSSAPI) completes the full handshake against the live broker, trace shows `GSSAPI Phase 2 complete server_layers=0x01`. - Full CI gate green: fmt, clippy (default + all-features), 206 unit tests (--all-features).
The new `gssapi` cargo feature (0.15.0) pulls `libgssapi-sys`, whose build.rs generates bindings against `gssapi.h` at compile time. GitHub's default Ubuntu runners ship only the runtime `libgssapi_krb5.so` and omit the development headers, so every job that runs `cargo <...> --all-features` now fails with `'gssapi.h' file not found`. Fix: install `libkrb5-dev` before cargo runs in the affected jobs: - test.yml: check, unit-tests, integration-tests, chaos-tests - semver-check.yml: Detect Breaking Changes - release.yml: Pre-release Tests Release binaries continue to exclude `gssapi` (cargo-dist uses default features), so `build-local-artifacts`, `publish-crates-io`, and the Docker publish path don't need krb5 headers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds SASL/GSSAPI (Kerberos) authentication via the
SaslMechanismPlugintrait from #94. Default builds stay Kerberos-free; opt in with
cargo build --features gssapi -p kafka-backup-cli.Credits @kthimjo (#95): state machine, credential hints, and
GSS_MECH_KRB5wiring are adapted from their PR. The plugin-traitrefactor, process-wide
KRB5_ENV_LOCK,KRB5CCNAMEisolation, upfrontkeytab validation, unit tests, and the Docker KDC fixture are ours.
What's in the box
SaslMechanism::Gssapienum variant +SecurityConfigfields(
sasl_kerberos_service_name,sasl_keytab_path,sasl_krb5_config_path).GssapiPlugin:gss_init_sec_context.layer=0x01(no sec layer, no size)wrap/unwrap.
KRB5_ENV_LOCKmutex serialisesKRB5_CLIENT_KTNAME/KRB5_CONFIG/KRB5CCNAMEmutation duringCred::acquire—eliminates the multi-client env-var race inherent to
libgssapi 0.9.KRB5CCNAME=MEMORY:<ptr>per-plugin ccache isolation when a keytabis configured — prevents stale tickets in the OS default ccache
(common on macOS
API:<uuid>caches) from being preferred over afresh TGT from the keytab.
--sasl-mechanism,--sasl-keytab,--sasl-krb5-config,--sasl-kerberos-service-nameonoffset-reset,offset-reset-bulk,and
offset-rollback. YAML configs auto-wire the GSSAPI plugin whensasl_mechanism: GSSAPIis set. Helpful runtime error if the CLI wasbuilt without
--features gssapi.commands/security_args.rs)consolidates three prior copies.
tests/sasl-gssapi-test-infra/(MIT KDC +cp-kafka 7.7.0 configured for
SASL_PLAINTEXT://kafka.test.local:9098with
GSSAPIenabled, realmTEST.LOCAL, keytab auto-gen healthcheck).#[ignore]E2E tests (keytab happy-path, missing-keytab clearerror, KIP-368 reauth fires within broker's 60s window).
Local E2E evidence (macOS aarch64, MIT krb5 1.22.2)
cargo fmt --all -- --checkcargo clippy --all-targets -- -D warningscargo clippy --all-targets --all-features -- -D warningscargo test --workspace --lib --bins --all-featuressasl_oauth_*ignored)sasl_gssapi_keytab_e2esasl_gssapi_missing_keytab_surfaces_clear_errorsasl_gssapi_reauth_fires_within_broker_windowoffset-rollback snapshotagainst live fixtureGSSAPI Phase 2 complete server_layers=0x01CI skips
sasl_*integration tests by default because the OAUTH / GSSAPIcompose fixtures aren't brought up in the workflow (same pattern as the
pre-existing
--skip tls). See.github/workflows/test.ymland thefixture READMEs for manual runs.
Build requirements
The
gssapifeature links against MIT krb5 at build time:brew install krb5+ exportPKG_CONFIG_PATH="$(brew --prefix krb5)/lib/pkgconfig:…".Apple's bundled Heimdal does not expose the symbols
libgssapi 0.9requires.apt-get install libkrb5-dev.dnf install krb5-devel.Limitations (V1)
Single-broker Kerberos only. Routing a dedicated plugin instance per
broker for multi-broker clusters is a known follow-up — the
PartitionLeaderRouterpath currently clones the configured pluginArcfor every broker connection. Release binaries and the defaultDocker image do not include GSSAPI; build your own image with
--build-arg FEATURES=gssapionce the downstream image ships that arg.Test plan
cargo fmt --check/clippy -D warnings(default + all-features)cargo test --workspace --lib --bins --all-featuresgreen🤖 Generated with Claude Code