Skip to content

Added SASL/GSSAPI (Kerberos) authentication support#95

Open
kthimjo wants to merge 1 commit intoosodevops:mainfrom
kthimjo:feature/gssapi
Open

Added SASL/GSSAPI (Kerberos) authentication support#95
kthimjo wants to merge 1 commit intoosodevops:mainfrom
kthimjo:feature/gssapi

Conversation

@kthimjo
Copy link
Copy Markdown

@kthimjo kthimjo commented Apr 21, 2026

Summary

This PR adds GSSAPI/Kerberos authentication to kafka-backup-core, enabling
backup and restore operations against Kafka clusters secured with Kerberos, a
common requirement in enterprise environments. The implementation follows the
same always-compiled, configuration-driven pattern as the existing SCRAM and
PLAIN authentication methods: if sasl_mechanism is not set to GSSAPI, the
new code paths are never reached and behaviour is completely unchanged.


Motivation

Many organisations use Kerberos to secure their Kafka infrastructure.
kafka-backup previously supported PLAIN and SCRAM-SHA-256/512 but had no
path for Kerberos users, making it unusable in those environments without
workarounds. This change closes that gap.


Changes

New file: crates/kafka-backup-core/src/kafka/gssapi.rs

Implements the GssapiClient struct, which drives the two-phase Kafka
SASL/GSSAPI wire protocol (RFC 4752 §3.1):

  • Phase 1 — GSS context establishment via one or more SaslAuthenticate
    round-trips until gss_init_sec_context signals completion.
  • Phase 2 — Security-layer negotiation; always selects layer 0x01
    (no per-message wrapping), which is the only layer Kafka requires.

Uses libgssapi 0.9.1 — a safe Rust
binding over the OS GSSAPI library (libgssapi_krb5 on Linux, Heimdal on
macOS/BSD). No pure-Rust Kerberos implementation is used; all KDC
communication, ticket management, and keytab handling are delegated to the OS
library, which is the standard approach and is how the Java, Python, and
C Kafka clients work).

Modified: crates/kafka-backup-core/src/kafka/client.rs

  • Added hostname: String field to BrokerConnection (extracted from the
    broker address at connect time) so the GSSAPI client can build the correct
    GSS target principal without re-parsing the address string.
  • Added Gssapi arms to authenticate() and authenticate_raw(), passing the
    four GSSAPI parameters (service name, broker host, keytab path, krb5 config
    path) to the new GssapiClient.
  • Added sasl_gssapi_auth() (uses send_request, for initial connections) and
    sasl_gssapi_auth_raw() (uses send_raw_request, for reconnects), mirroring
    the existing SCRAM method split that prevents async recursion.

Modified: crates/kafka-backup-core/src/config.rs

Added Gssapi variant to SaslMechanism and three new optional fields to
SecurityConfig:

Field Type Default Description
sasl_kerberos_service_name Option<String> None"kafka" Must match broker's sasl.kerberos.service.name
sasl_keytab_path Option<PathBuf> None Path to keytab; sets KRB5_CLIENT_KTNAME
sasl_krb5_config_path Option<PathBuf> None Path to krb5.conf; sets KRB5_CONFIG

All three fields are optional. When None, the OS Kerberos library uses its
own resolution order ($KRB5_CLIENT_KTNAME / $KRB5_CONFIG env vars, then
/etc/krb5.conf, then the system credential cache). Existing configurations
that do not set these fields are completely unaffected.

Modified: crates/kafka-backup-core/src/kafka/mod.rs

Added mod gssapi;.

Modified: crates/kafka-backup-cli/src/commands/offset_reset.rs, offset_reset_bulk.rs, offset_rollback.rs

These files construct SecurityConfig struct literals by field name. The three
new fields were added explicitly as None to fix compilation. Full GSSAPI
support for the offset-reset commands (reading keytab/krb5 config from the
YAML or CLI flags) is left for a possible future PR (see Known limitations).

Modified: crates/kafka-backup-core/Cargo.toml

Added libgssapi = "0.9" as a regular (non-optional) dependency, consistent
with how ring is included regardless of which auth mechanism is configured.


Configuration

Below is a complete example for a SASL_SSL + GSSAPI setup. All three
GSSAPI-specific fields are optional.

source:                               # or `target:` for restore
  bootstrap_servers:
    - broker1.corp.com:6668
    - broker2.corp.com:6668
  security:
    security_protocol: SASL_SSL
    sasl_mechanism: GSSAPI

    # Must match sasl.kerberos.service.name on the broker (default: "kafka")
    sasl_kerberos_service_name: kafka

    # Path to a keytab file for unattended / service-account use.
    # Remove or comment out to use the OS credential cache (kinit).
    sasl_keytab_path: /etc/security/keytabs/kafka-backup.keytab

    # Path to a custom krb5.conf. Remove or comment out to use /etc/krb5.conf.
    sasl_krb5_config_path: /etc/kafka/krb5.conf

    # TLS CA certificate (required when security_protocol: SASL_SSL)
    ssl_ca_location: /var/ssl/private/ca.crt

Minimum viable config (relying on kinit/OS cache and system krb5.conf):

  security:
    security_protocol: SASL_SSL
    sasl_mechanism: GSSAPI
    ssl_ca_location: /var/ssl/private/ca.crt

Build requirements

The system GSSAPI development library must be present on the build host.
The runtime host needs only the runtime library (usually installed by default).

Distribution Build Runtime
RHEL / CentOS / Rocky sudo dnf install krb5-devel krb5-libs (default)
Debian / Ubuntu sudo apt install libkrb5-dev libkrb5-3 (default)
macOS (Homebrew) brew install krb5 + set PKG_CONFIG_PATH bundled

For CI (GitHub Actions Linux runner), add:

- name: Install Kerberos development libraries
  run: sudo apt-get install -y libkrb5-dev

Testing

Tested on Red Hat Enterprise Linux 9.7 (bare metal VM) against a
multi-broker Kafka cluster with SASL_SSL + GSSAPI:

Scenario Result
kinit credential cache, no keytab config
Keytab file via sasl_keytab_path, custom krb5.conf
Keytab file via sasl_keytab_path, OS default krb5.conf
Valid keytab, user not authorised on Kafka ✅ 0 topics backupped
Invalid / unreadable keytab, no ticket in cache ✅ clear error
Valid keytab, invalid krb5.conf path ✅ clear error
Backup operation end-to-end
Restore operation end-to-end
cargo test passed

Testing in Docker / Kubernetes environments would be appreciated. A
docker-compose setup with a Kerberized Kafka and a KDC container would be a
valuable addition to the kafka-backup-demos repository.


Example log output

Successful authentication with keytab:

2026-04-20T21:36:23.802996Z DEBUG kafka_backup_core::kafka::gssapi: GSSAPI: KRB5_CONFIG=/etc/kafka/krb5.conf
2026-04-20T21:36:23.803055Z DEBUG kafka_backup_core::kafka::gssapi: GSSAPI: KRB5_CLIENT_KTNAME=/etc/security/keytabs/kafka-backup.keytab
2026-04-20T21:36:23.813211Z DEBUG kafka_backup_core::kafka::gssapi: GSSAPI: Phase 2 complete (server_layers=0x01, authz_id="")
2026-04-20T21:36:23.813724Z DEBUG kafka_backup_core::kafka::client: SASL GSSAPI authentication successful (broker: broker1.corp.com)

Invalid keytab with no ticket in the OS cache:

2026-04-20T21:31:38.105568Z WARN kafka_backup_core::health: Component kafka became Unhealthy:
  Some("Authentication error: GSSAPI: failed to acquire Kerberos credentials:
  No credentials were supplied, or the credentials were unavailable or inaccessible
  (No Kerberos credentials available (default cache: KCM:)).
  Check that either a valid ticket-granting ticket exists (run `kinit`) or that
  `sasl_keytab_path` in the security configuration points to a readable keytab file.")

Known limitations

  • Offset-reset commands (offset-reset, offset-reset-bulk,
    offset-rollback) currently default all GSSAPI fields to None in their
    parse_security_config helper. These commands will work with kinit
    credential cache but do not yet read sasl_keytab_path or
    sasl_krb5_config_path from YAML or CLI flags. Could be the focus of
    a future PR.

  • This implementation has only been tested on Linux (RHEL 9.7).


Checklist

  • Code follows the existing style and conventions of the project
  • New dependency (libgssapi 0.9) is justified and documented
  • No breaking changes to existing authentication methods or configurations
  • Behaviour is unchanged when sasl_mechanism is not GSSAPI
  • Error messages are descriptive and guide the user toward a resolution
  • Unit tests for gssapi.rs (a full Kerberos handshake test requires a live KDC; contributions welcome)
  • Integration test in Docker / Kubernetes with a Kerberized Kafka
  • Offset-reset commands: full GSSAPI support (potential future PR)

@sionsmith
Copy link
Copy Markdown
Contributor

Thanks for tackling this — Kerberos is a known gap and the Phase 1 / Phase 2 state machine in gssapi.rs is solid work.

Heads-up that 0.14.0 (PR #94, currently in review) has since landed the SaslMechanismPlugin extension trait — a pluggable SASL dispatch path that was designed specifically to absorb mechanisms like GSSAPI, OAUTHBEARER, and MSK IAM without expanding the core enum or baking vendor-coupled deps into the default build. A few consequences for this PR as currently shaped:

  1. Merge conflict is structural, not textual. authenticate_raw() no longer exists on the post-0.14 branch — reconnect and initial-connect share a single dispatch function. The enum-plus-match-arm approach here doesn't have a place to hook in.
  2. libgssapi as an always-compiled dep forces libkrb5-dev / krb5-devel / brew install krb5 onto every CI runner, every release build, and every dev machine — for a feature the majority of users will never touch. A gssapi Cargo feature gate (default off) preserves the lean build.
  3. KIP-368 re-auth is live in 0.14.0. A plugin-shaped GSSAPI impl gets ticket refresh for free (broker's advertised session_lifetime_ms drives a scheduler that calls reauth_payload at ~80 % of the window). Your state machine already does a fresh gss_init_sec_context — the reauth surface fits naturally.
  4. unsafe { std::env::set_var(KRB5_CONFIG, …) } races other connects in a multi-client process. Process-wide serialisation during credential acquisition (or upstreaming a keytab-aware Cred::acquire) closes that gap.

Proposal. Rework as a GssapiPlugin: SaslMechanismPlugin under crates/kafka-backup-core/src/kafka/sasl/gssapi.rs, gated by a gssapi Cargo feature. ~90 % of the state machine you've written transfers verbatim — the Phase 1 round-driver, Phase 2 layer negotiation, and wrap/unwrap are exactly what the trait wants.

I've prototyped this on feat/gssapi-plugin (stacked on feat/sasl-mechanism-plugin, six commits: feature scaffolding → config surface → plugin → CLI wiring → Docker KDC E2E fixture → docs). Happy to pair with you on porting your logic across, or you're welcome to pick it up and drive to completion yourself — the skeleton's there. See docs/PRD-sasl-mechanism-plugin.md and examples/custom_sasl_plugin.rs in PR #94 for the trait contract.

Credit for the Phase 1/2 state machine stays yours either way.

@kthimjo
Copy link
Copy Markdown
Author

kthimjo commented Apr 21, 2026

Hi Sion, thank you for the quick feedback, I appreciate it. My mistake for not being aware that draft PR #94 was created before this.
Your proposal makes perfect sense, I will look into your prototype and try to fit the GssapiClient into it in the next few days.

sionsmith added a commit that referenced this pull request Apr 21, 2026
Adds a `gssapi` cargo feature on both `kafka-backup-core` and
`kafka-backup-cli` (passthrough) with `default = []`, plus an optional
`libgssapi = "0.9"` workspace dependency. No logic changes — subsequent
commits build the `GssapiPlugin` impl behind this gate.

Default builds are unchanged and do not pull `libgssapi`. The gssapi
feature requires system krb5 development headers at build time:
- Debian/Ubuntu: `apt-get install libkrb5-dev`
- RHEL/Fedora: `dnf install krb5-devel`
- macOS: `brew install krb5` (then
  `export PKG_CONFIG_PATH="$(brew --prefix krb5)/lib/pkgconfig:$PKG_CONFIG_PATH"`)

Verified:
- `cargo check -p kafka-backup-core` (default, no libgssapi)
- `cargo check -p kafka-backup-core --features gssapi` (pulls libgssapi)
- `cargo check -p kafka-backup-cli` (default)
- `cargo check -p kafka-backup-cli --features gssapi` (passthrough works)

Part of the GSSAPI plugin rework superseding PR #95 (authored by
@kthimjo) on the `SaslMechanismPlugin` extension point.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Adds the always-present `SaslMechanism::Gssapi` enum variant and three
optional `SecurityConfig` fields backing it:

- `sasl_kerberos_service_name` — Kafka service principal (defaults to
  `kafka` at the CLI layer)
- `sasl_keytab_path` — keytab file path; OS credential cache is used
  if unset
- `sasl_krb5_config_path` — path to `krb5.conf`; system default if unset

All three are `#[serde(default)]` so existing configs keep parsing.
YAML round-trip tested: `sasl_mechanism: GSSAPI` (SCREAMING-KEBAB-CASE)
decodes to `SaslMechanism::Gssapi` and all three path fields populate.

The variant is always compiled (so the YAML surface is consistent
across binaries), but a working GSSAPI client requires the `gssapi`
cargo feature at the CLI level. Core's `authenticate()` surfaces a
clear error if `SaslMechanism::Gssapi` is set without a plugin — the
CLI installs a `GssapiPlugin` via `populate_sasl_plugin` in a later
commit.

Part of the GSSAPI plugin rework superseding PR #95 by @kthimjo.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Adds `kafka_backup_core::kafka::GssapiPlugin` behind the `gssapi`
cargo feature. The plugin is a state machine around
`libgssapi::context::ClientCtx` that implements RFC 4752 §3.1:

  Phase 1 — gss_init_sec_context rounds (Context → ContextInProgress)
  Phase 1→2 transition — empty turnaround token (AwaitingLayerProposal)
  Phase 2 — unwrap broker proposal, check 0x01 (no-security-layer) bit,
            wrap reply `0x01 0x00 0x00 0x00 | authz_id` (AwaitingFinalAck)
  Done — broker ack closes the handshake

Notable design decisions:

- Interior mutability via `Arc<tokio::sync::Mutex<State>>` to bridge
  the trait's `&self` methods with `ClientCtx::step`'s `&mut`.
- Process-wide `KRB5_ENV_LOCK: tokio::sync::Mutex<()>` serialises
  `KRB5_CLIENT_KTNAME` / `KRB5_CONFIG` env-var mutation around
  `Cred::acquire`. libgssapi 0.9.1 does not expose a keytab-path
  argument, so env vars are the only route; without this lock,
  concurrent `KafkaClient`s would race. PR #95's unsynchronised
  `set_var` is the underlying issue this fixes.
- `reauth_payload` resets state to Initial and re-acquires a fresh
  credential + ClientCtx — Kerberos tickets expire and a stale
  context cannot be reused.
- Keytab existence is checked upfront in `new()` so misconfig fails
  fast at construction rather than mid-handshake.

Day-1 spike result: libgssapi 0.9.1 exposes `Cred::acquire` and
`Cred::acquire_with_password` only; no keytab-aware constructor. The
env-var mutex is the correct mitigation for OSS until upstream gains
a keytab argument.

Tests (7 unit tests, feature-gated): Phase 2 proposal parser
(rejects <4 bytes, rejects missing 0x01 bit, accepts 0x01/0x07),
Phase 2 reply wire format, keytab-missing construction error,
continue_payload-before-initial poison, mechanism name. The full
gss_init_sec_context / wrap / unwrap round-trip is exercised by the
Docker E2E added in a later commit.

Part of the GSSAPI plugin rework superseding PR #95 by @kthimjo.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Ports two small improvements from PR #95:

- Build an OidSet containing GSS_MECH_KRB5 and pass it to Cred::acquire's
  desired-mechs parameter instead of None. Locks the mechanism to Kerberos 5
  rather than relying on the libgssapi default; matches the convention in
  librdkafka + the Java Kafka client.
- Pass Some(&GSS_MECH_KRB5) to ClientCtx::new for the same reason.

Plus one observability improvement adapted from PR #95: parse_phase2_proposal
now returns the observed layer mask and the caller emits it at DEBUG alongside
the authz_id when Phase 2 wrap succeeds, so a field report can distinguish
"broker offered 0x01" from "broker offered 0x07".

The thread-safe KRB5_ENV_LOCK + plugin-trait refactor + upfront keytab
validation + Docker E2E fixture + unit tests + CLI flag surface remain as they
were — those stay ours.

Co-authored-by: Krist Thimjo <krist.thimjo@intesasanpaolo.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Adds a `gssapi` cargo feature on both `kafka-backup-core` and
`kafka-backup-cli` (passthrough) with `default = []`, plus an optional
`libgssapi = "0.9"` workspace dependency. No logic changes — subsequent
commits build the `GssapiPlugin` impl behind this gate.

Default builds are unchanged and do not pull `libgssapi`. The gssapi
feature requires system krb5 development headers at build time:
- Debian/Ubuntu: `apt-get install libkrb5-dev`
- RHEL/Fedora: `dnf install krb5-devel`
- macOS: `brew install krb5` (then
  `export PKG_CONFIG_PATH="$(brew --prefix krb5)/lib/pkgconfig:$PKG_CONFIG_PATH"`)

Verified:
- `cargo check -p kafka-backup-core` (default, no libgssapi)
- `cargo check -p kafka-backup-core --features gssapi` (pulls libgssapi)
- `cargo check -p kafka-backup-cli` (default)
- `cargo check -p kafka-backup-cli --features gssapi` (passthrough works)

Part of the GSSAPI plugin rework superseding PR #95 (authored by
@kthimjo) on the `SaslMechanismPlugin` extension point.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Adds the always-present `SaslMechanism::Gssapi` enum variant and three
optional `SecurityConfig` fields backing it:

- `sasl_kerberos_service_name` — Kafka service principal (defaults to
  `kafka` at the CLI layer)
- `sasl_keytab_path` — keytab file path; OS credential cache is used
  if unset
- `sasl_krb5_config_path` — path to `krb5.conf`; system default if unset

All three are `#[serde(default)]` so existing configs keep parsing.
YAML round-trip tested: `sasl_mechanism: GSSAPI` (SCREAMING-KEBAB-CASE)
decodes to `SaslMechanism::Gssapi` and all three path fields populate.

The variant is always compiled (so the YAML surface is consistent
across binaries), but a working GSSAPI client requires the `gssapi`
cargo feature at the CLI level. Core's `authenticate()` surfaces a
clear error if `SaslMechanism::Gssapi` is set without a plugin — the
CLI installs a `GssapiPlugin` via `populate_sasl_plugin` in a later
commit.

Part of the GSSAPI plugin rework superseding PR #95 by @kthimjo.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Adds `kafka_backup_core::kafka::GssapiPlugin` behind the `gssapi`
cargo feature. The plugin is a state machine around
`libgssapi::context::ClientCtx` that implements RFC 4752 §3.1:

  Phase 1 — gss_init_sec_context rounds (Context → ContextInProgress)
  Phase 1→2 transition — empty turnaround token (AwaitingLayerProposal)
  Phase 2 — unwrap broker proposal, check 0x01 (no-security-layer) bit,
            wrap reply `0x01 0x00 0x00 0x00 | authz_id` (AwaitingFinalAck)
  Done — broker ack closes the handshake

Notable design decisions:

- Interior mutability via `Arc<tokio::sync::Mutex<State>>` to bridge
  the trait's `&self` methods with `ClientCtx::step`'s `&mut`.
- Process-wide `KRB5_ENV_LOCK: tokio::sync::Mutex<()>` serialises
  `KRB5_CLIENT_KTNAME` / `KRB5_CONFIG` env-var mutation around
  `Cred::acquire`. libgssapi 0.9.1 does not expose a keytab-path
  argument, so env vars are the only route; without this lock,
  concurrent `KafkaClient`s would race. PR #95's unsynchronised
  `set_var` is the underlying issue this fixes.
- `reauth_payload` resets state to Initial and re-acquires a fresh
  credential + ClientCtx — Kerberos tickets expire and a stale
  context cannot be reused.
- Keytab existence is checked upfront in `new()` so misconfig fails
  fast at construction rather than mid-handshake.

Day-1 spike result: libgssapi 0.9.1 exposes `Cred::acquire` and
`Cred::acquire_with_password` only; no keytab-aware constructor. The
env-var mutex is the correct mitigation for OSS until upstream gains
a keytab argument.

Tests (7 unit tests, feature-gated): Phase 2 proposal parser
(rejects <4 bytes, rejects missing 0x01 bit, accepts 0x01/0x07),
Phase 2 reply wire format, keytab-missing construction error,
continue_payload-before-initial poison, mechanism name. The full
gss_init_sec_context / wrap / unwrap round-trip is exercised by the
Docker E2E added in a later commit.

Part of the GSSAPI plugin rework superseding PR #95 by @kthimjo.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Ports two small improvements from PR #95:

- Build an OidSet containing GSS_MECH_KRB5 and pass it to Cred::acquire's
  desired-mechs parameter instead of None. Locks the mechanism to Kerberos 5
  rather than relying on the libgssapi default; matches the convention in
  librdkafka + the Java Kafka client.
- Pass Some(&GSS_MECH_KRB5) to ClientCtx::new for the same reason.

Plus one observability improvement adapted from PR #95: parse_phase2_proposal
now returns the observed layer mask and the caller emits it at DEBUG alongside
the authz_id when Phase 2 wrap succeeds, so a field report can distinguish
"broker offered 0x01" from "broker offered 0x07".

The thread-safe KRB5_ENV_LOCK + plugin-trait refactor + upfront keytab
validation + Docker E2E fixture + unit tests + CLI flag surface remain as they
were — those stay ours.

Co-authored-by: Krist Thimjo <krist.thimjo@intesasanpaolo.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Adds a `gssapi` cargo feature on both `kafka-backup-core` and
`kafka-backup-cli` (passthrough) with `default = []`, plus an optional
`libgssapi = "0.9"` workspace dependency. No logic changes — subsequent
commits build the `GssapiPlugin` impl behind this gate.

Default builds are unchanged and do not pull `libgssapi`. The gssapi
feature requires system krb5 development headers at build time:
- Debian/Ubuntu: `apt-get install libkrb5-dev`
- RHEL/Fedora: `dnf install krb5-devel`
- macOS: `brew install krb5` (then
  `export PKG_CONFIG_PATH="$(brew --prefix krb5)/lib/pkgconfig:$PKG_CONFIG_PATH"`)

Verified:
- `cargo check -p kafka-backup-core` (default, no libgssapi)
- `cargo check -p kafka-backup-core --features gssapi` (pulls libgssapi)
- `cargo check -p kafka-backup-cli` (default)
- `cargo check -p kafka-backup-cli --features gssapi` (passthrough works)

Part of the GSSAPI plugin rework superseding PR #95 (authored by
@kthimjo) on the `SaslMechanismPlugin` extension point.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Adds the always-present `SaslMechanism::Gssapi` enum variant and three
optional `SecurityConfig` fields backing it:

- `sasl_kerberos_service_name` — Kafka service principal (defaults to
  `kafka` at the CLI layer)
- `sasl_keytab_path` — keytab file path; OS credential cache is used
  if unset
- `sasl_krb5_config_path` — path to `krb5.conf`; system default if unset

All three are `#[serde(default)]` so existing configs keep parsing.
YAML round-trip tested: `sasl_mechanism: GSSAPI` (SCREAMING-KEBAB-CASE)
decodes to `SaslMechanism::Gssapi` and all three path fields populate.

The variant is always compiled (so the YAML surface is consistent
across binaries), but a working GSSAPI client requires the `gssapi`
cargo feature at the CLI level. Core's `authenticate()` surfaces a
clear error if `SaslMechanism::Gssapi` is set without a plugin — the
CLI installs a `GssapiPlugin` via `populate_sasl_plugin` in a later
commit.

Part of the GSSAPI plugin rework superseding PR #95 by @kthimjo.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Adds `kafka_backup_core::kafka::GssapiPlugin` behind the `gssapi`
cargo feature. The plugin is a state machine around
`libgssapi::context::ClientCtx` that implements RFC 4752 §3.1:

  Phase 1 — gss_init_sec_context rounds (Context → ContextInProgress)
  Phase 1→2 transition — empty turnaround token (AwaitingLayerProposal)
  Phase 2 — unwrap broker proposal, check 0x01 (no-security-layer) bit,
            wrap reply `0x01 0x00 0x00 0x00 | authz_id` (AwaitingFinalAck)
  Done — broker ack closes the handshake

Notable design decisions:

- Interior mutability via `Arc<tokio::sync::Mutex<State>>` to bridge
  the trait's `&self` methods with `ClientCtx::step`'s `&mut`.
- Process-wide `KRB5_ENV_LOCK: tokio::sync::Mutex<()>` serialises
  `KRB5_CLIENT_KTNAME` / `KRB5_CONFIG` env-var mutation around
  `Cred::acquire`. libgssapi 0.9.1 does not expose a keytab-path
  argument, so env vars are the only route; without this lock,
  concurrent `KafkaClient`s would race. PR #95's unsynchronised
  `set_var` is the underlying issue this fixes.
- `reauth_payload` resets state to Initial and re-acquires a fresh
  credential + ClientCtx — Kerberos tickets expire and a stale
  context cannot be reused.
- Keytab existence is checked upfront in `new()` so misconfig fails
  fast at construction rather than mid-handshake.

Day-1 spike result: libgssapi 0.9.1 exposes `Cred::acquire` and
`Cred::acquire_with_password` only; no keytab-aware constructor. The
env-var mutex is the correct mitigation for OSS until upstream gains
a keytab argument.

Tests (7 unit tests, feature-gated): Phase 2 proposal parser
(rejects <4 bytes, rejects missing 0x01 bit, accepts 0x01/0x07),
Phase 2 reply wire format, keytab-missing construction error,
continue_payload-before-initial poison, mechanism name. The full
gss_init_sec_context / wrap / unwrap round-trip is exercised by the
Docker E2E added in a later commit.

Part of the GSSAPI plugin rework superseding PR #95 by @kthimjo.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Ports two small improvements from PR #95:

- Build an OidSet containing GSS_MECH_KRB5 and pass it to Cred::acquire's
  desired-mechs parameter instead of None. Locks the mechanism to Kerberos 5
  rather than relying on the libgssapi default; matches the convention in
  librdkafka + the Java Kafka client.
- Pass Some(&GSS_MECH_KRB5) to ClientCtx::new for the same reason.

Plus one observability improvement adapted from PR #95: parse_phase2_proposal
now returns the observed layer mask and the caller emits it at DEBUG alongside
the authz_id when Phase 2 wrap succeeds, so a field report can distinguish
"broker offered 0x01" from "broker offered 0x07".

The thread-safe KRB5_ENV_LOCK + plugin-trait refactor + upfront keytab
validation + Docker E2E fixture + unit tests + CLI flag surface remain as they
were — those stay ours.

Co-authored-by: Krist Thimjo <krist.thimjo@intesasanpaolo.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Adds a `gssapi` cargo feature on both `kafka-backup-core` and
`kafka-backup-cli` (passthrough) with `default = []`, plus an optional
`libgssapi = "0.9"` workspace dependency. No logic changes — subsequent
commits build the `GssapiPlugin` impl behind this gate.

Default builds are unchanged and do not pull `libgssapi`. The gssapi
feature requires system krb5 development headers at build time:
- Debian/Ubuntu: `apt-get install libkrb5-dev`
- RHEL/Fedora: `dnf install krb5-devel`
- macOS: `brew install krb5` (then
  `export PKG_CONFIG_PATH="$(brew --prefix krb5)/lib/pkgconfig:$PKG_CONFIG_PATH"`)

Verified:
- `cargo check -p kafka-backup-core` (default, no libgssapi)
- `cargo check -p kafka-backup-core --features gssapi` (pulls libgssapi)
- `cargo check -p kafka-backup-cli` (default)
- `cargo check -p kafka-backup-cli --features gssapi` (passthrough works)

Part of the GSSAPI plugin rework superseding PR #95 (authored by
@kthimjo) on the `SaslMechanismPlugin` extension point.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Adds the always-present `SaslMechanism::Gssapi` enum variant and three
optional `SecurityConfig` fields backing it:

- `sasl_kerberos_service_name` — Kafka service principal (defaults to
  `kafka` at the CLI layer)
- `sasl_keytab_path` — keytab file path; OS credential cache is used
  if unset
- `sasl_krb5_config_path` — path to `krb5.conf`; system default if unset

All three are `#[serde(default)]` so existing configs keep parsing.
YAML round-trip tested: `sasl_mechanism: GSSAPI` (SCREAMING-KEBAB-CASE)
decodes to `SaslMechanism::Gssapi` and all three path fields populate.

The variant is always compiled (so the YAML surface is consistent
across binaries), but a working GSSAPI client requires the `gssapi`
cargo feature at the CLI level. Core's `authenticate()` surfaces a
clear error if `SaslMechanism::Gssapi` is set without a plugin — the
CLI installs a `GssapiPlugin` via `populate_sasl_plugin` in a later
commit.

Part of the GSSAPI plugin rework superseding PR #95 by @kthimjo.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Adds `kafka_backup_core::kafka::GssapiPlugin` behind the `gssapi`
cargo feature. The plugin is a state machine around
`libgssapi::context::ClientCtx` that implements RFC 4752 §3.1:

  Phase 1 — gss_init_sec_context rounds (Context → ContextInProgress)
  Phase 1→2 transition — empty turnaround token (AwaitingLayerProposal)
  Phase 2 — unwrap broker proposal, check 0x01 (no-security-layer) bit,
            wrap reply `0x01 0x00 0x00 0x00 | authz_id` (AwaitingFinalAck)
  Done — broker ack closes the handshake

Notable design decisions:

- Interior mutability via `Arc<tokio::sync::Mutex<State>>` to bridge
  the trait's `&self` methods with `ClientCtx::step`'s `&mut`.
- Process-wide `KRB5_ENV_LOCK: tokio::sync::Mutex<()>` serialises
  `KRB5_CLIENT_KTNAME` / `KRB5_CONFIG` env-var mutation around
  `Cred::acquire`. libgssapi 0.9.1 does not expose a keytab-path
  argument, so env vars are the only route; without this lock,
  concurrent `KafkaClient`s would race. PR #95's unsynchronised
  `set_var` is the underlying issue this fixes.
- `reauth_payload` resets state to Initial and re-acquires a fresh
  credential + ClientCtx — Kerberos tickets expire and a stale
  context cannot be reused.
- Keytab existence is checked upfront in `new()` so misconfig fails
  fast at construction rather than mid-handshake.

Day-1 spike result: libgssapi 0.9.1 exposes `Cred::acquire` and
`Cred::acquire_with_password` only; no keytab-aware constructor. The
env-var mutex is the correct mitigation for OSS until upstream gains
a keytab argument.

Tests (7 unit tests, feature-gated): Phase 2 proposal parser
(rejects <4 bytes, rejects missing 0x01 bit, accepts 0x01/0x07),
Phase 2 reply wire format, keytab-missing construction error,
continue_payload-before-initial poison, mechanism name. The full
gss_init_sec_context / wrap / unwrap round-trip is exercised by the
Docker E2E added in a later commit.

Part of the GSSAPI plugin rework superseding PR #95 by @kthimjo.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sionsmith added a commit that referenced this pull request Apr 21, 2026
Ports two small improvements from PR #95:

- Build an OidSet containing GSS_MECH_KRB5 and pass it to Cred::acquire's
  desired-mechs parameter instead of None. Locks the mechanism to Kerberos 5
  rather than relying on the libgssapi default; matches the convention in
  librdkafka + the Java Kafka client.
- Pass Some(&GSS_MECH_KRB5) to ClientCtx::new for the same reason.

Plus one observability improvement adapted from PR #95: parse_phase2_proposal
now returns the observed layer mask and the caller emits it at DEBUG alongside
the authz_id when Phase 2 wrap succeeds, so a field report can distinguish
"broker offered 0x01" from "broker offered 0x07".

The thread-safe KRB5_ENV_LOCK + plugin-trait refactor + upfront keytab
validation + Docker E2E fixture + unit tests + CLI flag surface remain as they
were — those stay ours.

Co-authored-by: Krist Thimjo <krist.thimjo@intesasanpaolo.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants