Skip to content

netfilter: add raw table and no-op CT target for Istio DNS capture#12688

Open
a7i wants to merge 1 commit intogoogle:masterfrom
a7i:fix/raw-table-ct-target
Open

netfilter: add raw table and no-op CT target for Istio DNS capture#12688
a7i wants to merge 1 commit intogoogle:masterfrom
a7i:fix/raw-table-ct-target

Conversation

@a7i
Copy link
Contributor

@a7i a7i commented Mar 8, 2026

Summary

Add the iptables raw table and a no-op CT (conntrack zone) target to gVisor's netfilter implementation. This enables Istio's istio-init container to apply iptables rules when DNS capture is enabled (ISTIO_META_DNS_CAPTURE=true).

Problem

When Istio DNS capture is enabled, istio-iptables generates iptables-restore input containing both * nat and * raw table sections. The raw table rules use -j CT --zone N targets for conntrack zone isolation between Envoy's DNS queries and application DNS queries. gVisor previously only implemented nat, mangle, and filter tables, causing iptables-restore to fail with:

iptables-restore: unable to initialize table 'raw'

This blocks Istio service mesh adoption on gVisor when DNS capture is required.

Approach

Raw table: Added as a new TableID (RawID) with PREROUTING and OUTPUT hooks, matching the Linux kernel's raw table. Wired into CheckPrerouting() and CheckOutput() as the first table checked (before mangle), matching Linux's netfilter hook priority ordering:

  • Linux hook order: raw → conntrack → mangle → nat → filter
  • gVisor hook order (now): raw → mangle → nat (filter is separate)

CT target: Implemented as a no-op that accepts packets without modifying conntrack behavior. The target parses the xt_ct_target_info (revision 0) struct from userspace, stores the zone value, but does not apply zone-based conntrack isolation. This is intentional:

  • gVisor's conntrack implementation does not support zones
  • The CT target's purpose in Istio is to prevent conntrack table collisions between Envoy (UID 1337) and application DNS traffic
  • DNS redirection still works correctly via the nat table's REDIRECT rules to port 15053
  • The lack of zone tracking may cause rare conntrack 5-tuple collisions under heavy concurrent DNS load, but this is acceptable for gVisor's sandboxed environment

How Linux and other runtimes handle this:

  • Linux kernel: Full raw table with CT --zone support via nf_conntrack_zones
  • runc / kata: Delegate to the host Linux kernel, so they get full support for free
  • gVisor: Must implement in userspace netstack — this PR adds the table/target scaffolding with a no-op CT action

Changes

  • pkg/tcpip/stack/iptables.go: Add RawID to TableID enum, EmptyRawTable(), default table entries for IPv4/IPv6, wire into CheckPrerouting() and CheckOutput()
  • pkg/tcpip/stack/iptables_targets.go: Add CTTarget struct with no-op Action() returning RuleAccept
  • pkg/abi/linux/netfilter.go: Add XTCTTargetInfoV0 ABI struct (72 bytes) matching Linux's xt_ct_target_info
  • pkg/sentry/socket/netfilter/netfilter.go: Register raw table in nameToID, SetEntries, and DefaultLinuxTables
  • pkg/sentry/socket/netfilter/ct_target.go: New file — ctTarget wrapper and ctTargetMaker with marshal/unmarshal
  • pkg/sentry/socket/netfilter/targets.go: Register ctTargetMaker for IPv4 and IPv6
  • pkg/sentry/socket/netfilter/BUILD: Add ct_target.go to srcs
  • test/syscalls/linux/iptables.cc: Add RawTableInitialState test (gVisor-only) and CTTargetGetRevision test

Testing

  • RawTableInitialState: Verifies IPT_SO_GET_INFO for the "raw" table returns correct valid_hooks (PREROUTING + OUTPUT), num_entries (3), and entry sizes
  • CTTargetGetRevision: Verifies IPT_SO_GET_REVISION_TARGET for "CT" target revision 0 succeeds
  • Manual end-to-end test: Built runsc with this change (plus fix(setsockopt): increase maxOptLen from 8KB to 32KB #12686), deployed to an aarch64 node, and verified Istio istio-init with ISTIO_META_DNS_CAPTURE=true completes successfully — the full iptables-restore input including both * nat and * raw sections is applied without error

Related

Add the iptables `raw` table and a no-op `CT` (conntrack) target to
gVisor's netfilter implementation. This enables Istio's istio-init
container to successfully apply iptables rules when DNS capture is
enabled (ISTIO_META_DNS_CAPTURE=true).

When DNS capture is enabled, Istio generates iptables-restore input
containing both `* nat` and `* raw` table sections. The raw table
rules use `-j CT --zone N` for conntrack zone isolation. Previously,
gVisor only implemented nat, mangle, and filter tables, causing
iptables-restore to fail with "unable to initialize table 'raw'".

Changes:
- Add RawID to the TableID enum with default PREROUTING and OUTPUT
  hooks, matching Linux's raw table
- Wire raw table into CheckPrerouting() and CheckOutput() as the
  first table checked (before mangle), matching Linux's hook order
- Add EmptyRawTable() for SetEntries() to use when receiving rules
- Add CTTarget as a no-op target that returns RuleAccept (gVisor's
  conntrack does not implement zones, but accepting the rules allows
  iptables-restore to succeed)
- Add XTCTTargetInfoV0 ABI struct matching Linux's xt_ct_target_info
- Register CT target maker for both IPv4 and IPv6
- Add C++ tests for raw table initial state and CT target revision

The CT target is intentionally a no-op: it stores the zone value but
does not modify conntrack behavior. DNS redirection still works via
the nat table's REDIRECT rules. The lack of zone tracking may cause
rare conntrack collisions but is acceptable for gVisor's use case.

Fixes: google#12685

Signed-off-by: amiralavi7@gmail.com
@parth-opensrc parth-opensrc self-assigned this Mar 11, 2026
@parth-opensrc parth-opensrc added ready to pull area: networking Issue related to networking labels Mar 11, 2026
copybara-service bot pushed a commit that referenced this pull request Mar 12, 2026
## Summary

Add the iptables `raw` table and a no-op `CT` (conntrack zone) target to gVisor's netfilter implementation. This enables Istio's `istio-init` container to apply iptables rules when DNS capture is enabled (`ISTIO_META_DNS_CAPTURE=true`).

## Problem

When Istio DNS capture is enabled, `istio-iptables` generates `iptables-restore` input containing both `* nat` and `* raw` table sections. The `raw` table rules use `-j CT --zone N` targets for conntrack zone isolation between Envoy's DNS queries and application DNS queries. gVisor previously only implemented `nat`, `mangle`, and `filter` tables, causing `iptables-restore` to fail with:

```
iptables-restore: unable to initialize table 'raw'
```

This blocks Istio service mesh adoption on gVisor when DNS capture is required.

## Approach

**Raw table**: Added as a new `TableID` (`RawID`) with `PREROUTING` and `OUTPUT` hooks, matching the Linux kernel's raw table. Wired into `CheckPrerouting()` and `CheckOutput()` as the **first** table checked (before mangle), matching Linux's netfilter hook priority ordering:

- Linux hook order: raw → conntrack → mangle → nat → filter
- gVisor hook order (now): raw → mangle → nat (filter is separate)

**CT target**: Implemented as a **no-op** that accepts packets without modifying conntrack behavior. The target parses the `xt_ct_target_info` (revision 0) struct from userspace, stores the zone value, but does not apply zone-based conntrack isolation. This is intentional:

- gVisor's conntrack implementation does not support zones
- The CT target's purpose in Istio is to prevent conntrack table collisions between Envoy (UID 1337) and application DNS traffic
- DNS redirection still works correctly via the `nat` table's `REDIRECT` rules to port 15053
- The lack of zone tracking may cause rare conntrack 5-tuple collisions under heavy concurrent DNS load, but this is acceptable for gVisor's sandboxed environment

**How Linux and other runtimes handle this**:
- **Linux kernel**: Full `raw` table with `CT --zone` support via `nf_conntrack_zones`
- **runc / kata**: Delegate to the host Linux kernel, so they get full support for free
- **gVisor**: Must implement in userspace netstack — this PR adds the table/target scaffolding with a no-op CT action

## Changes

- `pkg/tcpip/stack/iptables.go`: Add `RawID` to `TableID` enum, `EmptyRawTable()`, default table entries for IPv4/IPv6, wire into `CheckPrerouting()` and `CheckOutput()`
- `pkg/tcpip/stack/iptables_targets.go`: Add `CTTarget` struct with no-op `Action()` returning `RuleAccept`
- `pkg/abi/linux/netfilter.go`: Add `XTCTTargetInfoV0` ABI struct (72 bytes) matching Linux's `xt_ct_target_info`
- `pkg/sentry/socket/netfilter/netfilter.go`: Register `raw` table in `nameToID`, `SetEntries`, and `DefaultLinuxTables`
- `pkg/sentry/socket/netfilter/ct_target.go`: New file — `ctTarget` wrapper and `ctTargetMaker` with marshal/unmarshal
- `pkg/sentry/socket/netfilter/targets.go`: Register `ctTargetMaker` for IPv4 and IPv6
- `pkg/sentry/socket/netfilter/BUILD`: Add `ct_target.go` to srcs
- `test/syscalls/linux/iptables.cc`: Add `RawTableInitialState` test (gVisor-only) and `CTTargetGetRevision` test

## Testing

- `RawTableInitialState`: Verifies `IPT_SO_GET_INFO` for the "raw" table returns correct `valid_hooks` (PREROUTING + OUTPUT), `num_entries` (3), and entry sizes
- `CTTargetGetRevision`: Verifies `IPT_SO_GET_REVISION_TARGET` for "CT" target revision 0 succeeds
- **Manual end-to-end test**: Built `runsc` with this change (plus #12686), deployed to an aarch64 node, and verified Istio `istio-init` with `ISTIO_META_DNS_CAPTURE=true` completes successfully — the full `iptables-restore` input including both `* nat` and `* raw` sections is applied without error

## Related

- Fixes #12685
- Depends on #12686 (maxOptLen increase) for large Istio rulesets

FUTURE_COPYBARA_INTEGRATE_REVIEW=#12688 from a7i:fix/raw-table-ct-target 06aa774
PiperOrigin-RevId: 882273534
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: networking Issue related to networking ready to pull

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Istio istio-init fails on gVisor: maxOptLen 8KB limit + missing raw table block iptables-restore

2 participants