netfilter: add raw table and no-op CT target for Istio DNS capture#12722
Open
copybara-service[bot] wants to merge 1 commit intomasterfrom
Open
netfilter: add raw table and no-op CT target for Istio DNS capture#12722copybara-service[bot] wants to merge 1 commit intomasterfrom
copybara-service[bot] wants to merge 1 commit intomasterfrom
Conversation
## Summary Add the iptables `raw` table and a no-op `CT` (conntrack zone) target to gVisor's netfilter implementation. This enables Istio's `istio-init` container to apply iptables rules when DNS capture is enabled (`ISTIO_META_DNS_CAPTURE=true`). ## Problem When Istio DNS capture is enabled, `istio-iptables` generates `iptables-restore` input containing both `* nat` and `* raw` table sections. The `raw` table rules use `-j CT --zone N` targets for conntrack zone isolation between Envoy's DNS queries and application DNS queries. gVisor previously only implemented `nat`, `mangle`, and `filter` tables, causing `iptables-restore` to fail with: ``` iptables-restore: unable to initialize table 'raw' ``` This blocks Istio service mesh adoption on gVisor when DNS capture is required. ## Approach **Raw table**: Added as a new `TableID` (`RawID`) with `PREROUTING` and `OUTPUT` hooks, matching the Linux kernel's raw table. Wired into `CheckPrerouting()` and `CheckOutput()` as the **first** table checked (before mangle), matching Linux's netfilter hook priority ordering: - Linux hook order: raw → conntrack → mangle → nat → filter - gVisor hook order (now): raw → mangle → nat (filter is separate) **CT target**: Implemented as a **no-op** that accepts packets without modifying conntrack behavior. The target parses the `xt_ct_target_info` (revision 0) struct from userspace, stores the zone value, but does not apply zone-based conntrack isolation. This is intentional: - gVisor's conntrack implementation does not support zones - The CT target's purpose in Istio is to prevent conntrack table collisions between Envoy (UID 1337) and application DNS traffic - DNS redirection still works correctly via the `nat` table's `REDIRECT` rules to port 15053 - The lack of zone tracking may cause rare conntrack 5-tuple collisions under heavy concurrent DNS load, but this is acceptable for gVisor's sandboxed environment **How Linux and other runtimes handle this**: - **Linux kernel**: Full `raw` table with `CT --zone` support via `nf_conntrack_zones` - **runc / kata**: Delegate to the host Linux kernel, so they get full support for free - **gVisor**: Must implement in userspace netstack — this PR adds the table/target scaffolding with a no-op CT action ## Changes - `pkg/tcpip/stack/iptables.go`: Add `RawID` to `TableID` enum, `EmptyRawTable()`, default table entries for IPv4/IPv6, wire into `CheckPrerouting()` and `CheckOutput()` - `pkg/tcpip/stack/iptables_targets.go`: Add `CTTarget` struct with no-op `Action()` returning `RuleAccept` - `pkg/abi/linux/netfilter.go`: Add `XTCTTargetInfoV0` ABI struct (72 bytes) matching Linux's `xt_ct_target_info` - `pkg/sentry/socket/netfilter/netfilter.go`: Register `raw` table in `nameToID`, `SetEntries`, and `DefaultLinuxTables` - `pkg/sentry/socket/netfilter/ct_target.go`: New file — `ctTarget` wrapper and `ctTargetMaker` with marshal/unmarshal - `pkg/sentry/socket/netfilter/targets.go`: Register `ctTargetMaker` for IPv4 and IPv6 - `pkg/sentry/socket/netfilter/BUILD`: Add `ct_target.go` to srcs - `test/syscalls/linux/iptables.cc`: Add `RawTableInitialState` test (gVisor-only) and `CTTargetGetRevision` test ## Testing - `RawTableInitialState`: Verifies `IPT_SO_GET_INFO` for the "raw" table returns correct `valid_hooks` (PREROUTING + OUTPUT), `num_entries` (3), and entry sizes - `CTTargetGetRevision`: Verifies `IPT_SO_GET_REVISION_TARGET` for "CT" target revision 0 succeeds - **Manual end-to-end test**: Built `runsc` with this change (plus #12686), deployed to an aarch64 node, and verified Istio `istio-init` with `ISTIO_META_DNS_CAPTURE=true` completes successfully — the full `iptables-restore` input including both `* nat` and `* raw` sections is applied without error ## Related - Fixes #12685 - Depends on #12686 (maxOptLen increase) for large Istio rulesets FUTURE_COPYBARA_INTEGRATE_REVIEW=#12688 from a7i:fix/raw-table-ct-target 06aa774 PiperOrigin-RevId: 882273534
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
netfilter: add raw table and no-op CT target for Istio DNS capture
Summary
Add the iptables
rawtable and a no-opCT(conntrack zone) target to gVisor's netfilter implementation. This enables Istio'sistio-initcontainer to apply iptables rules when DNS capture is enabled (ISTIO_META_DNS_CAPTURE=true).Problem
When Istio DNS capture is enabled,
istio-iptablesgeneratesiptables-restoreinput containing both* natand* rawtable sections. Therawtable rules use-j CT --zone Ntargets for conntrack zone isolation between Envoy's DNS queries and application DNS queries. gVisor previously only implementednat,mangle, andfiltertables, causingiptables-restoreto fail with:This blocks Istio service mesh adoption on gVisor when DNS capture is required.
Approach
Raw table: Added as a new
TableID(RawID) withPREROUTINGandOUTPUThooks, matching the Linux kernel's raw table. Wired intoCheckPrerouting()andCheckOutput()as the first table checked (before mangle), matching Linux's netfilter hook priority ordering:CT target: Implemented as a no-op that accepts packets without modifying conntrack behavior. The target parses the
xt_ct_target_info(revision 0) struct from userspace, stores the zone value, but does not apply zone-based conntrack isolation. This is intentional:nattable'sREDIRECTrules to port 15053How Linux and other runtimes handle this:
rawtable withCT --zonesupport vianf_conntrack_zonesChanges
pkg/tcpip/stack/iptables.go: AddRawIDtoTableIDenum,EmptyRawTable(), default table entries for IPv4/IPv6, wire intoCheckPrerouting()andCheckOutput()pkg/tcpip/stack/iptables_targets.go: AddCTTargetstruct with no-opAction()returningRuleAcceptpkg/abi/linux/netfilter.go: AddXTCTTargetInfoV0ABI struct (72 bytes) matching Linux'sxt_ct_target_infopkg/sentry/socket/netfilter/netfilter.go: Registerrawtable innameToID,SetEntries, andDefaultLinuxTablespkg/sentry/socket/netfilter/ct_target.go: New file —ctTargetwrapper andctTargetMakerwith marshal/unmarshalpkg/sentry/socket/netfilter/targets.go: RegisterctTargetMakerfor IPv4 and IPv6pkg/sentry/socket/netfilter/BUILD: Addct_target.goto srcstest/syscalls/linux/iptables.cc: AddRawTableInitialStatetest (gVisor-only) andCTTargetGetRevisiontestTesting
RawTableInitialState: VerifiesIPT_SO_GET_INFOfor the "raw" table returns correctvalid_hooks(PREROUTING + OUTPUT),num_entries(3), and entry sizesCTTargetGetRevision: VerifiesIPT_SO_GET_REVISION_TARGETfor "CT" target revision 0 succeedsrunscwith this change (plus fix(setsockopt): increase maxOptLen from 8KB to 32KB #12686), deployed to an aarch64 node, and verified Istioistio-initwithISTIO_META_DNS_CAPTURE=truecompletes successfully — the fulliptables-restoreinput including both* natand* rawsections is applied without errorRelated
FUTURE_COPYBARA_INTEGRATE_REVIEW=#12688 from a7i:fix/raw-table-ct-target 06aa774