Skip to content

chore: revert double-quoted zone names (#6133)#8184

Closed
6ixfalls wants to merge 3 commits intoenvoyproxy:mainfrom
6ixfalls:chore/fix_node_label
Closed

chore: revert double-quoted zone names (#6133)#8184
6ixfalls wants to merge 3 commits intoenvoyproxy:mainfrom
6ixfalls:chore/fix_node_label

Conversation

@6ixfalls
Copy link
Copy Markdown
Contributor

@6ixfalls 6ixfalls commented Feb 4, 2026

What type of PR is this?

What this PR does / why we need it: Reverts PR #6133 which double-quotes the zone value. This is incorrect, as the zone value should already be a string. The resulting environment variable has extra double quotes.

Which issue(s) this PR fixes:

Fixes #7492

Release Notes: Yes

@6ixfalls 6ixfalls requested a review from a team as a code owner February 4, 2026 22:37
@netlify
Copy link
Copy Markdown

netlify Bot commented Feb 4, 2026

Deploy Preview for cerulean-figolla-1f9435 canceled.

Name Link
🔨 Latest commit 99b066d
🔍 Latest deploy log https://app.netlify.com/projects/cerulean-figolla-1f9435/deploys/698671862b01d3000836c3ff

@6ixfalls 6ixfalls force-pushed the chore/fix_node_label branch from 559cfd3 to c093e2d Compare February 4, 2026 22:51
Signed-off-by: Bryan Lee <me@bryanl.ee>
@6ixfalls 6ixfalls force-pushed the chore/fix_node_label branch from c093e2d to 4c05eaa Compare February 4, 2026 22:59
Comment thread tools/hack/create-cluster.sh Outdated
- role: control-plane
labels:
"topology.kubernetes.io/zone": 0
"topology.kubernetes.io/zone": "0"
Copy link
Copy Markdown
Contributor

@jukie jukie Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like purely a revert, did you add any extra handling for numerical zones? Before #6133 this wouldn't work in Azure. Have you tested your changes with unquoted numerical zones as input?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a difficult problem - it's not even possible to set numerical zones directly with the Kubernetes API as it doesn't match the expected type of map[string]string. In theory, however, the problem should never happen, as the templated YAML files already quote the zone value. I'll run some tests now, but even with an unquoted numerical zone as an input my guess is it will still work properly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you add changes or was my understanding correct that this simply reverts #6133? If so, I'm not sure this is correct because azure users were broken prior to that due to parsing the numerical zone.

The issue is on the EG parsing side (either topology injector or bootstrap template) rather than Kubernetes but unless something else has changed we shouldn't revert it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure is doing something very wrong in that case (but it may make sense to implement a fix ourselves). However, even with the new create-cluster.sh that sets a number as the zone label on the node, the resulting node definition properly quotes the zone value, so I'm unable to reproduce the problem. A bug report to Azure makes more sense.

Copy link
Copy Markdown
Contributor Author

@6ixfalls 6ixfalls Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even when a number is somehow set as the zone label, EG should handle it completely fine, since all the strings are quoted in the template anyway.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a different theory for the cause of #6117?
I agree the double-quotes are a bad approach and I'm open to this change but we need to pinpoint exactly what was broken for azure users prior to #6133 and why your changes won't have the same issues. Are you able to test this in Azure to verify?

Copy link
Copy Markdown
Contributor

@jukie jukie Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've found the root cause — it's deeper than the annotation quoting. The mergeBootstrap() and jsonPatchBootstrap() functions round-trip the bootstrap yaml through protobuf/JSON, and yaml.JSONToYAML strips the yaml-level quotes on the way back out.

This means even with quotes in the template, the final rendered bootstrap that Envoy receives ends up with an unquoted $(ENVOY_SERVICE_ZONE).

This would specifically affect users who define a custom spec.bootstrap and have numeric zone values (e.g. Azure zones). You can verify by looking at the existing testdata where the zone value comes through unquoted after the round-trip.

A complete fix would combine your template quoting with post-processing in mergeBootstrap/jsonPatchBootstrap to re-quote $(...) env var patterns after the round-trip strips them. I started on an example approach here to confirm — https://github.com/envoyproxy/gateway/compare/main...jukie:gateway:quoted-zones-bug?expand=1 — but would rather let you carry this PR forward so feel free to pull from it or take a different direction. Happy to help if you have questions!

Copy link
Copy Markdown
Contributor Author

@6ixfalls 6ixfalls Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jukie, does this need to be solved generically with regex, or is a simple string replace ($(ENVOY_SERVICE_ZONE) -> "$(ENVOY_SERVICE_ZONE)") fine? I don't see any other place where this numerical issue would occur other than $(ENVOY_POD_NAME), which is unlikely to be a number. Alternatively, a forked YAML library could be used to force quotes (but that breaks the YAML spec).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a string replace would be reasonable too, and yeah we only need handling for ENVOY_SERVICE_ZONE

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.78%. Comparing base (9bfb386) to head (68ed69c).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8184      +/-   ##
==========================================
- Coverage   73.83%   73.78%   -0.05%     
==========================================
  Files         241      241              
  Lines       36579    36579              
==========================================
- Hits        27007    26990      -17     
- Misses       7669     7681      +12     
- Partials     1903     1908       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jukie
Copy link
Copy Markdown
Contributor

jukie commented Feb 6, 2026

Could you please explain why this wasn't necessary given it seemed to be the resolution for #6117? If we don't need it, that's great, but I want to ensure that numerical zone values are handled correctly.

Signed-off-by: Bryan Lee <me@bryanl.ee>
@6ixfalls 6ixfalls force-pushed the chore/fix_node_label branch from 300e03f to 99b066d Compare February 6, 2026 22:56
@jukie
Copy link
Copy Markdown
Contributor

jukie commented Feb 7, 2026

@6ixfalls can you run make gen-check locally so that CI passes and the e2e's can run?

@6ixfalls
Copy link
Copy Markdown
Contributor Author

Closing this PR as the current code works as intended. I wanted to use the zone value in a directResponse template (through environment variables, which is why the double-quoting behavior confused me initially), but with #7830 and kubernetes/enhancements#5146 this change is unnecessary.

@6ixfalls 6ixfalls closed this Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revert quoted value zone annotation in topology injector

2 participants