Skip to content

fix: generate json_sanitizer fuzz corpus binary seed at build time (resolves Scorecard Binary-Artifacts alert #17)#44805

Closed
Copilot wants to merge 2 commits intomainfrom
copilot/remove-binary-executable-artifact
Closed

fix: generate json_sanitizer fuzz corpus binary seed at build time (resolves Scorecard Binary-Artifacts alert #17)#44805
Copilot wants to merge 2 commits intomainfrom
copilot/remove-binary-executable-artifact

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 1, 2026

test/common/json/json_sanitizer_corpus/binary_file (5698 bytes, added in #20637) was flagged by OSSF Scorecard Binary-Artifacts as a checked-in binary executable. It is a libFuzzer corpus seed with zero documented provenance — committed without comment alongside the harness that introduced the non-UTF-8 code path it exercises, never modified, no CVE/OSS-Fuzz reference. The exact bytes are not load-bearing; the only requirement is that the seed hits sanitize()'s octal-escape fallback path.

Changes

  • test/common/json/gen_binary_corpus.sh (new): deterministic bash generator producing a 5698-byte blob that covers every single-byte value, truncated/invalid/surrogate UTF-8 sequences, an overlong NUL encoding, JSON-significant ASCII interleaved with high-bit bytes, and a pseudo-random tail (b[i] = (b[i-1]*31+7) & 0xff) to fill to the original size. set -euo pipefail, LC_ALL=C, single output redirection to $1, no temp files.

  • test/common/json/json_sanitizer_corpus/binary_file (deleted): replaced by the genrule output.

  • test/common/json/BUILD (modified):

    • genrule json_sanitizer_corpus_binary_file runs gen_binary_corpus.sh to emit json_sanitizer_corpus/binary_file at build time.
    • filegroup json_sanitizer_corpus_files unions the glob(["json_sanitizer_corpus/**"]) text seeds with the genrule output.
    • json_sanitizer_fuzz_test updated to corpus = ":json_sanitizer_corpus_files" — the envoy_cc_fuzz_test macro already passes label-form corpus arguments through directly, so no macro changes are needed.
genrule(
    name = "json_sanitizer_corpus_binary_file",
    srcs = ["gen_binary_corpus.sh"],
    outs = ["json_sanitizer_corpus/binary_file"],
    cmd = "bash $(location gen_binary_corpus.sh) $@",
)

filegroup(
    name = "json_sanitizer_corpus_files",
    testonly = 1,
    srcs = glob(["json_sanitizer_corpus/**"]) + [":json_sanitizer_corpus_binary_file"],
)

envoy_cc_fuzz_test(
    name = "json_sanitizer_fuzz_test",
    corpus = ":json_sanitizer_corpus_files",
    ...
)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • ftpmirror.gnu.org
    • Triggering command: /build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/embedded_tools/jdk/bin/java bazel(envoy) --add-opens=java.base/java.lang=ALL-UNNAMED -Xverify:none -Djava.util.logging.config.file=/build/bazel_root/base/javalog.properties -Dcom.google.devtools.build.lib.util.LogHandlerQuerier.class=com.google.devtools.build.lib.util.SimpleLogHandler$HandlerQuerier -XX:-MaxFDLimit -Djava.library.path=/build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/embedded_tools/jdk/lib:/build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/embedded_tools/jdk/lib/server:/build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/ -Dfile.encoding=ISO-8859-1 -Duser.country= -Duser.language= -Duser.variant= -Xmx3g -DBAZEL_TRACK_SOURCE_DIRECTORIES=1 -Djavax.net.ssl.trustStore=/tmp/custom-cacerts -Djavax.net.ssl.trustStorePassword=changeit -jar /build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/A-server.jar --max_idle_secs=10800 --noshutdown_on_low_sys_mem --connect_timeout_secs=30 (dns block)
  • mirror.bazel.build
    • Triggering command: /build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/embedded_tools/jdk/bin/java bazel(envoy) --add-opens=java.base/java.lang=ALL-UNNAMED -Xverify:none -Djava.util.logging.config.file=/build/bazel_root/base/javalog.properties -Dcom.google.devtools.build.lib.util.LogHandlerQuerier.class=com.google.devtools.build.lib.util.SimpleLogHandler$HandlerQuerier -XX:-MaxFDLimit -Djava.library.path=/build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/embedded_tools/jdk/lib:/build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/embedded_tools/jdk/lib/server:/build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/ -Dfile.encoding=ISO-8859-1 -Duser.country= -Duser.language= -Duser.variant= -Xmx3g -DBAZEL_TRACK_SOURCE_DIRECTORIES=1 -Djavax.net.ssl.trustStore=/tmp/custom-cacerts -Djavax.net.ssl.trustStorePassword=changeit -jar /build/bazel_root/install/fb2a7f6d344d2f4e335882534df59296/A-server.jar --max_idle_secs=10800 --noshutdown_on_low_sys_mem --connect_timeout_secs=30 (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Background

Code Scanning alert #17 (OSSF Scorecard Binary-Artifacts, severity High) flags test/common/json/json_sanitizer_corpus/binary_file as a checked-in binary executable artifact. It is not actually an executable — it is a 5,698-byte opaque seed input for the libFuzzer-driven json_sanitizer_fuzz_test (declared via corpus = "json_sanitizer_corpus" in test/common/json/BUILD).

Provenance investigation (already done)

  • Added in PR #20637 / commit 1c66d57 (2022-04-08), single commit, never modified since.
  • Not present in the predecessor PR json: add hand-rolled json sanitizer #20428.
  • The non-UTF-8 code path it seeds was introduced in the same PR — so it cannot be a captured pre-existing crasher.
  • The PR description, commit message, and ~30 review comments on admin: Json nlohmann sanitizer #20637 contain zero mention of binary_file. No CVE, OSS-Fuzz report, issue, or commit anywhere in the repo references it. Almost certainly libFuzzer-mutated bytes from a local dev run, dropped in alongside the obvious text seeds (lower_case, punctuation, etc.).
  • The exact bytes are not load-bearing. The fuzz harness (json_sanitizer_fuzz_test.cc) uses this seed only to exercise the non-UTF-8 / octal-escape fallback path in Envoy::Json::sanitize() — libFuzzer mutates from there.

Goal

Stop checking in binary_file and instead generate it deterministically at build time via a Bazel genrule. This resolves the Scorecard finding cleanly (it's no longer a checked-in binary artifact) without gaming the heuristic and without losing any test coverage.

Constraints (please respect)

  • Use bash for the generator. Do not introduce a Python toolchain dependency for this. Bash is already universally available in Envoy's build environment. Only fall back to Python if there's a concrete reason bash cannot do the job (which I don't expect — generating ~5KB of deterministic bytes via printf and a loop is trivial).
  • No C++ generator either — keep the change as small and self-contained as possible.
  • The generator must be deterministic (same bytes every build) and hermetic (no network, no env-dependent behavior).
  • The generated file must be wired into the existing envoy_cc_fuzz_test(name = "json_sanitizer_fuzz_test", corpus = "json_sanitizer_corpus", ...) so libFuzzer picks it up exactly like the existing text seeds in that directory.

Required changes

  1. Add test/common/json/gen_binary_corpus.sh — a bash script that writes a deterministic binary blob to $1 (output path). The blob should be designed to seed the non-UTF-8 / octal-escape path of Envoy::Json::sanitize(). Concrete content the script should emit (in order, concatenated):

    • All 256 byte values \x00..\xff (covers every single-byte input).
    • Truncated multi-byte UTF-8 starters: \xc2, \xe0\xa4, \xf0\x9d\x84.
    • Invalid continuation byte: \xc2\xff.
    • Surrogate-range 3-byte encoding (always invalid UTF-8): \xed\xa0\x80.
    • Overlong encoding of NUL: \xc0\x80.
    • JSON-significant ASCII (", \, control chars \x00..\x1f, \x7f) interleaved with high-bit bytes \x80..\xff.
    • High-bit bytes embedded in ASCII context: literal Hello, + \xff\xfe\xfd + , world!.
    • A deterministic pseudo-random tail computed from the bytes already written (e.g. b[i] = (b[i-1] * 31 + 7) & 0xff seeded from the prefix) until the file reaches a target size of 5698 bytes (matches the original size; not strictly necessary but keeps fuzzer behavior similar).
    • Truncate to exactly 5698 bytes at the end.
    • Use set -euo pipefail and LC_ALL=C. Use printf and a single output redirection — do not call dd, head, etc. unnecessarily. Avoid temp files; write straight to $1.
    • Make the script chmod +x.
  2. Delete test/common/json/json_sanitizer_corpus/binary_file.

  3. Modify test/common/json/BUILD to:

    • Add a genrule that runs gen_binary_corpus.sh and produces json_sanitizer_corpus/binary_file as its output. Example shape:
      genrule(
          name = "json_sanitizer_corpus_binary_file",
          outs = ["json_sanitizer_corpus/binary_file"],
          srcs = ["gen_binary_corpus.sh"],
          cmd = "$(location gen_binary_corpus.sh) $@",
      )
    • Wire the generated file into the json_sanitizer_fuzz_test corpus. This is the part that needs investigation. The envoy_cc_fuzz_test macro is defined in bazel/envoy_build_system.bzl; please read it to determine how corpus = "json_sanitizer_corpus" is expanded (likely via a filegroup with a glob). A glob over the source tree will not pick up genrule outputs. Choose the least invasive of these options:
      • Preferred: if `env...

This pull request was created from Copilot chat.

@repokitteh-read-only
Copy link
Copy Markdown

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #44805 was opened by Copilot.

see: more, trace.

Copilot AI changed the title [WIP] Remove checked-in binary executable artifact from repository fix: generate json_sanitizer fuzz corpus binary seed at build time (resolves Scorecard Binary-Artifacts alert #17) May 1, 2026
Copilot finished work on behalf of phlax May 1, 2026 10:35
Copilot AI requested a review from phlax May 1, 2026 10:35
@phlax phlax closed this May 1, 2026
@phlax phlax deleted the copilot/remove-binary-executable-artifact branch May 1, 2026 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants