wrangler: truncate Pages commit message at UTF-8 boundary#12378
wrangler: truncate Pages commit message at UTF-8 boundary#12378vicb merged 7 commits intocloudflare:mainfrom
Conversation
🦋 Changeset detectedLatest commit: 465de13 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
|
||
| const MAX_COMMIT_MESSAGE_BYTES = 384; | ||
|
|
||
| export function truncateUtf8Bytes(str: string, maxBytes: number): string { |
There was a problem hiding this comment.
Maybe this should be moved to the workers-util package? What do others think?
At least we just push this util function toward the bottom of the file.
There was a problem hiding this comment.
good point, i think it should be in workers-util
| if (byteCount + charBytes > maxBytes) { | ||
| break; | ||
| } | ||
| result += char; |
There was a problem hiding this comment.
perf: concatenating string is O(n^2) and 384^2 = 147k. It would be better to chars.push(char) and then return chars.join("")
return str.slice(...) and not pushing is also an option
There was a problem hiding this comment.
Agreed. I’ll push an update using the bytes-first approach
There was a problem hiding this comment.
FYI, Gemini help me came up with:
export function truncateUtf8Bytes(str: string, maxBytes: number): string {
const buf = Buffer.from(str);
if (buf.length <= maxBytes) return str;
let i = maxBytes;
// Scan backwards from the limit (max 3 bytes for UTF-8)
// We stop if we find a "Start Byte" or ASCII (value < 128)
while (i > 0) {
i--;
const byte = buf[i];
// If it's a Continuation Byte (0x80 - 0xBF), keep stepping back.
if ((byte & 0xC0) === 0x80) continue;
// We found the Start Byte (or ASCII).
// Now we perform the single check: "Does this char fit?"
// Calculate expected size:
// ASCII (0xxxxxxx) -> 1
// 110xxxxx -> 2
// 1110xxxx -> 3
// 11110xxx -> 4
let charLength = 1;
if (byte >= 0xC0) charLength = 2;
if (byte >= 0xE0) charLength = 3;
if (byte >= 0xF0) charLength = 4;
// If the sequence + start index fits within maxBytes, we are good!
// Otherwise, we cut BEFORE this character (at i).
if (i + charLength > maxBytes) {
return buf.subarray(0, i).toString('utf8');
}
// If it fits, the cut was clean.
break;
}
return buf.subarray(0, maxBytes).toString('utf8');
}|
Got it, I’ll wait for other feedback and fix everything together. |
create-cloudflare
@cloudflare/kv-asset-handler
miniflare
@cloudflare/pages-shared
@cloudflare/unenv-preset
@cloudflare/vite-plugin
@cloudflare/vitest-pool-workers
@cloudflare/workers-editor-shared
@cloudflare/workers-utils
wrangler
commit: |
vicb
left a comment
There was a problem hiding this comment.
LGTM, thanks for your work on this 🎉
Fixes #11749
Safely truncate Cloudflare Pages commit messages at valid UTF-8 boundaries before sending them to the Pages deployments API.
Cloudflare Pages enforces a fixed byte limit (384 bytes) on git commit metadata. When a multi-line commit message containing multi-byte UTF-8 characters (e.g. Cyrillic, Japanese, emoji) is truncated mid-character on the server side, the resulting string becomes invalid UTF-8 and deployments fail with error 8000111.
This change ensures Wrangler never sends an invalid UTF-8 commit message by truncating at the nearest valid UTF-8 boundary on the client side.
What was changed
truncateUtf8Bytes()inpackages/wrangler/src/api/pages/deploy.tspackages/wrangler/src/__tests__/pages/utf8-truncation.test.tsTests
Public documentation
Docs: Not required — this is an internal deployment-safety fix affecting only the Pages deploy API payload. No user-facing CLI or API changes.