[codex] Chunk translation queue jobs#615
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds incremental, batch-by-batch HTML translation using a deterministic Changes
Sequence DiagramsequenceDiagram
participant Client
participant Queue as Queue Service
participant Worker as Translation Worker
participant R2 as R2 (Partial / Final)
participant Origin as Origin HTML
Client->>Queue: Enqueue translation job
Queue-->>Worker: Deliver job (may include reason: 'continue', sourceHash)
Worker->>Origin: Fetch source HTML
Origin-->>Worker: Return HTML or redirect/non-HTML
Worker->>R2: Lookup PartialTranslationState by sourceHash
alt Partial exists
R2-->>Worker: PartialTranslationState
else No partial
Worker->>Worker: Compute sourceHash, split into batches
end
Worker->>Worker: Translate up to one missing batch
Worker->>R2: Persist updated PartialTranslationState
alt More batches remain
Worker->>Queue: Enqueue continuation job (reason: 'continue', sourceHash)
else All batches complete
Worker->>Worker: Render final translated HTML
Worker->>R2: Write StoredTranslatedResponse (final) and delete PartialTranslationState
Worker->>Cache: Populate Cache API with final result
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
apps/translation-worker/src/index.ts (2)
1177-1186: 💤 Low valueConsider using nullish coalescing assignment.
Static analysis suggests using
??=operator for simpler code.♻️ Optional refactor using ??=
let state = await readPartialTranslationState(requestUrl, locale, sourceHash) - if (!state) { - state = { - cacheVersion: TRANSLATION_CACHE_VERSION, - sourceHash, - locale, - translatedBatches: [], - updatedAt: Date.now(), - } - } + state ??= { + cacheVersion: TRANSLATION_CACHE_VERSION, + sourceHash, + locale, + translatedBatches: [], + updatedAt: Date.now(), + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/translation-worker/src/index.ts` around lines 1177 - 1186, The code initializes state from readPartialTranslationState(...) and then assigns a default object if falsy; replace that pattern with a nullish coalescing assignment to simplify it: use the existing variable name state with the ??= operator so that state ??= { cacheVersion: TRANSLATION_CACHE_VERSION, sourceHash, locale, translatedBatches: [], updatedAt: Date.now() } after the call to readPartialTranslationState(requestUrl, locale, sourceHash) — keep references to readPartialTranslationState, state, and TRANSLATION_CACHE_VERSION to locate and update the assignment.
669-677: 💤 Low valueConsider simplifying nested template literal.
Static analysis flagged the nested template literal at line 670. While functional, this can be simplified for readability.
♻️ Optional refactor to simplify
function errorMessage(error: unknown): string { - if (error instanceof Error) return `${error.name}: ${error.message}${error.stack ? `\n${error.stack}` : ''}` + if (error instanceof Error) { + const base = `${error.name}: ${error.message}` + return error.stack ? `${base}\n${error.stack}` : base + } if (typeof error === 'string') return error🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/translation-worker/src/index.ts` around lines 669 - 677, The nested template literal inside errorMessage makes the string construction harder to read; simplify by building the message in steps instead of embedding a template inside a template. In the errorMessage function, create a base string like `${error.name}: ${error.message}` and, if error.stack exists, append '\n' + error.stack (or use string concatenation) before returning; update the branch for error instanceof Error accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@apps/translation-worker/src/index.ts`:
- Around line 1177-1186: The code initializes state from
readPartialTranslationState(...) and then assigns a default object if falsy;
replace that pattern with a nullish coalescing assignment to simplify it: use
the existing variable name state with the ??= operator so that state ??= {
cacheVersion: TRANSLATION_CACHE_VERSION, sourceHash, locale, translatedBatches:
[], updatedAt: Date.now() } after the call to
readPartialTranslationState(requestUrl, locale, sourceHash) — keep references to
readPartialTranslationState, state, and TRANSLATION_CACHE_VERSION to locate and
update the assignment.
- Around line 669-677: The nested template literal inside errorMessage makes the
string construction harder to read; simplify by building the message in steps
instead of embedding a template inside a template. In the errorMessage function,
create a base string like `${error.name}: ${error.message}` and, if error.stack
exists, append '\n' + error.stack (or use string concatenation) before
returning; update the branch for error instanceof Error accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 86d2cdb4-8aa9-4a62-8bc0-f28dac913e50
📒 Files selected for processing (2)
apps/translation-worker/src/index.tsapps/translation-worker/wrangler.jsonc
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0a5bfea484
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/translation-worker/src/index.ts (1)
605-613:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
MAX_BATCH_CHARSis still bypassed by a single oversized segment.Because Line 607 only rolls the batch when
currentBatch.length > 0, the first segment in a batch can be arbitrarily large. A long text node will still go through as one huge model call, which keeps the original timeout/wall-time failure mode alive for pages with large paragraphs or blobs.Please split oversize segments before batching, or at minimum detect and fail them explicitly with a dedicated path instead of silently violating the cap.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/translation-worker/src/index.ts` around lines 605 - 613, The batching loop allows a single segment to exceed MAX_BATCH_CHARS because it only rolls the batch when currentBatch.length > 0; update the logic in the loop over segments to detect oversized segment.text and either (A) split segment.text into chunks no larger than MAX_BATCH_CHARS and feed those chunks into the existing batching logic (ensuring currentBatch/currentLength are updated per chunk), or (B) explicitly surface a clear failure path when a single segment exceeds MAX_BATCH_CHARS; modify the code around the for (const segment of segments) loop and the usage of MAX_BATCH_CHARS, currentBatch, and currentLength so no individual pushed item can bypass the character cap.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/translation-worker/src/index.ts`:
- Around line 1132-1133: The localized redirect created by
localizeRedirect(originResponse, requestUrl, locale) is never persisted to
cacheKey before the job is marked complete, causing perpetual MISSes; update the
isRedirect handling in the function that uses isRedirect/localizeRedirect to
store the localized redirect into the cache (using the same cacheKey used for
this job) and only then return the response object (or true where applicable).
Apply the same persistence step for the other early-return path referenced
around the 1162–1163 spot so that any 30x canonicalizations write the localized
redirect to cacheKey before completing the job.
---
Outside diff comments:
In `@apps/translation-worker/src/index.ts`:
- Around line 605-613: The batching loop allows a single segment to exceed
MAX_BATCH_CHARS because it only rolls the batch when currentBatch.length > 0;
update the logic in the loop over segments to detect oversized segment.text and
either (A) split segment.text into chunks no larger than MAX_BATCH_CHARS and
feed those chunks into the existing batching logic (ensuring
currentBatch/currentLength are updated per chunk), or (B) explicitly surface a
clear failure path when a single segment exceeds MAX_BATCH_CHARS; modify the
code around the for (const segment of segments) loop and the usage of
MAX_BATCH_CHARS, currentBatch, and currentLength so no individual pushed item
can bypass the character cap.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 25b2d30a-7de1-4266-ad27-70536c4154fa
📒 Files selected for processing (1)
apps/translation-worker/src/index.ts
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fe8b46ede5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/translation-worker/src/index.ts`:
- Around line 1284-1290: The branch handling source.type === 'response'
currently only persists redirects and always clears the pending marker and
returns true; update it to distinguish stable terminal responses (e.g.,
localized 404s) from transient failures and persist stable non-redirect
responses exactly like redirects: detect non-redirect but stable responses, call
toCachedResponse(source.response.clone()) and await cacheTranslatedResponse(env,
requestUrl, locale, cacheKey, cachedResponse) before calling
deletePartialTranslationStateSafely(env, requestUrl, locale) and returning true;
keep transient failures from being marked complete (do not clear the pending
marker or return true for them). Ensure you use the existing helpers isRedirect,
toCachedResponse, cacheTranslatedResponse, and
deletePartialTranslationStateSafely and preserve requestUrl, locale, cacheKey,
and env in the implementation.
- Around line 1321-1337: When scheduling a continuation job, refresh the pending
TTL so the partial-translation pending marker doesn't expire while continuation
batches are running; modify the continuation branch around
writePartialTranslationState and env.TRANSLATION_QUEUE.send (the block that sets
cacheVersion/locale/sourceHash/translatedBatches/updatedAt and then calls
env.TRANSLATION_QUEUE.send) to also set/extend a pending expiry (e.g.,
pendingUntil or pendingExpires = Date.now() + PENDING_PENDING_TTL_MS) and
persist that field in writePartialTranslationState before enqueuing the
continuation; ensure the same constant used for pending TTL (create or reuse
PENDING_PENDING_TTL_MS or a descriptive name) is applied so fresh requests see
the marker and do not enqueue duplicate miss/stale jobs.
- Around line 1467-1475: The current code can throw when calling
caches.default.put(...) (or when enqueueTranslationSafely(...) runs) and that
propagates, turning a valid R2 hit into a fallback; wrap the cache repopulation
and the enqueue call in their own try/catch blocks so any errors are
swallowed/logged but do not prevent returning the storedResponse. Specifically,
in the block after readStoredTranslatedResponse and readTranslatedAt, call await
caches.default.put(cacheKey, storedResponse.clone()) inside a try { ... } catch
(err) { /* log */ } and likewise call enqueueTranslationSafely(env, requestUrl,
locale, 'stale') inside a separate try/catch so failures there don’t throw;
finally always return withResponseHeaders(storedResponse, isStale ? 'STALE' :
'HIT', isHead).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b48b2e01-8511-42ca-9cd4-d89f8bbaff45
📒 Files selected for processing (3)
.github/workflows/deploy-translation.ymlapps/translation-worker/src/index.tsapps/translation-worker/wrangler.jsonc
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/translation-worker/wrangler.jsonc
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e8852dcda
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/translation-worker/src/index.ts`:
- Around line 350-355: canonicalTranslationStoreUrl currently drops the entire
query string which causes different query variants to alias to the same R2 store
key (while fetchEnglishOrigin preserves the search), so change
canonicalTranslationStoreUrl to use a deterministic normalized-query policy
instead of blanking search: parse requestUrl.searchParams, filter out
volatile/tracking params (e.g., utm_*, session ids) or allowlist only the query
keys that affect translation, sort the remaining keys and values
deterministically, rebuild storeUrl.search from that canonicalized string, and
return storeUrl.toString(); reference canonicalTranslationStoreUrl and ensure
its normalization policy is consistent with how fetchEnglishOrigin treats
queries.
- Around line 1316-1320: The current branch calls cacheTranslatedResponse() for
any stable terminal Response which will call writeStoredTranslatedResponse()
that uses response.text(), corrupting non-text/binary bodies; update the logic
in the branch that calls toCachedResponse/cacheTranslatedResponse to first
inspect source.response.headers.get('content-type') and only use the existing
text-path for textual content types (e.g., text/*, application/json,
application/xml, or those with charset=utf-8); for non-text/binary types, change
toCachedResponse and writeStoredTranslatedResponse to read the body as raw bytes
(response.arrayBuffer()), encode it (e.g., base64) and persist a flag/metadata
indicating binary encoding so reads decode properly, or simply skip caching
binary responses—ensure the change references isStableTerminalResponse,
toCachedResponse, cacheTranslatedResponse and writeStoredTranslatedResponse so
the text vs binary handling is implemented consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f584e077-ebbb-4dd4-b905-d7302abbf7df
📒 Files selected for processing (1)
apps/translation-worker/src/index.ts
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|



Summary
Root cause
The live /id/ page was redirecting to English because the Indonesian translation cache was empty. The queue consumer was failing after ~108s wall time while translating the full page in one invocation, so it never wrote the cached localized HTML. The logged CPU time was only 69ms, so increasing CPU alone would not fix it.
Validation
Summary by CodeRabbit
New Features
Refactor
Chores