Skip to content

[codex] Refresh translations when source changes#637

Merged
riderx merged 3 commits into
mainfrom
codex/translation-source-check
May 6, 2026
Merged

[codex] Refresh translations when source changes#637
riderx merged 3 commits into
mainfrom
codex/translation-source-check

Conversation

@riderx
Copy link
Copy Markdown
Member

@riderx riderx commented May 6, 2026

Summary

  • store the English source hash with cached/R2 translated responses
  • schedule a background source-hash check for cached translated pages at most once every 5 minutes
  • requeue translation refreshes when the English source hash changes, without blocking the localized response

Impact

Localized pages keep returning fast from cache, while source updates are detected much sooner than the previous 24-hour freshness window.

Validation

  • bun run ci:verify:translation

Summary by CodeRabbit

  • Improvements
    • Translations now refresh when source content changes and include source-hash tracking for smarter caching.
    • Background translation processing and safer cache refreshes for more reliable, incremental updates with persisted state.
  • New Features
    • Payloads and enqueueing updated to support source-aware, partial-state continuation and background re-translation.
  • Tests
    • Added verification that source-specific refresh requests are deduplicated in the priority queue.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Warning

Rate limit exceeded

@riderx has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9855a30-6289-4842-8708-f1228f276751

📥 Commits

Reviewing files that changed from the base of the PR and between 614a808 and 9de2bbc.

📒 Files selected for processing (2)
  • apps/translation-worker/scripts/verify-parser.ts
  • apps/translation-worker/src/index.ts
📝 Walkthrough

Walkthrough

Adds source-hash-aware translation caching, incremental-state persistence, and background task orchestration to the translation worker; API surface expanded to accept an optional WorkerExecutionContext for ctx-aware background scheduling.

Changes

Source-Hash-Aware Translation Caching and Background Orchestration

Layer / File(s) Summary
Data Shape
apps/translation-worker/src/index.ts
StoredTranslatedResponse and TranslationCoordinatorRecord gain optional sourceHash. New WorkerExecutionContext type with waitUntil introduced. New constants: TRANSLATION_SOURCE_CHECK_SECONDS, TRANSLATION_SOURCE_HASH_HEADER.
Helpers / Utilities
apps/translation-worker/src/index.ts
readTranslationSourceHash() parses/validates header. translationSourceHash() computes SHA-256 over cache version, locale, and source HTML. sourceCheckKeyFor() builds source-check cache keys. isTranslatedResponseFreshForJob() now considers sourceHash equality.
Core Cache & Storage
apps/translation-worker/src/index.ts
putTranslationPendingMarker() and toCachedResponse() accept/propagate optional sourceHash. Stored translations read propagate sourceHash into response headers. Incremental refresh writes responses with associated sourceHash.
Incremental / Partial Translation
apps/translation-worker/src/index.ts
Partial-state load/validation updated to use sourceHash; refresh flow computes sourceHash and returns a cached translated response when no segments remain. Mismatches log source-change events.
Queueing & Orchestration
apps/translation-worker/src/index.ts
New enqueueTranslation() and enqueueTranslationSafely() thread sourceHash into markers and queue records. New background-check scheduling helpers: scheduleTranslationBackgroundTask(), checkTranslatedSourceFreshness() and variants.
Serving / API Wiring
apps/translation-worker/src/index.ts
serveTranslated() signature extended to ctx?: WorkerExecutionContext; background re-checks scheduled via ctx. Default export fetch() signature changed to async fetch(request, env, ctx?) and now forwards ctx into serve flow.
Tests / Scripts
apps/translation-worker/scripts/verify-parser.ts
Adds a test flow exercising source-specific refresh: enqueues a priority refresh with a constant 64-hex sourceHash, expects enqueue success, then a duplicate refresh expecting rejection (no queue length change).

Sequence Diagram

sequenceDiagram
    participant Client
    participant Worker as Translation Worker
    participant Cache as KV/R2 Cache
    participant Queue as Translation Queue
    participant BG as Background Context

    Client->>Worker: fetch(request, env, ctx?)
    Worker->>Worker: readTranslationSourceHash(request)
    Worker->>Cache: lookup cached translation
    alt cached & sourceHash matches & fresh
        Cache-->>Worker: cached translation (with sourceHash)
        Worker-->>Client: return cached translation
    else cached but stale or sourceHash mismatch
        Cache-->>Worker: stale translation
        Worker->>BG: ctx.waitUntil(enqueueTranslation(..., sourceHash))
        Worker-->>Client: serve stale translation (with sourceHash)
        BG->>Queue: enqueue translation task (sourceHash)
    else no cached translation
        Worker->>Worker: compute translationSourceHash(locale, html)
        Worker->>BG: ctx.waitUntil(enqueueTranslation(..., sourceHash))
        Worker-->>Client: serve pending marker
        BG->>Queue: enqueue translation task (sourceHash)
    end
    Queue->>Worker: process translation job
    Worker->>Cache: putTranslationPendingMarker(..., sourceHash)
    Worker->>Cache: store translated response with sourceHash
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Cap-go/website#624: Overlaps on queueing and incremental translation state persistence changes in apps/translation-worker/src/index.ts.
  • Cap-go/website#613: Related work moving refresh into queue-driven background processing and changing worker execution context handling.
  • Cap-go/website#615: Closely related; adds sourceHash propagation and R2-backed partial-state handling in the translation worker.

Poem

🐰 Hopped a hash into the queue so wide,

Source checks whisper when HTML's tried,
Background waits while translations grow,
Cached and safe, the translations flow,
A rabbit cheers: "Fresh pages, hop-tide!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[codex] Refresh translations when source changes' directly reflects the main objective of the PR, which is to implement source-hash aware translation refresh mechanism.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/translation-source-check

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 860e0c89ca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/translation-worker/src/index.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
apps/translation-worker/src/index.ts (1)

2049-2079: 💤 Low value

Minor race condition in source check throttling.

Between checking alreadyChecked (line 2052) and putting the marker (lines 2055-2063), concurrent requests for the same URL could all pass the check and trigger parallel source fetches. This is benign (just redundant origin requests) since the check has a 5-minute throttle and operations are idempotent.

Also, the Accept-Language: locale header on line 2069 is overwritten by fetchEnglishOrigin internally. Consider removing it for clarity or adding a comment.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/translation-worker/src/index.ts` around lines 2049 - 2079, In
checkTranslatedSourceFreshness, prevent the race between the alreadyChecked
check and the caches.default.put by inserting the cache marker immediately after
the alreadyChecked guard: call caches.default.put(checkKey, new
Response(knownSourceHash ?? '', {...})) before calling loadSourceHtml so
concurrent requests see the marker and skip redundant origin fetches; keep the
same Cache-Control/X-Capgo headers and allow the marker to be updated later if
needed. Also remove (or add a clarifying comment about) the Accept-Language
header on the render Request since fetchEnglishOrigin overwrites it—adjust the
renderRequest creation in checkTranslatedSourceFreshness accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@apps/translation-worker/src/index.ts`:
- Around line 2049-2079: In checkTranslatedSourceFreshness, prevent the race
between the alreadyChecked check and the caches.default.put by inserting the
cache marker immediately after the alreadyChecked guard: call
caches.default.put(checkKey, new Response(knownSourceHash ?? '', {...})) before
calling loadSourceHtml so concurrent requests see the marker and skip redundant
origin fetches; keep the same Cache-Control/X-Capgo headers and allow the marker
to be updated later if needed. Also remove (or add a clarifying comment about)
the Accept-Language header on the render Request since fetchEnglishOrigin
overwrites it—adjust the renderRequest creation in
checkTranslatedSourceFreshness accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e26630ab-5a74-4992-8edc-9308f69d3c74

📥 Commits

Reviewing files that changed from the base of the PR and between 22da870 and 860e0c8.

📒 Files selected for processing (1)
  • apps/translation-worker/src/index.ts

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 614a8083e7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/translation-worker/src/index.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/translation-worker/src/index.ts`:
- Around line 470-478: The continuation path that sets a pending marker fails to
include the sourceHash, which causes dedupe keys to be lost when
TRANSLATION_COORDINATOR is unset; update the continuation flow to pass the
sourceHash into putTranslationPendingMarker (ensure the variable used to
detect/enqueue translations — the same sourceHash computed earlier in the
continue/refresh branch — is forwarded) so the pending cache entry includes
TRANSLATION_SOURCE_HASH_HEADER; locate the continue/refresh branch that
currently calls putTranslationPendingMarker without sourceHash and add the
sourceHash argument when invoking it.
- Around line 2438-2439: The current matcher allows a hash-less job to match any
record (returning true when job.sourceHash is falsy), which corrupts dedupe
state when the same matcher is reused for both /enqueue and /complete; change
the final check to require a strict sourceHash equality so a hash-less job only
matches hash-less records and a hash-specific job only matches the same hash.
Concretely, in the function that returns using record and job (the block with
"if (!record || record.cacheVersion !== job.cacheVersion || record.locale !==
job.locale || record.url !== job.url) return false"), replace the loose check
"return !job.sourceHash || record.sourceHash === job.sourceHash" with logic that
returns true only when both sourceHash are equal (e.g., treat undefined/empty
the same by comparing record.sourceHash === job.sourceHash or explicitly check
both falsy or exact equality) so both /enqueue and /complete use a symmetric,
strict match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a3b065d4-2cbe-450e-a729-b93c95ceeec6

📥 Commits

Reviewing files that changed from the base of the PR and between 860e0c8 and 614a808.

📒 Files selected for processing (2)
  • apps/translation-worker/scripts/verify-parser.ts
  • apps/translation-worker/src/index.ts

Comment thread apps/translation-worker/src/index.ts
Comment thread apps/translation-worker/src/index.ts Outdated
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 6, 2026

@riderx riderx merged commit 84c585f into main May 6, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant