Skip to content

docs(server): add 'Plugin RPC resilience' page#109

Closed
ComBba wants to merge 2 commits into
reddit:mainfrom
ComBba:docs/plugin-rpc-resilience
Closed

docs(server): add 'Plugin RPC resilience' page#109
ComBba wants to merge 2 commits into
reddit:mainfrom
ComBba:docs/plugin-rpc-resilience

Conversation

@ComBba
Copy link
Copy Markdown

@ComBba ComBba commented May 13, 2026

Summary

Adds a new docs page — docs/capabilities/server/plugin-rpc-resilience.md — documenting the defensive pattern for the empty-gRPC-envelope failure mode that surfaces as `Error: undefined undefined: undefined` with `code/details/metadata` all undefined. When that fires, every plugin call (settings.get, redis.get/set/zRange, reddit.getModerators, scheduler.runJob) throws identically, and an unwrapped handler 500s into a silent blank screen for the moderator clicking the menu.

Three open Devvit GitHub issues share this signature:

  • reddit/devvit#258 — `reddit.submitCustomPost` / `submitPost` on playtest
  • reddit/devvit#259 — `scheduler.runJob` returns UUID but `listJobs` is empty
  • reddit/devvit#261 — `settings.get` / `redis.get,zRange` / `reddit.getModerators` all throw from the same handler

The pattern in this page kept vibe-mod (a mod-tool app for Reddit's Mod Tools and Migrated Apps Hackathon) usable on r/SocialSeeding while the plugin RPC layer was unavailable on Issue #261. Without it the menu was silent — no toast, no form, just nothing happens on click.

What's in the page

A per-layer playbook with code snippets for each surface:

Layer On throw Why
`settings.get(optional)` Warning, continue Optional override; mustn't block the required global key.
`settings.get(required)` Throw `no_key_plugin_rpc` Distinguishes platform-side from "not configured".
`redis.get` Warn + default Treat as cold cache; let the handler compute fresh state.
`redis.set` Warn + continue The user's action already happened; don't 500 because persistence failed.
`redis.zRange` (audit) Warn + empty list Audit dashboard renders an explanatory banner instead of crashing.
`reddit.getModerators` Trust gateway-side `forUserType:"moderator"` filter Defense-in-depth; the gateway is the security boundary.
`scheduler.runJob` Warn + continue Best-effort enqueue; the rule is still compiled, just no preview.
Inbound scheduler tick Warn + return 200 Don't flood the gateway with retried 500s.

Plus:

  • A `describeErr(err)` helper for observable error shapes (without it, `console.warn(err)` collapses the opaque envelope to the literal string `"undefined undefined: undefined"` and you can't tell which call failed).
  • A BYOK pattern that distinguishes "platform RPC down" from "not configured" so the user-facing toast can branch.
  • Two opt-in local-only env-var escape hatches (`MYAPP_AI_PROVIDER=mock`, `MYAPP_LOCAL_FALLBACK=1`) for keeping the compose flow runnable in playtest while the platform recovers. Both default off so production behaviour is unchanged.

Conventions followed

  • ✅ Mirrored to both `docs/` and `versioned_docs/version-0.12/` per CONTRIBUTING.
  • ✅ Markdown (no Docusaurus admonitions or React components — plain `.md` to match the cache-helper / scheduler / settings-and-secrets pages around it).
  • ✅ Code examples lifted from real app code, edited for clarity; no copy-pasted Reddit credentials or secrets.
  • ✅ Cross-links to the three open issues that share the gRPC-envelope signature.

Test plan

  • Both files render in `yarn start` locally without sidebar / link errors.
  • Links to the three Devvit issues resolve.
  • Code blocks compile in isolation (extracted into `vibe-mod`'s test suite — see Two-Weeks-Team/vibe-mod#30 for the live-deployed implementation).

CLA

I'll submit the Contribution License Agreement before merge per the README requirement.

A defensive pattern reference for the empty-gRPC-envelope failure mode
('Error: undefined undefined: undefined' with code/details/metadata all
undefined) that's been reported in #258, #259, and #261. When that pattern
fires, every plugin call (settings.get, redis.get/set/zRange,
reddit.getModerators, scheduler.runJob) throws identically, and an
unwrapped handler 500s into a silent blank screen.

The page is a per-layer playbook:
  - settings.get(optional)  → warn-only, fall through to global
  - settings.get(required)  → distinct sentinel so toasts can branch
  - redis.get               → default to "cold cache"
  - redis.set               → log + continue (user action already happened)
  - reddit.getModerators    → trust gateway-side forUserType:"moderator"
  - scheduler.runJob        → best-effort enqueue
  - inbound scheduler tick  → return 200, don't 500-loop the gateway

Plus a `describeErr(err)` helper for observable error shapes (without
`describeErr`, `console.warn(err)` collapses the opaque envelope to the
literal string "undefined undefined: undefined"), an OpenAI-key BYOK
pattern showing how to distinguish "platform RPC down" from "not
configured", and two opt-in local-only env-var escape hatches
(`MYAPP_AI_PROVIDER=mock`, `MYAPP_LOCAL_FALLBACK=1`) for keeping the
compose flow runnable in playtest while the platform recovers.

Mirrored to both `docs/` and `versioned_docs/version-0.12/` per CONTRIBUTING.

References at the bottom of the page point at the three open Devvit
issues that share the gRPC-envelope signature, so readers can confirm
this is a known pattern rather than a per-app bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ComBba ComBba requested a review from a team May 13, 2026 11:46
@ComBba
Copy link
Copy Markdown
Author

ComBba commented May 13, 2026

2026-05-14 update on the cited platform issue

A quick update for reviewers in case it affects how you think about this PR's framing:

The plugin-RPC failure mode this page documents was originally tied to reddit/devvit#261. I just posted a follow-up there — over the past 24+ hours, the undefined undefined: undefined envelope has stopped reproducing in our app's logs. settings.get / redis / reddit.getModerators are all returning successfully now. We don't know whether the underlying gRPC issue was silently fixed, or whether it's intermittent and we're in a "good" window.

I think the patterns in this page are still worth merging on their own merit:

  • They're general defense-in-depth against any transient plugin-RPC failure (not just #261).
  • They produce honest user-facing toasts instead of silent 500s, which is a UX win even on the happy path.
  • They distinguish "RPC unreachable" from "secret not configured," which is a real ambiguity worth documenting regardless of #261's current state.

Happy to soften the references to #261 in the page body if reviewers prefer the framing to be platform-issue-agnostic ("a transient empty-gRPC-envelope failure mode" rather than "the failure on Issue #261"). Let me know.

…uld-not-reproduce)

The page originally cited reddit/devvit#261 as a confirmation that the
plugin-RPC failure was wide-scope. That issue is now closed (2026-05-14)
because the symptom has not been reproducing for 24+ hours. Update the
'Related issues' section to:

- Note #261's closure with the actual reason
- Reframe the page's value as defense against transient failures generally
  (sidecar restart, secret rotation, partial deploy), not contingent on any
  one platform bug
- Keep #258 / #259 references with explicit 'open at time of writing' status

The patterns themselves are unchanged because they apply universally.
@ComBba
Copy link
Copy Markdown
Author

ComBba commented May 13, 2026

Pushed ecc77dd to update the Related Issues section: #261 is now annotated as closed-as-could-not-reproduce, and the page's framing makes clear the patterns apply to any transient plugin-RPC failure rather than to any single platform bug. PR body also updated for the same reframe. Patterns themselves are unchanged.

@ComBba
Copy link
Copy Markdown
Author

ComBba commented May 14, 2026

Closing this PR.

The motivating platform issue (reddit/devvit#261) is now resolved as could-not-reproduce — plugin RPC has been returning successfully in our app for several days, and our original reproduction was conflated with what turned out to be a separate bug on our side (SELECTION-array postmortem). With the motivating issue closed and the CLA still pending on our side, leaving this PR open as long-lived "wait state" felt like noise on the review queue.

The defensive patterns themselves are still useful as transient-failure defense, so I've published them as a standalone GitHub Gist that anyone can read and adapt without depending on this PR landing:

https://gist.github.com/ComBba/88395968eb285a796111f1d33635b3f9

The same patterns are in production on vibe-mod (PR #30). Happy to revisit a docs PR in the future if a platform issue makes the patterns more directly relevant. Thanks for the consideration!

@ComBba ComBba closed this May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant