Skip to content

claw-wall can trigger Discord global rate limits via per-bot channel polling and missing 429 backoff #147

@mostlydev

Description

@mostlydev

Summary

claw up currently injects claw-wall with one Discord token per bot per channel, and claw-wall continues polling after Discord returns 429 Too Many Requests instead of backing off. On a multi-bot pod, that combination can trigger Discord global rate limiting and make channel-context feel frozen.

Impact

On Tiverton, #trading-floor was being polled through 7 bot tokens and #infra through 2 bot tokens every 15 seconds. Discord eventually responded with:

You are being blocked from accessing our API temporarily due to exceeding global rate limits.

Once that happened:

  • channel-context stopped updating reliably
  • mentions/feed behavior felt frozen from the operator side
  • the wall kept retrying and prolonging the block instead of honoring backoff

Root Cause

There are two separate problems that compound:

  1. injectConversationWall() builds CLAW_WALL_TOKENS from every resolved Discord handle with channel IDs.

    • File: cmd/claw/compose_up.go
    • Result: same channel is polled N times with N different bot tokens.
  2. claw-wall treats Discord 429 as a normal error and polls again on the next tick.

    • File: cmd/claw-wall/discord.go
    • Result: once blocked, the wall keeps hammering the API instead of backing off.

Expected Behavior

  • claw-wall should use a single reader token per channel, not one token per bot.
  • If a pod has a master service, prefer its token for shared channel reads.
  • claw-wall should honor Retry-After / X-RateLimit-Reset-After on 429s.
  • A token-wide/global 429 should pause polling for all channels using that token until the backoff window expires.
  • The default wall poll interval should be less aggressive than 15s for multi-bot pods.

Suggested Fix

  • In cmd/claw/compose_up.go:

    • collapse CLAW_WALL_TOKENS to one (channel, token) pair per channel
    • prefer the master claw's token when available
    • set a saner default poll interval for the wall (for example 30s)
  • In cmd/claw-wall/discord.go:

    • parse Discord 429 responses into a structured rate-limit error
    • respect Retry-After and X-RateLimit-Reset-After
    • back off per pair for local 429s and per token for global/token-wide 429s

Repro

  1. Create a pod with multiple Discord bots subscribed to the same channel.
  2. Run claw up -d.
  3. Inspect CLAW_WALL_TOKENS on the injected claw-wall container.
  4. Observe duplicate (channel, token) entries for the same channel across multiple bots.
  5. Let the pod run against Discord for long enough.
  6. Observe repeated 429 Too Many Requests responses in claw-wall logs.

Notes

I patched this locally in the Tiverton operator environment and verified that the fix shape works:

  • CLAW_WALL_TOKENS dropped from 9 entries to 2
  • both shared channels now use one stable reader token
  • claw-wall logs now emit a single backoff event per channel and stop retrying during the cooldown window

One operator caveat: the local live pod can be fixed with a patched dev claw, but the stock release claw binary will recreate the old fanout behavior until this is merged upstream.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions