Skip to content

[codex] Fallback lite GLM to standard Fireworks#543

Merged
jahooma merged 1 commit intomainfrom
jahooma/lite-fw-fallback
Apr 25, 2026
Merged

[codex] Fallback lite GLM to standard Fireworks#543
jahooma merged 1 commit intomainfrom
jahooma/lite-fw-fallback

Conversation

@brandonkachen
Copy link
Copy Markdown
Collaborator

This updates lite-mode GLM routing to fall back to the standard Fireworks API when the dedicated deployment is outside hours, cooling down, returns 5xxs, or throws before responding.

Previously, deployment-backed GLM requests surfaced provider availability errors instead of retrying serverless, which broke codebuff lite mode when the dedicated deployment was unavailable.

The user impact is that lite-mode GLM stays available without changing freebuff waiting-room behavior, which still preserves the existing hard-stop semantics for free-mode traffic.

Validation: ANTHROPIC_API_KEY=dummy FIREWORKS_API_KEY=dummy GRAVITY_API_KEY=dummy STRIPE_SUBSCRIPTION_100_PRICE_ID=dummy STRIPE_SUBSCRIPTION_200_PRICE_ID=dummy STRIPE_SUBSCRIPTION_500_PRICE_ID=dummy bun test web/src/llm-api/__tests__/fireworks-deployment.test.ts web/src/app/api/v1/freebuff/session/__tests__/session.test.ts.

@jahooma jahooma marked this pull request as ready for review April 25, 2026 00:32
@jahooma jahooma merged commit fc9a76d into main Apr 25, 2026
19 checks passed
@jahooma jahooma deleted the jahooma/lite-fw-fallback branch April 25, 2026 00:34
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

This PR adds a lite-mode fallback so that GLM requests with cost_mode: 'lite' transparently retry against the standard Fireworks serverless API when the dedicated deployment is outside hours, cooling down, returns a 5xx, or throws before responding. Existing freebuff/non-lite behaviour is fully preserved.

Confidence Score: 5/5

Safe to merge — logic is correct, all new paths are covered by tests, and existing behaviour is unchanged.

The only finding is a P2 style suggestion to merge two duplicated fallback branches; no correctness or reliability issues were identified.

No files require special attention.

Important Files Changed

Filename Overview
web/src/llm-api/fireworks.ts Adds lite-mode fallback to the standard Fireworks API across four deployment-unavailability scenarios; logic is correct but the two pre-deployment checks contain duplicated fallback branches that could be merged.
web/src/llm-api/tests/fireworks-deployment.test.ts Adds four well-structured tests covering lite-mode fallback for outside-hours, cooldown, 5xx, and thrown-error scenarios; good coverage of the new code paths.

Comments Outside Diff (1)

  1. web/src/llm-api/fireworks.ts, line 733-771 (link)

    P2 Consolidate the two pre-deployment fallback checks

    The "outside hours" and "cooling down" blocks share identical lite-mode behaviour (log + return standard API), so they can be merged into a single guard. This removes one repeated if (shouldFallbackToStandardApi) branch and keeps both distinct error responses for non-lite callers:

    if (hasDeployment && (!isDeploymentHours(now) || isDeploymentCoolingDown())) {
      if (shouldFallbackToStandardApi) {
        logger.info(
          { model: originalModel },
          'Falling back to Fireworks standard API (deployment unavailable)',
        )
        return createStandardApiRequest()
      }
      if (!isDeploymentHours(now)) {
        return new Response(
          JSON.stringify({
            error: {
              message: `${originalModel} is only available during ${FREEBUFF_DEPLOYMENT_HOURS_LABEL}. Use minimax/minimax-m2.7 outside those hours.`,
              code: 'DEPLOYMENT_OUTSIDE_HOURS',
              type: 'availability_error',
            },
          }),
          { status: 503, statusText: 'Service Unavailable' },
        )
      }
      return new Response(
        JSON.stringify({
          error: {
            message: `${originalModel} deployment is temporarily unavailable. Use minimax/minimax-m2.7 while it recovers.`,
            code: 'DEPLOYMENT_COOLDOWN',
            type: 'availability_error',
          },
        }),
        { status: 503, statusText: 'Service Unavailable' },
      )
    }
    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: web/src/llm-api/fireworks.ts
    Line: 733-771
    
    Comment:
    **Consolidate the two pre-deployment fallback checks**
    
    The "outside hours" and "cooling down" blocks share identical lite-mode behaviour (log + return standard API), so they can be merged into a single guard. This removes one repeated `if (shouldFallbackToStandardApi)` branch and keeps both distinct error responses for non-lite callers:
    
    ```typescript
    if (hasDeployment && (!isDeploymentHours(now) || isDeploymentCoolingDown())) {
      if (shouldFallbackToStandardApi) {
        logger.info(
          { model: originalModel },
          'Falling back to Fireworks standard API (deployment unavailable)',
        )
        return createStandardApiRequest()
      }
      if (!isDeploymentHours(now)) {
        return new Response(
          JSON.stringify({
            error: {
              message: `${originalModel} is only available during ${FREEBUFF_DEPLOYMENT_HOURS_LABEL}. Use minimax/minimax-m2.7 outside those hours.`,
              code: 'DEPLOYMENT_OUTSIDE_HOURS',
              type: 'availability_error',
            },
          }),
          { status: 503, statusText: 'Service Unavailable' },
        )
      }
      return new Response(
        JSON.stringify({
          error: {
            message: `${originalModel} deployment is temporarily unavailable. Use minimax/minimax-m2.7 while it recovers.`,
            code: 'DEPLOYMENT_COOLDOWN',
            type: 'availability_error',
          },
        }),
        { status: 503, statusText: 'Service Unavailable' },
      )
    }
    ```
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
This is a comment left during a code review.
Path: web/src/llm-api/fireworks.ts
Line: 733-771

Comment:
**Consolidate the two pre-deployment fallback checks**

The "outside hours" and "cooling down" blocks share identical lite-mode behaviour (log + return standard API), so they can be merged into a single guard. This removes one repeated `if (shouldFallbackToStandardApi)` branch and keeps both distinct error responses for non-lite callers:

```typescript
if (hasDeployment && (!isDeploymentHours(now) || isDeploymentCoolingDown())) {
  if (shouldFallbackToStandardApi) {
    logger.info(
      { model: originalModel },
      'Falling back to Fireworks standard API (deployment unavailable)',
    )
    return createStandardApiRequest()
  }
  if (!isDeploymentHours(now)) {
    return new Response(
      JSON.stringify({
        error: {
          message: `${originalModel} is only available during ${FREEBUFF_DEPLOYMENT_HOURS_LABEL}. Use minimax/minimax-m2.7 outside those hours.`,
          code: 'DEPLOYMENT_OUTSIDE_HOURS',
          type: 'availability_error',
        },
      }),
      { status: 503, statusText: 'Service Unavailable' },
    )
  }
  return new Response(
    JSON.stringify({
      error: {
        message: `${originalModel} deployment is temporarily unavailable. Use minimax/minimax-m2.7 while it recovers.`,
        code: 'DEPLOYMENT_COOLDOWN',
        type: 'availability_error',
      },
    }),
    { status: 503, statusText: 'Service Unavailable' },
  )
}
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Fallback lite GLM to Fireworks API" | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants