Skip to content

TokenRateLimitPolicy counts 404 responses against quota, blocking legitimate requests #1674

@noyitz

Description

@noyitz

Summary

TokenRateLimitPolicy applies rate limiting before Gateway API route matching occurs, causing 404 responses to consume user quotas. This results in legitimate requests being blocked when users make mistakes in URLs or follow incorrect documentation.

Environment

  • Kuadrant Version: (deployed via Helm charts)
  • OpenShift Version: 4.19.9+
  • Gateway API Version: OpenShift native implementation
  • Cluster: OpenShift on AWS
  • Gateway Controller: openshift.io/gateway-controller/v1

Steps to Reproduce

  1. Setup: Deploy Kuadrant with TokenRateLimitPolicy on Gateway (rate limit: 5 requests)
  2. Get authentication token from MaaS API
  3. Make 5 requests to wrong URL (missing endpoint path):
    for i in {1..5}; do
      curl -H "Authorization: Bearer $TOKEN" \
           -H "Content-Type: application/json" \
           -d '{"model": "test", "prompt": "Hello"}' \
           "http://maas.example.com/llm/model-name" # Missing /v1/chat/completions
    done
  4. Make 6th request to wrong URL:
    curl -H "Authorization: Bearer $TOKEN"
    "http://maas.example.com/llm/model-name"
  5. Test correct URL:
    curl -H "Authorization: Bearer $TOKEN"
    "http://maas.example.com/llm/model-name/v1/chat/completions"

Expected Behavior

  • Steps 1-3: Return 404 Not Found without consuming rate limit quota
  • Step 4: Return 404 Not Found without consuming rate limit quota
  • Step 5: Return 200 OK with successful response

Actual Behavior

  • Steps 1-3: Return 404 Not Found and consume quota (5/5 requests used)
  • Step 4: Return 429 Rate Limit Exceeded (quota exhausted)
  • Step 5: Return 429 Rate Limit Exceeded (legitimate request blocked)

Root Cause

TokenRateLimitPolicy is applied at the Gateway level before route matching:

Request → TokenRateLimitPolicy (quota consumed) → Gateway Routing → 404

Should be:

Request → Gateway Routing → Route Match → TokenRateLimitPolicy → Backend
Request → Gateway Routing → No Match → 404 (no quota consumption)

Impact

  • Poor User Experience: Typos in URLs exhaust user quotas
  • Legitimate Requests Blocked: Correct requests fail after URL mistakes
  • Billing/Quota Issues: Users charged for failed requests

Configuration

Gateway

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: maas-default-gateway
namespace: openshift-ingress
spec:
gatewayClassName: openshift-default
listeners:
- name: https
port: 443
protocol: HTTPS
hostname: maas.apps.cluster.example.com

TokenRateLimitPolicy

apiVersion: kuadrant.io/v1alpha1
kind: TokenRateLimitPolicy
metadata:
name: gateway-token-rate-limits
namespace: openshift-ingress
spec:
targetRef:
group: gateway.networking.k8s.io
kind: Gateway
name: maas-default-gateway
limits:
per-user:
rates:
- limit: 5
duration: 10s

Reproduction Environment

This was discovered and reproduced in the https://github.com/opendatahub-io/maas-billing on customer environment:

  • Gateway: maas.apps.cluster-tvp85.tvp85.sandbox1981.opentlc.com
  • Model endpoint: /llm/facebook-opt-125m-simulated/v1/chat/completions

Logs

Example sequence showing the issue:
Request 1 to /llm/model-name: 404 (quota: 1/5)
Request 2 to /llm/model-name: 404 (quota: 2/5)
Request 3 to /llm/model-name: 404 (quota: 3/5)
Request 4 to /llm/model-name: 404 (quota: 4/5)
Request 5 to /llm/model-name: 404 (quota: 5/5)
Request 6 to /llm/model-name: 429 Rate Limit Exceeded
Request 7 to /llm/model-name/v1/chat/completions: 429 Rate Limit Exceeded

Suggested Fix

Rate limiting policies should only be evaluated after successful route matching, or provide configuration to exclude certain HTTP status codes (like 404) from quota consumption.

Possible approaches:

  1. Apply TokenRateLimitPolicy at HTTPRoute level instead of Gateway level
  2. Add configuration to exclude specific status codes from rate limiting
  3. Change policy evaluation order to happen after routing

Additional Context

  • Related to Gateway API policy attachment and request processing order
  • Affects any application using Kuadrant TokenRateLimitPolicy at Gateway level
  • Issue discovered during MaaS Platform validation guide testing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions