Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/planning/reports/issue-wave-gh-next32-lane-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,13 @@ Worktree: `cliproxyapi-plusplus-wave-cpb-2`
## Blockers

- #160 blocked on missing deterministic reproduction fixture for duplicate-output stream bug in current repo state.

## Wave2 Lane 2 Entry - #241

- Issue: `#241` copilot context length should always be `128K`
- Status: `implemented`
- Mapping:
- normalization at runtime registration: `pkg/llmproxy/registry/model_registry.go`
- regression coverage: `pkg/llmproxy/registry/model_registry_hook_test.go`
- Tests:
- `go test ./pkg/llmproxy/registry -run 'TestRegisterClient_NormalizesCopilotContextLength|TestGetGitHubCopilotModels' -count=1`
12 changes: 11 additions & 1 deletion docs/planning/reports/issue-wave-gh-next32-lane-3.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,17 @@ Worktree: `cliproxyapi-plusplus-wave-cpb-3`
- Status: `pending`
- Notes: lane-started

### Wave2 #221 - `kiro账号被封`
- Status: `implemented`
- Source mapping:
- Source issue: `router-for-me/CLIProxyAPIPlus#221` (Kiro account banned handling)
- Fix: broaden Kiro 403 suspension detection to case-insensitive suspended/banned signals so banned accounts consistently trigger cooldown + remediation messaging in both non-stream and stream paths.
- Code: `pkg/llmproxy/runtime/executor/kiro_executor.go`
- Tests: `pkg/llmproxy/runtime/executor/kiro_executor_extra_test.go`
- Test commands:
- `go test ./pkg/llmproxy/runtime/executor -run 'Test(IsKiroSuspendedOrBannedResponse|FormatKiroCooldownError|FormatKiroSuspendedStatusMessage)' -count=1`
- Result: blocked by pre-existing package build failures in `pkg/llmproxy/runtime/executor/codex_websockets_executor.go` (`unused imports`, `undefined: authID`, `undefined: wsURL`).

## Focused Checks

- `task quality:fmt:check`
Expand All @@ -37,4 +48,3 @@ Worktree: `cliproxyapi-plusplus-wave-cpb-3`
## Blockers

- None recorded yet; work is in planning state.

13 changes: 13 additions & 0 deletions docs/planning/reports/issue-wave-gh-next32-lane-4.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,16 @@ Worktree: `cliproxyapi-plusplus-wave-cpb-4`

- None recorded yet; work is in planning state.

## Wave2 Updates

### Wave2 Lane 4 - Issue #210
- Issue: `#210` Kiro/Ampcode Bash tool parameter incompatibility
- Mapping:
- `pkg/llmproxy/translator/kiro/claude/truncation_detector.go`
- `pkg/llmproxy/translator/kiro/claude/truncation_detector_test.go`
- Change:
- Extended command-parameter alias compatibility so `execute` and `run_command` accept `cmd` in addition to `command`, matching existing Bash alias handling and preventing false truncation loops.
- Tests:
- `go test ./pkg/llmproxy/translator/kiro/claude -run 'TestDetectTruncation|TestBuildSoftFailureToolResult'`
- Quality gate:
- `task quality` failed due pre-existing syntax errors in `pkg/llmproxy/executor/kiro_executor.go` (`expected '(' found kiroModelFingerprint`), unrelated to this issue scope.
14 changes: 13 additions & 1 deletion docs/planning/reports/issue-wave-gh-next32-lane-5.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,19 @@ Worktree: `cliproxyapi-plusplus-wave-cpb-5`
- `task quality:fmt:check`
- `QUALITY_PACKAGES='./pkg/llmproxy/api ./sdk/api/handlers/openai' task quality:quick`

## Wave2 Execution Entry

### #200
- Status: `done`
- Mapping: `router-for-me/CLIProxyAPIPlus issue#200` -> `CP2K-0020` -> Gemini quota auto disable/enable timing now honors fractional/unit retry hints from upstream quota messages.
- Code:
- `pkg/llmproxy/executor/gemini_cli_executor.go`
- `pkg/llmproxy/runtime/executor/gemini_cli_executor.go`
- Tests:
- `pkg/llmproxy/executor/gemini_cli_executor_retry_delay_test.go`
- `pkg/llmproxy/runtime/executor/gemini_cli_executor_retry_delay_test.go`
- `go test ./pkg/llmproxy/executor ./pkg/llmproxy/runtime/executor -run 'TestParseRetryDelay_(MessageDuration|MessageMilliseconds|PrefersRetryInfo)$'`

## Blockers

- None recorded yet; work is in planning state.

20 changes: 20 additions & 0 deletions docs/planning/reports/issue-wave-gh-next32-lane-6.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,23 @@ Worktree: `cliproxyapi-plusplus-wave-cpb-6`

- None recorded yet; work is in planning state.

## Wave2 Entries

### 2026-02-23 - #179 OpenAI-MLX/vLLM-MLX support
- Status: `done`
- Mapping:
- Source issue: `router-for-me/CLIProxyAPIPlus#179`
- Implemented fix: OpenAI-compatible model discovery now honors `models_endpoint` auth attribute (emitted from `models-endpoint` config), including absolute URL and absolute path overrides.
- Why this is low risk: fallback/default `/v1/models` behavior is unchanged; only explicit override handling is added.
- Files:
- `pkg/llmproxy/executor/openai_models_fetcher.go`
- `pkg/llmproxy/executor/openai_models_fetcher_test.go`
- `pkg/llmproxy/runtime/executor/openai_models_fetcher.go`
- `pkg/llmproxy/runtime/executor/openai_models_fetcher_test.go`
- Tests:
- `go test pkg/llmproxy/executor/openai_models_fetcher.go pkg/llmproxy/executor/proxy_helpers.go pkg/llmproxy/executor/openai_models_fetcher_test.go`
- `go test pkg/llmproxy/runtime/executor/openai_models_fetcher.go pkg/llmproxy/runtime/executor/proxy_helpers.go pkg/llmproxy/runtime/executor/openai_models_fetcher_test.go`
- Verification notes:
- Added regression coverage for `models_endpoint` path override and absolute URL override in both mirrored executor test suites.
- Blockers:
- Package-level `go test ./pkg/llmproxy/executor` and `go test ./pkg/llmproxy/runtime/executor` are currently blocked by unrelated compile errors in existing lane files (`kiro_executor.go`, `codex_websockets_executor.go`).
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Issue Wave GH Next32 Merge Report - Wave 2 (2026-02-23)

## Scope
- Wave 2, one item per lane (6 lanes total).
- Base: `origin/main` @ `f7e56f05`.

## Merged Commits
- `f1ab6855` - `fix(#253): support endpoint override for provider-pinned codex models`
- `05f894bf` - `fix(registry): enforce copilot context length 128K at registration (#241)`
- `947883cb` - `fix(kiro): handle banned account 403 payloads (#221)`
- `9fa8479d` - `fix(kiro): broaden cmd alias handling for command tools (#210)`
- `d921c09b` - `fix(#200): honor Gemini quota reset durations for cooldown`
- `a2571c90` - `fix(#179): honor openai-compat models-endpoint overrides`

## Issue Mapping
- `#253` -> `f1ab6855`
- `#241` -> `05f894bf`
- `#221` -> `947883cb`
- `#210` -> `9fa8479d`
- `#200` -> `d921c09b`
- `#179` -> `a2571c90`

## Validation
- `go test ./sdk/api/handlers/openai -run 'TestResolveEndpointOverride_' -count=1`
- `go test ./pkg/llmproxy/registry -run 'TestRegisterClient_NormalizesCopilotContextLength|TestGetGitHubCopilotModels' -count=1`
- `go test ./pkg/llmproxy/translator/kiro/claude -run 'TestDetectTruncation|TestBuildSoftFailureToolResult' -count=1`
- `go test pkg/llmproxy/executor/openai_models_fetcher.go pkg/llmproxy/executor/proxy_helpers.go pkg/llmproxy/executor/openai_models_fetcher_test.go -count=1`
- `go test pkg/llmproxy/runtime/executor/openai_models_fetcher.go pkg/llmproxy/runtime/executor/proxy_helpers.go pkg/llmproxy/runtime/executor/openai_models_fetcher_test.go -count=1`
9 changes: 4 additions & 5 deletions pkg/llmproxy/executor/gemini_cli_executor.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ import (
"math/rand"
"net/http"
"regexp"
"strconv"
"strings"
"time"

Expand Down Expand Up @@ -937,14 +936,14 @@ func parseRetryDelay(errorBody []byte) (*time.Duration, error) {
}
}

// Fallback: parse from error.message "Your quota will reset after Xs."
// Fallback: parse from error.message (supports units like ms/s/m/h with optional decimals)
message := gjson.GetBytes(errorBody, "error.message").String()
if message != "" {
re := regexp.MustCompile(`after\s+(\d+)s\.?`)
re := regexp.MustCompile(`after\s+([0-9]+(?:\.[0-9]+)?(?:ms|s|m|h))\.?`)
if matches := re.FindStringSubmatch(message); len(matches) > 1 {
seconds, err := strconv.Atoi(matches[1])
duration, err := time.ParseDuration(matches[1])
if err == nil {
return new(time.Duration(seconds) * time.Second), nil
return &duration, nil
}
}
}
Expand Down
54 changes: 54 additions & 0 deletions pkg/llmproxy/executor/gemini_cli_executor_retry_delay_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
package executor

import (
"testing"
"time"
)

func TestParseRetryDelay_MessageDuration(t *testing.T) {
t.Parallel()

body := []byte(`{"error":{"message":"Quota exceeded. Your quota will reset after 1.5s."}}`)
got, err := parseRetryDelay(body)
if err != nil {
t.Fatalf("parseRetryDelay returned error: %v", err)
}
if got == nil {
t.Fatal("parseRetryDelay returned nil duration")
}
if *got != 1500*time.Millisecond {
t.Fatalf("parseRetryDelay = %v, want %v", *got, 1500*time.Millisecond)
}
}

func TestParseRetryDelay_MessageMilliseconds(t *testing.T) {
t.Parallel()

body := []byte(`{"error":{"message":"Please retry after 250ms."}}`)
got, err := parseRetryDelay(body)
if err != nil {
t.Fatalf("parseRetryDelay returned error: %v", err)
}
if got == nil {
t.Fatal("parseRetryDelay returned nil duration")
}
if *got != 250*time.Millisecond {
t.Fatalf("parseRetryDelay = %v, want %v", *got, 250*time.Millisecond)
}
}

func TestParseRetryDelay_PrefersRetryInfo(t *testing.T) {
t.Parallel()

body := []byte(`{"error":{"message":"Your quota will reset after 99s.","details":[{"@type":"type.googleapis.com/google.rpc.RetryInfo","retryDelay":"2s"}]}}`)
got, err := parseRetryDelay(body)
if err != nil {
t.Fatalf("parseRetryDelay returned error: %v", err)
}
if got == nil {
t.Fatal("parseRetryDelay returned nil duration")
}
if *got != 2*time.Second {
t.Fatalf("parseRetryDelay = %v, want %v", *got, 2*time.Second)
}
}
31 changes: 31 additions & 0 deletions pkg/llmproxy/executor/openai_models_fetcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,9 @@ func resolveOpenAIModelsURL(baseURL string, attrs map[string]string) string {
if modelsURL := strings.TrimSpace(attrs["models_url"]); modelsURL != "" {
return modelsURL
}
if modelsEndpoint := strings.TrimSpace(attrs["models_endpoint"]); modelsEndpoint != "" {
return resolveOpenAIModelsEndpointURL(baseURL, modelsEndpoint)
}
}

trimmedBaseURL := strings.TrimRight(strings.TrimSpace(baseURL), "/")
Expand All @@ -134,6 +137,34 @@ func resolveOpenAIModelsURL(baseURL string, attrs map[string]string) string {
return trimmedBaseURL + "/v1/models"
}

func resolveOpenAIModelsEndpointURL(baseURL, modelsEndpoint string) string {
modelsEndpoint = strings.TrimSpace(modelsEndpoint)
if modelsEndpoint == "" {
return ""
}
if parsed, err := url.Parse(modelsEndpoint); err == nil && parsed.IsAbs() {
return modelsEndpoint
}

trimmedBaseURL := strings.TrimRight(strings.TrimSpace(baseURL), "/")
if trimmedBaseURL == "" {
return modelsEndpoint
}

if strings.HasPrefix(modelsEndpoint, "/") {
baseParsed, err := url.Parse(trimmedBaseURL)
if err == nil && baseParsed.Scheme != "" && baseParsed.Host != "" {
baseParsed.Path = modelsEndpoint
baseParsed.RawQuery = ""
baseParsed.Fragment = ""
return baseParsed.String()
}
return trimmedBaseURL + modelsEndpoint
}

return trimmedBaseURL + "/" + strings.TrimLeft(modelsEndpoint, "/")
}

func isVersionSegment(segment string) bool {
if len(segment) < 2 || segment[0] != 'v' {
return false
Expand Down
16 changes: 16 additions & 0 deletions pkg/llmproxy/executor/openai_models_fetcher_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,22 @@ func TestResolveOpenAIModelsURL(t *testing.T) {
},
want: "https://custom.example.com/models",
},
{
name: "ModelsEndpointPathOverrideUsesBaseHost",
baseURL: "https://api.z.ai/api/coding/paas/v4",
attrs: map[string]string{
"models_endpoint": "/api/coding/paas/v4/models",
},
want: "https://api.z.ai/api/coding/paas/v4/models",
},
{
name: "ModelsEndpointAbsoluteURLOverrideWins",
baseURL: "https://api.z.ai/api/coding/paas/v4",
attrs: map[string]string{
"models_endpoint": "https://custom.example.com/models",
},
want: "https://custom.example.com/models",
},
}

for _, tc := range testCases {
Expand Down
16 changes: 16 additions & 0 deletions pkg/llmproxy/registry/model_registry.go
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,9 @@ func (r *ModelRegistry) RegisterClient(clientID, clientProvider string, models [
defer r.mutex.Unlock()

provider := strings.ToLower(clientProvider)
if provider == "github-copilot" {
models = normalizeCopilotContextLength(models)
}
uniqueModelIDs := make([]string, 0, len(models))
rawModelIDs := make([]string, 0, len(models))
newModels := make(map[string]*ModelInfo, len(models))
Expand Down Expand Up @@ -414,6 +417,19 @@ func (r *ModelRegistry) RegisterClient(clientID, clientProvider string, models [
misc.LogCredentialSeparator()
}

func normalizeCopilotContextLength(models []*ModelInfo) []*ModelInfo {
normalized := make([]*ModelInfo, 0, len(models))
for _, model := range models {
if model == nil {
continue
}
copyModel := cloneModelInfo(model)
copyModel.ContextLength = 128000
normalized = append(normalized, copyModel)
}
return normalized
}

func (r *ModelRegistry) addModelRegistration(modelID, provider string, model *ModelInfo, now time.Time) {
if model == nil || modelID == "" {
return
Expand Down
41 changes: 41 additions & 0 deletions pkg/llmproxy/registry/model_registry_hook_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -202,3 +202,44 @@ func TestModelRegistryHook_PanicDoesNotAffectRegistry(t *testing.T) {
t.Fatal("timeout waiting for OnModelsUnregistered hook call")
}
}

func TestRegisterClient_NormalizesCopilotContextLength(t *testing.T) {
r := newTestModelRegistry()
hook := &capturingHook{
registeredCh: make(chan registeredCall, 1),
unregisteredCh: make(chan unregisteredCall, 1),
}
r.SetHook(hook)

r.RegisterClient("client-copilot", "github-copilot", []*ModelInfo{
{ID: "gpt-5", ContextLength: 200000},
{ID: "gpt-5-mini", ContextLength: 1048576},
})

select {
case call := <-hook.registeredCh:
for _, model := range call.models {
if model.ContextLength != 128000 {
t.Fatalf("hook model %q context_length=%d, want 128000", model.ID, model.ContextLength)
}
}
case <-time.After(2 * time.Second):
t.Fatal("timeout waiting for OnModelsRegistered hook call")
}

registration, ok := r.models["gpt-5"]
if !ok || registration == nil || registration.Info == nil {
t.Fatal("expected gpt-5 registration info")
}
if registration.Info.ContextLength != 128000 {
t.Fatalf("registry info context_length=%d, want 128000", registration.Info.ContextLength)
}

clientInfo, ok := r.clientModelInfos["client-copilot"]["gpt-5-mini"]
if !ok || clientInfo == nil {
t.Fatal("expected client model info for gpt-5-mini")
}
if clientInfo.ContextLength != 128000 {
t.Fatalf("client model info context_length=%d, want 128000", clientInfo.ContextLength)
}
}
9 changes: 4 additions & 5 deletions pkg/llmproxy/runtime/executor/gemini_cli_executor.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ import (
"math/rand"
"net/http"
"regexp"
"strconv"
"strings"
"time"

Expand Down Expand Up @@ -937,14 +936,14 @@ func parseRetryDelay(errorBody []byte) (*time.Duration, error) {
}
}

// Fallback: parse from error.message "Your quota will reset after Xs."
// Fallback: parse from error.message (supports units like ms/s/m/h with optional decimals)
message := gjson.GetBytes(errorBody, "error.message").String()
if message != "" {
re := regexp.MustCompile(`after\s+(\d+)s\.?`)
re := regexp.MustCompile(`after\s+([0-9]+(?:\.[0-9]+)?(?:ms|s|m|h))\.?`)
if matches := re.FindStringSubmatch(message); len(matches) > 1 {
seconds, err := strconv.Atoi(matches[1])
duration, err := time.ParseDuration(matches[1])
if err == nil {
return new(time.Duration(seconds) * time.Second), nil
return &duration, nil
}
}
}
Expand Down
Loading
Loading