KooshaPari · KooshaPari · Feb 23, 2026 · Feb 23, 2026 · Feb 23, 2026 · Feb 23, 2026
diff --git a/docs/planning/reports/issue-wave-gh-next32-lane-2.md b/docs/planning/reports/issue-wave-gh-next32-lane-2.md
@@ -55,3 +55,13 @@ Worktree: `cliproxyapi-plusplus-wave-cpb-2`
 ## Blockers
 
 - #160 blocked on missing deterministic reproduction fixture for duplicate-output stream bug in current repo state.
+
+## Wave2 Lane 2 Entry - #241
+
+- Issue: `#241` copilot context length should always be `128K`
+- Status: `implemented`
+- Mapping:
+  - normalization at runtime registration: `pkg/llmproxy/registry/model_registry.go`
+  - regression coverage: `pkg/llmproxy/registry/model_registry_hook_test.go`
+- Tests:
+  - `go test ./pkg/llmproxy/registry -run 'TestRegisterClient_NormalizesCopilotContextLength|TestGetGitHubCopilotModels' -count=1`
diff --git a/docs/planning/reports/issue-wave-gh-next32-lane-3.md b/docs/planning/reports/issue-wave-gh-next32-lane-3.md
@@ -29,6 +29,17 @@ Worktree: `cliproxyapi-plusplus-wave-cpb-3`
 - Status: `pending`
 - Notes: lane-started
 
+### Wave2 #221 - `kiro账号被封`
+- Status: `implemented`
+- Source mapping:
+  - Source issue: `router-for-me/CLIProxyAPIPlus#221` (Kiro account banned handling)
+  - Fix: broaden Kiro 403 suspension detection to case-insensitive suspended/banned signals so banned accounts consistently trigger cooldown + remediation messaging in both non-stream and stream paths.
+  - Code: `pkg/llmproxy/runtime/executor/kiro_executor.go`
+  - Tests: `pkg/llmproxy/runtime/executor/kiro_executor_extra_test.go`
+- Test commands:
+  - `go test ./pkg/llmproxy/runtime/executor -run 'Test(IsKiroSuspendedOrBannedResponse|FormatKiroCooldownError|FormatKiroSuspendedStatusMessage)' -count=1`
+  - Result: blocked by pre-existing package build failures in `pkg/llmproxy/runtime/executor/codex_websockets_executor.go` (`unused imports`, `undefined: authID`, `undefined: wsURL`).
+
 ## Focused Checks
 
 - `task quality:fmt:check`
@@ -37,4 +48,3 @@ Worktree: `cliproxyapi-plusplus-wave-cpb-3`
 ## Blockers
 
 - None recorded yet; work is in planning state.
-
diff --git a/docs/planning/reports/issue-wave-gh-next32-lane-4.md b/docs/planning/reports/issue-wave-gh-next32-lane-4.md
@@ -34,3 +34,16 @@ Worktree: `cliproxyapi-plusplus-wave-cpb-4`
 
 - None recorded yet; work is in planning state.
 
+## Wave2 Updates
+
+### Wave2 Lane 4 - Issue #210
+- Issue: `#210` Kiro/Ampcode Bash tool parameter incompatibility
+- Mapping:
+  - `pkg/llmproxy/translator/kiro/claude/truncation_detector.go`
+  - `pkg/llmproxy/translator/kiro/claude/truncation_detector_test.go`
+- Change:
+  - Extended command-parameter alias compatibility so `execute` and `run_command` accept `cmd` in addition to `command`, matching existing Bash alias handling and preventing false truncation loops.
+- Tests:
+  - `go test ./pkg/llmproxy/translator/kiro/claude -run 'TestDetectTruncation|TestBuildSoftFailureToolResult'`
+- Quality gate:
+  - `task quality` failed due pre-existing syntax errors in `pkg/llmproxy/executor/kiro_executor.go` (`expected '(' found kiroModelFingerprint`), unrelated to this issue scope.
diff --git a/docs/planning/reports/issue-wave-gh-next32-lane-5.md b/docs/planning/reports/issue-wave-gh-next32-lane-5.md
@@ -30,7 +30,19 @@ Worktree: `cliproxyapi-plusplus-wave-cpb-5`
 - `task quality:fmt:check`
 - `QUALITY_PACKAGES='./pkg/llmproxy/api ./sdk/api/handlers/openai' task quality:quick`
 
+## Wave2 Execution Entry
+
+### #200
+- Status: `done`
+- Mapping: `router-for-me/CLIProxyAPIPlus issue#200` -> `CP2K-0020` -> Gemini quota auto disable/enable timing now honors fractional/unit retry hints from upstream quota messages.
+- Code:
+  - `pkg/llmproxy/executor/gemini_cli_executor.go`
+  - `pkg/llmproxy/runtime/executor/gemini_cli_executor.go`
+- Tests:
+  - `pkg/llmproxy/executor/gemini_cli_executor_retry_delay_test.go`
+  - `pkg/llmproxy/runtime/executor/gemini_cli_executor_retry_delay_test.go`
+  - `go test ./pkg/llmproxy/executor ./pkg/llmproxy/runtime/executor -run 'TestParseRetryDelay_(MessageDuration|MessageMilliseconds|PrefersRetryInfo)$'`
+
 ## Blockers
 
 - None recorded yet; work is in planning state.
-
diff --git a/docs/planning/reports/issue-wave-gh-next32-lane-6.md b/docs/planning/reports/issue-wave-gh-next32-lane-6.md
@@ -34,3 +34,23 @@ Worktree: `cliproxyapi-plusplus-wave-cpb-6`
 
 - None recorded yet; work is in planning state.
 
+## Wave2 Entries
+
+### 2026-02-23 - #179 OpenAI-MLX/vLLM-MLX support
+- Status: `done`
+- Mapping:
+  - Source issue: `router-for-me/CLIProxyAPIPlus#179`
+  - Implemented fix: OpenAI-compatible model discovery now honors `models_endpoint` auth attribute (emitted from `models-endpoint` config), including absolute URL and absolute path overrides.
+  - Why this is low risk: fallback/default `/v1/models` behavior is unchanged; only explicit override handling is added.
+- Files:
+  - `pkg/llmproxy/executor/openai_models_fetcher.go`
+  - `pkg/llmproxy/executor/openai_models_fetcher_test.go`
+  - `pkg/llmproxy/runtime/executor/openai_models_fetcher.go`
+  - `pkg/llmproxy/runtime/executor/openai_models_fetcher_test.go`
+- Tests:
+  - `go test pkg/llmproxy/executor/openai_models_fetcher.go pkg/llmproxy/executor/proxy_helpers.go pkg/llmproxy/executor/openai_models_fetcher_test.go`
+  - `go test pkg/llmproxy/runtime/executor/openai_models_fetcher.go pkg/llmproxy/runtime/executor/proxy_helpers.go pkg/llmproxy/runtime/executor/openai_models_fetcher_test.go`
+- Verification notes:
+  - Added regression coverage for `models_endpoint` path override and absolute URL override in both mirrored executor test suites.
+- Blockers:
+  - Package-level `go test ./pkg/llmproxy/executor` and `go test ./pkg/llmproxy/runtime/executor` are currently blocked by unrelated compile errors in existing lane files (`kiro_executor.go`, `codex_websockets_executor.go`).
diff --git a/docs/planning/reports/issue-wave-gh-next32-merge-wave2-2026-02-23.md b/docs/planning/reports/issue-wave-gh-next32-merge-wave2-2026-02-23.md
@@ -0,0 +1,28 @@
+# Issue Wave GH Next32 Merge Report - Wave 2 (2026-02-23)
+
+## Scope
+- Wave 2, one item per lane (6 lanes total).
+- Base: `origin/main` @ `f7e56f05`.
+
+## Merged Commits
+- `f1ab6855` - `fix(#253): support endpoint override for provider-pinned codex models`
+- `05f894bf` - `fix(registry): enforce copilot context length 128K at registration (#241)`
+- `947883cb` - `fix(kiro): handle banned account 403 payloads (#221)`
+- `9fa8479d` - `fix(kiro): broaden cmd alias handling for command tools (#210)`
+- `d921c09b` - `fix(#200): honor Gemini quota reset durations for cooldown`
+- `a2571c90` - `fix(#179): honor openai-compat models-endpoint overrides`
+
+## Issue Mapping
+- `#253` -> `f1ab6855`
+- `#241` -> `05f894bf`
+- `#221` -> `947883cb`
+- `#210` -> `9fa8479d`
+- `#200` -> `d921c09b`
+- `#179` -> `a2571c90`
+
+## Validation
+- `go test ./sdk/api/handlers/openai -run 'TestResolveEndpointOverride_' -count=1`
+- `go test ./pkg/llmproxy/registry -run 'TestRegisterClient_NormalizesCopilotContextLength|TestGetGitHubCopilotModels' -count=1`
+- `go test ./pkg/llmproxy/translator/kiro/claude -run 'TestDetectTruncation|TestBuildSoftFailureToolResult' -count=1`
+- `go test pkg/llmproxy/executor/openai_models_fetcher.go pkg/llmproxy/executor/proxy_helpers.go pkg/llmproxy/executor/openai_models_fetcher_test.go -count=1`
+- `go test pkg/llmproxy/runtime/executor/openai_models_fetcher.go pkg/llmproxy/runtime/executor/proxy_helpers.go pkg/llmproxy/runtime/executor/openai_models_fetcher_test.go -count=1`
diff --git a/pkg/llmproxy/executor/gemini_cli_executor.go b/pkg/llmproxy/executor/gemini_cli_executor.go
@@ -13,7 +13,6 @@ import (
 	"math/rand"
 	"net/http"
 	"regexp"
-	"strconv"
 	"strings"
 	"time"
 
@@ -937,14 +936,14 @@ func parseRetryDelay(errorBody []byte) (*time.Duration, error) {
 		}
 	}
 
-	// Fallback: parse from error.message "Your quota will reset after Xs."
+	// Fallback: parse from error.message (supports units like ms/s/m/h with optional decimals)
 	message := gjson.GetBytes(errorBody, "error.message").String()
 	if message != "" {
-		re := regexp.MustCompile(`after\s+(\d+)s\.?`)
+		re := regexp.MustCompile(`after\s+([0-9]+(?:\.[0-9]+)?(?:ms|s|m|h))\.?`)
 		if matches := re.FindStringSubmatch(message); len(matches) > 1 {
-			seconds, err := strconv.Atoi(matches[1])
+			duration, err := time.ParseDuration(matches[1])
 			if err == nil {
-				return new(time.Duration(seconds) * time.Second), nil
+				return &duration, nil
 			}
 		}
 	}

diff --git a/pkg/llmproxy/executor/gemini_cli_executor_retry_delay_test.go b/pkg/llmproxy/executor/gemini_cli_executor_retry_delay_test.go
@@ -0,0 +1,54 @@
+package executor
+
+import (
+	"testing"
+	"time"
+)
+
+func TestParseRetryDelay_MessageDuration(t *testing.T) {
+	t.Parallel()
+
+	body := []byte(`{"error":{"message":"Quota exceeded. Your quota will reset after 1.5s."}}`)
+	got, err := parseRetryDelay(body)
+	if err != nil {
+		t.Fatalf("parseRetryDelay returned error: %v", err)
+	}
+	if got == nil {
+		t.Fatal("parseRetryDelay returned nil duration")
+	}
+	if *got != 1500*time.Millisecond {
+		t.Fatalf("parseRetryDelay = %v, want %v", *got, 1500*time.Millisecond)
+	}
+}
+
+func TestParseRetryDelay_MessageMilliseconds(t *testing.T) {
+	t.Parallel()
+
+	body := []byte(`{"error":{"message":"Please retry after 250ms."}}`)
+	got, err := parseRetryDelay(body)
+	if err != nil {
+		t.Fatalf("parseRetryDelay returned error: %v", err)
+	}
+	if got == nil {
+		t.Fatal("parseRetryDelay returned nil duration")
+	}
+	if *got != 250*time.Millisecond {
+		t.Fatalf("parseRetryDelay = %v, want %v", *got, 250*time.Millisecond)
+	}
+}
+
+func TestParseRetryDelay_PrefersRetryInfo(t *testing.T) {
+	t.Parallel()
+
+	body := []byte(`{"error":{"message":"Your quota will reset after 99s.","details":[{"@type":"type.googleapis.com/google.rpc.RetryInfo","retryDelay":"2s"}]}}`)
+	got, err := parseRetryDelay(body)
+	if err != nil {
+		t.Fatalf("parseRetryDelay returned error: %v", err)
+	}
+	if got == nil {
+		t.Fatal("parseRetryDelay returned nil duration")
+	}
+	if *got != 2*time.Second {
+		t.Fatalf("parseRetryDelay = %v, want %v", *got, 2*time.Second)
+	}
+}
diff --git a/pkg/llmproxy/executor/openai_models_fetcher.go b/pkg/llmproxy/executor/openai_models_fetcher.go
@@ -111,6 +111,9 @@ func resolveOpenAIModelsURL(baseURL string, attrs map[string]string) string {
 		if modelsURL := strings.TrimSpace(attrs["models_url"]); modelsURL != "" {
 			return modelsURL
 		}
+		if modelsEndpoint := strings.TrimSpace(attrs["models_endpoint"]); modelsEndpoint != "" {
+			return resolveOpenAIModelsEndpointURL(baseURL, modelsEndpoint)
+		}
 	}
 
 	trimmedBaseURL := strings.TrimRight(strings.TrimSpace(baseURL), "/")
@@ -134,6 +137,34 @@ func resolveOpenAIModelsURL(baseURL string, attrs map[string]string) string {
 	return trimmedBaseURL + "/v1/models"
 }
 
+func resolveOpenAIModelsEndpointURL(baseURL, modelsEndpoint string) string {
+	modelsEndpoint = strings.TrimSpace(modelsEndpoint)
+	if modelsEndpoint == "" {
+		return ""
+	}
+	if parsed, err := url.Parse(modelsEndpoint); err == nil && parsed.IsAbs() {
+		return modelsEndpoint
+	}
+
+	trimmedBaseURL := strings.TrimRight(strings.TrimSpace(baseURL), "/")
+	if trimmedBaseURL == "" {
+		return modelsEndpoint
+	}
+
+	if strings.HasPrefix(modelsEndpoint, "/") {
+		baseParsed, err := url.Parse(trimmedBaseURL)
+		if err == nil && baseParsed.Scheme != "" && baseParsed.Host != "" {
+			baseParsed.Path = modelsEndpoint
+			baseParsed.RawQuery = ""
+			baseParsed.Fragment = ""
+			return baseParsed.String()
+		}
+		return trimmedBaseURL + modelsEndpoint
+	}
+
+	return trimmedBaseURL + "/" + strings.TrimLeft(modelsEndpoint, "/")
+}
+
 func isVersionSegment(segment string) bool {
 	if len(segment) < 2 || segment[0] != 'v' {
 		return false

diff --git a/pkg/llmproxy/executor/openai_models_fetcher_test.go b/pkg/llmproxy/executor/openai_models_fetcher_test.go
@@ -35,6 +35,22 @@ func TestResolveOpenAIModelsURL(t *testing.T) {
 			},
 			want: "https://custom.example.com/models",
 		},
+		{
+			name:    "ModelsEndpointPathOverrideUsesBaseHost",
+			baseURL: "https://api.z.ai/api/coding/paas/v4",
+			attrs: map[string]string{
+				"models_endpoint": "/api/coding/paas/v4/models",
+			},
+			want: "https://api.z.ai/api/coding/paas/v4/models",
+		},
+		{
+			name:    "ModelsEndpointAbsoluteURLOverrideWins",
+			baseURL: "https://api.z.ai/api/coding/paas/v4",
+			attrs: map[string]string{
+				"models_endpoint": "https://custom.example.com/models",
+			},
+			want: "https://custom.example.com/models",
+		},
 	}
 
 	for _, tc := range testCases {

diff --git a/pkg/llmproxy/registry/model_registry.go b/pkg/llmproxy/registry/model_registry.go
@@ -211,6 +211,9 @@ func (r *ModelRegistry) RegisterClient(clientID, clientProvider string, models [
 	defer r.mutex.Unlock()
 
 	provider := strings.ToLower(clientProvider)
+	if provider == "github-copilot" {
+		models = normalizeCopilotContextLength(models)
+	}
 	uniqueModelIDs := make([]string, 0, len(models))
 	rawModelIDs := make([]string, 0, len(models))
 	newModels := make(map[string]*ModelInfo, len(models))
@@ -414,6 +417,19 @@ func (r *ModelRegistry) RegisterClient(clientID, clientProvider string, models [
 	misc.LogCredentialSeparator()
 }
 
+func normalizeCopilotContextLength(models []*ModelInfo) []*ModelInfo {
+	normalized := make([]*ModelInfo, 0, len(models))
+	for _, model := range models {
+		if model == nil {
+			continue
+		}
+		copyModel := cloneModelInfo(model)
+		copyModel.ContextLength = 128000
+		normalized = append(normalized, copyModel)
+	}
+	return normalized
+}
+
 func (r *ModelRegistry) addModelRegistration(modelID, provider string, model *ModelInfo, now time.Time) {
 	if model == nil || modelID == "" {
 		return

diff --git a/pkg/llmproxy/registry/model_registry_hook_test.go b/pkg/llmproxy/registry/model_registry_hook_test.go
@@ -202,3 +202,44 @@ func TestModelRegistryHook_PanicDoesNotAffectRegistry(t *testing.T) {
 		t.Fatal("timeout waiting for OnModelsUnregistered hook call")
 	}
 }
+
+func TestRegisterClient_NormalizesCopilotContextLength(t *testing.T) {
+	r := newTestModelRegistry()
+	hook := &capturingHook{
+		registeredCh:   make(chan registeredCall, 1),
+		unregisteredCh: make(chan unregisteredCall, 1),
+	}
+	r.SetHook(hook)
+
+	r.RegisterClient("client-copilot", "github-copilot", []*ModelInfo{
+		{ID: "gpt-5", ContextLength: 200000},
+		{ID: "gpt-5-mini", ContextLength: 1048576},
+	})
+
+	select {
+	case call := <-hook.registeredCh:
+		for _, model := range call.models {
+			if model.ContextLength != 128000 {
+				t.Fatalf("hook model %q context_length=%d, want 128000", model.ID, model.ContextLength)
+			}
+		}
+	case <-time.After(2 * time.Second):
+		t.Fatal("timeout waiting for OnModelsRegistered hook call")
+	}
+
+	registration, ok := r.models["gpt-5"]
+	if !ok || registration == nil || registration.Info == nil {
+		t.Fatal("expected gpt-5 registration info")
+	}
+	if registration.Info.ContextLength != 128000 {
+		t.Fatalf("registry info context_length=%d, want 128000", registration.Info.ContextLength)
+	}
+
+	clientInfo, ok := r.clientModelInfos["client-copilot"]["gpt-5-mini"]
+	if !ok || clientInfo == nil {
+		t.Fatal("expected client model info for gpt-5-mini")
+	}
+	if clientInfo.ContextLength != 128000 {
+		t.Fatalf("client model info context_length=%d, want 128000", clientInfo.ContextLength)
+	}
+}
diff --git a/pkg/llmproxy/runtime/executor/gemini_cli_executor.go b/pkg/llmproxy/runtime/executor/gemini_cli_executor.go
@@ -13,7 +13,6 @@ import (
 	"math/rand"
 	"net/http"
 	"regexp"
-	"strconv"
 	"strings"
 	"time"
 
@@ -937,14 +936,14 @@ func parseRetryDelay(errorBody []byte) (*time.Duration, error) {
 		}
 	}
 
-	// Fallback: parse from error.message "Your quota will reset after Xs."
+	// Fallback: parse from error.message (supports units like ms/s/m/h with optional decimals)
 	message := gjson.GetBytes(errorBody, "error.message").String()
 	if message != "" {
-		re := regexp.MustCompile(`after\s+(\d+)s\.?`)
+		re := regexp.MustCompile(`after\s+([0-9]+(?:\.[0-9]+)?(?:ms|s|m|h))\.?`)
 		if matches := re.FindStringSubmatch(message); len(matches) > 1 {
-			seconds, err := strconv.Atoi(matches[1])
+			duration, err := time.ParseDuration(matches[1])
 			if err == nil {
-				return new(time.Duration(seconds) * time.Second), nil
+				return &duration, nil
 			}
 		}
 	}