From da58c1a9e8fe32df8cbe55cf5a91dc20e7ba953c Mon Sep 17 00:00:00 2001
From: MK <mk@initializ.io>
Date: Wed, 4 Mar 2026 04:07:23 -0500
Subject: [PATCH 1/5] feat: add file_create tool with disk persistence and
 k8s-pod-rightsizer skill

- Add file_create builtin tool that writes files to disk and returns
  structured JSON with path for channel upload and cross-tool reference
- Files are written to the agent's .forge/files/ directory via FilesDir
  context value, with fallback to $TMPDIR/forge-files/
- Add FilesDir to LLMExecutorConfig and inject into execution context
- Fix Slack file extraction to preserve raw content for typed files
- Add k8s-pod-rightsizer embedded skill with apply workflow instructions
- Update docs for tools, runtime, and skills
---
 docs/runtime.md                               |   14 +
 docs/skills.md                                |   29 +
 docs/tools.md                                 |   31 +
 forge-cli/runtime/runner.go                   |    1 +
 forge-core/runtime/audit.go                   |   17 +-
 forge-core/runtime/loop.go                    |   50 +-
 forge-core/tools/builtins/builtins_test.go    |  193 +++
 forge-core/tools/builtins/file_create.go      |  120 ++
 forge-core/tools/builtins/register.go         |    1 +
 forge-plugins/channels/slack/slack.go         |    8 +-
 .../embedded/k8s-pod-rightsizer/SKILL.md      |  516 ++++++++
 .../scripts/k8s-pod-rightsizer.sh             | 1082 +++++++++++++++++
 forge-skills/local/registry_embedded_test.go  |    5 +-
 13 files changed, 2059 insertions(+), 8 deletions(-)
 create mode 100644 forge-core/tools/builtins/file_create.go
 create mode 100644 forge-skills/local/embedded/k8s-pod-rightsizer/SKILL.md
 create mode 100644 forge-skills/local/embedded/k8s-pod-rightsizer/scripts/k8s-pod-rightsizer.sh
diff --git a/docs/runtime.md b/docs/runtime.md
index d08fd13..230ddd8 100644
--- a/docs/runtime.md
+++ b/docs/runtime.md
@@ -161,6 +161,20 @@ forge serve logs
 
 The daemon forks `forge run` in the background with `setsid`, writes state to `.forge/serve.json`, and redirects output to `.forge/serve.log`. Passphrase prompting for encrypted secrets happens in the parent process (which has TTY access) before forking.
 
+## File Output Directory
+
+The runtime configures a `FilesDir` for tool-generated files (e.g., from `file_create`). This directory defaults to `<WorkDir>/.forge/files/` and is injected into the execution context so tools can write files that other tools can reference by path.
+
+```
+<WorkDir>/
+  .forge/
+    files/        ← file_create output (patches.yaml, reports, etc.)
+    sessions/     ← conversation persistence
+    memory/       ← long-term memory
+```
+
+The `FilesDir` is set via `LLMExecutorConfig.FilesDir` and made available to tools through `runtime.FilesDirFromContext(ctx)`. See [Tools — File Create](tools.md#file-create) for details.
+
 ## Conversation Memory
 
 For details on session persistence, context window management, compaction, and long-term memory, see [Memory](memory.md).
diff --git a/docs/skills.md b/docs/skills.md
index ca3de1f..48c0dbf 100644
--- a/docs/skills.md
+++ b/docs/skills.md
@@ -150,6 +150,7 @@ forge skills list --tags kubernetes,incident-response
 | `tavily-search` | — | Search the web using Tavily AI search API | `tavily-search.sh` |
 | `tavily-research` | — | Deep multi-source research via Tavily API | `tavily-research.sh`, `tavily-research-poll.sh` |
 | `k8s-incident-triage` | sre | Read-only Kubernetes incident triage using kubectl | — (binary-backed) |
+| `k8s-pod-rightsizer` | sre | Analyze workload metrics and produce CPU/memory rightsizing recommendations with optional apply | — (binary-backed) |
 | `code-review` | developer | AI-powered code review for diffs and files | `code-review-diff.sh`, `code-review-file.sh` |
 | `code-review-standards` | developer | Initialize and manage code review standards | — (template-based) |
 | `code-review-github` | developer | Post code review results to GitHub PRs | — (binary-backed) |
@@ -218,6 +219,34 @@ The skill accepts two input modes:
 
 Requires: `kubectl`, optional `KUBECONFIG`, `K8S_API_DOMAIN`, `DEFAULT_NAMESPACE` environment variables.
 
+### Kubernetes Pod Rightsizer Skill
+
+The `k8s-pod-rightsizer` skill analyzes real workload metrics (Prometheus or metrics-server fallback) and produces policy-constrained CPU/memory rightsizing recommendations:
+
+```bash
+forge skills add k8s-pod-rightsizer
+```
+
+This skill operates in three modes:
+
+| Mode | Purpose | Mutates Cluster |
+|------|---------|-----------------|
+| `dry-run` | Report recommendations only (default) | No |
+| `plan` | Generate strategic merge patch YAMLs | No |
+| `apply` | Execute patches with rollback bundle | Yes (requires `i_accept_risk: true`) |
+
+**Key features:**
+
+- Deterministic formulas — no LLM-based guessing for recommendations
+- Policy model with per-namespace and per-workload overrides (safety factors, min/max bounds, step constraints)
+- Prometheus p95 metrics with metrics-server fallback
+- Automatic rollback bundle generation in apply mode
+- Workload classification: over-provisioned, under-provisioned, right-sized, limit-bound, insufficient-data
+
+**Apply workflow:** The skill's built-in `mode=apply` handles rollback bundles, strategic merge patches via `kubectl patch`, and rollout verification. Do not manually run `kubectl apply -f` — use `mode=apply` with `i_accept_risk: true` instead.
+
+Requires: `bash`, `kubectl`, `jq`, `curl`. Optional: `KUBECONFIG`, `K8S_API_DOMAIN`, `PROMETHEUS_URL`, `PROMETHEUS_TOKEN`, `POLICY_FILE`, `DEFAULT_NAMESPACE`.
+
 ### Codegen React Skill
 
 The `codegen-react` skill scaffolds and iterates on **Vite + React** applications with Tailwind CSS:
diff --git a/docs/tools.md b/docs/tools.md
index 030cfd9..3c91835 100644
--- a/docs/tools.md
+++ b/docs/tools.md
@@ -24,6 +24,7 @@ Tools are capabilities that an LLM agent can invoke during execution. Forge prov
 | `uuid_generate` | Generate UUID v4 identifiers |
 | `math_calculate` | Evaluate mathematical expressions |
 | `web_search` | Search the web for quick lookups and recent information |
+| `file_create` | Create a downloadable file, written to the agent's `.forge/files/` directory |
 | `read_skill` | Load full instructions for an available skill on demand |
 | `memory_search` | Search long-term memory (when enabled) |
 | `memory_get` | Read memory files (when enabled) |
@@ -80,6 +81,36 @@ tools:
 | 6 | **Environment isolation** | Only `PATH`, `HOME`, `LANG`, explicit passthrough vars, and proxy vars |
 | 7 | **Output limits** | Configurable max output size (default: 1MB) to prevent memory exhaustion |
 
+## File Create
+
+The `file_create` tool generates downloadable files that are both written to disk and uploaded to the user's channel (Slack/Telegram).
+
+| Field | Description |
+|-------|-------------|
+| `filename` | Name with extension (e.g., `patches.yaml`, `report.json`) |
+| `content` | Full file content as text |
+
+**Output JSON** includes `filename`, `content`, `mime_type`, and `path`. The `path` field contains the absolute disk location, allowing other tools (e.g., `kubectl apply -f <path>`) to reference the file.
+
+**File location:** Files are written to the agent's `.forge/files/` directory (under `WorkDir`). The runtime injects this path via `FilesDir` in the executor context. When running outside the full runtime (e.g., tests), falls back to `$TMPDIR/forge-files/`.
+
+**Allowed extensions:**
+
+| Extension | MIME Type |
+|-----------|-----------|
+| `.md` | `text/markdown` |
+| `.json` | `application/json` |
+| `.yaml`, `.yml` | `text/yaml` |
+| `.txt`, `.log` | `text/plain` |
+| `.csv` | `text/csv` |
+| `.sh` | `text/x-shellscript` |
+| `.xml` | `text/xml` |
+| `.html` | `text/html` |
+| `.py` | `text/x-python` |
+| `.ts` | `text/typescript` |
+
+Filenames with path separators (`/`, `\`) or traversal patterns (`..`) are rejected.
+
 ## Memory Tools
 
 When [long-term memory](memory.md) is enabled, two additional tools are registered:
diff --git a/forge-cli/runtime/runner.go b/forge-cli/runtime/runner.go
index a4410b2..3b3f51a 100644
--- a/forge-cli/runtime/runner.go
+++ b/forge-cli/runtime/runner.go
@@ -383,6 +383,7 @@ func (r *Runner) Run(ctx context.Context) error {
 						Logger:       r.logger,
 						ModelName:    mc.Client.Model,
 						CharBudget:   charBudget,
+						FilesDir:     filepath.Join(r.cfg.WorkDir, ".forge", "files"),
 					}
 
 					// Initialize memory persistence (enabled by default).
diff --git a/forge-core/runtime/audit.go b/forge-core/runtime/audit.go
index 2db26eb..f6af274 100644
--- a/forge-core/runtime/audit.go
+++ b/forge-core/runtime/audit.go
@@ -64,9 +64,10 @@ func (a *AuditLogger) Emit(event AuditEvent) {
 	a.mu.Unlock()
 }
 
-// Context key types for correlation and task IDs.
+// Context key types for correlation IDs, task IDs, and file directories.
 type correlationIDKey struct{}
 type taskIDKey struct{}
+type filesDirKey struct{}
 
 // WithCorrelationID stores a correlation ID in the context.
 func WithCorrelationID(ctx context.Context, id string) context.Context {
@@ -96,6 +97,20 @@ func TaskIDFromContext(ctx context.Context) string {
 	return ""
 }
 
+// WithFilesDir stores a files directory path in the context.
+func WithFilesDir(ctx context.Context, dir string) context.Context {
+	return context.WithValue(ctx, filesDirKey{}, dir)
+}
+
+// FilesDirFromContext retrieves the files directory from the context.
+// Returns "" if not set.
+func FilesDirFromContext(ctx context.Context) string {
+	if dir, ok := ctx.Value(filesDirKey{}).(string); ok {
+		return dir
+	}
+	return ""
+}
+
 // GenerateID produces a 16-character hex random ID using crypto/rand.
 func GenerateID() string {
 	b := make([]byte, 8)
diff --git a/forge-core/runtime/loop.go b/forge-core/runtime/loop.go
index 8de27a4..427f93b 100644
--- a/forge-core/runtime/loop.go
+++ b/forge-core/runtime/loop.go
@@ -31,6 +31,7 @@ type LLMExecutor struct {
 	modelName          string // resolved model name for context budget
 	charBudget         int    // resolved character budget
 	maxToolResultChars int    // computed from char budget
+	filesDir           string // directory for file_create output
 }
 
 // LLMExecutorConfig configures the LLM executor.
@@ -45,6 +46,7 @@ type LLMExecutorConfig struct {
 	Logger        Logger
 	ModelName     string // model name for context-aware budgeting
 	CharBudget    int    // explicit char budget override (0 = auto from model)
+	FilesDir      string // directory for file_create output (default: $TMPDIR/forge-files)
 }
 
 // NewLLMExecutor creates a new LLMExecutor with the given configuration.
@@ -93,11 +95,16 @@ func NewLLMExecutor(cfg LLMExecutorConfig) *LLMExecutor {
 		modelName:          cfg.ModelName,
 		charBudget:         budget,
 		maxToolResultChars: toolLimit,
+		filesDir:           cfg.FilesDir,
 	}
 }
 
 // Execute processes a message through the LLM agent loop.
 func (e *LLMExecutor) Execute(ctx context.Context, task *a2a.Task, msg *a2a.Message) (*a2a.Message, error) {
+	if e.filesDir != "" {
+		ctx = WithFilesDir(ctx, e.filesDir)
+	}
+
 	mem := NewMemory(e.systemPrompt, e.charBudget, e.modelName)
 
 	// Try to recover session from disk. If found, the disk snapshot
@@ -239,13 +246,31 @@ func (e *LLMExecutor) Execute(ctx context.Context, task *a2a.Task, msg *a2a.Mess
 				return nil, fmt.Errorf("after tool exec hook: %w", err)
 			}
 
-			// Track large tool outputs for pass-through in the response.
-			if len(result) > largeToolOutputThreshold {
+			// Handle file_create tool: always create a file part.
+			// For other tools with large output, detect content type.
+			if tc.Function.Name == "file_create" {
+				var fc struct {
+					Filename string `json:"filename"`
+					Content  string `json:"content"`
+					MimeType string `json:"mime_type"`
+				}
+				if err := json.Unmarshal([]byte(result), &fc); err == nil && fc.Filename != "" {
+					largeToolOutputs = append(largeToolOutputs, a2a.Part{
+						Kind: a2a.PartKindFile,
+						File: &a2a.FileContent{
+							Name:     fc.Filename,
+							MimeType: fc.MimeType,
+							Bytes:    []byte(fc.Content),
+						},
+					})
+				}
+			} else if len(result) > largeToolOutputThreshold {
+				name, mime := detectFileType(result, tc.Function.Name)
 				largeToolOutputs = append(largeToolOutputs, a2a.Part{
 					Kind: a2a.PartKindFile,
 					File: &a2a.FileContent{
-						Name:     tc.Function.Name + "-output.md",
-						MimeType: "text/markdown",
+						Name:     name,
+						MimeType: mime,
 						Bytes:    []byte(result),
 					},
 				})
@@ -327,6 +352,23 @@ func a2aMessageToLLM(msg a2a.Message) llm.ChatMessage {
 	}
 }
 
+// detectFileType inspects tool output content and returns an appropriate
+// filename and MIME type. JSON and YAML content gets typed extensions;
+// everything else defaults to markdown.
+func detectFileType(content, toolName string) (filename, mimeType string) {
+	trimmed := strings.TrimSpace(content)
+	if len(trimmed) > 0 && (trimmed[0] == '{' || trimmed[0] == '[') {
+		// Quick check: try to parse as JSON.
+		if json.Valid([]byte(trimmed)) {
+			return toolName + "-output.json", "application/json"
+		}
+	}
+	if strings.HasPrefix(trimmed, "---") {
+		return toolName + "-output.yaml", "text/yaml"
+	}
+	return toolName + "-output.md", "text/markdown"
+}
+
 // llmMessageToA2A converts an LLM chat message to an A2A message.
 // Any extra parts (e.g. large tool output files) are appended after the text part.
 func llmMessageToA2A(msg llm.ChatMessage, extraParts ...a2a.Part) *a2a.Message {
diff --git a/forge-core/tools/builtins/builtins_test.go b/forge-core/tools/builtins/builtins_test.go
index 9add8be..624e237 100644
--- a/forge-core/tools/builtins/builtins_test.go
+++ b/forge-core/tools/builtins/builtins_test.go
@@ -6,7 +6,10 @@ import (
 	"net/http"
 	"net/http/httptest"
 	"os"
+	"path/filepath"
 	"strings"
+
+	"github.com/initializ/forge/forge-core/runtime"
 	"testing"
 
 	"github.com/initializ/forge/forge-core/tools"
@@ -21,6 +24,7 @@ func TestRegisterAll(t *testing.T) {
 	expected := []string{
 		"http_request", "json_parse", "csv_parse",
 		"datetime_now", "uuid_generate", "math_calculate", "web_search",
+		"file_create",
 	}
 	for _, name := range expected {
 		if reg.Get(name) == nil {
@@ -374,6 +378,195 @@ func TestWebSearchTool_ExplicitPerplexity(t *testing.T) {
 	}
 }
 
+func TestFileCreateTool(t *testing.T) {
+	tool := GetByName("file_create")
+	if tool == nil {
+		t.Fatal("expected file_create tool to exist")
+	}
+
+	// Clean up temp files after all subtests.
+	defer func() { _ = os.RemoveAll(filepath.Join(os.TempDir(), "forge-files")) }()
+
+	t.Run("valid YAML file", func(t *testing.T) {
+		args, _ := json.Marshal(map[string]string{
+			"filename": "patches.yaml",
+			"content":  "---\napiVersion: apps/v1\nkind: Deployment",
+		})
+		result, err := tool.Execute(context.Background(), args)
+		if err != nil {
+			t.Fatalf("Execute error: %v", err)
+		}
+		var out map[string]string
+		if err := json.Unmarshal([]byte(result), &out); err != nil {
+			t.Fatalf("output is not valid JSON: %v", err)
+		}
+		if out["filename"] != "patches.yaml" {
+			t.Errorf("filename: got %q, want %q", out["filename"], "patches.yaml")
+		}
+		if out["mime_type"] != "text/yaml" {
+			t.Errorf("mime_type: got %q, want %q", out["mime_type"], "text/yaml")
+		}
+		if out["content"] != "---\napiVersion: apps/v1\nkind: Deployment" {
+			t.Errorf("content mismatch: got %q", out["content"])
+		}
+		// Verify path field and disk persistence.
+		if out["path"] == "" {
+			t.Fatal("expected non-empty path field")
+		}
+		diskContent, err := os.ReadFile(out["path"])
+		if err != nil {
+			t.Fatalf("file not found at path %q: %v", out["path"], err)
+		}
+		if string(diskContent) != out["content"] {
+			t.Errorf("disk content mismatch: got %q, want %q", string(diskContent), out["content"])
+		}
+	})
+
+	t.Run("valid JSON file", func(t *testing.T) {
+		args, _ := json.Marshal(map[string]string{
+			"filename": "report.json",
+			"content":  `{"key":"value"}`,
+		})
+		result, err := tool.Execute(context.Background(), args)
+		if err != nil {
+			t.Fatalf("Execute error: %v", err)
+		}
+		var out map[string]string
+		if err := json.Unmarshal([]byte(result), &out); err != nil {
+			t.Fatalf("output is not valid JSON: %v", err)
+		}
+		if out["mime_type"] != "application/json" {
+			t.Errorf("mime_type: got %q, want %q", out["mime_type"], "application/json")
+		}
+		if out["path"] == "" {
+			t.Fatal("expected non-empty path field")
+		}
+	})
+
+	t.Run("valid Python file", func(t *testing.T) {
+		args, _ := json.Marshal(map[string]string{
+			"filename": "script.py",
+			"content":  "print('hello')",
+		})
+		result, err := tool.Execute(context.Background(), args)
+		if err != nil {
+			t.Fatalf("Execute error: %v", err)
+		}
+		var out map[string]string
+		if err := json.Unmarshal([]byte(result), &out); err != nil {
+			t.Fatalf("output is not valid JSON: %v", err)
+		}
+		if out["mime_type"] != "text/x-python" {
+			t.Errorf("mime_type: got %q, want %q", out["mime_type"], "text/x-python")
+		}
+	})
+
+	t.Run("valid TypeScript file", func(t *testing.T) {
+		args, _ := json.Marshal(map[string]string{
+			"filename": "index.ts",
+			"content":  "const x: number = 1;",
+		})
+		result, err := tool.Execute(context.Background(), args)
+		if err != nil {
+			t.Fatalf("Execute error: %v", err)
+		}
+		var out map[string]string
+		if err := json.Unmarshal([]byte(result), &out); err != nil {
+			t.Fatalf("output is not valid JSON: %v", err)
+		}
+		if out["mime_type"] != "text/typescript" {
+			t.Errorf("mime_type: got %q, want %q", out["mime_type"], "text/typescript")
+		}
+	})
+
+	t.Run("path traversal rejected", func(t *testing.T) {
+		args, _ := json.Marshal(map[string]string{
+			"filename": "../evil.sh",
+			"content":  "rm -rf /",
+		})
+		_, err := tool.Execute(context.Background(), args)
+		if err == nil {
+			t.Error("expected error for path traversal")
+		}
+	})
+
+	t.Run("unsupported extension rejected", func(t *testing.T) {
+		args, _ := json.Marshal(map[string]string{
+			"filename": "malware.exe",
+			"content":  "bad",
+		})
+		_, err := tool.Execute(context.Background(), args)
+		if err == nil {
+			t.Error("expected error for unsupported extension")
+		}
+	})
+
+	t.Run("empty filename rejected", func(t *testing.T) {
+		args, _ := json.Marshal(map[string]string{
+			"filename": "",
+			"content":  "hello",
+		})
+		_, err := tool.Execute(context.Background(), args)
+		if err == nil {
+			t.Error("expected error for empty filename")
+		}
+	})
+
+	t.Run("empty content succeeds", func(t *testing.T) {
+		args, _ := json.Marshal(map[string]string{
+			"filename": "empty.txt",
+			"content":  "",
+		})
+		result, err := tool.Execute(context.Background(), args)
+		if err != nil {
+			t.Fatalf("Execute error: %v", err)
+		}
+		var out map[string]string
+		if err := json.Unmarshal([]byte(result), &out); err != nil {
+			t.Fatalf("output is not valid JSON: %v", err)
+		}
+		if out["content"] != "" {
+			t.Errorf("expected empty content, got %q", out["content"])
+		}
+		// Verify empty file exists on disk.
+		diskContent, err := os.ReadFile(out["path"])
+		if err != nil {
+			t.Fatalf("file not found at path %q: %v", out["path"], err)
+		}
+		if len(diskContent) != 0 {
+			t.Errorf("expected empty file on disk, got %d bytes", len(diskContent))
+		}
+	})
+
+	t.Run("uses FilesDir from context", func(t *testing.T) {
+		customDir := filepath.Join(t.TempDir(), ".forge", "files")
+		ctx := runtime.WithFilesDir(context.Background(), customDir)
+		args, _ := json.Marshal(map[string]string{
+			"filename": "ctx-test.yaml",
+			"content":  "hello: world",
+		})
+		result, err := tool.Execute(ctx, args)
+		if err != nil {
+			t.Fatalf("Execute error: %v", err)
+		}
+		var out map[string]string
+		if err := json.Unmarshal([]byte(result), &out); err != nil {
+			t.Fatalf("output is not valid JSON: %v", err)
+		}
+		wantPath := filepath.Join(customDir, "ctx-test.yaml")
+		if out["path"] != wantPath {
+			t.Errorf("path: got %q, want %q", out["path"], wantPath)
+		}
+		diskContent, err := os.ReadFile(wantPath)
+		if err != nil {
+			t.Fatalf("file not found at %q: %v", wantPath, err)
+		}
+		if string(diskContent) != "hello: world" {
+			t.Errorf("disk content: got %q, want %q", string(diskContent), "hello: world")
+		}
+	})
+}
+
 func TestAllToolsHaveCategory(t *testing.T) {
 	for _, tool := range All() {
 		if tool.Category() != tools.CategoryBuiltin {
diff --git a/forge-core/tools/builtins/file_create.go b/forge-core/tools/builtins/file_create.go
new file mode 100644
index 0000000..a1f4895
--- /dev/null
+++ b/forge-core/tools/builtins/file_create.go
@@ -0,0 +1,120 @@
+package builtins
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"os"
+	"path/filepath"
+	"strings"
+
+	"github.com/initializ/forge/forge-core/runtime"
+	"github.com/initializ/forge/forge-core/tools"
+)
+
+// allowedExtensions maps file extensions to their MIME types.
+var allowedExtensions = map[string]string{
+	".md":   "text/markdown",
+	".json": "application/json",
+	".yaml": "text/yaml",
+	".yml":  "text/yaml",
+	".txt":  "text/plain",
+	".log":  "text/plain",
+	".csv":  "text/csv",
+	".sh":   "text/x-shellscript",
+	".xml":  "text/xml",
+	".html": "text/html",
+	".py":   "text/x-python",
+	".ts":   "text/typescript",
+}
+
+type fileCreateTool struct{}
+
+func (t *fileCreateTool) Name() string { return "file_create" }
+func (t *fileCreateTool) Description() string {
+	return "Create a downloadable file. The file is written to a temporary directory and uploaded to the user's channel (Slack/Telegram). The result includes a 'path' field with the file's location on disk, which can be used with other tools like kubectl apply -f <path>."
+}
+func (t *fileCreateTool) Category() tools.Category { return tools.CategoryBuiltin }
+
+func (t *fileCreateTool) InputSchema() json.RawMessage {
+	return json.RawMessage(`{
+		"type": "object",
+		"properties": {
+			"filename": {
+				"type": "string",
+				"description": "Filename with extension (e.g., patches.yaml, report.json, output.txt, script.py)"
+			},
+			"content": {
+				"type": "string",
+				"description": "The full file content as text"
+			}
+		},
+		"required": ["filename", "content"]
+	}`)
+}
+
+func (t *fileCreateTool) Execute(ctx context.Context, args json.RawMessage) (string, error) {
+	var input struct {
+		Filename string `json:"filename"`
+		Content  string `json:"content"`
+	}
+	if err := json.Unmarshal(args, &input); err != nil {
+		return "", fmt.Errorf("invalid arguments: %w", err)
+	}
+
+	// Validate filename is not empty.
+	if strings.TrimSpace(input.Filename) == "" {
+		return "", fmt.Errorf("filename is required")
+	}
+
+	// Reject path traversal and directory separators.
+	if strings.ContainsAny(input.Filename, "/\\") {
+		return "", fmt.Errorf("filename must not contain path separators")
+	}
+	if input.Filename == "." || input.Filename == ".." {
+		return "", fmt.Errorf("invalid filename")
+	}
+
+	// Validate extension against allowlist.
+	ext := strings.ToLower(filepath.Ext(input.Filename))
+	mime, ok := allowedExtensions[ext]
+	if !ok {
+		supported := make([]string, 0, len(allowedExtensions))
+		for k := range allowedExtensions {
+			supported = append(supported, k)
+		}
+		return "", fmt.Errorf("unsupported file extension %q; supported: %s", ext, strings.Join(supported, ", "))
+	}
+
+	// Write file to the agent's .forge/files directory if available,
+	// otherwise fall back to a system temp directory.
+	dir := runtime.FilesDirFromContext(ctx)
+	if dir == "" {
+		dir = filepath.Join(os.TempDir(), "forge-files")
+	}
+	if err := os.MkdirAll(dir, 0o755); err != nil {
+		return "", fmt.Errorf("creating temp directory: %w", err)
+	}
+	filePath := filepath.Join(dir, input.Filename)
+	if err := os.WriteFile(filePath, []byte(input.Content), 0o644); err != nil {
+		return "", fmt.Errorf("writing file: %w", err)
+	}
+
+	// Return structured JSON for the runtime to parse.
+	out, err := json.Marshal(map[string]string{
+		"filename":  input.Filename,
+		"content":   input.Content,
+		"mime_type": mime,
+		"path":      filePath,
+	})
+	if err != nil {
+		return "", fmt.Errorf("marshalling output: %w", err)
+	}
+	return string(out), nil
+}
+
+// MimeFromExtension returns the MIME type for a given file extension.
+// Returns empty string if the extension is not in the allowlist.
+func MimeFromExtension(ext string) string {
+	return allowedExtensions[strings.ToLower(ext)]
+}
diff --git a/forge-core/tools/builtins/register.go b/forge-core/tools/builtins/register.go
index fafae62..17e218b 100644
--- a/forge-core/tools/builtins/register.go
+++ b/forge-core/tools/builtins/register.go
@@ -12,6 +12,7 @@ func All() []tools.Tool {
 		&uuidGenerateTool{},
 		&mathCalculateTool{},
 		&webSearchTool{},
+		&fileCreateTool{},
 	}
 }
 
diff --git a/forge-plugins/channels/slack/slack.go b/forge-plugins/channels/slack/slack.go
index a2e983c..7ed59cf 100644
--- a/forge-plugins/channels/slack/slack.go
+++ b/forge-plugins/channels/slack/slack.go
@@ -838,7 +838,13 @@ func extractLargestFile(msg *a2a.Message) (content, filename string) {
 	}
 	for _, p := range msg.Parts {
 		if p.Kind == a2a.PartKindFile && p.File != nil && len(p.File.Bytes) > len(content) {
-			content = unwrapJSONContent(string(p.File.Bytes))
+			raw := string(p.File.Bytes)
+			// Only unwrap JSON content for markdown files.
+			// Preserve raw content for explicitly typed files (json, yaml, etc.)
+			if strings.HasSuffix(p.File.Name, ".md") {
+				raw = unwrapJSONContent(raw)
+			}
+			content = raw
 			filename = p.File.Name
 		}
 	}
diff --git a/forge-skills/local/embedded/k8s-pod-rightsizer/SKILL.md b/forge-skills/local/embedded/k8s-pod-rightsizer/SKILL.md
new file mode 100644
index 0000000..4f5a9dc
--- /dev/null
+++ b/forge-skills/local/embedded/k8s-pod-rightsizer/SKILL.md
@@ -0,0 +1,516 @@
+---
+name: k8s-pod-rightsizer
+category: sre
+tags:
+  - kubernetes
+  - rightsizing
+  - cost-optimization
+  - resource-management
+  - prometheus
+  - capacity-planning
+  - kubectl
+description: Analyze Kubernetes workload metrics and produce policy-constrained CPU/memory rightsizing recommendations with optional patch generation and rollback-safe apply.
+metadata:
+  forge:
+    requires:
+      bins:
+        - bash
+        - kubectl
+        - jq
+        - curl
+      env:
+        required: []
+        one_of: []
+        optional:
+          - KUBECONFIG
+          - K8S_API_DOMAIN
+          - PROMETHEUS_URL
+          - PROMETHEUS_TOKEN
+          - POLICY_FILE
+          - DEFAULT_NAMESPACE
+    egress_domains:
+      - "$K8S_API_DOMAIN"
+      - "$PROMETHEUS_URL"
+    denied_tools:
+      - http_request
+      - web_search
+    timeout_hint: 300
+    trust_hints:
+      network: true
+      filesystem: read
+      shell: true
+---
+
+# Kubernetes Pod Rightsizer
+
+Analyzes real Kubernetes workload metrics (Prometheus or metrics-server fallback) and produces policy-constrained recommendations for CPU and memory request/limit adjustments.
+
+Supports three modes:
+
+- **dry-run** — Report recommendations only (default, read-only)
+- **plan** — Generate strategic merge patch YAMLs
+- **apply** — Execute patches with automatic rollback bundle generation
+
+This skill uses deterministic formulas, never LLM-based guessing.
+
+---
+
+## Tool Usage
+
+This skill uses `cli_execute` with `kubectl` and `curl` commands.
+NEVER use http_request or web_search to interact with Kubernetes or Prometheus.
+All cluster operations MUST go through kubectl or the rightsizer script via cli_execute.
+
+---
+
+## Applying Patches
+
+When the user asks to apply rightsizing patches, use the script's built-in `mode=apply` with `i_accept_risk: true`.
+
+**NEVER** manually run `kubectl apply -f <file>` — the script's apply mode provides:
+- Automatic rollback bundle generation (backup of current specs)
+- Strategic merge patches via `kubectl patch`
+- Rollout verification after each patch
+- Action logging
+
+**Correct workflow:**
+1. First run with `mode=dry-run` to show recommendations
+2. If user confirms, run with `mode=apply` and `i_accept_risk: true`
+3. Use `file_create` to provide the user with a downloadable copy of the patches (optional)
+
+**Example:**
+- User: "apply the rightsizing patches" → `{"namespace": "prod", "mode": "apply", "i_accept_risk": true}`
+
+---
+
+## Tool: k8s_pod_rightsizer
+
+Analyze workload resource usage and recommend CPU/memory request and limit changes.
+
+**Input:** namespace (string), workload (string), label_selector (string), mode (string), i_accept_risk (boolean), policy_file (string), lookback (string), output_format (string)
+
+**Output format:** Markdown tables for recommendations. YAML code blocks for patches. JSON for machine-readable output.
+
+### CRITICAL: Mode Field Rules
+
+`mode` controls the **action**, NOT the analysis filter. There are ONLY three valid values:
+
+| mode | Purpose |
+|------|---------|
+| `dry-run` | Analyze and report recommendations (default) |
+| `plan` | Generate patch YAMLs |
+| `apply` | Execute patches (requires `i_accept_risk: true`) |
+
+**NEVER set mode to a classification like "overprovisioned", "underprovisioned", "rightsized", etc.** These are OUTPUT classifications the tool produces, not input modes.
+
+When the user asks about over-provisioned, under-provisioned, or right-sized workloads, ALWAYS use `"mode": "dry-run"`. The output will include a `classification` field for each workload (e.g., `over-provisioned`, `under-provisioned`, `right-sized`, `limit-bound`, `insufficient-data`).
+
+Examples:
+- "which workloads are over-provisioned?" → `{"mode": "dry-run"}` — read classification from output
+- "generate patches for over-provisioned pods" → `{"mode": "plan"}` — patches are generated only for workloads needing changes
+- "find under-provisioned deployments" → `{"mode": "dry-run"}` — read classification from output
+
+---
+
+## Input Modes
+
+### 1) Human Mode (Natural Language)
+
+Input is a plain string.
+
+Examples:
+
+- `rightsize namespace payments-prod` → `{"namespace": "payments-prod", "mode": "dry-run"}`
+- `which workloads are over-provisioned in prod?` → `{"namespace": "prod", "mode": "dry-run"}`
+- `check resource usage for label app=checkout in prod` → `{"namespace": "prod", "label_selector": "app=checkout", "mode": "dry-run"}`
+- `generate patches for over-provisioned workloads in staging` → `{"namespace": "staging", "mode": "plan"}`
+- `apply rightsizing to deployment api-gateway in prod` → `{"namespace": "prod", "workload": "deployment/api-gateway", "mode": "apply", "i_accept_risk": true}`
+
+Behavior:
+
+- Parse namespace, workload, or selector intent.
+- If namespace omitted, use `$DEFAULT_NAMESPACE` if set.
+- Default mode is `dry-run`. ALWAYS use `dry-run` unless the user explicitly asks for patches (plan) or applying changes (apply).
+- Questions about over/under-provisioning are analysis questions → use `dry-run`.
+- Never require the user to remember JSON fields.
+
+---
+
+### 2) Automation Mode (Structured JSON)
+
+Input JSON schema:
+
+```json
+{
+  "namespace": "payments-prod",
+  "workload": "deployment/payments-api",
+  "label_selector": "",
+  "mode": "dry-run",
+  "i_accept_risk": false,
+  "policy_file": "",
+  "lookback": "24h",
+  "output_format": "markdown"
+}
+```
+
+Rules:
+
+- `namespace` is required (or `$DEFAULT_NAMESPACE` must be set).
+- `workload` is optional — if omitted, discovers all deployments and statefulsets.
+- `label_selector` is optional — filters discovered workloads.
+- `mode` must be one of: `dry-run`, `plan`, `apply`.
+- `i_accept_risk` must be `true` for `apply` mode.
+- `output_format`: `markdown` (default), `json`, or `yaml`.
+
+---
+
+## Execution Workflow
+
+### Step 0 — Preconditions
+
+Verify cluster access:
+
+```bash
+kubectl cluster-info --request-timeout=5s
+```
+
+If RBAC denies access, report the error and stop.
+
+Check Prometheus availability if `$PROMETHEUS_URL` is set:
+
+```bash
+curl -s "$PROMETHEUS_URL/api/v1/status/buildinfo"
+```
+
+Fall back to metrics-server if Prometheus is unavailable.
+
+---
+
+### Step 1 — Discover Workloads
+
+If a specific workload is provided, validate it exists:
+
+```bash
+kubectl get <kind> <name> -n <namespace> -o json
+```
+
+Otherwise, discover all deployments and statefulsets:
+
+```bash
+kubectl get deploy,sts -n <namespace> -o json
+```
+
+Filter by `label_selector` if provided. Skip `kube-system` unless explicitly targeted. Extract container resource specs for each workload.
+
+---
+
+### Step 2 — Collect Metrics
+
+**Prometheus (preferred):**
+
+Query p95 CPU and memory usage over the lookback window:
+
+```promql
+quantile_over_time(0.95, rate(container_cpu_usage_seconds_total{namespace="NS",pod=~"WORKLOAD.*",container!="POD"}[5m])[LOOKBACK:1m])
+```
+
+```promql
+quantile_over_time(0.95, container_memory_working_set_bytes{namespace="NS",pod=~"WORKLOAD.*",container!="POD"}[LOOKBACK])
+```
+
+Also collect throttle ratios and OOM kill counts.
+
+**Metrics-server fallback:**
+
+```bash
+kubectl top pod -n <namespace> --containers
+```
+
+When using metrics-server fallback, recommendations are advisory-only. Apply mode is blocked.
+
+---
+
+### Step 3 — Compute Recommendations
+
+All computations use deterministic formulas:
+
+- **Recommended request** = `p95_usage * safety_factor`, clamped to `[policy_min, policy_max]`
+- **Recommended limit** = `recommended_request * burst_multiplier`
+- **Step constraint** — changes smaller than `step_percent` of current value are suppressed (avoids churn)
+
+CPU values are rounded to nearest 10m. Memory values are rounded to nearest MiB.
+
+---
+
+### Step 4 — Generate Report
+
+Output format depends on `output_format` parameter:
+
+- **markdown** — Human-readable tables with workload, container, current vs recommended values, savings estimate, and classification
+- **json** — Machine-readable array of recommendation objects
+- **yaml** — Patch files (plan and apply modes only)
+
+---
+
+### Step 5 — Apply (if mode=apply)
+
+1. Generate rollback bundle (backup of current resource specs)
+2. Show diff preview of all patches
+3. Apply strategic merge patches via `kubectl patch`
+4. Verify rollout status after each patch
+5. Log all actions to `run.log` in the rollback bundle
+
+---
+
+## Policy Model
+
+Policy files define constraints for rightsizing recommendations. Use `$POLICY_FILE` or `--policy-file` to specify.
+
+### Example Policy
+
+```json
+{
+  "defaults": {
+    "cpu_safety_factor": 1.25,
+    "memory_safety_factor": 1.35,
+    "cpu_burst_multiplier": 2.0,
+    "memory_burst_multiplier": 1.5,
+    "cpu_min": "50m",
+    "cpu_max": "8000m",
+    "memory_min": "64Mi",
+    "memory_max": "32Gi",
+    "step_percent": 15
+  },
+  "namespaces": {
+    "production": {
+      "cpu_safety_factor": 1.4,
+      "memory_safety_factor": 1.5,
+      "step_percent": 20
+    }
+  },
+  "workloads": {
+    "production/payments-api": {
+      "cpu_min": "500m",
+      "memory_min": "512Mi"
+    }
+  }
+}
+```
+
+### Field Reference
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `cpu_safety_factor` | float | 1.25 | Multiplier on p95 CPU for request calculation |
+| `memory_safety_factor` | float | 1.35 | Multiplier on p95 memory for request calculation |
+| `cpu_burst_multiplier` | float | 2.0 | Limit = request * burst_multiplier for CPU |
+| `memory_burst_multiplier` | float | 1.5 | Limit = request * burst_multiplier for memory |
+| `cpu_min` | string | 10m | Floor for CPU request recommendations |
+| `cpu_max` | string | 8000m | Ceiling for CPU request recommendations |
+| `memory_min` | string | 32Mi | Floor for memory request recommendations |
+| `memory_max` | string | 32Gi | Ceiling for memory request recommendations |
+| `step_percent` | int | 15 | Minimum change percentage to trigger a recommendation |
+
+### Precedence
+
+Policy values resolve in 3 levels (highest priority first):
+
+1. **Workload override** — `workloads["namespace/name"]`
+2. **Namespace override** — `namespaces["namespace"]`
+3. **Defaults** — `defaults`
+
+Values merge via overlay: workload overrides namespace, which overrides defaults.
+
+---
+
+## Metrics Strategy
+
+### Prometheus (Preferred)
+
+When `$PROMETHEUS_URL` is set, the skill queries Prometheus for high-fidelity metrics:
+
+| Metric | PromQL Pattern |
+|--------|---------------|
+| p95 CPU | `quantile_over_time(0.95, rate(container_cpu_usage_seconds_total{...}[5m])[LOOKBACK:1m])` |
+| p95 Memory | `quantile_over_time(0.95, container_memory_working_set_bytes{...}[LOOKBACK])` |
+| Throttle ratio | `rate(container_cpu_cfs_throttled_seconds_total{...}[LOOKBACK]) / rate(container_cpu_cfs_periods_total{...}[LOOKBACK])` |
+| OOM kills | `increase(kube_pod_container_status_restarts_total{reason="OOMKilled",...}[LOOKBACK])` |
+
+Authentication via `$PROMETHEUS_TOKEN` (Bearer token) if set.
+
+### Metrics-Server Fallback
+
+When Prometheus is unavailable, falls back to:
+
+```bash
+kubectl top pod -n <namespace> --containers
+```
+
+Limitations:
+
+- Point-in-time snapshot only (no percentile data)
+- Recommendations are advisory-only
+- Apply mode is blocked
+- Step constraint is doubled (30% minimum change)
+
+---
+
+## Decision Engine
+
+All computations are deterministic and performed via `jq` arithmetic.
+
+### Request Calculation
+
+```
+raw_request = p95_usage * safety_factor
+clamped_request = clamp(raw_request, policy_min, policy_max)
+recommended_request = round(clamped_request)
+```
+
+### Limit Calculation
+
+```
+recommended_limit = recommended_request * burst_multiplier
+clamped_limit = clamp(recommended_limit, recommended_request, policy_max)
+```
+
+### Step Constraint
+
+A recommendation is only emitted if:
+
+```
+abs(recommended - current) / current >= step_percent / 100
+```
+
+This prevents churn from minor fluctuations.
+
+### Rounding
+
+- CPU: rounded to nearest 10m (e.g., 137m → 140m)
+- Memory: rounded to nearest MiB (e.g., 127.3Mi → 128Mi)
+
+---
+
+## Detection Heuristics
+
+Each container is classified into one of these patterns:
+
+| Pattern | Condition |
+|---------|-----------|
+| **Over-provisioned CPU** | CPU request > p95 CPU * safety_factor * 2 |
+| **Under-provisioned CPU** | CPU request < p95 CPU * 0.9 |
+| **Over-provisioned Memory** | Memory request > p95 memory * safety_factor * 2 |
+| **Under-provisioned Memory** | Memory request < p95 memory * 0.9 |
+| **Limit-bound (throttled)** | Throttle ratio > 0.1 or OOM kills > 0 |
+| **Right-sized** | Within step_percent of recommended values |
+| **Insufficient data** | Fewer than 10 data points in lookback window |
+
+---
+
+## Output Formats
+
+### Markdown Report (default)
+
+```markdown
+| Workload | Container | Resource | Current | Recommended | Change | Classification |
+|----------|-----------|----------|---------|-------------|--------|----------------|
+| deploy/api | app | CPU req | 1000m | 400m | -60% | Over-provisioned |
+| deploy/api | app | CPU lim | 2000m | 800m | -60% | Over-provisioned |
+| deploy/api | app | Mem req | 2Gi | 1Gi | -50% | Over-provisioned |
+| deploy/api | app | Mem lim | 4Gi | 1536Mi | -63% | Over-provisioned |
+```
+
+### JSON Output
+
+```json
+[
+  {
+    "workload": "deployment/api",
+    "container": "app",
+    "cpu_request": {"current": "1000m", "recommended": "400m", "change_percent": -60},
+    "cpu_limit": {"current": "2000m", "recommended": "800m", "change_percent": -60},
+    "memory_request": {"current": "2Gi", "recommended": "1Gi", "change_percent": -50},
+    "memory_limit": {"current": "4Gi", "recommended": "1536Mi", "change_percent": -63},
+    "classification": "over-provisioned"
+  }
+]
+```
+
+### Patch YAMLs (plan/apply modes)
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: api
+  namespace: payments-prod
+spec:
+  template:
+    spec:
+      containers:
+        - name: app
+          resources:
+            requests:
+              cpu: "400m"
+              memory: "1Gi"
+            limits:
+              cpu: "800m"
+              memory: "1536Mi"
+```
+
+---
+
+## Rollback
+
+When `mode=apply`, a rollback bundle is generated before any patches are applied:
+
+```
+rollback-<timestamp>/
+  backup-<workload>.json    # Current resource specs
+  patch-<workload>.json     # Applied patches
+  rollback-<workload>.sh    # kubectl patch commands to restore
+  run.log                   # Timestamped action log
+```
+
+To roll back:
+
+```bash
+bash rollback-<timestamp>/rollback-<workload>.sh
+```
+
+---
+
+## Safety Constraints
+
+This skill MUST:
+
+- Default to `dry-run` mode — never mutate without explicit mode selection.
+- Require `i_accept_risk: true` for `apply` mode.
+- Generate rollback bundles before applying any patch.
+- Never delete workloads, pods, namespaces, or any Kubernetes resource.
+- Never modify RBAC, NetworkPolicy, or Secret resources.
+- Never scale replicas.
+- Only patch `spec.template.spec.containers[].resources`.
+- Block `apply` mode when using metrics-server fallback (insufficient data fidelity).
+- Validate all policy values before use.
+- Cap lookback window at 30 days.
+- Skip `kube-system` namespace unless explicitly targeted.
+- Respect step constraints to avoid recommendation churn.
+- Log all mutations to the rollback bundle run.log.
+
+---
+
+## Autonomous Compatibility
+
+This skill is designed to be invoked by:
+
+- Humans via natural language CLI
+- Automation pipelines via structured JSON
+- Scheduled cost-optimization sweeps
+
+It must:
+
+- Be idempotent (repeated runs produce the same recommendations for the same data)
+- Produce deterministic results (no LLM-based guessing)
+- Be scope-limited (operates only on specified namespace/workload)
+- Generate machine-parseable output for downstream processing
diff --git a/forge-skills/local/embedded/k8s-pod-rightsizer/scripts/k8s-pod-rightsizer.sh b/forge-skills/local/embedded/k8s-pod-rightsizer/scripts/k8s-pod-rightsizer.sh
new file mode 100644
index 0000000..9021cef
--- /dev/null
+++ b/forge-skills/local/embedded/k8s-pod-rightsizer/scripts/k8s-pod-rightsizer.sh
@@ -0,0 +1,1082 @@
+#!/usr/bin/env bash
+# k8s-pod-rightsizer.sh — Analyze Kubernetes workload metrics and produce
+# policy-constrained CPU/memory rightsizing recommendations.
+#
+# Usage: ./k8s-pod-rightsizer.sh '{"namespace":"prod","mode":"dry-run"}'
+#
+# Requires: kubectl, jq, curl (for Prometheus), bash.
+set -euo pipefail
+
+###############################################################################
+# Constants & Defaults
+###############################################################################
+
+# Default policy values (used when no POLICY_FILE is provided)
+DEFAULT_CPU_SAFETY_FACTOR="1.25"
+DEFAULT_MEMORY_SAFETY_FACTOR="1.35"
+DEFAULT_CPU_BURST_MULTIPLIER="2.0"
+DEFAULT_MEMORY_BURST_MULTIPLIER="1.5"
+DEFAULT_CPU_MIN_MILLI="10"
+DEFAULT_CPU_MAX_MILLI="8000"
+DEFAULT_MEMORY_MIN_MI="32"
+DEFAULT_MEMORY_MAX_MI="32768"
+DEFAULT_STEP_PERCENT="15"
+DEFAULT_LOOKBACK="24h"
+
+# Metrics source flag
+METRICS_SOURCE="none"
+ADVISORY_ONLY="false"
+
+# Temp directory with cleanup trap
+TMPDIR_WORK=$(mktemp -d)
+trap 'rm -rf "$TMPDIR_WORK"' EXIT
+
+###############################################################################
+# Input Parsing & Validation
+###############################################################################
+
+INPUT="${1:-}"
+if [ -z "$INPUT" ]; then
+  echo '{"error":"usage: k8s-pod-rightsizer.sh {\"namespace\":\"...\",\"mode\":\"dry-run\"}"}' >&2
+  exit 1
+fi
+
+if ! echo "$INPUT" | jq empty 2>/dev/null; then
+  echo '{"error":"invalid JSON input"}' >&2
+  exit 1
+fi
+
+NAMESPACE=$(echo "$INPUT" | jq -r '.namespace // empty')
+WORKLOAD=$(echo "$INPUT" | jq -r '.workload // empty')
+LABEL_SELECTOR=$(echo "$INPUT" | jq -r '.label_selector // empty')
+MODE=$(echo "$INPUT" | jq -r '.mode // "dry-run"')
+I_ACCEPT_RISK=$(echo "$INPUT" | jq -r '.i_accept_risk // false')
+POLICY_FILE_INPUT=$(echo "$INPUT" | jq -r '.policy_file // empty')
+LOOKBACK=$(echo "$INPUT" | jq -r '.lookback // empty')
+OUTPUT_FORMAT=$(echo "$INPUT" | jq -r '.output_format // "markdown"')
+
+# Resolve namespace
+if [ -z "$NAMESPACE" ]; then
+  NAMESPACE="${DEFAULT_NAMESPACE:-}"
+fi
+if [ -z "$NAMESPACE" ]; then
+  echo '{"error":"namespace is required (provide in input or set DEFAULT_NAMESPACE)"}' >&2
+  exit 1
+fi
+
+# Normalize mode — map common synonyms to canonical values
+case "$MODE" in
+  dry-run|dryrun|dry_run|report|check|analyze|analysis|overprovisioned|over-provisioned|underprovisioned|under-provisioned)
+    MODE="dry-run"
+    ;;
+  plan|patch|patches|generate-patches|generate_patches|diff)
+    MODE="plan"
+    ;;
+  apply|execute|run)
+    MODE="apply"
+    ;;
+  *)
+    echo "{\"error\":\"invalid mode '$MODE': must be dry-run, plan, or apply\"}" >&2
+    exit 1
+    ;;
+esac
+
+# Validate apply prerequisites
+if [ "$MODE" = "apply" ] && [ "$I_ACCEPT_RISK" != "true" ]; then
+  echo '{"error":"apply mode requires i_accept_risk: true"}' >&2
+  exit 1
+fi
+
+# Validate output format
+case "$OUTPUT_FORMAT" in
+  markdown|json|yaml) ;;
+  *)
+    echo "{\"error\":\"invalid output_format '$OUTPUT_FORMAT': must be markdown, json, or yaml\"}" >&2
+    exit 1
+    ;;
+esac
+
+# Set lookback with default
+if [ -z "$LOOKBACK" ]; then
+  LOOKBACK="$DEFAULT_LOOKBACK"
+fi
+
+# Validate lookback format and cap at 30d
+LOOKBACK_HOURS=$(echo "$LOOKBACK" | jq -Rr '
+  if test("^[0-9]+h$") then ltrimstr("") | rtrimstr("h") | tonumber
+  elif test("^[0-9]+d$") then ltrimstr("") | rtrimstr("d") | tonumber * 24
+  else -1
+  end
+')
+if [ "$LOOKBACK_HOURS" -lt 0 ] 2>/dev/null; then
+  echo '{"error":"invalid lookback format: use Nh or Nd (e.g., 24h, 7d)"}' >&2
+  exit 1
+fi
+if [ "$LOOKBACK_HOURS" -gt 720 ]; then
+  echo '{"error":"lookback cannot exceed 30d (720h)"}' >&2
+  exit 1
+fi
+
+# Resolve policy file
+POLICY_FILE="${POLICY_FILE_INPUT:-${POLICY_FILE:-}}"
+
+###############################################################################
+# Policy Functions
+###############################################################################
+
+policy_load() {
+  if [ -n "$POLICY_FILE" ] && [ -f "$POLICY_FILE" ]; then
+    if ! jq empty "$POLICY_FILE" 2>/dev/null; then
+      echo '{"error":"invalid JSON in policy file"}' >&2
+      exit 1
+    fi
+    cat "$POLICY_FILE"
+  else
+    # Return empty policy (will use defaults)
+    echo '{}'
+  fi
+}
+
+resolve_policy() {
+  local policy_json="$1"
+  local ns="$2"
+  local workload_key="$3"  # "namespace/name" or empty
+
+  # Build effective policy: defaults → namespace override → workload override
+  jq -n --argjson policy "$policy_json" \
+    --arg ns "$ns" \
+    --arg wk "$workload_key" \
+    --argjson d_csf "$DEFAULT_CPU_SAFETY_FACTOR" \
+    --argjson d_msf "$DEFAULT_MEMORY_SAFETY_FACTOR" \
+    --argjson d_cbm "$DEFAULT_CPU_BURST_MULTIPLIER" \
+    --argjson d_mbm "$DEFAULT_MEMORY_BURST_MULTIPLIER" \
+    --argjson d_cmin "$DEFAULT_CPU_MIN_MILLI" \
+    --argjson d_cmax "$DEFAULT_CPU_MAX_MILLI" \
+    --argjson d_mmin "$DEFAULT_MEMORY_MIN_MI" \
+    --argjson d_mmax "$DEFAULT_MEMORY_MAX_MI" \
+    --argjson d_step "$DEFAULT_STEP_PERCENT" '
+    {
+      cpu_safety_factor: $d_csf,
+      memory_safety_factor: $d_msf,
+      cpu_burst_multiplier: $d_cbm,
+      memory_burst_multiplier: $d_mbm,
+      cpu_min_milli: $d_cmin,
+      cpu_max_milli: $d_cmax,
+      memory_min_mi: $d_mmin,
+      memory_max_mi: $d_mmax,
+      step_percent: $d_step
+    } as $builtin_defaults |
+    ($policy.defaults // {}) as $user_defaults |
+    ($policy.namespaces[$ns] // {}) as $ns_override |
+    (if $wk != "" then ($policy.workloads[$wk] // {}) else {} end) as $wk_override |
+    # Merge user defaults over builtin, converting cpu_min/memory_min string values
+    ($builtin_defaults + ($user_defaults | to_entries | map(
+      if .key == "cpu_min" then {key: "cpu_min_milli", value: (.value | tostring | gsub("m$";"") | tonumber)}
+      elif .key == "cpu_max" then {key: "cpu_max_milli", value: (.value | tostring | gsub("m$";"") | tonumber)}
+      elif .key == "memory_min" then {key: "memory_min_mi", value: (.value | tostring | gsub("Mi$";"") | tonumber)}
+      elif .key == "memory_max" then {key: "memory_max_mi", value: (.value | tostring | gsub("Gi$";"") | tonumber * 1024)}
+      else .
+      end
+    ) | from_entries)) as $merged_defaults |
+    # Apply namespace override
+    ($merged_defaults + ($ns_override | to_entries | map(
+      if .key == "cpu_min" then {key: "cpu_min_milli", value: (.value | tostring | gsub("m$";"") | tonumber)}
+      elif .key == "cpu_max" then {key: "cpu_max_milli", value: (.value | tostring | gsub("m$";"") | tonumber)}
+      elif .key == "memory_min" then {key: "memory_min_mi", value: (.value | tostring | gsub("Mi$";"") | tonumber)}
+      elif .key == "memory_max" then {key: "memory_max_mi", value: (.value | tostring | gsub("Gi$";"") | tonumber * 1024)}
+      else .
+      end
+    ) | from_entries)) as $after_ns |
+    # Apply workload override
+    ($after_ns + ($wk_override | to_entries | map(
+      if .key == "cpu_min" then {key: "cpu_min_milli", value: (.value | tostring | gsub("m$";"") | tonumber)}
+      elif .key == "cpu_max" then {key: "cpu_max_milli", value: (.value | tostring | gsub("m$";"") | tonumber)}
+      elif .key == "memory_min" then {key: "memory_min_mi", value: (.value | tostring | gsub("Mi$";"") | tonumber)}
+      elif .key == "memory_max" then {key: "memory_max_mi", value: (.value | tostring | gsub("Gi$";"") | tonumber * 1024)}
+      else .
+      end
+    ) | from_entries))
+  '
+}
+
+validate_policy() {
+  local eff_policy="$1"
+  local valid
+  valid=$(echo "$eff_policy" | jq '
+    if .cpu_safety_factor < 1 then "cpu_safety_factor must be >= 1"
+    elif .memory_safety_factor < 1 then "memory_safety_factor must be >= 1"
+    elif .cpu_burst_multiplier < 1 then "cpu_burst_multiplier must be >= 1"
+    elif .memory_burst_multiplier < 1 then "memory_burst_multiplier must be >= 1"
+    elif .cpu_min_milli < 0 then "cpu_min_milli must be >= 0"
+    elif .cpu_max_milli < .cpu_min_milli then "cpu_max_milli must be >= cpu_min_milli"
+    elif .memory_min_mi < 0 then "memory_min_mi must be >= 0"
+    elif .memory_max_mi < .memory_min_mi then "memory_max_mi must be >= memory_min_mi"
+    elif .step_percent < 0 or .step_percent > 100 then "step_percent must be between 0 and 100"
+    else "ok"
+    end
+  ' -r)
+  if [ "$valid" != "ok" ]; then
+    echo "{\"error\":\"policy validation failed: $valid\"}" >&2
+    exit 1
+  fi
+}
+
+###############################################################################
+# Preflight
+###############################################################################
+
+preflight() {
+  # Use the user's existing kubeconfig — kubectl reads $KUBECONFIG or ~/.kube/config by default
+  local kc="${KUBECONFIG:-${HOME}/.kube/config}"
+  if [ ! -f "$kc" ] && [ -z "${KUBECONFIG:-}" ]; then
+    echo "{\"error\":\"no kubeconfig found at ${kc} — set KUBECONFIG or configure kubectl\"}" >&2
+    exit 1
+  fi
+
+  local cluster_err
+  if ! cluster_err=$(kubectl cluster-info --request-timeout=10s 2>&1); then
+    echo "{\"error\":\"cannot connect to Kubernetes cluster: $(echo "$cluster_err" | head -1 | tr '"' "'")\"}" >&2
+    exit 1
+  fi
+}
+
+###############################################################################
+# Discovery Functions
+###############################################################################
+
+discover_workloads() {
+  local ns="$1"
+  local workload_filter="$2"
+  local label_sel="$3"
+
+  if [ -n "$workload_filter" ]; then
+    # Specific workload — parse kind/name
+    local kind name
+    if echo "$workload_filter" | grep -q '/'; then
+      kind=$(echo "$workload_filter" | cut -d'/' -f1)
+      name=$(echo "$workload_filter" | cut -d'/' -f2)
+    else
+      # Assume deployment if no kind specified
+      kind="deployment"
+      name="$workload_filter"
+    fi
+
+    local result
+    if ! result=$(kubectl get "$kind" "$name" -n "$ns" -o json 2>&1); then
+      echo "{\"error\":\"workload $kind/$name not found in namespace $ns: $result\"}" >&2
+      exit 1
+    fi
+    # Wrap single workload into items array
+    echo "$result" | jq '{items: [.]}'
+  else
+    # Discover all deployments and statefulsets
+    local selector_args=""
+    if [ -n "$label_sel" ]; then
+      selector_args="-l $label_sel"
+    fi
+
+    local deploys sts
+    # shellcheck disable=SC2086
+    deploys=$(kubectl get deploy -n "$ns" $selector_args -o json 2>/dev/null || echo '{"items":[]}')
+    # shellcheck disable=SC2086
+    sts=$(kubectl get sts -n "$ns" $selector_args -o json 2>/dev/null || echo '{"items":[]}')
+
+    # Merge items from both
+    jq -n --argjson d "$deploys" --argjson s "$sts" '{items: ($d.items + $s.items)}'
+  fi
+}
+
+extract_containers() {
+  # Extract container resource specs from workload JSON
+  local workload_json="$1"
+  echo "$workload_json" | jq '[
+    .items[] |
+    . as $wl |
+    {
+      kind: .kind,
+      name: .metadata.name,
+      namespace: .metadata.namespace
+    } as $meta |
+    .spec.template.spec.containers[] |
+    {
+      workload_kind: ($meta.kind | ascii_downcase),
+      workload_name: $meta.name,
+      namespace: $meta.namespace,
+      container: .name,
+      current_cpu_request: (.resources.requests.cpu // "0"),
+      current_cpu_limit: (.resources.limits.cpu // "0"),
+      current_memory_request: (.resources.requests.memory // "0"),
+      current_memory_limit: (.resources.limits.memory // "0")
+    }
+  ]'
+}
+
+###############################################################################
+# Unit Conversion Helpers (via jq)
+###############################################################################
+
+# Convert CPU string (e.g., "500m", "1", "2.5") to millicores integer
+cpu_to_milli() {
+  local val="$1"
+  echo "$val" | jq -Rr '
+    if . == "0" or . == "" then 0
+    elif test("m$") then rtrimstr("m") | tonumber
+    else tonumber * 1000
+    end | floor
+  '
+}
+
+# Convert memory string (e.g., "512Mi", "1Gi", "1073741824") to MiB integer
+memory_to_mi() {
+  local val="$1"
+  echo "$val" | jq -Rr '
+    if . == "0" or . == "" then 0
+    elif test("Gi$") then rtrimstr("Gi") | tonumber * 1024
+    elif test("Mi$") then rtrimstr("Mi") | tonumber
+    elif test("Ki$") then rtrimstr("Ki") | tonumber / 1024
+    elif test("G$") then rtrimstr("G") | tonumber * 1000000000 / 1048576
+    elif test("M$") then rtrimstr("M") | tonumber * 1000000 / 1048576
+    elif test("K$") then rtrimstr("K") | tonumber * 1000 / 1048576
+    else tonumber / 1048576
+    end | floor
+  '
+}
+
+###############################################################################
+# Prometheus Metrics
+###############################################################################
+
+query_prom() {
+  local promql="$1"
+  local prom_url="${PROMETHEUS_URL:-}"
+
+  if [ -z "$prom_url" ]; then
+    return 1
+  fi
+
+  local auth_header=""
+  if [ -n "${PROMETHEUS_TOKEN:-}" ]; then
+    auth_header="Authorization: Bearer ${PROMETHEUS_TOKEN}"
+  fi
+
+  local response http_code body
+  if [ -n "$auth_header" ]; then
+    response=$(curl -s -w "\n%{http_code}" --max-time 30 \
+      -G "${prom_url}/api/v1/query" \
+      --data-urlencode "query=${promql}" \
+      -H "$auth_header")
+  else
+    response=$(curl -s -w "\n%{http_code}" --max-time 30 \
+      -G "${prom_url}/api/v1/query" \
+      --data-urlencode "query=${promql}")
+  fi
+
+  http_code=$(echo "$response" | tail -1)
+  body=$(echo "$response" | sed '$d')
+
+  if [ "$http_code" -ne 200 ]; then
+    echo "" # Return empty on failure
+    return 1
+  fi
+
+  echo "$body"
+}
+
+get_metrics_prom() {
+  local ns="$1"
+  local pod_prefix="$2"
+  local container="$3"
+  local lookback_val="$4"
+
+  # p95 CPU usage (cores)
+  local cpu_query="quantile_over_time(0.95, rate(container_cpu_usage_seconds_total{namespace=\"${ns}\",pod=~\"${pod_prefix}.*\",container=\"${container}\"}[5m])[${lookback_val}:1m])"
+  local cpu_result
+  cpu_result=$(query_prom "$cpu_query" 2>/dev/null || echo "")
+
+  # p95 Memory usage (bytes)
+  local mem_query="quantile_over_time(0.95, container_memory_working_set_bytes{namespace=\"${ns}\",pod=~\"${pod_prefix}.*\",container=\"${container}\"}[${lookback_val}])"
+  local mem_result
+  mem_result=$(query_prom "$mem_query" 2>/dev/null || echo "")
+
+  # Throttle ratio
+  local throttle_query="rate(container_cpu_cfs_throttled_seconds_total{namespace=\"${ns}\",pod=~\"${pod_prefix}.*\",container=\"${container}\"}[${lookback_val}]) / rate(container_cpu_cfs_periods_total{namespace=\"${ns}\",pod=~\"${pod_prefix}.*\",container=\"${container}\"}[${lookback_val}])"
+  local throttle_result
+  throttle_result=$(query_prom "$throttle_query" 2>/dev/null || echo "")
+
+  # OOM kills
+  local oom_query="increase(kube_pod_container_status_restarts_total{namespace=\"${ns}\",pod=~\"${pod_prefix}.*\",container=\"${container}\",reason=\"OOMKilled\"}[${lookback_val}])"
+  local oom_result
+  oom_result=$(query_prom "$oom_query" 2>/dev/null || echo "")
+
+  # Extract values, defaulting to empty on parse failure
+  local p95_cpu_cores p95_mem_bytes throttle_ratio oom_kills
+
+  p95_cpu_cores=$(echo "${cpu_result:-}" | jq -r '.data.result[0].value[1] // empty' 2>/dev/null || echo "")
+  p95_mem_bytes=$(echo "${mem_result:-}" | jq -r '.data.result[0].value[1] // empty' 2>/dev/null || echo "")
+  throttle_ratio=$(echo "${throttle_result:-}" | jq -r '.data.result[0].value[1] // empty' 2>/dev/null || echo "")
+  oom_kills=$(echo "${oom_result:-}" | jq -r '.data.result[0].value[1] // empty' 2>/dev/null || echo "")
+
+  # Convert to millicores and MiB
+  local p95_cpu_milli="0"
+  local p95_mem_mi="0"
+  if [ -n "$p95_cpu_cores" ]; then
+    p95_cpu_milli=$(echo "$p95_cpu_cores" | jq -r '. | tonumber * 1000 | floor')
+  fi
+  if [ -n "$p95_mem_bytes" ]; then
+    p95_mem_mi=$(echo "$p95_mem_bytes" | jq -r '. | tonumber / 1048576 | floor')
+  fi
+
+  jq -n \
+    --argjson cpu "$p95_cpu_milli" \
+    --argjson mem "$p95_mem_mi" \
+    --arg throttle "${throttle_ratio:-0}" \
+    --arg oom "${oom_kills:-0}" \
+    --arg source "prometheus" '{
+      p95_cpu_milli: $cpu,
+      p95_memory_mi: $mem,
+      throttle_ratio: ($throttle | tonumber // 0),
+      oom_kills: ($oom | tonumber | floor // 0),
+      source: $source
+    }'
+}
+
+###############################################################################
+# kubectl top Fallback
+###############################################################################
+
+get_metrics_top() {
+  local ns="$1"
+  local pod_prefix="$2"
+  local container="$3"
+
+  ADVISORY_ONLY="true"
+
+  local top_output
+  if ! top_output=$(kubectl top pod -n "$ns" --containers 2>&1); then
+    echo '{"error":"metrics-server unavailable: '"$(echo "$top_output" | head -1)"'"}' >&2
+    return 1
+  fi
+
+  # Parse kubectl top output for matching pods/containers
+  # Format: POD_NAME CONTAINER CPU(cores) MEMORY(bytes)
+  local cpu_milli mem_mi
+  cpu_milli=$(echo "$top_output" | grep -E "^${pod_prefix}" | awk -v c="$container" '$2 == c {print $3}' | head -1 || echo "")
+  mem_mi=$(echo "$top_output" | grep -E "^${pod_prefix}" | awk -v c="$container" '$2 == c {print $4}' | head -1 || echo "")
+
+  # Convert from kubectl top format
+  if [ -z "$cpu_milli" ]; then
+    cpu_milli="0"
+  else
+    cpu_milli=$(cpu_to_milli "$cpu_milli")
+  fi
+  if [ -z "$mem_mi" ]; then
+    mem_mi="0"
+  else
+    mem_mi=$(memory_to_mi "$mem_mi")
+  fi
+
+  jq -n \
+    --argjson cpu "$cpu_milli" \
+    --argjson mem "$mem_mi" '{
+      p95_cpu_milli: $cpu,
+      p95_memory_mi: $mem,
+      throttle_ratio: 0,
+      oom_kills: 0,
+      source: "metrics-server"
+    }'
+}
+
+###############################################################################
+# Compute Engine
+###############################################################################
+
+compute_recommendation() {
+  local container_info="$1"
+  local metrics="$2"
+  local eff_policy="$3"
+
+  jq -n --argjson c "$container_info" --argjson m "$metrics" --argjson p "$eff_policy" \
+    --arg advisory "$ADVISORY_ONLY" '
+    # Parse current values to millicores/MiB
+    def parse_cpu:
+      if . == "0" or . == "" or . == null then 0
+      elif test("m$") then rtrimstr("m") | tonumber
+      else tonumber * 1000
+      end | floor;
+    def parse_mem:
+      if . == "0" or . == "" or . == null then 0
+      elif test("Gi$") then rtrimstr("Gi") | tonumber * 1024
+      elif test("Mi$") then rtrimstr("Mi") | tonumber
+      elif test("Ki$") then rtrimstr("Ki") | tonumber / 1024
+      else tonumber / 1048576
+      end | floor;
+
+    # Clamp helper
+    def clamp(lo; hi): if . < lo then lo elif . > hi then hi else . end;
+
+    # Round CPU to nearest 10m
+    def round_cpu: ((. + 5) / 10 | floor) * 10 | if . < 10 then 10 else . end;
+
+    # Round memory to nearest MiB (already integer)
+    def round_mem: if . < 1 then 1 else . | floor end;
+
+    ($c.current_cpu_request | parse_cpu) as $cur_cpu_req |
+    ($c.current_cpu_limit | parse_cpu) as $cur_cpu_lim |
+    ($c.current_memory_request | parse_mem) as $cur_mem_req |
+    ($c.current_memory_limit | parse_mem) as $cur_mem_lim |
+
+    $m.p95_cpu_milli as $p95_cpu |
+    $m.p95_memory_mi as $p95_mem |
+
+    # Step percent (doubled for advisory mode)
+    (if $advisory == "true" then ($p.step_percent * 2) else $p.step_percent end) as $step |
+
+    # Compute recommended CPU request
+    ($p95_cpu * $p.cpu_safety_factor | round_cpu | clamp($p.cpu_min_milli; $p.cpu_max_milli)) as $rec_cpu_req |
+    # Compute recommended CPU limit
+    ($rec_cpu_req * $p.cpu_burst_multiplier | round_cpu | clamp($rec_cpu_req; $p.cpu_max_milli)) as $rec_cpu_lim |
+    # Compute recommended memory request
+    ($p95_mem * $p.memory_safety_factor | round_mem | clamp($p.memory_min_mi; $p.memory_max_mi)) as $rec_mem_req |
+    # Compute recommended memory limit
+    ($rec_mem_req * $p.memory_burst_multiplier | round_mem | clamp($rec_mem_req; $p.memory_max_mi)) as $rec_mem_lim |
+
+    # Compute change percentages
+    (if $cur_cpu_req > 0 then (($rec_cpu_req - $cur_cpu_req) / $cur_cpu_req * 100 | floor) else 100 end) as $cpu_req_pct |
+    (if $cur_cpu_lim > 0 then (($rec_cpu_lim - $cur_cpu_lim) / $cur_cpu_lim * 100 | floor) else 100 end) as $cpu_lim_pct |
+    (if $cur_mem_req > 0 then (($rec_mem_req - $cur_mem_req) / $cur_mem_req * 100 | floor) else 100 end) as $mem_req_pct |
+    (if $cur_mem_lim > 0 then (($rec_mem_lim - $cur_mem_lim) / $cur_mem_lim * 100 | floor) else 100 end) as $mem_lim_pct |
+
+    # Check step constraint — suppress if change is too small
+    (if $cur_cpu_req > 0 then (($cpu_req_pct | fabs) >= $step) else true end) as $cpu_changed |
+    (if $cur_mem_req > 0 then (($mem_req_pct | fabs) >= $step) else true end) as $mem_changed |
+    ($cpu_changed or $mem_changed) as $has_recommendation |
+
+    # Classification
+    (if $p95_cpu == 0 and $p95_mem == 0 then "insufficient-data"
+     elif $m.oom_kills > 0 then "limit-bound"
+     elif $m.throttle_ratio > 0.1 then "limit-bound"
+     elif ($cur_cpu_req > 0 and $cur_cpu_req > ($p95_cpu * $p.cpu_safety_factor * 2)) then "over-provisioned"
+     elif ($cur_mem_req > 0 and $cur_mem_req > ($p95_mem * $p.memory_safety_factor * 2)) then "over-provisioned"
+     elif ($cur_cpu_req > 0 and $cur_cpu_req < ($p95_cpu * 0.9)) then "under-provisioned"
+     elif ($cur_mem_req > 0 and $cur_mem_req < ($p95_mem * 0.9)) then "under-provisioned"
+     elif ($has_recommendation | not) then "right-sized"
+     else "adjust"
+     end) as $classification |
+
+    {
+      workload_kind: $c.workload_kind,
+      workload_name: $c.workload_name,
+      namespace: $c.namespace,
+      container: $c.container,
+      metrics_source: $m.source,
+      classification: $classification,
+      has_recommendation: $has_recommendation,
+      cpu_request: {
+        current_milli: $cur_cpu_req,
+        recommended_milli: $rec_cpu_req,
+        change_percent: $cpu_req_pct
+      },
+      cpu_limit: {
+        current_milli: $cur_cpu_lim,
+        recommended_milli: $rec_cpu_lim,
+        change_percent: $cpu_lim_pct
+      },
+      memory_request: {
+        current_mi: $cur_mem_req,
+        recommended_mi: $rec_mem_req,
+        change_percent: $mem_req_pct
+      },
+      memory_limit: {
+        current_mi: $cur_mem_lim,
+        recommended_mi: $rec_mem_lim,
+        change_percent: $mem_lim_pct
+      },
+      throttle_ratio: $m.throttle_ratio,
+      oom_kills: $m.oom_kills,
+      advisory_only: ($advisory == "true")
+    }
+  '
+}
+
+###############################################################################
+# Report Generation
+###############################################################################
+
+format_cpu() {
+  # Convert millicores to display string
+  local milli="$1"
+  if [ "$milli" -ge 1000 ]; then
+    echo "${milli}m"
+  else
+    echo "${milli}m"
+  fi
+}
+
+format_mem() {
+  # Convert MiB to display string
+  local mi="$1"
+  if [ "$mi" -ge 1024 ]; then
+    local gi
+    gi=$(echo "$mi" | jq -r '. / 1024 | . * 10 | floor / 10')
+    echo "${gi}Gi"
+  else
+    echo "${mi}Mi"
+  fi
+}
+
+generate_markdown_report() {
+  local recommendations_file="$1"
+
+  local count
+  count=$(jq 'length' "$recommendations_file")
+
+  if [ "$count" -eq 0 ]; then
+    echo "# Rightsizing Report"
+    echo ""
+    echo "No workloads found or no recommendations to make."
+    return
+  fi
+
+  local advisory_flag
+  advisory_flag=$(jq -r '.[0].advisory_only' "$recommendations_file")
+
+  echo "# Rightsizing Report"
+  echo ""
+  echo "**Namespace:** ${NAMESPACE}"
+  echo "**Mode:** ${MODE}"
+  echo "**Metrics source:** $(jq -r '.[0].metrics_source' "$recommendations_file")"
+  echo "**Lookback:** ${LOOKBACK}"
+  if [ "$advisory_flag" = "true" ]; then
+    echo ""
+    echo "> **Advisory only** — metrics-server provides point-in-time data only. Use Prometheus for production rightsizing."
+  fi
+  echo ""
+  echo "| Workload | Container | Resource | Current | Recommended | Change | Classification |"
+  echo "|----------|-----------|----------|---------|-------------|--------|----------------|"
+
+  jq -r '.[] | select(.has_recommendation == true) |
+    "\(.workload_kind)/\(.workload_name)|\(.container)|\(.cpu_request.current_milli)|\(.cpu_request.recommended_milli)|\(.cpu_request.change_percent)|\(.cpu_limit.current_milli)|\(.cpu_limit.recommended_milli)|\(.cpu_limit.change_percent)|\(.memory_request.current_mi)|\(.memory_request.recommended_mi)|\(.memory_request.change_percent)|\(.memory_limit.current_mi)|\(.memory_limit.recommended_mi)|\(.memory_limit.change_percent)|\(.classification)"
+  ' "$recommendations_file" | while IFS='|' read -r wl ctr cur_cr rec_cr pct_cr cur_cl rec_cl pct_cl cur_mr rec_mr pct_mr cur_ml rec_ml pct_ml cls; do
+    echo "| ${wl} | ${ctr} | CPU req | $(format_cpu "$cur_cr") | $(format_cpu "$rec_cr") | ${pct_cr}% | ${cls} |"
+    echo "| ${wl} | ${ctr} | CPU lim | $(format_cpu "$cur_cl") | $(format_cpu "$rec_cl") | ${pct_cl}% | ${cls} |"
+    echo "| ${wl} | ${ctr} | Mem req | $(format_mem "$cur_mr") | $(format_mem "$rec_mr") | ${pct_mr}% | ${cls} |"
+    echo "| ${wl} | ${ctr} | Mem lim | $(format_mem "$cur_ml") | $(format_mem "$rec_ml") | ${pct_ml}% | ${cls} |"
+  done
+
+  # Summary of right-sized / insufficient-data
+  local right_sized insufficient
+  right_sized=$(jq '[.[] | select(.classification == "right-sized")] | length' "$recommendations_file")
+  insufficient=$(jq '[.[] | select(.classification == "insufficient-data")] | length' "$recommendations_file")
+  if [ "$right_sized" -gt 0 ] || [ "$insufficient" -gt 0 ]; then
+    echo ""
+    echo "**Summary:**"
+    [ "$right_sized" -gt 0 ] && echo "- ${right_sized} container(s) are right-sized (no changes needed)"
+    [ "$insufficient" -gt 0 ] && echo "- ${insufficient} container(s) have insufficient data for recommendations"
+  fi
+}
+
+generate_json_report() {
+  local recommendations_file="$1"
+
+  jq '[.[] | {
+    workload: "\(.workload_kind)/\(.workload_name)",
+    container: .container,
+    classification: .classification,
+    advisory_only: .advisory_only,
+    cpu_request: {
+      current: "\(.cpu_request.current_milli)m",
+      recommended: "\(.cpu_request.recommended_milli)m",
+      change_percent: .cpu_request.change_percent
+    },
+    cpu_limit: {
+      current: "\(.cpu_limit.current_milli)m",
+      recommended: "\(.cpu_limit.recommended_milli)m",
+      change_percent: .cpu_limit.change_percent
+    },
+    memory_request: {
+      current: "\(.memory_request.current_mi)Mi",
+      recommended: "\(.memory_request.recommended_mi)Mi",
+      change_percent: .memory_request.change_percent
+    },
+    memory_limit: {
+      current: "\(.memory_limit.current_mi)Mi",
+      recommended: "\(.memory_limit.recommended_mi)Mi",
+      change_percent: .memory_limit.change_percent
+    },
+    throttle_ratio: .throttle_ratio,
+    oom_kills: .oom_kills
+  }]' "$recommendations_file"
+}
+
+###############################################################################
+# Patch Generation
+###############################################################################
+
+generate_patches() {
+  local recommendations_file="$1"
+  local output_dir="$2"
+
+  jq -c '.[] | select(.has_recommendation == true)' "$recommendations_file" | while IFS= read -r rec; do
+    local wl_kind wl_name ns ctr
+    wl_kind=$(echo "$rec" | jq -r '.workload_kind')
+    wl_name=$(echo "$rec" | jq -r '.workload_name')
+    ns=$(echo "$rec" | jq -r '.namespace')
+    ctr=$(echo "$rec" | jq -r '.container')
+
+    local patch_file="${output_dir}/patch-${wl_kind}-${wl_name}.json"
+
+    # Build strategic merge patch via jq
+    local rec_cpu_req rec_cpu_lim rec_mem_req rec_mem_lim
+    rec_cpu_req=$(echo "$rec" | jq -r '.cpu_request.recommended_milli')
+    rec_cpu_lim=$(echo "$rec" | jq -r '.cpu_limit.recommended_milli')
+    rec_mem_req=$(echo "$rec" | jq -r '.memory_request.recommended_mi')
+    rec_mem_lim=$(echo "$rec" | jq -r '.memory_limit.recommended_mi')
+
+    # If patch file already exists (multiple containers), merge
+    if [ -f "$patch_file" ]; then
+      local existing
+      existing=$(cat "$patch_file")
+      echo "$existing" | jq --arg ctr "$ctr" \
+        --arg cpu_req "${rec_cpu_req}m" \
+        --arg cpu_lim "${rec_cpu_lim}m" \
+        --arg mem_req "${rec_mem_req}Mi" \
+        --arg mem_lim "${rec_mem_lim}Mi" '
+        .spec.template.spec.containers += [{
+          name: $ctr,
+          resources: {
+            requests: {cpu: $cpu_req, memory: $mem_req},
+            limits: {cpu: $cpu_lim, memory: $mem_lim}
+          }
+        }]
+      ' > "$patch_file"
+    else
+      jq -n --arg ctr "$ctr" \
+        --arg cpu_req "${rec_cpu_req}m" \
+        --arg cpu_lim "${rec_cpu_lim}m" \
+        --arg mem_req "${rec_mem_req}Mi" \
+        --arg mem_lim "${rec_mem_lim}Mi" '{
+          spec: {
+            template: {
+              spec: {
+                containers: [{
+                  name: $ctr,
+                  resources: {
+                    requests: {cpu: $cpu_req, memory: $mem_req},
+                    limits: {cpu: $cpu_lim, memory: $mem_lim}
+                  }
+                }]
+              }
+            }
+          }
+        }' > "$patch_file"
+    fi
+  done
+}
+
+generate_rollback() {
+  local ns="$1"
+  local recommendations_file="$2"
+  local rollback_dir="$3"
+
+  mkdir -p "$rollback_dir"
+
+  local timestamp
+  timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
+  echo "[$timestamp] Rollback bundle created" > "${rollback_dir}/run.log"
+
+  # Backup current specs for each unique workload
+  jq -r '.[] | select(.has_recommendation == true) | "\(.workload_kind)/\(.workload_name)"' "$recommendations_file" | sort -u | while IFS= read -r wl_ref; do
+    local kind name
+    kind=$(echo "$wl_ref" | cut -d'/' -f1)
+    name=$(echo "$wl_ref" | cut -d'/' -f2)
+
+    local backup_file="${rollback_dir}/backup-${kind}-${name}.json"
+    kubectl get "$kind" "$name" -n "$ns" -o json | jq '{
+      spec: {
+        template: {
+          spec: {
+            containers: [.spec.template.spec.containers[] | {
+              name: .name,
+              resources: .resources
+            }]
+          }
+        }
+      }
+    }' > "$backup_file"
+
+    # Generate rollback script
+    local rollback_script="${rollback_dir}/rollback-${kind}-${name}.sh"
+    local patch_content
+    patch_content=$(cat "$backup_file")
+    jq -n --arg kind "$kind" --arg name "$name" --arg ns "$ns" \
+      --arg patch "$patch_content" '{
+        command: "kubectl patch \($kind) \($name) -n \($ns) --type=strategic -p",
+        patch: $patch
+      }' > /dev/null  # Validate the jq works
+
+    # Write rollback script without variable interpolation in the heredoc
+    {
+      echo '#!/usr/bin/env bash'
+      echo 'set -euo pipefail'
+      echo "kubectl patch ${kind} ${name} -n ${ns} --type=strategic -p '${patch_content}'"
+    } > "$rollback_script"
+    chmod +x "$rollback_script"
+
+    echo "[$timestamp] Backed up $kind/$name" >> "${rollback_dir}/run.log"
+  done
+}
+
+###############################################################################
+# Apply Mode
+###############################################################################
+
+apply_patches() {
+  local ns="$1"
+  local recommendations_file="$2"
+  local patch_dir="$3"
+  local rollback_dir="$4"
+
+  local timestamp
+  timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
+
+  jq -r '.[] | select(.has_recommendation == true) | "\(.workload_kind)/\(.workload_name)"' "$recommendations_file" | sort -u | while IFS= read -r wl_ref; do
+    local kind name
+    kind=$(echo "$wl_ref" | cut -d'/' -f1)
+    name=$(echo "$wl_ref" | cut -d'/' -f2)
+
+    local patch_file="${patch_dir}/patch-${kind}-${name}.json"
+    if [ ! -f "$patch_file" ]; then
+      echo "[$timestamp] SKIP: no patch file for $kind/$name" >> "${rollback_dir}/run.log"
+      continue
+    fi
+
+    local patch_content
+    patch_content=$(cat "$patch_file")
+
+    echo "[$timestamp] Applying patch to $kind/$name in $ns" >> "${rollback_dir}/run.log"
+
+    local apply_result
+    if ! apply_result=$(kubectl patch "$kind" "$name" -n "$ns" --type=strategic -p "$patch_content" 2>&1); then
+      echo "[$timestamp] FAILED: $kind/$name — $apply_result" >> "${rollback_dir}/run.log"
+      echo "{\"error\":\"failed to patch $kind/$name: $apply_result\"}" >&2
+      echo "Rollback available at: ${rollback_dir}/" >&2
+      exit 1
+    fi
+
+    echo "[$timestamp] SUCCESS: $apply_result" >> "${rollback_dir}/run.log"
+
+    # Verify rollout
+    if ! kubectl rollout status "$kind/$name" -n "$ns" --timeout=120s 2>&1; then
+      echo "[$timestamp] WARNING: rollout not yet complete for $kind/$name" >> "${rollback_dir}/run.log"
+    fi
+  done
+}
+
+###############################################################################
+# YAML Output (for plan/apply modes)
+###############################################################################
+
+generate_yaml_output() {
+  local patch_dir="$1"
+
+  for patch_file in "${patch_dir}"/patch-*.json; do
+    [ -f "$patch_file" ] || continue
+    local basename_
+    basename_=$(basename "$patch_file" .json)
+    # Extract kind and name from filename: patch-<kind>-<name>.json
+    local kind name
+    kind=$(echo "$basename_" | sed 's/^patch-//' | cut -d'-' -f1)
+    name=$(echo "$basename_" | sed 's/^patch-[^-]*-//')
+
+    echo "---"
+    # Convert patch JSON to YAML-like output via jq
+    jq -r --arg kind "$kind" --arg name "$name" --arg ns "$NAMESPACE" '
+      "apiVersion: apps/v1",
+      "kind: \($kind | gsub("deployment";"Deployment") | gsub("statefulset";"StatefulSet") | gsub("sts";"StatefulSet"))",
+      "metadata:",
+      "  name: \($name)",
+      "  namespace: \($ns)",
+      "spec:",
+      "  template:",
+      "    spec:",
+      "      containers:",
+      (.spec.template.spec.containers[] |
+        "        - name: \(.name)",
+        "          resources:",
+        "            requests:",
+        "              cpu: \"\(.resources.requests.cpu)\"",
+        "              memory: \"\(.resources.requests.memory)\"",
+        "            limits:",
+        "              cpu: \"\(.resources.limits.cpu)\"",
+        "              memory: \"\(.resources.limits.memory)\""
+      )
+    ' "$patch_file"
+  done
+}
+
+###############################################################################
+# Main Orchestration
+###############################################################################
+
+main() {
+  # Step 0: Preflight
+  preflight
+
+  # Load and validate policy
+  local policy_json eff_policy
+  policy_json=$(policy_load)
+
+  # Step 1: Discover workloads
+  local workloads_json containers_json
+  workloads_json=$(discover_workloads "$NAMESPACE" "$WORKLOAD" "$LABEL_SELECTOR")
+
+  local workload_count
+  workload_count=$(echo "$workloads_json" | jq '.items | length')
+  if [ "$workload_count" -eq 0 ]; then
+    echo '{"error":"no workloads found matching criteria"}' >&2
+    exit 1
+  fi
+
+  containers_json=$(extract_containers "$workloads_json")
+
+  local container_count
+  container_count=$(echo "$containers_json" | jq 'length')
+  if [ "$container_count" -eq 0 ]; then
+    echo '{"error":"no containers found in matched workloads"}' >&2
+    exit 1
+  fi
+
+  # Step 2: Collect metrics and compute recommendations
+  local recommendations="[]"
+
+  local i=0
+  while [ "$i" -lt "$container_count" ]; do
+    local container_info
+    container_info=$(echo "$containers_json" | jq ".[$i]")
+
+    local wl_kind wl_name ctr_name
+    wl_kind=$(echo "$container_info" | jq -r '.workload_kind')
+    wl_name=$(echo "$container_info" | jq -r '.workload_name')
+    ctr_name=$(echo "$container_info" | jq -r '.container')
+
+    # Resolve policy for this workload
+    local workload_key="${NAMESPACE}/${wl_name}"
+    eff_policy=$(resolve_policy "$policy_json" "$NAMESPACE" "$workload_key")
+    validate_policy "$eff_policy"
+
+    # Collect metrics
+    local metrics=""
+
+    # Try Prometheus first
+    if [ -n "${PROMETHEUS_URL:-}" ]; then
+      metrics=$(get_metrics_prom "$NAMESPACE" "$wl_name" "$ctr_name" "$LOOKBACK" 2>/dev/null || echo "")
+    fi
+
+    # Fallback to kubectl top
+    if [ -z "$metrics" ] || [ "$metrics" = "" ]; then
+      metrics=$(get_metrics_top "$NAMESPACE" "$wl_name" "$ctr_name" 2>/dev/null || echo "")
+    fi
+
+    if [ -z "$metrics" ] || [ "$metrics" = "" ]; then
+      # No metrics available — mark as insufficient data
+      metrics='{"p95_cpu_milli":0,"p95_memory_mi":0,"throttle_ratio":0,"oom_kills":0,"source":"none"}'
+    fi
+
+    # Step 3: Compute recommendation
+    local rec
+    rec=$(compute_recommendation "$container_info" "$metrics" "$eff_policy")
+    recommendations=$(echo "$recommendations" | jq --argjson r "$rec" '. + [$r]')
+
+    i=$((i + 1))
+  done
+
+  # Block apply mode with metrics-server fallback
+  if [ "$MODE" = "apply" ] && [ "$ADVISORY_ONLY" = "true" ]; then
+    echo '{"error":"apply mode is blocked when using metrics-server fallback (insufficient data fidelity). Use Prometheus for apply mode."}' >&2
+    exit 1
+  fi
+
+  # Save recommendations to temp file
+  local rec_file="${TMPDIR_WORK}/recommendations.json"
+  echo "$recommendations" > "$rec_file"
+
+  # Check if there are any actionable recommendations
+  local actionable_count
+  actionable_count=$(jq '[.[] | select(.has_recommendation == true)] | length' "$rec_file")
+
+  # Step 4: Generate output
+  case "$MODE" in
+    dry-run)
+      case "$OUTPUT_FORMAT" in
+        markdown) generate_markdown_report "$rec_file" ;;
+        json) generate_json_report "$rec_file" ;;
+        yaml)
+          echo "# YAML output is only available in plan or apply mode"
+          echo "# Showing markdown report instead"
+          echo ""
+          generate_markdown_report "$rec_file"
+          ;;
+      esac
+      ;;
+    plan)
+      if [ "$actionable_count" -eq 0 ]; then
+        echo "No actionable recommendations — all workloads are right-sized or have insufficient data."
+        exit 0
+      fi
+
+      local patch_dir="${TMPDIR_WORK}/patches"
+      mkdir -p "$patch_dir"
+      generate_patches "$rec_file" "$patch_dir"
+
+      case "$OUTPUT_FORMAT" in
+        markdown)
+          generate_markdown_report "$rec_file"
+          echo ""
+          echo "## Generated Patches"
+          echo ""
+          generate_yaml_output "$patch_dir"
+          ;;
+        json) generate_json_report "$rec_file" ;;
+        yaml) generate_yaml_output "$patch_dir" ;;
+      esac
+      ;;
+    apply)
+      if [ "$actionable_count" -eq 0 ]; then
+        echo "No actionable recommendations — all workloads are right-sized or have insufficient data."
+        exit 0
+      fi
+
+      local ts
+      ts=$(date -u +"%Y%m%dT%H%M%SZ")
+      local rollback_dir="rollback-${ts}"
+      local patch_dir="${TMPDIR_WORK}/patches"
+      mkdir -p "$patch_dir"
+
+      # Generate rollback bundle
+      generate_rollback "$NAMESPACE" "$rec_file" "$rollback_dir"
+
+      # Generate patches
+      generate_patches "$rec_file" "$patch_dir"
+
+      # Show report
+      generate_markdown_report "$rec_file"
+      echo ""
+      echo "## Applying Patches"
+      echo ""
+      echo "Rollback bundle: \`${rollback_dir}/\`"
+      echo ""
+
+      # Apply
+      apply_patches "$NAMESPACE" "$rec_file" "$patch_dir" "$rollback_dir"
+
+      echo ""
+      echo "Patches applied successfully. To rollback:"
+      echo ""
+      echo '```bash'
+      echo "ls ${rollback_dir}/rollback-*.sh"
+      echo '```'
+      ;;
+  esac
+}
+
+main
diff --git a/forge-skills/local/registry_embedded_test.go b/forge-skills/local/registry_embedded_test.go
index 91c8eff..1c6b006 100644
--- a/forge-skills/local/registry_embedded_test.go
+++ b/forge-skills/local/registry_embedded_test.go
@@ -16,12 +16,12 @@ func TestEmbeddedRegistry_DiscoverAll(t *testing.T) {
 		t.Fatalf("List error: %v", err)
 	}
 
-	if len(skills) != 10 {
+	if len(skills) != 11 {
 		names := make([]string, len(skills))
 		for i, s := range skills {
 			names[i] = s.Name
 		}
-		t.Fatalf("expected 10 skills, got %d: %v", len(skills), names)
+		t.Fatalf("expected 11 skills, got %d: %v", len(skills), names)
 	}
 
 	// Verify all expected skills are present
@@ -41,6 +41,7 @@ func TestEmbeddedRegistry_DiscoverAll(t *testing.T) {
 		"code-review-github":    {displayName: "Code Review Github", hasEnv: true, hasBins: true, hasEgress: true},
 		"codegen-react":         {displayName: "Codegen React", hasEnv: false, hasBins: true, hasEgress: true},
 		"codegen-html":          {displayName: "Codegen Html", hasEnv: false, hasBins: true, hasEgress: true},
+		"k8s-pod-rightsizer":    {displayName: "K8s Pod Rightsizer", hasEnv: false, hasBins: true, hasEgress: false},
 	}
 
 	for _, s := range skills {

From 9ac31704054eb9794e0d767d84ad143b73cc2655 Mon Sep 17 00:00:00 2001
From: MK <mk@initializ.io>
Date: Wed, 4 Mar 2026 04:09:41 -0500
Subject: [PATCH 2/5] feat: add distinct icons for all embedded skills
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Replace default cardboard box icon with skill-specific icons:
⚖️ k8s-pod-rightsizer, 🔬 tavily-research, 🔎 code-review,
📏 code-review-standards, ⚛️ codegen-react, 🌐 codegen-html
---
 forge-cli/internal/tui/steps/skills_step.go | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/forge-cli/internal/tui/steps/skills_step.go b/forge-cli/internal/tui/steps/skills_step.go
index ca0de1b..818f28f 100644
--- a/forge-cli/internal/tui/steps/skills_step.go
+++ b/forge-cli/internal/tui/steps/skills_step.go
@@ -413,11 +413,19 @@ func (s *SkillsStep) Apply(ctx *tui.WizardContext) {
 
 func skillIcon(name string) string {
 	icons := map[string]string{
-		"github":              "🐙",
-		"weather":             "🌤️",
-		"tavily-search":       "🔍",
-		"k8s-incident-triage": "☸️",
-		"k8s_incident_triage": "☸️",
+		"github":                "🐙",
+		"weather":               "🌤️",
+		"tavily-search":         "🔍",
+		"tavily-research":       "🔬",
+		"k8s-incident-triage":   "☸️",
+		"k8s_incident_triage":   "☸️",
+		"k8s-pod-rightsizer":    "⚖️",
+		"k8s_pod_rightsizer":    "⚖️",
+		"code-review":           "🔎",
+		"code-review-standards": "📏",
+		"code-review-github":    "🐙",
+		"codegen-react":         "⚛️",
+		"codegen-html":          "🌐",
 	}
 	if icon, ok := icons[name]; ok {
 		return icon

From cc71418a57f3236167681bfcde4c81dd1f35f481 Mon Sep 17 00:00:00 2001
From: MK <mk@initializ.io>
Date: Wed, 4 Mar 2026 04:38:25 -0500
Subject: [PATCH 3/5] refactor: move skill icons from hardcoded map to SKILL.md
 frontmatter
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add `icon` field to SkillMetadata and SkillDescriptor, flowing through
the parser → scanner → registry → TUI pipeline. Icons are now declared
in each skill's SKILL.md frontmatter (e.g. `icon: ⚖️`) and automatically
picked up via go:embed — no TUI code changes needed when adding skills.

The hardcoded skillIcon() map is replaced with a simple fallback that
returns 📦 for skills missing the field. A test ensures all embedded
skills declare an icon.
---
 forge-cli/cmd/init.go                         |  1 +
 forge-cli/internal/tui/steps/skills_step.go   | 29 ++++++-------------
 forge-skills/contract/types.go                |  2 ++
 .../local/embedded/_template/SKILL.md         |  1 +
 .../embedded/code-review-github/SKILL.md      |  1 +
 .../embedded/code-review-standards/SKILL.md   |  1 +
 .../local/embedded/code-review/SKILL.md       |  1 +
 .../local/embedded/codegen-html/SKILL.md      |  1 +
 .../local/embedded/codegen-react/SKILL.md     |  1 +
 forge-skills/local/embedded/github/SKILL.md   |  1 +
 .../embedded/k8s-incident-triage/SKILL.md     |  1 +
 .../embedded/k8s-pod-rightsizer/SKILL.md      |  1 +
 .../local/embedded/tavily-research/SKILL.md   |  1 +
 .../local/embedded/tavily-search/SKILL.md     |  1 +
 forge-skills/local/embedded/weather/SKILL.md  |  1 +
 forge-skills/local/registry_embedded_test.go  | 21 ++++++++++++++
 forge-skills/local/scanner.go                 |  1 +
 17 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/forge-cli/cmd/init.go b/forge-cli/cmd/init.go
index 313fb25..2fd67bd 100644
--- a/forge-cli/cmd/init.go
+++ b/forge-cli/cmd/init.go
@@ -207,6 +207,7 @@ func collectInteractive(opts *initOptions) error {
 					Name:          s.Name,
 					DisplayName:   s.DisplayName,
 					Description:   s.Description,
+					Icon:          s.Icon,
 					RequiredEnv:   s.RequiredEnv,
 					OneOfEnv:      s.OneOfEnv,
 					OptionalEnv:   s.OptionalEnv,
diff --git a/forge-cli/internal/tui/steps/skills_step.go b/forge-cli/internal/tui/steps/skills_step.go
index 818f28f..a6ef329 100644
--- a/forge-cli/internal/tui/steps/skills_step.go
+++ b/forge-cli/internal/tui/steps/skills_step.go
@@ -16,6 +16,7 @@ type SkillInfo struct {
 	Name          string
 	DisplayName   string
 	Description   string
+	Icon          string
 	RequiredEnv   []string
 	OneOfEnv      []string
 	OptionalEnv   []string
@@ -70,7 +71,10 @@ func NewSkillsStep(styles *tui.StyleSet, skills []SkillInfo) *SkillsStep {
 
 	var items []components.MultiSelectItem
 	for _, sk := range skills {
-		icon := skillIcon(sk.Name)
+		icon := sk.Icon
+		if icon == "" {
+			icon = skillIcon(sk.Name)
+		}
 		var reqs []string
 		if len(sk.RequiredBins) > 0 {
 			reqs = append(reqs, "bins: "+strings.Join(sk.RequiredBins, ", "))
@@ -411,24 +415,9 @@ func (s *SkillsStep) Apply(ctx *tui.WizardContext) {
 	}
 }
 
-func skillIcon(name string) string {
-	icons := map[string]string{
-		"github":                "🐙",
-		"weather":               "🌤️",
-		"tavily-search":         "🔍",
-		"tavily-research":       "🔬",
-		"k8s-incident-triage":   "☸️",
-		"k8s_incident_triage":   "☸️",
-		"k8s-pod-rightsizer":    "⚖️",
-		"k8s_pod_rightsizer":    "⚖️",
-		"code-review":           "🔎",
-		"code-review-standards": "📏",
-		"code-review-github":    "🐙",
-		"codegen-react":         "⚛️",
-		"codegen-html":          "🌐",
-	}
-	if icon, ok := icons[name]; ok {
-		return icon
-	}
+// skillIcon returns a default icon for skills that don't declare one
+// in their SKILL.md frontmatter. Prefer adding "icon:" to frontmatter
+// instead of extending this function.
+func skillIcon(_ string) string {
 	return "📦"
 }
diff --git a/forge-skills/contract/types.go b/forge-skills/contract/types.go
index 740f688..ea1597d 100644
--- a/forge-skills/contract/types.go
+++ b/forge-skills/contract/types.go
@@ -7,6 +7,7 @@ type SkillDescriptor struct {
 	Description   string
 	Category      string
 	Tags          []string
+	Icon          string
 	RequiredEnv   []string
 	OneOfEnv      []string
 	OptionalEnv   []string
@@ -36,6 +37,7 @@ type SkillMetadata struct {
 	Description string                    `yaml:"description,omitempty"`
 	Category    string                    `yaml:"category,omitempty"`
 	Tags        []string                  `yaml:"tags,omitempty"`
+	Icon        string                    `yaml:"icon,omitempty"`
 	Metadata    map[string]map[string]any `yaml:"metadata,omitempty"`
 }
 
diff --git a/forge-skills/local/embedded/_template/SKILL.md b/forge-skills/local/embedded/_template/SKILL.md
index ff449f1..de1744b 100644
--- a/forge-skills/local/embedded/_template/SKILL.md
+++ b/forge-skills/local/embedded/_template/SKILL.md
@@ -1,5 +1,6 @@
 ---
 name: my-skill
+# icon: 🔧                              # Optional: emoji shown in TUI skill picker
 # category: ops                        # Optional: sre, research, ops, dev, security, etc.
 # tags:                                 # Optional: discovery keywords
 #   - example
diff --git a/forge-skills/local/embedded/code-review-github/SKILL.md b/forge-skills/local/embedded/code-review-github/SKILL.md
index e920a3a..2ac28c5 100644
--- a/forge-skills/local/embedded/code-review-github/SKILL.md
+++ b/forge-skills/local/embedded/code-review-github/SKILL.md
@@ -1,5 +1,6 @@
 ---
 name: code-review-github
+icon: 🐙
 category: developer
 tags:
   - code-review
diff --git a/forge-skills/local/embedded/code-review-standards/SKILL.md b/forge-skills/local/embedded/code-review-standards/SKILL.md
index 2f68fb8..3937b57 100644
--- a/forge-skills/local/embedded/code-review-standards/SKILL.md
+++ b/forge-skills/local/embedded/code-review-standards/SKILL.md
@@ -1,5 +1,6 @@
 ---
 name: code-review-standards
+icon: 📏
 category: developer
 tags:
   - code-review
diff --git a/forge-skills/local/embedded/code-review/SKILL.md b/forge-skills/local/embedded/code-review/SKILL.md
index 6bab179..4b30c7d 100644
--- a/forge-skills/local/embedded/code-review/SKILL.md
+++ b/forge-skills/local/embedded/code-review/SKILL.md
@@ -1,5 +1,6 @@
 ---
 name: code-review
+icon: 🔎
 category: developer
 tags:
   - code-review
diff --git a/forge-skills/local/embedded/codegen-html/SKILL.md b/forge-skills/local/embedded/codegen-html/SKILL.md
index 4d8b8b4..5735a9a 100644
--- a/forge-skills/local/embedded/codegen-html/SKILL.md
+++ b/forge-skills/local/embedded/codegen-html/SKILL.md
@@ -1,5 +1,6 @@
 ---
 name: codegen-html
+icon: 🌐
 category: developer
 tags:
   - code-generation
diff --git a/forge-skills/local/embedded/codegen-react/SKILL.md b/forge-skills/local/embedded/codegen-react/SKILL.md
index 68e8bf6..b16f54b 100644
--- a/forge-skills/local/embedded/codegen-react/SKILL.md
+++ b/forge-skills/local/embedded/codegen-react/SKILL.md
@@ -1,5 +1,6 @@
 ---
 name: codegen-react
+icon: ⚛️
 category: developer
 tags:
   - code-generation
diff --git a/forge-skills/local/embedded/github/SKILL.md b/forge-skills/local/embedded/github/SKILL.md
index 2e2f8f6..a2f556d 100644
--- a/forge-skills/local/embedded/github/SKILL.md
+++ b/forge-skills/local/embedded/github/SKILL.md
@@ -1,5 +1,6 @@
 ---
 name: github
+icon: 🐙
 description: Create issues, PRs, and query repositories
 metadata:
   forge:
diff --git a/forge-skills/local/embedded/k8s-incident-triage/SKILL.md b/forge-skills/local/embedded/k8s-incident-triage/SKILL.md
index ae81312..bc12a62 100644
--- a/forge-skills/local/embedded/k8s-incident-triage/SKILL.md
+++ b/forge-skills/local/embedded/k8s-incident-triage/SKILL.md
@@ -1,5 +1,6 @@
 ---
 name: k8s-incident-triage
+icon: ☸️
 category: sre
 tags:
   - kubernetes
diff --git a/forge-skills/local/embedded/k8s-pod-rightsizer/SKILL.md b/forge-skills/local/embedded/k8s-pod-rightsizer/SKILL.md
index 4f5a9dc..ebde5ab 100644
--- a/forge-skills/local/embedded/k8s-pod-rightsizer/SKILL.md
+++ b/forge-skills/local/embedded/k8s-pod-rightsizer/SKILL.md
@@ -1,5 +1,6 @@
 ---
 name: k8s-pod-rightsizer
+icon: ⚖️
 category: sre
 tags:
   - kubernetes
diff --git a/forge-skills/local/embedded/tavily-research/SKILL.md b/forge-skills/local/embedded/tavily-research/SKILL.md
index 9adcbc9..c5780f1 100644
--- a/forge-skills/local/embedded/tavily-research/SKILL.md
+++ b/forge-skills/local/embedded/tavily-research/SKILL.md
@@ -1,5 +1,6 @@
 ---
 name: tavily-research
+icon: 🔬
 description: Deep multi-source research using Tavily Research API
 metadata:
   forge:
diff --git a/forge-skills/local/embedded/tavily-search/SKILL.md b/forge-skills/local/embedded/tavily-search/SKILL.md
index 9ca9813..222b25c 100644
--- a/forge-skills/local/embedded/tavily-search/SKILL.md
+++ b/forge-skills/local/embedded/tavily-search/SKILL.md
@@ -1,5 +1,6 @@
 ---
 name: tavily-search
+icon: 🔍
 description: Search the web using Tavily AI search API
 metadata:
   forge:
diff --git a/forge-skills/local/embedded/weather/SKILL.md b/forge-skills/local/embedded/weather/SKILL.md
index 786ac21..926590a 100644
--- a/forge-skills/local/embedded/weather/SKILL.md
+++ b/forge-skills/local/embedded/weather/SKILL.md
@@ -1,5 +1,6 @@
 ---
 name: weather
+icon: 🌤️
 description: Get current weather and forecasts
 metadata:
   forge:
diff --git a/forge-skills/local/registry_embedded_test.go b/forge-skills/local/registry_embedded_test.go
index 1c6b006..71bcfb4 100644
--- a/forge-skills/local/registry_embedded_test.go
+++ b/forge-skills/local/registry_embedded_test.go
@@ -81,6 +81,9 @@ func TestEmbeddedRegistry_GitHubDetails(t *testing.T) {
 	if s.Description != "Create issues, PRs, and query repositories" {
 		t.Errorf("Description = %q", s.Description)
 	}
+	if s.Icon != "🐙" {
+		t.Errorf("Icon = %q, want 🐙", s.Icon)
+	}
 	if len(s.RequiredEnv) != 1 || s.RequiredEnv[0] != "GH_TOKEN" {
 		t.Errorf("RequiredEnv = %v", s.RequiredEnv)
 	}
@@ -190,6 +193,24 @@ func TestEmbeddedRegistry_TavilyResearchDetails(t *testing.T) {
 	}
 }
 
+func TestEmbeddedRegistry_AllSkillsHaveIcons(t *testing.T) {
+	reg, err := NewEmbeddedRegistry()
+	if err != nil {
+		t.Fatalf("NewEmbeddedRegistry error: %v", err)
+	}
+
+	skills, err := reg.List()
+	if err != nil {
+		t.Fatalf("List error: %v", err)
+	}
+
+	for _, s := range skills {
+		if s.Icon == "" {
+			t.Errorf("skill %q has no icon — add 'icon:' to its SKILL.md frontmatter", s.Name)
+		}
+	}
+}
+
 func TestEmbeddedRegistry_LoadContent(t *testing.T) {
 	reg, err := NewEmbeddedRegistry()
 	if err != nil {
diff --git a/forge-skills/local/scanner.go b/forge-skills/local/scanner.go
index 7c3355e..738dfe1 100644
--- a/forge-skills/local/scanner.go
+++ b/forge-skills/local/scanner.go
@@ -73,6 +73,7 @@ func Scan(fsys fs.FS) ([]contract.SkillDescriptor, error) {
 
 			sd.Category = meta.Category
 			sd.Tags = meta.Tags
+			sd.Icon = meta.Icon
 
 			// Extract forge-specific fields
 			if meta.Metadata != nil {

From bd1d3440721b89d63869f5b8a4cac62284dc8bf5 Mon Sep 17 00:00:00 2001
From: MK <mk@initializ.io>
Date: Wed, 4 Mar 2026 04:50:54 -0500
Subject: [PATCH 4/5] fix: add missing category and tags to all embedded skills

- github: category=developer, tags: github, issues, pull-requests, repositories
- tavily-research: category=research, tags: research, web-search, tavily, analysis
- tavily-search: category=research, tags: web-search, tavily, search
- weather: category=utilities, tags: weather, forecast, api

Add test to enforce all embedded skills declare category and tags.
---
 forge-skills/local/embedded/github/SKILL.md   |  6 ++++++
 .../local/embedded/tavily-research/SKILL.md   |  6 ++++++
 .../local/embedded/tavily-search/SKILL.md     |  5 +++++
 forge-skills/local/embedded/weather/SKILL.md  |  5 +++++
 forge-skills/local/registry_embedded_test.go  | 21 +++++++++++++++++++
 5 files changed, 43 insertions(+)

diff --git a/forge-skills/local/embedded/github/SKILL.md b/forge-skills/local/embedded/github/SKILL.md
index a2f556d..7dd0fed 100644
--- a/forge-skills/local/embedded/github/SKILL.md
+++ b/forge-skills/local/embedded/github/SKILL.md
@@ -1,6 +1,12 @@
 ---
 name: github
 icon: 🐙
+category: developer
+tags:
+  - github
+  - issues
+  - pull-requests
+  - repositories
 description: Create issues, PRs, and query repositories
 metadata:
   forge:
diff --git a/forge-skills/local/embedded/tavily-research/SKILL.md b/forge-skills/local/embedded/tavily-research/SKILL.md
index c5780f1..d0b0f76 100644
--- a/forge-skills/local/embedded/tavily-research/SKILL.md
+++ b/forge-skills/local/embedded/tavily-research/SKILL.md
@@ -1,6 +1,12 @@
 ---
 name: tavily-research
 icon: 🔬
+category: research
+tags:
+  - research
+  - web-search
+  - tavily
+  - analysis
 description: Deep multi-source research using Tavily Research API
 metadata:
   forge:
diff --git a/forge-skills/local/embedded/tavily-search/SKILL.md b/forge-skills/local/embedded/tavily-search/SKILL.md
index 222b25c..2f45ad5 100644
--- a/forge-skills/local/embedded/tavily-search/SKILL.md
+++ b/forge-skills/local/embedded/tavily-search/SKILL.md
@@ -1,6 +1,11 @@
 ---
 name: tavily-search
 icon: 🔍
+category: research
+tags:
+  - web-search
+  - tavily
+  - search
 description: Search the web using Tavily AI search API
 metadata:
   forge:
diff --git a/forge-skills/local/embedded/weather/SKILL.md b/forge-skills/local/embedded/weather/SKILL.md
index 926590a..d634df2 100644
--- a/forge-skills/local/embedded/weather/SKILL.md
+++ b/forge-skills/local/embedded/weather/SKILL.md
@@ -1,6 +1,11 @@
 ---
 name: weather
 icon: 🌤️
+category: utilities
+tags:
+  - weather
+  - forecast
+  - api
 description: Get current weather and forecasts
 metadata:
   forge:
diff --git a/forge-skills/local/registry_embedded_test.go b/forge-skills/local/registry_embedded_test.go
index 71bcfb4..d0bda32 100644
--- a/forge-skills/local/registry_embedded_test.go
+++ b/forge-skills/local/registry_embedded_test.go
@@ -193,6 +193,27 @@ func TestEmbeddedRegistry_TavilyResearchDetails(t *testing.T) {
 	}
 }
 
+func TestEmbeddedRegistry_AllSkillsHaveCategoryAndTags(t *testing.T) {
+	reg, err := NewEmbeddedRegistry()
+	if err != nil {
+		t.Fatalf("NewEmbeddedRegistry error: %v", err)
+	}
+
+	skills, err := reg.List()
+	if err != nil {
+		t.Fatalf("List error: %v", err)
+	}
+
+	for _, s := range skills {
+		if s.Category == "" {
+			t.Errorf("skill %q has no category — add 'category:' to its SKILL.md frontmatter", s.Name)
+		}
+		if len(s.Tags) == 0 {
+			t.Errorf("skill %q has no tags — add 'tags:' to its SKILL.md frontmatter", s.Name)
+		}
+	}
+}
+
 func TestEmbeddedRegistry_AllSkillsHaveIcons(t *testing.T) {
 	reg, err := NewEmbeddedRegistry()
 	if err != nil {

From 14b50661ae03acd292066d91d996f4b868427e63 Mon Sep 17 00:00:00 2001
From: MK <mk@initializ.io>
Date: Wed, 4 Mar 2026 04:53:27 -0500
Subject: [PATCH 5/5] docs: update skills.md with icon, category, and tags
 requirements

- Add icon/category/tags to the SKILL.md format example
- Document all frontmatter fields in a table (icon, category, tags required)
- Add Icon column to Built-in Skills table, fill in all categories
- Fix parser path reference (forge-skills/parser/parser.go)
---
 docs/skills.md | 52 +++++++++++++++++++++++++++++++++-----------------
 1 file changed, 35 insertions(+), 17 deletions(-)

diff --git a/docs/skills.md b/docs/skills.md
index 48c0dbf..bbbc368 100644
--- a/docs/skills.md
+++ b/docs/skills.md
@@ -15,6 +15,12 @@ Skills are defined in a Markdown file (default: `SKILL.md`). The file supports o
 ```markdown
 ---
 name: weather
+icon: 🌤️
+category: utilities
+tags:
+  - weather
+  - forecast
+  - api
 description: Weather data skill
 metadata:
   forge:
@@ -45,13 +51,24 @@ Each `## Tool:` heading defines a tool the agent can call. The frontmatter decla
 
 ### YAML Frontmatter
 
-The `metadata.forge.requires` block declares:
+Top-level fields:
+
+| Field | Required | Description |
+|-------|----------|-------------|
+| `name` | yes | Skill identifier (kebab-case) |
+| `icon` | yes | Emoji displayed in the TUI skill picker |
+| `category` | yes | Grouping for `forge skills list --category` (e.g., `sre`, `developer`, `research`, `utilities`) |
+| `tags` | yes | Discovery keywords for `forge skills list --tags` (kebab-case) |
+| `description` | yes | One-line summary |
+
+The `metadata.forge.requires` block declares runtime dependencies:
+
 - **`bins`** — Binary dependencies that must be in `$PATH` at runtime
 - **`env.required`** — Environment variables that must be set
 - **`env.one_of`** — At least one of these environment variables must be set
 - **`env.optional`** — Optional environment variables for extended functionality
 
-Frontmatter is parsed by `ParseWithMetadata()` in `forge-core/skills/parser.go` and feeds into the compilation pipeline.
+Frontmatter is parsed by `ParseWithMetadata()` in `forge-skills/parser/parser.go` and feeds into the compilation pipeline.
 
 ### Legacy List Format
 
@@ -118,11 +135,12 @@ Skill scripts run in a restricted environment via `SkillCommandExecutor`:
 
 ## Skill Categories & Tags
 
-Skills can declare a `category` and `tags` in their frontmatter for organization and filtering:
+All embedded skills must declare `category`, `tags`, and `icon` in their frontmatter. Categories and tags must be lowercase kebab-case.
 
 ```markdown
 ---
 name: k8s-incident-triage
+icon: ☸️
 category: sre
 tags:
   - kubernetes
@@ -131,7 +149,7 @@ tags:
 ---
 ```
 
-Categories and tags must be lowercase kebab-case. Use them to filter skills:
+Use categories and tags to filter skills:
 
 ```bash
 # List skills by category
@@ -143,19 +161,19 @@ forge skills list --tags kubernetes,incident-response
 
 ## Built-in Skills
 
-| Skill | Category | Description | Scripts |
-|-------|----------|-------------|---------|
-| `github` | — | Create issues, PRs, and query repositories | — (binary-backed) |
-| `weather` | — | Get weather data for a location | — (binary-backed) |
-| `tavily-search` | — | Search the web using Tavily AI search API | `tavily-search.sh` |
-| `tavily-research` | — | Deep multi-source research via Tavily API | `tavily-research.sh`, `tavily-research-poll.sh` |
-| `k8s-incident-triage` | sre | Read-only Kubernetes incident triage using kubectl | — (binary-backed) |
-| `k8s-pod-rightsizer` | sre | Analyze workload metrics and produce CPU/memory rightsizing recommendations with optional apply | — (binary-backed) |
-| `code-review` | developer | AI-powered code review for diffs and files | `code-review-diff.sh`, `code-review-file.sh` |
-| `code-review-standards` | developer | Initialize and manage code review standards | — (template-based) |
-| `code-review-github` | developer | Post code review results to GitHub PRs | — (binary-backed) |
-| `codegen-react` | developer | Scaffold and iterate on Vite + React apps | `codegen-react-scaffold.sh`, `codegen-react-read.sh`, `codegen-react-write.sh`, `codegen-react-run.sh` |
-| `codegen-html` | developer | Scaffold standalone Preact + HTM apps (zero dependencies) | `codegen-html-scaffold.sh`, `codegen-html-read.sh`, `codegen-html-write.sh` |
+| Skill | Icon | Category | Description | Scripts |
+|-------|------|----------|-------------|---------|
+| `github` | 🐙 | developer | Create issues, PRs, and query repositories | — (binary-backed) |
+| `weather` | 🌤️ | utilities | Get weather data for a location | — (binary-backed) |
+| `tavily-search` | 🔍 | research | Search the web using Tavily AI search API | `tavily-search.sh` |
+| `tavily-research` | 🔬 | research | Deep multi-source research via Tavily API | `tavily-research.sh`, `tavily-research-poll.sh` |
+| `k8s-incident-triage` | ☸️ | sre | Read-only Kubernetes incident triage using kubectl | — (binary-backed) |
+| `k8s-pod-rightsizer` | ⚖️ | sre | Analyze workload metrics and produce CPU/memory rightsizing recommendations | — (binary-backed) |
+| `code-review` | 🔎 | developer | AI-powered code review for diffs and files | `code-review-diff.sh`, `code-review-file.sh` |
+| `code-review-standards` | 📏 | developer | Initialize and manage code review standards | — (template-based) |
+| `code-review-github` | 🐙 | developer | Post code review results to GitHub PRs | — (binary-backed) |
+| `codegen-react` | ⚛️ | developer | Scaffold and iterate on Vite + React apps | `codegen-react-scaffold.sh`, `codegen-react-read.sh`, `codegen-react-write.sh`, `codegen-react-run.sh` |
+| `codegen-html` | 🌐 | developer | Scaffold standalone Preact + HTM apps (zero dependencies) | `codegen-html-scaffold.sh`, `codegen-html-read.sh`, `codegen-html-write.sh` |
 
 ### Tavily Research Skill