Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new Open Model Gym scenario to exercise “feature removal” edits and wires it into the Open Model Gym runner matrix, along with incidental npm lockfile updates.
Changes:
- Added a new
remove-featurescenario (Python codebase) that requires removing aninsert_imagefeature end-to-end. - Updated
config.yamlrunners (renamegoose-full→goose, addgoose-dietandpi-lean) and included the new scenario in the test matrix. - Updated package-lock metadata/flags in the Open Model Gym suite and MCP harness.
Reviewed changes
Copilot reviewed 2 out of 5 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| evals/open-model-gym/suite/scenarios/remove-feature.yaml | New scenario definition with setup files + validation rules for full insert_image removal. |
| evals/open-model-gym/config.yaml | Runner list adjustments and matrix includes the new scenario; updated runner name references. |
| evals/open-model-gym/suite/package-lock.json | Lockfile metadata updated (package name aligns with package.json). |
| evals/open-model-gym/mcp-harness/package-lock.json | Lockfile entries updated (adds peer: true flags for some packages). |
Files not reviewed (2)
- evals/open-model-gym/mcp-harness/package-lock.json: Language not supported
- evals/open-model-gym/suite/package-lock.json: Language not supported
| - name: goose-diet | ||
| type: goose | ||
| bin: ~/Downloads/goose-diet | ||
| extensions: [developer] |
There was a problem hiding this comment.
goose-diet sets bin: ~/Downloads/goose-diet, but the suite runner builds shell commands from this string and also runs which ${bin} for hashing, so ~ expansion isn't guaranteed and can cause the runner binary to be unresolvable on some shells/platforms; prefer an absolute path (or add explicit ~ expansion in the runner code) to avoid flaky execution.
| extensions: [developer, todo, skills, code_execution, extensionmanager] | ||
| stdio: | ||
| - node mcp-harness/dist/index.js | ||
|
|
There was a problem hiding this comment.
The runner rename from goose-full to goose makes the repository docs inconsistent (the README still documents goose-full), so users following the Quick Start/config examples will likely get “unknown runner” errors; either update the README in this PR or keep goose-full as an alias name in config.
| - name: goose-full | |
| type: goose | |
| bin: goose | |
| extensions: [developer, todo, skills, code_execution, extensionmanager] | |
| stdio: | |
| - node mcp-harness/dist/index.js |
* main: (125 commits) chore: add a new scenario (#7107) fix: Goose Desktop missing Calendar and Reminders entitlements (#7100) Fix 'Edit In Place' and 'Fork Session' features (#6970) Fix: Only send command content to command injection classifier (excluding part of tool call dict) (#7082) Docs: require auth optional for custom providers (#7098) fix: improve text-muted contrast for better readability (#7095) Always sync bundled extensions (#7057) feat: Add tom (Top Of Mind) platform extension (#7073) chore(docs): update GOOSE_SESSION_ID -> AGENT_SESSION_ID (#6669) fix(ci): switch from cargo-audit to cargo-deny for advisory scanning (#7032) chore(deps): bump @isaacs/brace-expansion from 5.0.0 to 5.0.1 in /evals/open-model-gym/suite (#7085) chore(deps): bump @modelcontextprotocol/sdk from 1.25.3 to 1.26.0 in /evals/open-model-gym/mcp-harness (#7086) fix: switch to windows msvc (#7080) fix: allow unlisted models for CLI providers (#7090) Use goose port (#7089) chore: strip posthog for sessions/models/daily only (#7079) tidy: clean up old benchmark and add gym (#7081) fix: use command.process_group(0) for CLI providers, not just MCP (#7083) added build notify (#6891) test(mcp): add image tool test and consolidate MCP test fixtures (#7019) ...
…tensions-deeplinks * 'main' of github.com:block/goose: [docs] update authors.yaml file (#7114) Implement manpage generation for goose-cli (#6980) docs: tool output optimization (#7109) Fix duplicated output in Code Mode by filtering content by audience (#7117) Enable tom (Top Of Mind) platform extension by default (#7111) chore: added notification for canary build failure (#7106) fix: fix windows bundle random failure and optimise canary build (#7105) feat(acp): add model selection support for session/new and session/set_model (#7112) fix: isolate claude-code sessions via stream-json session_id (#7108) ci: enable agentic provider live tests (claude-code, codex, gemini-cli) (#7088) docs: codex subscription support (#7104) chore: add a new scenario (#7107) fix: Goose Desktop missing Calendar and Reminders entitlements (#7100) Fix 'Edit In Place' and 'Fork Session' features (#6970) Fix: Only send command content to command injection classifier (excluding part of tool call dict) (#7082) Docs: require auth optional for custom providers (#7098) fix: improve text-muted contrast for better readability (#7095) Always sync bundled extensions (#7057)
* origin/main: feat: add AGENT=goose environment variable for cross-tool compatibility (#7017) fix: strip empty extensions array when deeplink also (#7096) [docs] update authors.yaml file (#7114) Implement manpage generation for goose-cli (#6980) docs: tool output optimization (#7109) Fix duplicated output in Code Mode by filtering content by audience (#7117) Enable tom (Top Of Mind) platform extension by default (#7111) chore: added notification for canary build failure (#7106) fix: fix windows bundle random failure and optimise canary build (#7105) feat(acp): add model selection support for session/new and session/set_model (#7112) fix: isolate claude-code sessions via stream-json session_id (#7108) ci: enable agentic provider live tests (claude-code, codex, gemini-cli) (#7088) docs: codex subscription support (#7104) chore: add a new scenario (#7107) fix: Goose Desktop missing Calendar and Reminders entitlements (#7100) Fix 'Edit In Place' and 'Fork Session' features (#6970) Fix: Only send command content to command injection classifier (excluding part of tool call dict) (#7082) # Conflicts: # crates/goose/src/agents/extension.rs
* origin/main: (30 commits) docs: GCP Vertex AI org policy filtering & update OnboardingProviderSetup component (#7125) feat: replace subagent and skills with unified summon extension (#6964) feat: add AGENT=goose environment variable for cross-tool compatibility (#7017) fix: strip empty extensions array when deeplink also (#7096) [docs] update authors.yaml file (#7114) Implement manpage generation for goose-cli (#6980) docs: tool output optimization (#7109) Fix duplicated output in Code Mode by filtering content by audience (#7117) Enable tom (Top Of Mind) platform extension by default (#7111) chore: added notification for canary build failure (#7106) fix: fix windows bundle random failure and optimise canary build (#7105) feat(acp): add model selection support for session/new and session/set_model (#7112) fix: isolate claude-code sessions via stream-json session_id (#7108) ci: enable agentic provider live tests (claude-code, codex, gemini-cli) (#7088) docs: codex subscription support (#7104) chore: add a new scenario (#7107) fix: Goose Desktop missing Calendar and Reminders entitlements (#7100) Fix 'Edit In Place' and 'Fork Session' features (#6970) Fix: Only send command content to command injection classifier (excluding part of tool call dict) (#7082) Docs: require auth optional for custom providers (#7098) ...
This adds another scenario to simulate a more complex feature removal