Skip to content

chore: add a new scenario#7107

Merged
michaelneale merged 1 commit intomainfrom
micn/more-scenarios
Feb 10, 2026
Merged

chore: add a new scenario#7107
michaelneale merged 1 commit intomainfrom
micn/more-scenarios

Conversation

@michaelneale
Copy link
Collaborator

This adds another scenario to simulate a more complex feature removal

@michaelneale michaelneale marked this pull request as ready for review February 10, 2026 01:32
Copilot AI review requested due to automatic review settings February 10, 2026 01:32
Copy link
Collaborator

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Open Model Gym scenario to exercise “feature removal” edits and wires it into the Open Model Gym runner matrix, along with incidental npm lockfile updates.

Changes:

  • Added a new remove-feature scenario (Python codebase) that requires removing an insert_image feature end-to-end.
  • Updated config.yaml runners (rename goose-fullgoose, add goose-diet and pi-lean) and included the new scenario in the test matrix.
  • Updated package-lock metadata/flags in the Open Model Gym suite and MCP harness.

Reviewed changes

Copilot reviewed 2 out of 5 changed files in this pull request and generated 2 comments.

File Description
evals/open-model-gym/suite/scenarios/remove-feature.yaml New scenario definition with setup files + validation rules for full insert_image removal.
evals/open-model-gym/config.yaml Runner list adjustments and matrix includes the new scenario; updated runner name references.
evals/open-model-gym/suite/package-lock.json Lockfile metadata updated (package name aligns with package.json).
evals/open-model-gym/mcp-harness/package-lock.json Lockfile entries updated (adds peer: true flags for some packages).
Files not reviewed (2)
  • evals/open-model-gym/mcp-harness/package-lock.json: Language not supported
  • evals/open-model-gym/suite/package-lock.json: Language not supported

Comment on lines +58 to +61
- name: goose-diet
type: goose
bin: ~/Downloads/goose-diet
extensions: [developer]
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goose-diet sets bin: ~/Downloads/goose-diet, but the suite runner builds shell commands from this string and also runs which ${bin} for hashing, so ~ expansion isn't guaranteed and can cause the runner binary to be unresolvable on some shells/platforms; prefer an absolute path (or add explicit ~ expansion in the runner code) to avoid flaky execution.

Copilot uses AI. Check for mistakes.
extensions: [developer, todo, skills, code_execution, extensionmanager]
stdio:
- node mcp-harness/dist/index.js

Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runner rename from goose-full to goose makes the repository docs inconsistent (the README still documents goose-full), so users following the Quick Start/config examples will likely get “unknown runner” errors; either update the README in this PR or keep goose-full as an alias name in config.

Suggested change
- name: goose-full
type: goose
bin: goose
extensions: [developer, todo, skills, code_execution, extensionmanager]
stdio:
- node mcp-harness/dist/index.js

Copilot uses AI. Check for mistakes.
@michaelneale michaelneale added this pull request to the merge queue Feb 10, 2026
Merged via the queue into main with commit a484eee Feb 10, 2026
24 checks passed
@michaelneale michaelneale deleted the micn/more-scenarios branch February 10, 2026 02:07
michaelneale added a commit that referenced this pull request Feb 10, 2026
* main: (125 commits)
  chore: add a new scenario (#7107)
  fix: Goose Desktop missing Calendar and Reminders entitlements (#7100)
  Fix 'Edit In Place' and 'Fork Session' features (#6970)
  Fix: Only send command content to command injection classifier (excluding part of tool call dict) (#7082)
  Docs: require auth optional for custom providers (#7098)
  fix: improve text-muted contrast for better readability (#7095)
  Always sync bundled extensions (#7057)
  feat: Add tom (Top Of Mind) platform extension (#7073)
  chore(docs): update GOOSE_SESSION_ID -> AGENT_SESSION_ID (#6669)
  fix(ci): switch from cargo-audit to cargo-deny for advisory scanning (#7032)
  chore(deps): bump @isaacs/brace-expansion from 5.0.0 to 5.0.1 in /evals/open-model-gym/suite (#7085)
  chore(deps): bump @modelcontextprotocol/sdk from 1.25.3 to 1.26.0 in /evals/open-model-gym/mcp-harness (#7086)
  fix: switch to windows msvc (#7080)
  fix: allow unlisted models for CLI providers (#7090)
  Use goose port (#7089)
  chore: strip posthog for sessions/models/daily only (#7079)
  tidy: clean up old benchmark and add gym (#7081)
  fix: use command.process_group(0) for CLI providers, not just MCP (#7083)
  added build notify (#6891)
  test(mcp): add image tool test and consolidate MCP test fixtures (#7019)
  ...
zanesq added a commit that referenced this pull request Feb 10, 2026
…tensions-deeplinks

* 'main' of github.com:block/goose:
  [docs] update authors.yaml file (#7114)
  Implement manpage generation for goose-cli (#6980)
  docs: tool output optimization (#7109)
  Fix duplicated output in Code Mode by filtering content by audience (#7117)
  Enable tom (Top Of Mind) platform extension by default (#7111)
  chore: added notification for canary build failure (#7106)
  fix: fix windows bundle random failure and optimise canary build (#7105)
  feat(acp): add model selection support for session/new and session/set_model (#7112)
  fix: isolate claude-code sessions via stream-json session_id (#7108)
  ci: enable agentic provider live tests (claude-code, codex, gemini-cli) (#7088)
  docs: codex subscription support (#7104)
  chore: add a new scenario (#7107)
  fix: Goose Desktop missing Calendar and Reminders entitlements (#7100)
  Fix 'Edit In Place' and 'Fork Session' features (#6970)
  Fix: Only send command content to command injection classifier (excluding part of tool call dict) (#7082)
  Docs: require auth optional for custom providers (#7098)
  fix: improve text-muted contrast for better readability (#7095)
  Always sync bundled extensions (#7057)
tlongwell-block added a commit that referenced this pull request Feb 10, 2026
* origin/main:
  feat: add AGENT=goose environment variable for cross-tool compatibility (#7017)
  fix: strip empty extensions array when deeplink also (#7096)
  [docs] update authors.yaml file (#7114)
  Implement manpage generation for goose-cli (#6980)
  docs: tool output optimization (#7109)
  Fix duplicated output in Code Mode by filtering content by audience (#7117)
  Enable tom (Top Of Mind) platform extension by default (#7111)
  chore: added notification for canary build failure (#7106)
  fix: fix windows bundle random failure and optimise canary build (#7105)
  feat(acp): add model selection support for session/new and session/set_model (#7112)
  fix: isolate claude-code sessions via stream-json session_id (#7108)
  ci: enable agentic provider live tests (claude-code, codex, gemini-cli) (#7088)
  docs: codex subscription support (#7104)
  chore: add a new scenario (#7107)
  fix: Goose Desktop missing Calendar and Reminders entitlements (#7100)
  Fix 'Edit In Place' and 'Fork Session' features (#6970)
  Fix: Only send command content to command injection classifier (excluding part of tool call dict) (#7082)

# Conflicts:
#	crates/goose/src/agents/extension.rs
jh-block added a commit that referenced this pull request Feb 10, 2026
* origin/main: (30 commits)
  docs: GCP Vertex AI org policy filtering & update OnboardingProviderSetup component (#7125)
  feat: replace subagent and skills with unified summon extension (#6964)
  feat: add AGENT=goose environment variable for cross-tool compatibility (#7017)
  fix: strip empty extensions array when deeplink also (#7096)
  [docs] update authors.yaml file (#7114)
  Implement manpage generation for goose-cli (#6980)
  docs: tool output optimization (#7109)
  Fix duplicated output in Code Mode by filtering content by audience (#7117)
  Enable tom (Top Of Mind) platform extension by default (#7111)
  chore: added notification for canary build failure (#7106)
  fix: fix windows bundle random failure and optimise canary build (#7105)
  feat(acp): add model selection support for session/new and session/set_model (#7112)
  fix: isolate claude-code sessions via stream-json session_id (#7108)
  ci: enable agentic provider live tests (claude-code, codex, gemini-cli) (#7088)
  docs: codex subscription support (#7104)
  chore: add a new scenario (#7107)
  fix: Goose Desktop missing Calendar and Reminders entitlements (#7100)
  Fix 'Edit In Place' and 'Fork Session' features (#6970)
  Fix: Only send command content to command injection classifier (excluding part of tool call dict) (#7082)
  Docs: require auth optional for custom providers (#7098)
  ...
Tyler-Hardin pushed a commit to Tyler-Hardin/goose that referenced this pull request Feb 11, 2026
Tyler-Hardin pushed a commit to Tyler-Hardin/goose that referenced this pull request Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments