Agent Persona Exploration - 2026-03-22 #22224
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Agent Persona Explorer. A newer discussion is available at Discussion #22354. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Systematic evaluation of the
agentic-workflowscustom agent across 5 software worker personas and 8 representative automation scenarios. All scenarios were assessed against production-readiness criteria.Persona Overview
developer.instructions(agentic-workflows custom agent)Key Findings
steps:shell block to pre-download data before the agent session, correctly minimizing LLM round-tripsworkflow_runwas correctly chosen for CI-reaction scenarios vspull_requestfor PR-automation vsschedulefor periodic tasks — no trigger mismatches observedTop Patterns Observed
steps:to pre-fetch (logs, artifacts, PR data) into/tmp/before the agent runs. This is the dominant architectural pattern."gh api *","jq *") rather than["*"]. Strict mode compliance was never violated.defaultsordefaults + github. The Playwright scenario correctly addednodeandplaywrightdomains.noopearly-exit conditions (e.g., "no migration files changed", "no failed API endpoints found", "no PRs merged this week").hide-older-comments: true+max: 1— Universally applied on all PR comment outputs to prevent comment spam across re-runs.View High Quality Responses (All scored 5.0/5.0)
be-2: API Regression Triage — Standout for security reasoning. The agent unprompted identified that test log content is attacker-controlled and explicitly stated: "Never add web-fetch or web-search to this workflow. Test logs are attacker-controlled input." Also provided a failure-pattern correlation table (e.g.,
404 on existing route → route path changed or router registration removed).do-1: Deployment Incident Monitor — Exemplary for the
workflow_runpattern with a frontmatterif:guard preventing agent cost on successful deployments. Included a complete label creation script (gh label create) for all incident categories. Defined a "no-op path" to skip duplicate incidents.do-2: Weekly Cost Report — Most sophisticated response. Used
repo-memoryfor 8-week rolling history, dual-threshold anomaly detection (Z-score > 2.0 AND >40% WoW), and commented-out cloud provider API endpoints (AWS/GCP/Azure) for easy enablement without guessing hostnames.be-1: Schema Review — Bonus suggestions: use
cache-memoryto learn per-team patterns over time; create a companion slash-command workflow for on-demand re-reviews. Engine recommendation (claude) was justified with reasoning about nuanced migration analysis.View Areas for Improvement
No critical issues were found. Three minor opportunities identified:
Standard label catalog — Multiple scenarios independently defined similar incident/status labels (
incident,deployment-failure,breaking-change). A reusable label setup guide or standard label registry would reduce duplication across repos.Getting-started checklist standardization —
fe-1(Visual Regression) included a thorough prerequisites checklist; other scenarios had ad-hoc checklists. A standard checklist template (CI artifact format, label prereqs, discussion category setup) could be promoted as a default in the agent's response style.Discussion category resolution — Two scenarios (
pm-1,do-2) flaggedcategory: "General"as needing manual verification. The agent could proactively suggest thegh api repos/{owner}/{repo}/discussions/categoriescommand as a first step, or query it automatically via thegithubtoolset before generating the workflow.Scenario Scores
Recommendations
Document the pre-step pattern as a canonical best practice — The
steps:pre-download pattern appeared in 6/8 scenarios and is clearly the right architecture for data-heavy workflows. It deserves explicit documentation and a named pattern in the developer guide.Add explicit injection-safety guidance to scheduled/CI-reaction workflows — The agent correctly flagged injection risk in
be-2(test logs) but did not raise it indo-1(deployment logs contain the same risk). A checklist item reminding authors to omitweb-fetch/bashwhen the input is user-controlled would make this universal.Standardize
workflow_dispatchas default for allscheduleworkflows — Both scheduled scenarios includedworkflow_dispatchfor manual testing, but this is easy to forget. Making it a default recommendation (or a compiler warning when absent) would improve developer experience.References:
Beta Was this translation helpful? Give feedback.
All reactions