Skip to content

Orchestrator: Improve guardrail error experience for blocked tasks #20

@leandrodamascena

Description

@leandrodamascena

Component

API or orchestration

Describe the feature

When the Bedrock Guardrail blocks a task during hydration, the error is completely generic - "Task context blocked by content policy". The Bedrock response includes assessment details (filter type, confidence level) but screenWithGuardrail() discards all of it and just returns a string. The guardrail_blocked event and bgagent status only show the generic message. The user has no idea what triggered the block or how to work around it.

Use case

I submitted a task from a GitHub issue that was literally "create a uv scaffolder, project name xyz, version 0.0.1, license MIT" and it got blocked by the PROMPT_ATTACK filter at HIGH as a false positive. I had no way to understand what triggered it, no way to debug, and no way to move forward. I had to dig into the code to figure out what was going on.

Proposed solution

The API is authenticated and bgagent events <task-id> already exists, so there's a safe channel to surface this:

  1. Pipe the Bedrock assessment through - screenWithGuardrail() should return the filter type and confidence from the response, include that in the guardrail_blocked event metadata, so bgagent events shows something like "PROMPT_ATTACK filter triggered at HIGH confidence" instead of just "Task context blocked by content policy"
  2. Better error in bgagent status - something actionable, like suggesting the user resubmit with --task instead of --issue to bypass the issue content that triggered the filter

I saw that the roadmap (Iteration 5) already mentions per-repo guardrail configuration. Even when it lands, I think the assessment details would help users understand and work around false positives.

Other information

No response

Acknowledgements

  • I may be able to implement this feature
  • This might be a breaking change

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions