Skip to content

EU AI Act compliance scan results — Haystack scored #1, wanted to validate findings #10810

@shotwellj

Description

@shotwellj

Hey team — I build AIR Blackbox, an open-source EU AI Act compliance scanner for Python AI frameworks. I ran it against 6 major agent frameworks (Haystack, OpenAI Agents SDK, Semantic Kernel, GPT Researcher, Mem0, DSPy) and Haystack scored #1 overall.

I'm opening this issue because I want to validate whether our scanner's findings are accurate to how you've built things. Some of our pattern matching may produce false positives and I'd appreciate your input.

What the scanner found (highlights)

  • 245/552 files have Pydantic or dataclass validation (44%)
  • 143/552 files use structured logging (26% — highest of all frameworks)
  • 47 files have human-in-the-loop patterns
  • 41 files with retry/backoff logic
  • Docstrings at 29% (1% below our 30% threshold — easy fix)

Where I need your help validating

The scanner detected patterns in all 5 of our OAuth delegation checks, but static pattern matching has limitations. Specifically:

  1. user_id in telemetry and HITL strategies — is this intentional identity binding for tracking which user authorized agent actions, or is it primarily analytics/telemetry?

  2. scope in agent.py — is this controlling agent permissions and what it can access, or is it used for something else?

  3. max_age in agent config — is this time-bounding agent execution (which would be a form of revocation/expiry), or is it unrelated to token lifecycle?

  4. is_allowed in serialization.py — this looks like deserialization safety, not agent action boundaries. Probably a false positive on our end. Can you confirm?

  5. execution_log in test_tool_invoker.py — does Haystack log tool invocations in production, or is this only in tests?

Full results

24 passing · 10 warnings · 5 failing · 39 total checks
95% automated detection · EU AI Act Articles 9, 10, 11, 12, 14, 15

The 5 failures are all missing docs (RISK_ASSESSMENT.md, DATA_GOVERNANCE.md, etc.) and no vault configured — not code issues.

How to reproduce

pip install air-blackbox
air-blackbox comply --scan ./haystack -v

18 code-level checks + 5 OAuth delegation pattern checks. Apache 2.0, runs entirely local — no code leaves your machine.

Why I'm reaching out

I published a full report comparing all 6 frameworks. Before sharing it more broadly, I want to make sure the Haystack findings are accurate. If any of the pattern matches above are false positives, I'd rather fix the scanner than publish incorrect results.

Appreciate any feedback — it helps us improve the scanner for the whole ecosystem. Being a German company, EU AI Act compliance is probably already on your radar, and I thought these results might be useful regardless.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priority, add to the next sprint if no P1 available

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions