Skip to content

fix: replace eval() with safe AST-based math evaluator in AGENTS.md template#5058

Open
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1774411188-fix-eval-calculator-template
Open

fix: replace eval() with safe AST-based math evaluator in AGENTS.md template#5058
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1774411188-fix-eval-calculator-template

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Summary

Fixes #5056 — The Calculator tool example in the AGENTS.md template (shipped via crewai create) used eval() on unsanitized LLM input, creating a remote code execution vulnerability in every newly scaffolded project.

Replaced eval(expression) with an AST-walking evaluator that only permits arithmetic operators (+, -, *, /, **) and numeric literals (int, float). No new dependencies required.

Added 28 tests that extract the calculator source directly from the markdown template and verify:

  • Valid arithmetic expressions evaluate correctly (12 cases)
  • Malicious/unsafe inputs are rejected with ValueError or SyntaxError (14 cases)
  • The template no longer contains bare eval() (2 sanity checks)

Review & Testing Checklist for Human

  • Verify the AST evaluator is safe: Confirm the _safe_eval function's whitelist approach (only ast.Constant, ast.BinOp, ast.UnaryOp with explicit operator map) cannot be bypassed. This is the core security claim.
  • Evaluate template verbosity tradeoff: The example grew from 4 lines to ~25 lines. Consider whether this level of detail is appropriate for a @tool decorator example, or if a simpler safe alternative (or removing the calculator entirely) would be better for the template's purpose.
  • Check test extraction approach: Tests use regex to extract code from the markdown file and exec() it. This keeps tests in sync with the template but is fragile to markdown structural changes (heading renames, code block reformatting). Decide if this coupling is acceptable.
  • Run crewai create crew test_project and verify the generated AGENTS.md contains the new safe calculator example.

Notes

  • This only affects newly scaffolded projects. Existing projects that already copied the old template are not patched by this change.
  • The evaluator intentionally omits modulo (%) and floor division (//) — reasonable for a minimal example but worth noting.
  • ZeroDivisionError is not caught; it will bubble up naturally from operator.truediv, which is acceptable behavior.

Link to Devin session: https://app.devin.ai/sessions/a69dae035ed249adb723d4efc6aec062

…emplate

The Calculator tool example in the AGENTS.md template used eval() on
unsanitized LLM input, creating a remote code execution vulnerability
in every new CrewAI project.

Replace eval() with an AST-based evaluator that only supports arithmetic
operators (+, -, *, /, **) and numeric literals, preventing arbitrary
code execution while preserving calculator functionality.

Closes #5056

Co-Authored-By: João <joao@crewai.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Prompt hidden (unlisted session)

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

expressions and rejects malicious code injection attempts.
"""

import ast
"""

import ast
import operator
@HeadyZhang
Copy link
Copy Markdown

The AST-walking approach looks solid, whitelist-only on ast.Constant, ast.BinOp, ast.UnaryOp with an explicit operator map is the right pattern. The 28 test cases cover the critical injection vectors well.

Two minor notes:

  1. Worth adding a test for nested dangerous expressions like calculator("1 + int(open('/etc/passwd').read())") — the current evaluator should reject it (since ast.Call isn't in the whitelist), but an explicit test case documents the security boundary.

  2. As mentioned in the issue — existing projects that already copied the old template aren't patched by this. Might be worth a one-line note in the next release changelog so users on older scaffolds know to update their Calculator tool.

Nice turnaround on the fix. Thanks for the quick response.

Heady

@HeadyZhang
Copy link
Copy Markdown

@joaomdmoura Both PRs (#5058, #5059) have passing tests and review comments. Would appreciate a review when you get a chance. Also happy to discuss integrating agent-audit as a CI security check for incoming PRs if that's of interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security] crewai create ships template with eval() on unsanitized LLM input

1 participant