fix(integ): make calculator tool more robust to LLM output variations #1445

cagataycali · 2026-01-10T08:01:14Z

Description

Fixes a flaky integration test in test_structured_output_agent_loop.py where test_tool_use_with_structured_output would fail intermittently.

Root Cause

The LLM sometimes outputs '+' instead of 'add' as the operation string. The original calculator tool only accepted exact string matches like "add", "subtract", etc., causing the function to return 0 when the LLM used symbols.

Example of failure:

# LLM called: calculator(operation='+', a=2, b=2)
# Expected: 4
# Got: 0 (because '+' didn't match 'add')

This was observed in PR #1444 CI run: https://github.com/strands-agents/sdk-python/actions/runs/20875011420/job/59982925582?pr=1444

Changes

Accept both word and symbol forms for all operations:
- add / +
- subtract / - / sub
- multiply / * / mul
- divide / / / div
- power / ** / pow
Normalize input with lower() and strip()
Fix divide operation (was b/a, now a/b to match standard math notation)
Improve docstring with Args section

Related Issues

Related to PR fix(mcp): propagate contextvars to background thread #1444 (CI failure)

Documentation PR

No documentation changes required - this is a test fixture.

Type of Change

Bug fix (test reliability)

Testing

Manual testing confirms calculator now handles both 'add' and '+'
Change is minimal and isolated to test file

Checklist

I have read the CONTRIBUTING document
My changes generate no new warnings
This is a test-only change

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Created by strands-coder autonomous agent 🦆

The test_tool_use_with_structured_output test was flaky because the LLM sometimes uses '+' instead of 'add' as the operation string. The calculator tool now accepts both formats for all operations. Changes: - Accept both word and symbol forms: add/+, subtract/-, multiply/*, divide//, power/** - Also accept common abbreviations: sub, mul, div, pow - Normalize input with lower() and strip() - Fix divide operation (was b/a, now a/b) - Improve docstring with Args section This makes the integ tests more resilient to LLM output variations.

codecov · 2026-01-10T08:10:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

cagataycali · 2026-01-12T16:28:46Z

✅ Approved & Ready to Merge!

This PR has been approved by @dbschmigelski and all CI checks are passing! 🎉

Status Summary

✅ CI: All checks SUCCESS
✅ Review: APPROVED
✅ Mergeable: No conflicts
✅ Codecov: 100% coverage

What This Fixes

Makes the calculator integration test more robust by handling LLM output variations gracefully.

Impact

Test stability improvement
No production code changes
Low risk

Ready for merge when maintainers are available! 🦆

github-actions bot added the size/s label Jan 10, 2026

cagataycali temporarily deployed to auto-approve January 10, 2026 08:01 — with GitHub Actions Inactive

dbschmigelski approved these changes Jan 12, 2026

View reviewed changes

cagataycali mentioned this pull request Jan 12, 2026

fix: add concurrency protection to prevent parallel invocations from corrupting agent state #1453

Merged

7 tasks

yonib05 approved these changes Jan 12, 2026

View reviewed changes

yonib05 merged commit 3ffc327 into strands-agents:main Jan 12, 2026
15 checks passed

cagataycali deleted the fix/flaky-calculator-test branch January 12, 2026 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(integ): make calculator tool more robust to LLM output variations #1445

fix(integ): make calculator tool more robust to LLM output variations #1445

Uh oh!

cagataycali commented Jan 10, 2026

Uh oh!

codecov bot commented Jan 10, 2026

Uh oh!

cagataycali commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(integ): make calculator tool more robust to LLM output variations #1445

fix(integ): make calculator tool more robust to LLM output variations #1445

Uh oh!

Conversation

cagataycali commented Jan 10, 2026

Description

Root Cause

Changes

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

codecov bot commented Jan 10, 2026

Codecov Report

Uh oh!

cagataycali commented Jan 12, 2026

✅ Approved & Ready to Merge!

Status Summary

What This Fixes

Impact

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants