fix: Add duplicate response loop breaker to prevent infinite loops by gdeyoung · Pull Request #1265 · agent0ai/agent-zero

gdeyoung · 2026-03-15T22:04:33Z

Summary

This PR adds a duplicate response loop breaker to prevent the agent from getting stuck in infinite loops when receiving "You have sent the same message again" errors from external APIs.

Problem

The agent can get stuck in an infinite loop when:

External LLM API (e.g., ZAI/GLM-5, Ollama) rejects a message as a duplicate
Agent receives "You have sent the same message again" error
Agent retries with the exact same message
Loop continues indefinitely

Solution

Add duplicate_retries counter to track consecutive duplicate responses
Break loop after 3 consecutive identical responses with HandledException
Log clear error message when loop is broken
Reset duplicate_retries on successful iteration

Changes

agent.py: Added duplicate_retries counter and loop breaker logic (13 lines added)

Testing

Tested by triggering duplicate response scenarios - agent now breaks out after 3 attempts with clear error message.

Related Issues

Fixes #1056
Fixes #1000
Related to #1187, #1011

Notes

This is a core code fix that cannot be solved through agent behavior changes alone, as the LLM is not in a coherent state during a loop to recognize and break the pattern.

- Add duplicate_retries counter to track consecutive duplicate responses - Break loop after 3 consecutive identical responses with HandledException - Log error message when loop is broken - Reset duplicate_retries on successful iteration Fixes agent0ai#1056, agent0ai#1000 - Prevents agent from getting stuck in infinite loop when receiving 'You have sent the same message again' from external APIs

- Add duplicate_retries counter to track consecutive duplicate responses - Pass retry_count to fw.msg_repeat.md for context - Enhanced warning message with specific guidance on breaking loops - Provides 4 concrete alternatives when stuck in a loop - Reset duplicate_retries on successful iteration This addresses the ROOT CAUSE by giving the LLM: 1. Context (retry count) so it knows it's in a loop 2. Specific alternatives instead of generic 'do something else' 3. Self-correction capability before circuit breaker kicks in Works in conjunction with PR agent0ai#1265 (circuit breaker) for defense-in-depth. Related to agent0ai#1056, agent0ai#1000, agent0ai#1187, agent0ai#1011

gdeyoung · 2026-03-25T18:15:50Z

🚨 Report: Increased Looping After Today's Update

After today's platform update, I'm experiencing significantly MORE looping issues where patterns get stuck, and it's happening EARLIER in chats.

Evidence: Chat Session Restarts Today (March 25)

Time	Session	Notes
05:03	Session 1	-
08:01	Session 2	-
14:28	Session 3	-
14:32	Session 4	⚠️ 4 min gap - restart
14:49	Session 5	⚠️ 17 min gap - restart
15:49	Session 6	-
16:08	Session 7	-
16:39	Session 8	-
16:42	Session 9	⚠️ 3 min gap - restart
18:05	Current	-

Key Finding: 3 restarts within 21 minutes (14:28 → 14:49) indicates severe looping/hang issues!

Additional Warning on Startup

Seeing this in Docker logs on startup:
/opt/venv-a0/lib/python3.12/site-packages/requests/init.py:113: RequestsDependencyWarning:
urllib3 (2.6.3) or chardet (7.3.0)/charset_normalizer (3.4.6) doesn't match a supported version!
\

Manifestation

The loops manifest as:

Agent getting stuck repeating similar actions
Patterns that don't break naturally
Requiring manual restart to recover

Request

This reinforces the NEED for the circuit breaker fix in this PR. Please prioritize review - this is actively impacting production use.

Related: #1266 (Enhanced duplicate response guidance)

anglerfish27 · 2026-03-27T21:48:04Z

I feel the need to jump in here and post. Today is March 27th. I was using an older version of A0 (.9x) that I had downloaded about a month give or take ago to test A0 out as a POC for my job. Sadly I'm no AI expert, you can barely call me a beginner! I'm just a systems admin! so if you ask me deep technical AI questions or if how I say something sounds weird and causes and eye roll. Sorry. I don't know what I dont know! (yet!)

Working with our enterprise co-pilot AI at work, I was able to wire up that version of A0 I had on my Mac laptop (personal) with docker desktop on it and the official latest (at the time) image. I finally got my backend for A0 in the mail, a DGX Spark 10 "super computer" (their words not mine). I got LM studio the CLI version only installed on it (using the Spark headless), and ollama (again CLI only) on it. I had a few models downloaded to LM studio, I was using qwen/qwen3-30b-A3b-2507 for the "chat" portion, and nvidia/nemotron-3-nano for the utility portion, for web I was using qwen3-8B and for embedding I was using ollama as the backend, with a small text embedding model, the name escapes me but irrelevant.

since things were running in docker on the spark and on my laptop networking was a mess (for me anyways i've never used docker so there's that). Co-pilot forged ahead and got it all connected and working by using an open running terminal window that was making the system behave in a bridged mode (oh yeah I forgot I suck at networking too), after fussing around we got it all work, it was rather amazing to see the power of this come alive, I mean I was throwing all sorts of hard things at it and giving documents to reference, it did an outstanding job at all of it, no running out of context, no hallucinations, it was spot on. I was blown away.

enter today. I wanted to move off this bridged mode connection because that's not how we would run it in the enterprise. That's when things fell apart, I spent hours fighting with it and co-pilot to try and figure it out, we would get some parts working others wouldn't. After I realized we were so wildly out of date, I decided to ditch the old version and go for the latest as of today 1.3.

Working with co-pilot we began wiring it up again, issue after issue, mostly around the embedding model with ollama, co-pilot determined it was in our interest to ditch ollama and go with openai. Being ignorant I said sure, i mean at this point its broke so...

Well that turned into a mess, and after deleting the container and adding some environment variables to the new container deployed (is there really no way to add env vars after the container is created?) we were able to fix the wiring and embedding was happy. It was an API Key issue, A0 wanted a key even though we didn't need one or use one previously on the spark side but that was ollama not openai for the embedded model.

It was no longer showing up as an issue (api keys). We made sure my Mac could talk to all the endpoints, that models were loaded by lms and openai (that bit us for a bit grr). We passed the point where all curl tests worked for both embedding and chat models both on my Mac laptop (where the docker A0 is) and also locally on the spark.

Thinking we were good, I restarted the container for the last time, and that's when we hit the new and current blocker... the infinite loop of:
You have sent the same message again. You have to do something else!

CP had me try different browsers/incognito. No joy. just kept on looping this error. CP had me delete the container and add even more env vars to try and disable the loop (and also had me try disabling streaming) thinking that might be causing issues with chat.

Yeah nope. Still stuck in this loop.

Everything is running local off the spark (no cloud services).

I went rouge (on my own) and tried messing with the API endpoints http:// with it without it adding /v1 or removing it adding v1/models or not ect...basically all the different variations I could think of. Well no that broke things worse I guess that's showing we are correctly configured for the endpoints. I hope?

CP finally gave up after posting logs of the failure when I would type a message and then being stuck in this loop was the final straw I think it ran out of things to try and decided its a bug in the code. And that it can't be overridden with env vars. Again I have no idea if this is true. I'm relying on CP to guide me, yes I know this sucks because it can and is wrong sometimes, many times. But it did get it working perfectly with this bridged network so it knows enough to figure it out and fix the wiring issues we had. Now its recommending I downgrade to version 1.2 as that allegedly doesn't have the looping issue or mechanism...

If someone wants me to test something out, and has the patience to guide me some. I will be more than glad to test away. I know that having everything local may be a curse or a blessing. But the point is there's no penalty for me to test a million times over, no credits to use, no cost. I have a $4K machine that handles all the models locally. We can try other inference engines if we want. I've got nothing to lose since 1.3 is roasted right now unless someone knows how to fix it! So while adding my name to the hat of people saying 'yep its not fixed'. I'm also saying if you want to use me a test bed I'm willing to be one. Just gotta understand this stuff is crazy new to me. I fully understand all the different configurations I've been through probably makes me a high target for "ah its just misconfigured on his end" which is valid and very well may be true! Since I was using AI to help me I can't say with any certainty that I'm not the problem. But I was glad to see that others were posting this issue and recently as of 2 days ago too. Giving me hope there is a bug in there. Hoping the A0 team can figure this out for good. A0 looks so awesome, and when it was working on the old version man I was so impressed. I could feed it information or tell it specific websites to ingest and make it an expert in an instant. Which is our goal, to have a reliable system admin in a box so to speak. With lots of guardrails of course.

- #3: duplicate response loop breaker (breaks after 3 identical responses) - #4: dynamic output truncation threshold based on context window size - #2: resolve §§secret() / $$secret() placeholders in MCP server env/args/url/headers - #19: scheduler update_task tool method + prompt documentation Already applied (verified, skipping): #22 parallel MCP init, agent0ai#62 context window optimization Upstream: PR agent0ai#1265, PR agent0ai#857, PR agent0ai#1150, PR agent0ai#1105 Made-with: Cursor

gdeyoung mentioned this pull request Mar 15, 2026

fix: Enhanced duplicate response guidance for self-correction #1266

Closed

gdeyoung closed this Apr 6, 2026

gdeyoung deleted the fix/duplicate-loop-breaker branch April 6, 2026 02:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Add duplicate response loop breaker to prevent infinite loops#1265

fix: Add duplicate response loop breaker to prevent infinite loops#1265
gdeyoung wants to merge 1 commit into
agent0ai:mainfrom
gdeyoung:fix/duplicate-loop-breaker

gdeyoung commented Mar 15, 2026

Uh oh!

gdeyoung commented Mar 25, 2026

Uh oh!

anglerfish27 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gdeyoung commented Mar 15, 2026

Summary

Problem

Solution

Changes

Testing

Related Issues

Notes

Uh oh!

gdeyoung commented Mar 25, 2026

🚨 Report: Increased Looping After Today's Update

Evidence: Chat Session Restarts Today (March 25)

Additional Warning on Startup

Manifestation

Request

Uh oh!

anglerfish27 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants