Skip to content

Conversation

@longcw
Copy link
Contributor

@longcw longcw commented Aug 8, 2025

No description provided.

@longcw longcw requested a review from a team August 8, 2025 01:57
@longcw
Copy link
Contributor Author

longcw commented Aug 12, 2025

@theomonnom can you review this one?

Copy link
Member

@theomonnom theomonnom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lg, I'm just not sure about the scenario where a SpeechHandle created using say/generate_reply should be "restarted".

@theomonnom
Copy link
Member

Just saw one of those examples in dz's PR:

        await self.session.say("connecting you to the customer now.")
        await self.session_manager.merge_calls()

@longcw
Copy link
Contributor Author

longcw commented Aug 13, 2025

this lg, I'm just not sure about the scenario where a SpeechHandle created using say/generate_reply should be "restarted".

we can add the source back to the event, and skip retry for source say in the default callback.

@longcw longcw requested a review from theomonnom August 13, 2025 07:33
@longcw longcw force-pushed the longc/resume-interrupted-agent-cb branch from 2db6dc1 to 3a0ec93 Compare August 20, 2025 03:22
@longcw longcw changed the title add resume_false_interruption callback to AgentSession add resume_false_interruption and pause/resume the audio output Aug 20, 2025
@longcw longcw force-pushed the longc/resume-interrupted-agent-cb branch from 6a951eb to bd23bf3 Compare August 22, 2025 01:33

self._audio_buf = utils.aio.Chan[rtc.AudioFrame]()
self._audio_bstream = utils.audio.AudioByteStream(
sample_rate, num_channels, samples_per_channel=sample_rate // 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky thing if splitting the frames in smaller size, is that as soon as the asyncio loop is slower or the cpu usage is high, the audio may be stuttering.

You should be able to see that when using the stress cmd on mac

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember having a lot of issues around that when initially developing agents. This is why we can push faster than realtime from the Python side (from this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we still have a reasonable buffer (200ms) for the AudioSource, the chunking here is just to ensure a single frame is not too big that we paused too late.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the interruptions has a min of 200ms latency here? Would be good to check if 200ms is enough. The only real reason we need big buffers is because users may write slow code on the event loop

Copy link
Contributor Author

@longcw longcw Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use 100ms or 200ms, but I am wondering, what will happen if we call audio_source.clear_queue() when await audio_source.capture(frame) didn't return, how can we get the queued audio back from the audio_source?

if we can get the audio back, then the queue size and frame size don't matter, I can implement the pause in a different way when pause is called.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no easy way to get the audio back. But it's a complete OK assumption to assume that the playout was realtime. So 2s passed = 2s to discard

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now we will have a max of frame_size latency to pause the audio, and will drop max to 200ms (audio_source queue size) of audio when pause.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe 200ms is completely fine in both naturalness and cpu stress

logger.debug("resumed false interrupted speech", extra={"timeout": timeout})

self._session.emit(
"agent_false_interruption", AgentFalseInterruptionEvent(resumed=resumed)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep this event or replace it with a callback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should deprecate it for now, to avoid breaking changes, logging a warning is fine for now

video_sampler: NotGivenOr[_VideoSampler | None] = NOT_GIVEN,
user_away_timeout: float | None = 15.0,
agent_false_interruption_timeout: float | None = 4.0,
agent_false_interruption_timeout: float | None = 2.0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
agent_false_interruption_timeout: float | None = 2.0,
false_interruption_timeout: float | None = 2.0,

I think it's fine to call it this way, user_false_interruption_timeout wouldn't make a ton of sense anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be a breaking change, should we change a name and log deprecate warning if user set agent_false_interruption_timeout?

Copy link
Member

@theomonnom theomonnom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!!

@longcw longcw merged commit 2c84fc9 into main Aug 27, 2025
25 checks passed
@longcw longcw deleted the longc/resume-interrupted-agent-cb branch August 27, 2025 07:19
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Aug 28, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Aug 28, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Sep 5, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Sep 5, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Sep 19, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Sep 19, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Sep 29, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Sep 29, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Nov 20, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Nov 20, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Nov 28, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Nov 28, 2025
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Jan 11, 2026
akshaym1shra pushed a commit to akshaym1shra/agents that referenced this pull request Jan 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants