-
Notifications
You must be signed in to change notification settings - Fork 2.8k
add resume_false_interruption and pause/resume the audio output #3109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@theomonnom can you review this one? |
theomonnom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this lg, I'm just not sure about the scenario where a SpeechHandle created using say/generate_reply should be "restarted".
|
Just saw one of those examples in dz's PR: |
we can add the |
2db6dc1 to
3a0ec93
Compare
6a951eb to
bd23bf3
Compare
|
|
||
| self._audio_buf = utils.aio.Chan[rtc.AudioFrame]() | ||
| self._audio_bstream = utils.audio.AudioByteStream( | ||
| sample_rate, num_channels, samples_per_channel=sample_rate // 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tricky thing if splitting the frames in smaller size, is that as soon as the asyncio loop is slower or the cpu usage is high, the audio may be stuttering.
You should be able to see that when using the stress cmd on mac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember having a lot of issues around that when initially developing agents. This is why we can push faster than realtime from the Python side (from this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we still have a reasonable buffer (200ms) for the AudioSource, the chunking here is just to ensure a single frame is not too big that we paused too late.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so the interruptions has a min of 200ms latency here? Would be good to check if 200ms is enough. The only real reason we need big buffers is because users may write slow code on the event loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use 100ms or 200ms, but I am wondering, what will happen if we call audio_source.clear_queue() when await audio_source.capture(frame) didn't return, how can we get the queued audio back from the audio_source?
if we can get the audio back, then the queue size and frame size don't matter, I can implement the pause in a different way when pause is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no easy way to get the audio back. But it's a complete OK assumption to assume that the playout was realtime. So 2s passed = 2s to discard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now we will have a max of frame_size latency to pause the audio, and will drop max to 200ms (audio_source queue size) of audio when pause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe 200ms is completely fine in both naturalness and cpu stress
| logger.debug("resumed false interrupted speech", extra={"timeout": timeout}) | ||
|
|
||
| self._session.emit( | ||
| "agent_false_interruption", AgentFalseInterruptionEvent(resumed=resumed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep this event or replace it with a callback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should deprecate it for now, to avoid breaking changes, logging a warning is fine for now
| video_sampler: NotGivenOr[_VideoSampler | None] = NOT_GIVEN, | ||
| user_away_timeout: float | None = 15.0, | ||
| agent_false_interruption_timeout: float | None = 4.0, | ||
| agent_false_interruption_timeout: float | None = 2.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| agent_false_interruption_timeout: float | None = 2.0, | |
| false_interruption_timeout: float | None = 2.0, |
I think it's fine to call it this way, user_false_interruption_timeout wouldn't make a ton of sense anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be a breaking change, should we change a name and log deprecate warning if user set agent_false_interruption_timeout?
theomonnom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!!
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
…kit#3109) Co-authored-by: David Zhao <dz@livekit.io>
No description provided.