.NET: Python: add ADR for threads#2420
.NET: Python: add ADR for threads#2420eavanvalkenburg wants to merge 26 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an Architecture Decision Record (ADR) proposing changes to the thread management system in the Python agent framework. The ADR addresses confusion around the current AgentThread class, which can exist in multiple states and hold different types of data, making it difficult for users to understand and use correctly.
Key Changes
- Proposes two options for restructuring thread management: maintaining the current single-class approach vs. separating into distinct
RemoteThreadandLocalThreadclasses - Documents the current implementation's limitations and the decision drivers focused on ease of use and clarity
- Presents trade-offs between consistency with .NET implementation and Python-specific design patterns
| - We would then add a flag on ChatClients, to indicate which type of thread they support, and it can be both, so two flags are likely needed, although local thread might always be possible. | ||
| - And finally, all Agents would get two methods, `get_service_thread()`/`get_remote_thread` and `get_local_thread()`, both of which might raise an error if the chat client does not support that type of thread. | ||
| - the `run` methods would take both types of threads, but would raise an error if the thread type is not supported by the chat client. And it would also check with the `store` parameter to make sure it is used correctly, or set it correctly. | ||
| - One open question is how to handle when there is a mismatch between the thread type and the `store` parameter, for example passing a `LocalAgentThread` with `store=True`, or a `ServiceAgentThread` with `store=False`. Options are to either raise an error, or to ignore the `store` parameter and always do the right thing based on the thread type. Or to transform the thread into the right type, but that seems more complex and might not always be possible. Although starting with a local thread (which would be a list of chat messages in a ChatMessageStore) and then setting store=True might make sense, the return would be a service thread then, but that adds complexity, this might be useful for workflows that combine different agent types. |
There was a problem hiding this comment.
I like the fact that it's explicit, but this also makes it more complicated with store parameter, because at the moment you only need to change this parameter value True/False, while with a separate thread type you would also need to change that thread type, so basically configuring the same logic in two different places.
If we introduce a separate thread type for remote and local messages, maybe we can remove store parameter in this case. Instead of using store parameter, we will just check a thread type, and by default it will be a local one.
There was a problem hiding this comment.
good point, we do need to look at both conversation_id and store parameters in this case. I'll add
There was a problem hiding this comment.
The only challenge is that chat options work at the chat client level, while this works at the agent level.
There was a problem hiding this comment.
added more clarity on this choice, and the option I think makes most sense.
| - Good, because it is clear which chat clients support which type of thread. | ||
| - Good, because we can make all the logic that deals with threads much clearer, as each class has a single responsibility. | ||
| - Good, because it might also enable a abstracted method to get a list of chat messages from a thread through the chat client. | ||
| - Bad, because it would more fundamentally diverge from dotnet. |
There was a problem hiding this comment.
I think if this API brings more clarity, it can be introduced in .NET as well. Otherwise, I'm not sure if we want to diverge from .NET this much.
There was a problem hiding this comment.
When discussing with Weslie we also came to the conclusion that we might invert the order, and do this for python and follow with .net
There was a problem hiding this comment.
The main problem in .net is that we don't know what kind of thread is supported by the underlying service until we get the first response from the chat client and changing this will be challenging.
There was a problem hiding this comment.
The main problem in .net is that we don't know what kind of thread is supported by the underlying service until we get the first response from the chat client and changing this will be challenging.
@eavanvalkenburg Isn't it the same in Python?
There was a problem hiding this comment.
It is, but since currently all the clients are built by us and part of AF, it's a onetime breaking change to add the flag to the protocol and base_client and then we are good to go, and if someone implements their own, it's a simple addition. For dotnet, this is a bit more tricky
4510170 to
531d751
Compare
| - Bad, because it is unclear which chat clients can support which type of thread. | ||
| - Bad, because dotnet also has subclasses for each type of agent, so already somewhat diverging from dotnet. | ||
|
|
||
| ### 2. Separate classes for `ServiceThread`/`RemoteThread` and `LocalAgentThread`/`LocalThread`, each with their own behaviors and methods. |
There was a problem hiding this comment.
I like the explicit-ness of the separate classes. I think doing it this way will be better for developers, too, as they'd know which type they're working with and what's available in each object.
4200763 to
db563ac
Compare
| This approach would mean: | ||
| - Creating two subclasses of AgentThread, one for service threads and one for local threads, both with different attributes and methods. Tentatively called `RemoteThread` and `LocalThread`. | ||
| - Removing `ChatMessageStore`, instead a `LocalThread` would have a list of ChatMessages as attribute. | ||
| - Moving `ContextProvider` back into Agent, replacing with a field `context_data`/`context_state` or a dict of `context_provider_name/id: context_data/state` on both thread types, which the agent would then use to get the context from the context providers when running in that thread. This makes the thread itself state-only, and the context provider can be stateless and live in the agent. |
There was a problem hiding this comment.
I'm not sure if context provider should be part of the thread at all, it feels to me like an optional component that should be part of the Agent.
There was a problem hiding this comment.
I agree the provider should not be, but a state of some sort is needed, for instance when you are running with a RemoteThread, it might contain the last few messages in order to accurately be able to search for related content, so the last X number of messages of the thread might be kept. Also something like a user_id or session_id could be stored and should be part of the thread, not the agent or some other thing.
| - Moving `ContextProvider` back into Agent, replacing with a field `context_data`/`context_state` or a dict of `context_provider_name/id: context_data/state` on both thread types, which the agent would then use to get the context from the context providers when running in that thread. This makes the thread itself state-only, and the context provider can be stateless and live in the agent. | ||
| - The protocol/interface for ContextProviders would need a slight update, one to return a initial `context_data/state` object, the logic of which is maintained by the provider (whether it matches a app, user or session and what to record in there), and adding that `context_data/state` to the invoked and invoking methods. The `context_data/state` needs to be (de)serializable in a standard way, to make handling the thread easier. | ||
| - We would need to add a flag on `ChatClients`, to indicate if they support `remote/service` threads, and we assume that we always support `local` threads. | ||
| - All Agents would get two methods, `get_service_thread(thread_id: str | None = None, ...)`/`get_remote_thread(thread_id: str | None = None, ...)` and `get_local_thread(messages: list[ChatMessage] | None = None, ...)`, both of which might raise an error if the chat client does not support that type of thread, after creation the agent then calls the context_provider(s) to get a `context_data/state` assigned as well. |
There was a problem hiding this comment.
In previous point there is a statement and we assume that we always support local threads, but in this point we say that both remote and local thread get methods might raise an error if the chat client does not support that type of thread. But if we always support local threads (which makes sense to me), then method get_local_thread should never throw.
There was a problem hiding this comment.
good one, clarified
| - `LocalThread` | ||
| - `ClientSideThread` | ||
|
|
||
| `RemoteThread` and `LocalThread` seem the clearest and most concise options and the most pythonic. |
There was a problem hiding this comment.
I like RemoteThread and LocalThread, but if we are doing this refactoring, we should also make a decision if thread terminology is the final option, or we want to think about conversation and any other potential alternatives. I remember there was some feedback about thread naming.
Also, one potential alternative to RemoteThread is HostedThread. It's also short and works well with tool types like HostedMCPTool.
There was a problem hiding this comment.
Good one, added that. I'll add a note on alternatives to Threads.
34e416b to
752742a
Compare
Motivation and Context
ADR proposing changes to the thread setup in Python.
Description
Contribution Checklist