rpc : enable async operations#7915
Conversation
Start a dedicated backend thread in the rpc-server and use message passing interface for submitting work to it. This will enable backend async operations and cross-server communication.
|
I may be wrong, but I suspect that the async queue will need to be implemented in the client side instead. |
|
If we want to copy tensors across RPC servers then we need to handle at least two connections on the server side -- one from the scheduler and one from another RPC server. I considered the following options for implementing this:
I think option 3 is bringing less complexity compared to option 2 so I opted for it but I am open to discussions.
Could you please elaborate? |
|
I wouldn't say that the message queue doesn't require synchronization, it is still locking a mutex for every message. Whether that's more efficient than the other methods, I don't know, but it is probably not going to be the bottleneck regardless. Another option could be using select/poll, which is still a single thread with blocking I/O. To implement the async interface of ggml-backend, my intuition is that it would be simpler to implement the queue on the client side, but I am not completely sure of that. I think it should be possible to create a generic adapter that sits on top of another backend and implements the asynchronous operations by running an asynchronous queue in a different thread. For APIs that support multi-device synchronization natively such as CUDA, it is still going to be more efficient to use the native implementation, but for other backends it should be possible to provide a generic implementation. |
Start a dedicated backend thread in the rpc-server and use message passing interface for submitting work to it. This will enable backend async operations and cross-server communication.