rpc : enable async operations by rgerganov · Pull Request #7915 · ggml-org/llama.cpp

rgerganov · 2024-06-13T08:00:39Z

Start a dedicated backend thread in the rpc-server and use message passing interface for submitting work to it. This will enable backend async operations and cross-server communication.

Self Reported Review Complexity:
- Review Complexity : Low
- Review Complexity : Medium
- Review Complexity : High
I have read the contributing guidelines

Start a dedicated backend thread in the rpc-server and use message passing interface for submitting work to it. This will enable backend async operations and cross-server communication.

slaren · 2024-06-16T20:03:18Z

I may be wrong, but I suspect that the async queue will need to be implemented in the client side instead.

rgerganov · 2024-06-17T07:54:28Z

If we want to copy tensors across RPC servers then we need to handle at least two connections on the server side -- one from the scheduler and one from another RPC server. I considered the following options for implementing this:

Using a single thread and async IO. I think this would be hard to implement in a cross-platform way without using 3rd party libraries.
Using multiple threads and blocking IO. My assumption is that backend implementations are not guaranteed to be thread-safe, so we need to add synchronization when access the backend from multiple threads.
Using a single thread for all backend ops and submitting work to it via thread-safe message queue. No synchronization needed as backend is confined to a single thread.

I think option 3 is bringing less complexity compared to option 2 so I opted for it but I am open to discussions.

I may be wrong, but I suspect that the async queue will need to be implemented in the client side instead.

Could you please elaborate?

slaren · 2024-06-17T16:48:13Z

I wouldn't say that the message queue doesn't require synchronization, it is still locking a mutex for every message. Whether that's more efficient than the other methods, I don't know, but it is probably not going to be the bottleneck regardless. Another option could be using select/poll, which is still a single thread with blocking I/O.

To implement the async interface of ggml-backend, my intuition is that it would be simpler to implement the queue on the client side, but I am not completely sure of that. I think it should be possible to create a generic adapter that sits on top of another backend and implements the asynchronous operations by running an asynchronous queue in a different thread. For APIs that support multi-device synchronization natively such as CUDA, it is still going to be more efficient to use the native implementation, but for other backends it should be possible to provide a generic implementation.

rgerganov · 2024-06-20T10:50:01Z

PR #8032 is based on this work, trying to make copying tensors across servers more efficient. However, I am observing performance degradation with TinyLlama and 2 CUDA servers running on localhost.

@slaren may be we should close this PR and continue the discussion on PR #8032?

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 13, 2024

rpc : enable async operations

b30565e

Start a dedicated backend thread in the rpc-server and use message passing interface for submitting work to it. This will enable backend async operations and cross-server communication.

rgerganov force-pushed the async branch from 6971b32 to b30565e Compare June 14, 2024 08:46

ggerganov approved these changes Jun 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rpc : enable async operations#7915

rpc : enable async operations#7915
rgerganov wants to merge 1 commit intoggml-org:masterfrom
rgerganov:async

rgerganov commented Jun 13, 2024

Uh oh!

slaren commented Jun 16, 2024

Uh oh!

rgerganov commented Jun 17, 2024

Uh oh!

slaren commented Jun 17, 2024

Uh oh!

rgerganov commented Jun 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rgerganov commented Jun 13, 2024

Uh oh!

slaren commented Jun 16, 2024

Uh oh!

rgerganov commented Jun 17, 2024

Uh oh!

slaren commented Jun 17, 2024

Uh oh!

rgerganov commented Jun 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants