Skip to content

rpc : track allocated buffers#7411

Merged
rgerganov merged 2 commits intoggml-org:masterfrom
rgerganov:rpc-track-buffers
May 20, 2024
Merged

rpc : track allocated buffers#7411
rgerganov merged 2 commits intoggml-org:masterfrom
rgerganov:rpc-track-buffers

Conversation

@rgerganov
Copy link
Copy Markdown
Member

ref: #7407

@rgerganov rgerganov self-assigned this May 20, 2024
@rgerganov rgerganov requested a review from slaren May 20, 2024 09:39
Comment thread ggml-rpc.cpp


ggml_backend_t backend;
std::unordered_set<ggml_backend_buffer_t> buffers;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! just some thought:

  • should we also handle the life span of client_socket and gml_backend_t in this class?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted in PR #7407, we serve only one client at a time and I don't see reasons why we should add support for multiple clients.

Copy link
Copy Markdown
Contributor

@chraac chraac May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted in PR #7407, we serve only one client at a time and I don't see reasons why we should add support for multiple clients.

okay, then we can do it later, when we wanna introduce multi-client supports.

Comment thread ggml-rpc.cpp
}
#endif
GGML_PRINT_DEBUG("Connecting to %s\n", endpoint);
fprintf(stderr, "Connecting to %s\n", endpoint);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should it be better to use the GGML_PRINT_DEBUG instead of stdio directly?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this message is helpful for troubleshooting connection problems, especially when using multiple rpc-servers, so I think it's fine to always print this

@mofosyne mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level server labels May 20, 2024
Comment thread ggml-rpc.cpp
@chraac
Copy link
Copy Markdown
Contributor

chraac commented May 20, 2024

Test on my machine, the memory was freed correctly after client disconnected:

image
as we can see in the picture above, the free_mem remain unchange when i disconnect and connect again.

Comment thread ggml-rpc.cpp
Copy link
Copy Markdown
Contributor

@chraac chraac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rgerganov rgerganov merged commit db10f01 into ggml-org:master May 20, 2024
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* rpc : track allocated buffers

ref: ggml-org#7407

* rpc : pack rpc_tensor tightly
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
* rpc : track allocated buffers

ref: ggml-org#7407

* rpc : pack rpc_tensor tightly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants