Clean up per-thread parameter buffer pool and job submission logic by nikhilJain17 · Pull Request #19772 · ggml-org/llama.cpp

nikhilJain17 · 2026-02-20T23:27:04Z

After splitting per-thread state and execution, this is the final cleanup diff.

We allow the buffer pool to grow in case of multiple kernels in a command requiring more buffers, remove the inflight_threads logic, and replace it with num_kernels to decide when to submit a batch of commands.

…nd replace inflight_threads with num_kernels for submission

reeselevine

Thanks, this looks good, and I think the heuristic for max param buffer pool size is reasonable.

…sion logic (ggml-org#19772) * Allow webgpu_buf_pool to resize if needed, remove inflight_threads, and replace inflight_threads with num_kernels for submission * Run clang-format * Keep track of num batched kernels that have not been submitted yet * Run clang-format * Increase buf pool max size * Increase param buf pool init size * Remove webgpu buf pool resizing * Merge with master * Add buffer pool growth * Move buffer pool growth outside of lock * Reduce max pool size to 32 * Run clang-format * Only resize param buf pool

nikhilJain17 added 4 commits January 30, 2026 12:15

Merge

df60497

Merge

5ae7583

merge

70238e0

Allow webgpu_buf_pool to resize if needed, remove inflight_threads, a…

2806b7d

…nd replace inflight_threads with num_kernels for submission

github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 20, 2026

nikhilJain17 added 14 commits February 20, 2026 15:28

Run clang-format

f427995

Keep track of num batched kernels that have not been submitted yet

7cecc8a

Run clang-format

58d2a91

Increase buf pool max size

059e21b

Increase param buf pool init size

4fa05f7

Remove webgpu buf pool resizing

4b2af8e

Merge with master

9f9d064

Merge with master

a3e6d6f

Merge with master

a87f3db

Add buffer pool growth

a8694c0

Move buffer pool growth outside of lock

596e258

Reduce max pool size to 32

a487db9

Run clang-format

8512456

Only resize param buf pool

60ae42e

nikhilJain17 marked this pull request as ready for review March 1, 2026 05:24

nikhilJain17 requested a review from reeselevine as a code owner March 1, 2026 05:24

reeselevine reviewed Mar 2, 2026

View reviewed changes

reeselevine approved these changes Mar 2, 2026

View reviewed changes

reeselevine merged commit 4d828bd into ggml-org:master Mar 2, 2026
76 of 78 checks passed

reeselevine mentioned this pull request Mar 3, 2026

ggml webgpu: fix workgroup dispatch limit for large batch sizes #19965

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up per-thread parameter buffer pool and job submission logic#19772

Clean up per-thread parameter buffer pool and job submission logic#19772
reeselevine merged 18 commits intoggml-org:masterfrom
nikhilJain17:nikhilJain17/webgpu_buf_pool

nikhilJain17 commented Feb 20, 2026

Uh oh!

reeselevine left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nikhilJain17 commented Feb 20, 2026

Uh oh!

reeselevine left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants