[Mirror] server: improve slots scheduling for n_cmpl#80
[Mirror] server: improve slots scheduling for n_cmpl#80
Conversation
📝 WalkthroughWalkthroughRefactors orchestration to use task-level parent/child predicates, moves child tasks into Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Queue as ResponseQueue
participant Context as ServerContext
participant Slots as SlotManager
Client->>Queue: post_tasks(parent + children)
Queue->>Context: enqueue parent_task (with child metadata)
Context->>Slots: get_free_slots(n_needed, exclude_id_slot)
alt enough slots
Slots-->>Context: child_slots list
Context->>Slots: launch_slots_with_parent_task(parent_slot, child_slots, parent_task)
Slots-->>Context: child slots started
Slots-->>Context: parent slot started
Context->>Queue: update states / notify_started
Queue-->>Client: tasks_started
else insufficient slots
Slots-->>Context: not_enough_slots
Context->>Queue: pop_deferred_task(id_slot) / defer parent_task
Queue-->>Client: task_deferred
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (1)📓 Common learnings🔇 Additional comments (16)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tools/server/server-queue.cpp (1)
338-355: Potential correctness issue:post_tasks()requires task IDs to be assigned before calling it.
id_tasks = server_task::get_list_id(tasks)+add_waiting_task_ids(id_tasks)will misbehave if anytask.id/child_task.idis still-1(the queue’spost(std::vector<...>)assigns ids later, but the response reader needs stable ids now). Please either assert IDs are already assigned or assign them here before buildingid_tasks.Proposed fix (assert ids are assigned before using them)
void server_response_reader::post_tasks(std::vector<server_task> && tasks, bool front) { GGML_ASSERT(id_tasks.empty() && "post_tasks() can only be called once per reader"); + for (const auto & t : tasks) { + GGML_ASSERT(t.id != -1 && "post_tasks(): task.id must be assigned before posting"); + for (const auto & c : t.child_tasks) { + GGML_ASSERT(c.id != -1 && "post_tasks(): child_task.id must be assigned before posting"); + } + } id_tasks = server_task::get_list_id(tasks); - states.reserve(tasks.size()); + states.reserve(id_tasks.size()); size_t index = 0; for (auto & task : tasks) { task.index = index++; states.push_back(task.create_state()); // for child tasks for (auto & child_task : task.child_tasks) { child_task.index = index++; states.push_back(child_task.create_state()); } } GGML_ASSERT(states.size() == id_tasks.size()); queue_results.add_waiting_task_ids(id_tasks); queue_tasks.post(std::move(tasks), front); }tools/server/server-task.h (1)
123-211: Fix invariant:n_childrenvschild_tasks.size()can desync (possible OOB later).Right now
is_parent()isn_children > 0, but the actual child storage ischild_tasks. If any code sets one without the other, downstream code that indexes children byn_childrencan read pastchild_tasks. Recommend makingn_childrenderived (or at least kept in sync insideadd_child).Proposed fix (make “parent” derived from stored children)
struct server_task { @@ - int n_children = 0; // number of tasks reusing this prompt + int n_children = 0; // number of tasks reusing this prompt (keep in sync with child_tasks.size()) @@ void add_child(int id_parent, int id_child) { server_task copy; @@ if (copy.params.sampling.seed != LLAMA_DEFAULT_SEED) { copy.params.sampling.seed += (uint32_t)child_tasks.size() + 1; } child_tasks.push_back(std::move(copy)); + n_children = (int) child_tasks.size(); } @@ bool is_parent() const { - return n_children > 0; + return !child_tasks.empty(); } };
🤖 Fix all issues with AI agents
In @tools/server/server-context.cpp:
- Around line 3075-3086: Remove the unused local variable child_tasks declared
in the block that prepares child tasks in server-context (the line
"std::vector<server_task> child_tasks;"); delete that declaration and keep the
existing logic that sets task.n_children, calls task.add_child(...), and pushes
task into tasks (no other modifications). Ensure there are no remaining
references to child_tasks after removal and rebuild to confirm the
unused-variable warning is gone.
- Around line 1653-1786: The bug is that try_reserve_n_slots counts the
already-reserved parent slot (slot->task_id_next == task.id) toward child slots;
change try_reserve_n_slots to accept an optional exclude_slot_id (e.g.,
signature try_reserve_n_slots(size_t n_required, int task_id, int
exclude_slot_id = -1) or similar) and skip any slot whose id equals
exclude_slot_id when scanning/reserving, then call it from process_single_task
with the parent slot id (try_reserve_n_slots(task.n_children, task.id,
slot->id)); update any other call sites accordingly and ensure
launch_slots_with_child_tasks still assumes the parent slot is excluded from
child reservations.
🧹 Nitpick comments (2)
tools/server/tests/unit/test_chat_completion.py (1)
494-507: Add a request timeout to avoid hangs if the scheduler regresses.This test intentionally stresses “slot reuse”; if it deadlocks, pytest may hang. Consider passing a short
timeout=toserver.make_request(...)(if supported by the test utility) so failures are bounded-time.tools/server/server-queue.cpp (1)
220-227: OK, but consider reducing per-id debug spam.Looping
RES_DBG(...)for every id in a batch can get very noisy for large batches; consider one aggregate log (or guard behind a higher debug level) if this shows up in practice.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
tools/server/server-context.cpptools/server/server-queue.cpptools/server/server-queue.htools/server/server-task.htools/server/tests/unit/test_chat_completion.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 0
File: :0-0
Timestamp: 2025-12-22T23:32:42.603Z
Learning: For mirrored PRs (those with [Mirror] in title or mirroring upstream), ngxson wants AI-assisted code review with these preferences: check the upstream PR URL for description, be nit-picky about obvious mistakes, don't tag contributors directly (inform ngxson instead), and alert about suspicious AI-generated code comments like "CRITICAL: ..." or "You are correct here...".
When reviewing mirrored PRs, the PR is from an upstream contributor, not from ngxson himself. Any issues should be reported to ngxson so he can relay them to the contributor.
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
📚 Learning: 2025-12-20T21:18:21.151Z
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 62
File: tools/server/server-context.cpp:591-603
Timestamp: 2025-12-20T21:18:21.151Z
Learning: In ngxson/llama.cpp server implementation, holding mutex_tasks during callback_sleeping_state() (which calls handle_sleeping_state) is intentional behavior. No tasks should be allowed to be pushed into the queue during sleeping state transition, even though this blocks all queue operations (post, defer, wait_until_no_sleep) for the duration of model unload/reload (potentially minutes).
Applied to files:
tools/server/server-queue.cpp
🧬 Code graph analysis (2)
tools/server/server-queue.h (1)
tools/server/server-task.h (1)
unordered_set(169-178)
tools/server/tests/unit/test_chat_completion.py (2)
tools/server/server-context.cpp (1)
res(1063-1088)tools/server/tests/utils.py (1)
make_request(288-319)
🔇 Additional comments (4)
tools/server/server-queue.cpp (1)
328-336: Good guardrail: prevent posting “parent” viapost_task().The new
GGML_ASSERT(!task.is_parent())makes the API expectation explicit and avoids silently mis-scheduling parent/child sets.tools/server/server-context.cpp (2)
87-90: Slot reservation primitive looks fine.
task_id_next+ reset clearing it is a clear way to encode “only accept task X next” at the slot level.Also applies to: 183-183
2060-2064: Good: explicitly forbid context shift for shared prompts.Rejecting ctx-shift when
is_parent()/is_child()prevents hard-to-reason-about KV mutations across linked tasks.tools/server/server-queue.h (1)
124-140: API migration toadd_waiting_task_ids(...)is complete.Verification confirms no remaining calls to the old API. The new method is declared, implemented, and actively used in the codebase.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tools/server/server-context.cpp (1)
1684-1721: Potential partial failure scenario worth noting.If
launch_slot_with_taskfails for a child task (line 1701), any previously-launched child slots continue processing without a parent being launched. This could result in orphaned task results.The caller does call
unreserve_slotson failure (line 1779), which prevents further slot reservations, but it doesn't cancel already-launched children.This is likely acceptable for now given the complexity of proper cleanup, but worth documenting as a known limitation or addressing in a follow-up if it causes issues in practice.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tools/server/server-context.cpp
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 0
File: :0-0
Timestamp: 2025-12-22T23:32:42.603Z
Learning: For mirrored PRs (those with [Mirror] in title or mirroring upstream), ngxson wants AI-assisted code review with these preferences: check the upstream PR URL for description, be nit-picky about obvious mistakes, don't tag contributors directly (inform ngxson instead), and alert about suspicious AI-generated code comments like "CRITICAL: ..." or "You are correct here...".
When reviewing mirrored PRs, the PR is from an upstream contributor, not from ngxson himself. Any issues should be reported to ngxson so he can relay them to the contributor.
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
🔇 Additional comments (12)
tools/server/server-context.cpp (12)
87-90: LGTM!The new
task_id_nextfield provides a clean mechanism for slot reservation. The comment clearly explains its purpose for coordinating parent + child task scheduling.
183-183: LGTM!Properly resets the reservation field when a slot is reset.
345-345: LGTM!The
taskpointer is asserted non-null earlier in the function (line 335), so this access is safe.
1218-1224: LGTM!The
slot.taskis assigned viamake_uniqueimmediately before this access, so it's guaranteed to be non-null.
1619-1619: LGTM!The assertion is appropriate since the caller (
process_single_taskat line 1732) checkstask.cli_input != nullptrbefore invoking this function.
1653-1674: LGTM!The reservation logic is sound:
- Skips the current (parent) slot
- Stops early when enough slots are reserved
- Only reserves slots that are either already reserved for this task or completely free
- Returns success only when all required slots are actually available (not processing)
The design correctly handles the case where slots may be reserved but still finishing previous work.
1676-1682: LGTM!Simple and correct cleanup function that releases all slot reservations for a given task.
1767-1790: LGTM!The parent task scheduling logic correctly implements a "sync barrier" pattern:
- Reserves the current slot for this task
- Attempts to reserve required child slots (reusing any existing reservations for this task)
- Only proceeds when all slots are both reserved and available
- If not all slots are ready, clears only the current slot's reservation and defers
The design handles the case where a deferred task may be assigned to a different slot on retry - existing reservations from prior attempts are correctly reused via the
task_idmatching.
1806-1806: LGTM!Proper cleanup of slot reservations when a task is cancelled. This prevents reservation leaks.
2064-2064: LGTM!The slot is in
SLOT_STATE_GENERATINGat this point (verified by line 2049), which guaranteesslot.taskis valid.
2764-2764: LGTM!Consistent with the refactor moving predicates to
server_task. The slot is inSLOT_STATE_DONE_PROMPTstate, ensuringslot.taskis valid.
3058-3088: LGTM!The child task creation is refactored to use
task.add_child(...), which attaches children to the parent task. This aligns with the newlaunch_slots_with_child_taskshelper that expectsparent_task.child_tasksto contain the child tasks for coordinated launch.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tools/server/server-context.cpp (1)
1006-1027: Missing reservation check in LRU slot selection.The LRU fallback loop does not check
slot.task_id_next, unlike the LCP-based selection above it. This could cause a reserved slot to be selected for an unrelated task.🐛 Proposed fix to add reservation check in LRU selection
// find the slot that has been least recently used if (ret == nullptr) { int64_t t_last = -1; for (server_slot & slot : slots) { // skip the slot if it is not available if (slot.is_processing()) { continue; } + // skip slot reserved for other tasks + if (slot.task_id_next != -1 && slot.task_id_next != task.id) { + continue; + } + // select the current slot if the criteria match if (!ret || slot.t_last_used <= t_last) { t_last = slot.t_last_used; ret = &slot; } }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tools/server/server-context.cpp
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 0
File: :0-0
Timestamp: 2025-12-22T23:32:42.603Z
Learning: For mirrored PRs (those with [Mirror] in title or mirroring upstream), ngxson wants AI-assisted code review with these preferences: check the upstream PR URL for description, be nit-picky about obvious mistakes, don't tag contributors directly (inform ngxson instead), and alert about suspicious AI-generated code comments like "CRITICAL: ..." or "You are correct here...".
When reviewing mirrored PRs, the PR is from an upstream contributor, not from ngxson himself. Any issues should be reported to ngxson so he can relay them to the contributor.
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: ggml-ci-arm64-cpu-kleidiai
- GitHub Check: ubuntu-cpu-cmake (arm64, ubuntu-22.04-arm)
- GitHub Check: ggml-ci-arm64-cpu-high-perf
- GitHub Check: windows-latest-cmake (cpu-x64 (static), x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmak...
- GitHub Check: openEuler-latest-cmake-cann (aarch64, 910b, Release)
- GitHub Check: server (Release, LLAMA_ARG_BACKEND_SAMPLING=1)
- GitHub Check: server (UNDEFINED, RelWithDebInfo)
- GitHub Check: server (ADDRESS, RelWithDebInfo)
- GitHub Check: server (Release)
- GitHub Check: server-windows
- GitHub Check: server-build (Release)
- GitHub Check: server-build (UNDEFINED, RelWithDebInfo)
- GitHub Check: server-build (ADDRESS, RelWithDebInfo)
- GitHub Check: WebUI Checks
- GitHub Check: pyright type-check
🔇 Additional comments (12)
tools/server/server-context.cpp (12)
87-90: LGTM!The new
task_id_nextfield for slot reservation is well-documented with a clear comment explaining its purpose.
345-352: LGTM!The
release()method correctly passestask_id_nextto the callback, enabling deferred task processing when a reserved slot becomes available.
970-973: LGTM!Correctly skips slots reserved for other tasks while allowing a slot to be used if it's reserved for the requesting task.
1624-1657: LGTM!The
GGML_ASSERTis appropriate as the function is only called whencli_inputis set (verified by the caller at line 1755).
1659-1690: LGTM!The reservation logic correctly handles:
- Reusing slots already reserved for this task
- Only succeeding when all required slots are both reserved and not processing
- Proper unreservation cleanup in
unreserve_slots
1692-1744: LGTM!The multi-slot launch logic is well-structured with proper cleanup paths:
- Child slots are launched first, then the parent
- On failure, all launched slots are released and reservations are cleared
- Reservations are automatically cleared via
slot.reset()inlaunch_slot_with_task
1792-1816: LGTM!The parent task scheduling logic correctly:
- Reserves the current slot for this task
- Attempts to reserve additional child slots
- Launches all slots together on success, or defers while keeping partial reservations
2091-2095: LGTM!Correctly updated to use
slot.task->is_parent()andslot.task->is_child()after the refactoring that moved these predicates to the task.
2789-2814: LGTM!The parent task prompt completion handling correctly uses the updated
slot.task->is_parent()API.
3085-3116: LGTM!The refactoring correctly uses
task.add_child()to attach child tasks to the parent. Only the parent task is pushed to the queue, and children are launched together vialaunch_slots_with_child_tasks.
1226-1230: LGTM!Child slots correctly start in
SLOT_STATE_WAIT_OTHERto wait for the parent's prompt processing, while parent/standalone slots start inSLOT_STATE_STARTED.
829-832: The callback implementation is correct. Whentask_id_nextis -1,pop_deferred_task()explicitly pops the front available deferred task from the queue (not a no-op). The function safely handles this case by checking if the deferred queue is not empty before proceeding, and the -1 value is intentionally used to pop the next available task without requiring a specific task ID.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tools/server/server-context.cpp`:
- Around line 1662-1704: The function launch_slots_with_parent_task currently
returns bool but uses integer return codes (0 for success, 2 for failure),
causing inverted logic; change its signature from bool to int and keep the
existing return values (0 = success, non-zero = failure), and update all call
sites that invoke launch_slots_with_parent_task (and any checks like
!launch_slots_with_parent_task(...)) to interpret the int correctly (e.g., check
for != 0 on failure or == 0 on success); no change needed to
launch_slot_with_task or release_slots logic besides adjusting callers to use
the int result.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tools/server/server-context.cpp
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 0
File: :0-0
Timestamp: 2025-12-22T23:32:42.603Z
Learning: For mirrored PRs (those with [Mirror] in title or mirroring upstream), ngxson wants AI-assisted code review with these preferences: check the upstream PR URL for description, be nit-picky about obvious mistakes, don't tag contributors directly (inform ngxson instead), and alert about suspicious AI-generated code comments like "CRITICAL: ..." or "You are correct here...".
When reviewing mirrored PRs, the PR is from an upstream contributor, not from ngxson himself. Any issues should be reported to ngxson so he can relay them to the contributor.
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: build-linux-cross / debian-13-loongarch64-vulkan-cross
- GitHub Check: build-linux-cross / debian-13-loongarch64-cpu-cross
- GitHub Check: openEuler-latest-cmake-cann (x86, 310p, Release)
- GitHub Check: openEuler-latest-cmake-cann (x86, 910b, Release)
- GitHub Check: build-cmake-pkg / linux
- GitHub Check: windows-msys2 (CLANG64, clang-x86_64, Release)
- GitHub Check: ubuntu-24-cmake-webgpu
- GitHub Check: ubuntu-24-wasm-webgpu
- GitHub Check: ubuntu-24-cmake-vulkan-deb
- GitHub Check: ubuntu-latest-cmake-sanitizer (ADDRESS, Debug)
- GitHub Check: ubuntu-cpu-cmake (arm64, ubuntu-22.04-arm)
- GitHub Check: ubuntu-24-cmake-vulkan
- GitHub Check: macOS-latest-cmake-arm64
- GitHub Check: macOS-latest-cmake-x64
- GitHub Check: ubuntu-latest-llguidance
- GitHub Check: server-build (ADDRESS, RelWithDebInfo)
- GitHub Check: server-build (UNDEFINED, RelWithDebInfo)
- GitHub Check: pyright type-check
- GitHub Check: WebUI Checks
- GitHub Check: server-build (Release)
🔇 Additional comments (10)
tools/server/server-context.cpp (10)
170-170: LGTM!Good documentation improvement to clarify the parameter represents
slot_id.
340-342: LGTM!The refactor to use
slot.task->is_child()instead of slot-level checks is cleaner. Thetaskis guaranteed non-null by the assert at line 330.
824-826: LGTM!The callback properly uses the
slot_idparameter to pop deferred tasks specifically associated with the released slot.
939-947: LGTM!Good naming improvement to distinguish
id_slotfrom other ID types.
1215-1219: LGTM!Correct state assignment: child slots wait for the parent to process the prompt before they can proceed.
1614-1614: LGTM!The assert makes the precondition explicit. The caller at line 1715 already checks
task.cli_input != nullptrbefore calling this function.
1746-1762: LGTM, pending the return value fix.The parent task scheduling logic is correct:
- Get free slots for child tasks (excluding parent slot)
- If not enough slots, defer the task
- Launch all slots together using
launch_slots_with_parent_taskThis addresses the previous review comment about counting the parent slot toward child slot requirements.
2030-2034: LGTM!Correctly prevents context shift for parent/child slots, which would desynchronize the shared prompt state.
2730-2748: LGTM!The parent-child state synchronization is correct:
- Parent completes prompt processing (SLOT_STATE_DONE_PROMPT)
- Finds all children waiting (SLOT_STATE_WAIT_OTHER with matching id_parent)
- Copies KV cache and state to each child
- Children transition to SLOT_STATE_DONE_PROMPT to begin generation
3042-3049: LGTM!The refactored child task creation using
task.add_child()is cleaner than the previous approach. Child tasks are now embedded within the parent task, which allowslaunch_slots_with_parent_taskto orchestrate them together.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| bool launch_slots_with_parent_task(server_slot & parent_slot, std::vector<server_slot *> & child_slots, server_task && parent_task) { | ||
| GGML_ASSERT(!parent_slot.is_processing()); | ||
| GGML_ASSERT(parent_task.is_parent()); | ||
| GGML_ASSERT(child_slots.size() == parent_task.child_tasks.size()); | ||
|
|
||
| int id_parent = parent_task.id; | ||
|
|
||
| SRV_INF("launching slots for parent task id_task = %d with %zu child tasks\n", id_parent, parent_task.child_tasks.size()); | ||
|
|
||
| // to be called in case of failure to release all launched slots | ||
| auto release_slots = [this, id_parent]() { | ||
| for (auto & slot : slots) { | ||
| if (slot.is_processing() && ( | ||
| slot.task->id == id_parent || | ||
| slot.task->id_parent == id_parent | ||
| )) { | ||
| slot.release(); | ||
| } | ||
| } | ||
| }; | ||
|
|
||
| // launch all child tasks first | ||
| size_t idx = 0; | ||
| GGML_ASSERT(child_slots.size() == parent_task.child_tasks.size()); | ||
| for (auto * slot : child_slots) { | ||
| int id_child = parent_task.child_tasks[idx].id; | ||
| if (!launch_slot_with_task(*slot, std::move(parent_task.child_tasks[idx]))) { | ||
| SRV_ERR("failed to launch slot with child task, id_task = %d\n", id_child); | ||
| release_slots(); | ||
| return 2; | ||
| } | ||
| idx++; | ||
| } | ||
|
|
||
| // finally, launch the parent task | ||
| if (!launch_slot_with_task(parent_slot, std::move(parent_task))) { | ||
| SRV_ERR("failed to launch slot with task, id_task = %d\n", id_parent); | ||
| release_slots(); | ||
| return 2; | ||
| } | ||
|
|
||
| return 0; | ||
| } |
There was a problem hiding this comment.
Critical: Return values are inverted due to bool/int mismatch.
The function returns bool but uses integer return values (return 0 for success, return 2 for failure). Due to implicit conversion:
return 0→false(but should indicate success)return 2→true(but should indicate failure)
At the call site (line 1755), !launch_slots_with_parent_task(...) will:
- On success (returns 0/false):
!false = true→ enters error branch incorrectly - On failure (returns 2/true):
!true = false→ skips error branch incorrectly
🐛 Proposed fix
// launch multiple slots for parent + child tasks
bool launch_slots_with_parent_task(server_slot & parent_slot, std::vector<server_slot *> & child_slots, server_task && parent_task) {
GGML_ASSERT(!parent_slot.is_processing());
GGML_ASSERT(parent_task.is_parent());
GGML_ASSERT(child_slots.size() == parent_task.child_tasks.size());
int id_parent = parent_task.id;
SRV_INF("launching slots for parent task id_task = %d with %zu child tasks\n", id_parent, parent_task.child_tasks.size());
// to be called in case of failure to release all launched slots
auto release_slots = [this, id_parent]() {
for (auto & slot : slots) {
if (slot.is_processing() && (
slot.task->id == id_parent ||
slot.task->id_parent == id_parent
)) {
slot.release();
}
}
};
// launch all child tasks first
size_t idx = 0;
GGML_ASSERT(child_slots.size() == parent_task.child_tasks.size());
for (auto * slot : child_slots) {
int id_child = parent_task.child_tasks[idx].id;
if (!launch_slot_with_task(*slot, std::move(parent_task.child_tasks[idx]))) {
SRV_ERR("failed to launch slot with child task, id_task = %d\n", id_child);
release_slots();
- return 2;
+ return false;
}
idx++;
}
// finally, launch the parent task
if (!launch_slot_with_task(parent_slot, std::move(parent_task))) {
SRV_ERR("failed to launch slot with task, id_task = %d\n", id_parent);
release_slots();
- return 2;
+ return false;
}
- return 0;
+ return true;
}🤖 Prompt for AI Agents
In `@tools/server/server-context.cpp` around lines 1662 - 1704, The function
launch_slots_with_parent_task currently returns bool but uses integer return
codes (0 for success, 2 for failure), causing inverted logic; change its
signature from bool to int and keep the existing return values (0 = success,
non-zero = failure), and update all call sites that invoke
launch_slots_with_parent_task (and any checks like
!launch_slots_with_parent_task(...)) to interpret the int correctly (e.g., check
for != 0 on failure or == 0 on success); no change needed to
launch_slot_with_task or release_slots logic besides adjusting callers to use
the int result.
Mirror from upstream PR: ggml-org#18789
Note: @coderabbitai use my 'Mirror PR' preset for reviewing this.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.