Add miscellaneous updates#8
Merged
Merged
Conversation
v1nc3nt27
pushed a commit
to v1nc3nt27/vllm
that referenced
this pull request
Sep 12, 2023
xiangyuT
pushed a commit
to xiangyuT/vllm
that referenced
this pull request
Oct 24, 2023
hongxiayang
pushed a commit
to hongxiayang/vllm
that referenced
this pull request
Feb 13, 2024
mzusman
added a commit
to mzusman/vllm
that referenced
this pull request
Apr 16, 2024
* Return support for other models apart from jamba * Support n>1 * A little cleanup * Rename * Apply whitespace suggestions from code review * Add max batch size to the main func * Fixed attention kv cache bug * log where requests id are deleted from the dict to debug mode * Fix typo * Align with v0.3.3 vllm code * Remove comments * Take out model config from CUDAGraph object * Fix * Fix typo * Make the kv cache selection cleaner * Another typo * Took the num layers calc outside * Remove the -1 * Set as num layer / period --------- Co-authored-by: Mor Zusman <morz@ai21.com> Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
sfc-gh-hazhang
pushed a commit
to sfc-gh-hazhang/vllm
that referenced
this pull request
May 7, 2024
remove dummy path in arctic
ykim362
pushed a commit
to ykim362/vllm
that referenced
this pull request
Jun 17, 2024
…128k Support Phi3SuScaledRotaryEmbedding for 128k model
This was referenced Jul 5, 2024
Closed
zeroorhero
pushed a commit
to zeroorhero/vllm
that referenced
this pull request
Sep 23, 2024
update overhead benchmark
1 task
1 task
This was referenced Oct 12, 2024
This was referenced Jan 27, 2026
tjtanaa
pushed a commit
to tjtanaa/vllm
that referenced
this pull request
Jan 29, 2026
Add PR and issue templates from vLLM project
1 task
1 task
Srinivasoo7
pushed a commit
to Srinivasoo7/vllm
that referenced
this pull request
Mar 4, 2026
…Manager - Add store_threshold >= 2 validation in FilterReusedOffloadingManager constructor (mirrors the existing max_tracker_size >= 1 guard) - Fix cpu.py gate from > 1 to >= 2; update comment to clarify that values < 2 disable filtering - Add internal assertions to test_filter_reused_manager to verify tracker eviction and count reset (Comments vllm-project#8 and vllm-project#9) - Remove tests/v1/kv_offload/__init__.py (not needed for pytest discovery) - Remove accidentally tracked dev-workflow files (.patch, diff*.txt, error.txt, log files, mypy/test output files) Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
1 task
1 task
1 task
1 task
This was referenced Mar 20, 2026
khairulkabir1661
pushed a commit
to khairulkabir1661/vllm
that referenced
this pull request
Mar 26, 2026
## Summary Cherry-pick upstream bug fixes for RHAIIS 3.3.1 onto `rhai/0.13.0`. All fixes are from upstream vLLM `main` and address critical bugs affecting RHAIIS 3.3.0. Other releases (3.2.2, EAx) will be done separately. **Jira Epic:** [INFERENG-4743](https://issues.redhat.com/browse/INFERENG-4743) ## Cherry-picked commits (chronological order) | # | Upstream PR | Jira | Summary | |---|------------|------|---------| | 1 | [vllm-project#30550](vllm-project#30550) | [INFERENG-5106](https://issues.redhat.com/browse/INFERENG-5106) | Support using chat template as custom score template for reranking models | | 2 | [vllm-project#31406](vllm-project#31406) | [INFERENG-4800](https://issues.redhat.com/browse/INFERENG-4800) | Add encoder-only/cross attention support to Triton Attention backend | | 3 | [vllm-project#34243](vllm-project#34243) | [INFERENG-4746](https://issues.redhat.com/browse/INFERENG-4746) | Fix Llama-4 attn quantization by correctly permuting scales for rope (int8, fp8) | | 4 | [vllm-project#34454](vllm-project#34454) | [INFERENG-5032](https://issues.redhat.com/browse/INFERENG-5032) | Fix structured output in multi-turn GPT-OSS (content:null with json_object) | | 5 | [vllm-project#34507](vllm-project#34507) | [INFERENG-5038](https://issues.redhat.com/browse/INFERENG-5038) | Fix fused MoE int32 overflow in stride*offset for large models | | 6 | [vllm-project#35085](vllm-project#35085) | [INFERENG-5028](https://issues.redhat.com/browse/INFERENG-5028) | Gracefully disable AllReduceFusionPass on GPUs without multicast support | | 7 | [vllm-project#35456](vllm-project#35456) | [INFERENG-5035](https://issues.redhat.com/browse/INFERENG-5035) | Replace assert with ValueError for response_format validation (completions) | | 8 | [vllm-project#35510](vllm-project#35510) | [INFERENG-5035](https://issues.redhat.com/browse/INFERENG-5035) | Add response_format validation to chat completions endpoint | ## Conflict resolutions <details> <summary><b>#1 — llama-nemotron-embed / score-template support (vllm-project#30550)</b>: Clean cherry-pick, no conflicts</summary> Applied cleanly onto `rhai/0.13.0`. </details> <details> <summary><b>#2 — Triton Attention (vllm-project#31406)</b>: Clean cherry-pick, no conflicts</summary> Applied cleanly onto `rhai/0.13.0`. </details> <details> <summary><b>#3 — Llama-4 attn quant (vllm-project#34243)</b>: Clean cherry-pick, no conflicts</summary> Applied cleanly. 4 intermediate upstream commits touch `llama4.py` but the fix targets a self-contained block. </details> <details> <summary><b>vllm-project#4 — GPT-OSS multi-turn (vllm-project#34454)</b>: Clean cherry-pick, no conflicts</summary> Applied cleanly despite 3 intermediate upstream commits that refactored imports in `gptoss_reasoning_parser.py`. The fix logic (adding `eom_token_id` early-exit check in `is_reasoning_end`) was independent of the import changes. </details> <details> <summary><b>vllm-project#5 — Fused MoE int32 overflow (vllm-project#34507)</b>: Conflicts in 2 files</summary> **`vllm/model_executor/layers/fused_moe/fused_moe.py`**: ~30 intermediate upstream commits refactored `fused_moe_kernel` with conditional `naive_block_assignment` logic that doesn't exist in `rhai/0.13.0`. Resolved by keeping our simpler code and applying only the int64 cast fix: - `fused_moe_kernel_gptq_awq`: added `.to(tl.int64)` to `tl.load()` result - `fused_moe_kernel`: added `offs_token = offs_token.to(tl.int64)` before `token_mask` **`tests/kernels/moe/test_moe.py`**: Upstream test changes depend on `make_dummy_moe_config()` from intermediate refactors. Resolved by keeping our existing test code (no test changes). </details> <details> <summary><b>vllm-project#6 — AllReduceFusionPass multicast (vllm-project#35085)</b>: Conflict due to file rename + API change</summary> Upstream moved `collective_fusion.py` → `compilation/passes/fusion/allreduce_rms_fusion.py` and changed the API from `trtllm_create_ipc_workspace_for_all_reduce_fusion()` to `create_allreduce_fusion_workspace()`. Resolved by applying the try/except wrapper around our existing `trtllm_create_ipc_workspace_for_all_reduce_fusion()` call in `collective_fusion.py`. The error handling logic (catching RuntimeError with "multicast" in message, logging warning, returning early) is identical to upstream. </details> <details> <summary><b>vllm-project#7 — response_format validation for completions (vllm-project#35456)</b>: Conflict due to file restructuring</summary> Upstream split `protocol.py` into `completion/protocol.py` and `chat_completion/protocol.py`. Our branch still has the monolithic `protocol.py`. Resolved by: - Removing the non-existent `vllm/entrypoints/openai/completion/protocol.py` - Manually adding `validate_response_format` model_validator to `CompletionRequest` in our `protocol.py` - Using `ValueError` instead of upstream's `VLLMValidationError` (which doesn't exist in our branch; `ValueError` is already handled as 400 Bad Request in `serving_engine.py`) - Test additions from upstream applied cleanly to `test_completion_error.py` </details> <details> <summary><b>vllm-project#8 — response_format validation for chat completions (vllm-project#35510)</b>: Conflict due to file restructuring</summary> Same file restructuring issue as vllm-project#6. Resolved by: - Removing the non-existent `vllm/entrypoints/openai/chat_completion/protocol.py` - Manually adding `validate_response_format` model_validator to `ChatCompletionRequest` in our `protocol.py` - Only accepting the `test_json_schema_response_format_missing_schema` test from the conflict (discarding ~140 lines of intermediate upstream tests that reference non-existent paths in our branch) </details> ## Test plan - [ ] Verify `llama-nemotron-embed-1b-v2` works correctly with the backported score-template / bidirectional model support - [ ] Verify Llama-4 quantized model loads correctly with int8/fp8 attention quantization - [ ] Verify GPT-OSS multi-turn chat with `json_object` response_format returns valid content - [ ] Verify large MoE models (e.g. Qwen3.5-397B) don't crash with int32 overflow - [ ] Verify MoE model loading on H200 GPUs (without multicast) gracefully falls back - [ ] Verify `response_format: {type: "json_schema"}` without `json_schema` field returns 400 (not 500) for both `/v1/completions` and `/v1/chat/completions` - [ ] Verify encoder models (e.g. Whisper) work with Triton attention backend on ROCm [INFERENG-4743]: https://redhat.atlassian.net/browse/INFERENG-4743?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [INFERENG-4800]: https://redhat.atlassian.net/browse/INFERENG-4800?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [INFERENG-4746]: https://redhat.atlassian.net/browse/INFERENG-4746?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [INFERENG-5032]: https://redhat.atlassian.net/browse/INFERENG-5032?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [INFERENG-5038]: https://redhat.atlassian.net/browse/INFERENG-5038?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [INFERENG-5106]: https://redhat.atlassian.net/browse/INFERENG-5106?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
yuezhu1
pushed a commit
to yuezhu1/vllm
that referenced
this pull request
Mar 30, 2026
…llm-project#8) Add optional `get_desired_lora_slots()` method to the `LoRAResolver` ABC with a default `return None` so all existing subclasses remain unaffected. The engine will call this hook between batches when dynamic_lora_slots=True to let resolver implementations signal a desired GPU slot count. The returned value is clamped to [min_loras, max_loras] by the engine (implemented in vllm-project#13). Closes vllm-project#8 Co-authored-by: Claude Signed-off-by: Chen Wang <Chen.Wang1@ibm.com>
Damon-Salvetore
pushed a commit
to Damon-Salvetore/vllm
that referenced
this pull request
Mar 31, 2026
…rk-slidesparse 更新 framework_slidesparse.md:重构为七阶段工程流程并完善实现细节
jinhuang12
pushed a commit
to jinhuang12/vllm
that referenced
this pull request
Apr 8, 2026
…d check Replace all "diminishing returns" / discretionary language with mechanical f-threshold stop condition across SKILL.md, orchestration docs, hooks, and conformance tests. Key changes: - Stage 7 marked AUTONOMOUS with decision tree (no user interaction) - Non-Negotiable vllm-project#8 + Campaign Stop Condition already in place; align all downstream references (Task Graph, Example 1, Resume Protocol) - Escalation Protocol: STOP → HALT (clarify ≠ campaign termination) - Resume Protocol step 9: prohibit autonomous pause (user-request only) - Stop hook: add paused-status exit + replace stale nudge language - Gate hook + test: update "diminishing returns" labels - README: fix stale 3% default → 1.0%, add Non-Negotiable vllm-project#8 - integration-logic.md: fix 5 discretionary-language spots - test-orchestrator.md: update all § references and expected behaviors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Natfii
referenced
this pull request
in Navi-AI-Lab/nvllm
Apr 14, 2026
…impl CutePagedAttentionImpl becomes a pipeline state object: - bind_fusion_weights() stores static weights + allocates persistent I/O buffers with fixed addresses (graph-safe) - forward() reads from self instead of per-forward side-channels - gate_buf added for output gate fusion (Qwen3NextAttention) Blockers #6, #7, #8 from the CUDA graphs checklist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
djmmoss
pushed a commit
to djmmoss/vllm
that referenced
this pull request
Apr 17, 2026
MTP large logprob fixes
starpit
added a commit
to starpit/vllm
that referenced
this pull request
Apr 27, 2026
…nccl` First step of TP support per project_tp_design_notes. Adds the universal `Instruction::AllReduce(u32)` variant + eval arm + the `ForwardCtx::tp_group: Option<&Arc<NcclGroup>>` field, all gated behind a new `nccl` cargo feature on `ferrite-forward`. At tp=1 the upcoming lowering pass emits zero AllReduce rows, so this is a strict superset of the current `cuda` build. Variant placement mirrors `Add` / `FusedAddRmsNorm` — one-tile in- place same-shape, so shape-aware coloring will collapse it to the input slot with no `View` row (validated by task vllm-project#3's coloring test). Eval arm calls `NcclGroup::all_reduce_inplace` and `expect`s both the group reference and the call result; the `None` case is unreachable when canonical fanout (task vllm-project#7) only emits AllReduce rows for tp_world_size > 1 canonicals. Plumbs the feature forward through `vllm-executor`'s `nccl` feature so the cuda_worker `ForwardCtx` construction sites compile under the full feature set; `tp_group: None` for now (task vllm-project#8 wires the real `Arc<NcclGroup>` through). Also stubs the missing `Self::Ferrite(_) => {}` arm in `CudaModel::set_tp_group` — that match was non-exhaustive under `--features nccl` because the ferrite stack was previously TP-oblivious and nobody compiled the nccl path through it. Verified: `cargo check -p ferrite-forward --features cuda` (variant absent) and `--features nccl` (variant present) both green; `cargo check -p vllm-executor --features cuda` and `--features nccl` both green; clippy -D warnings clean on both. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
starpit
added a commit
to starpit/vllm
that referenced
this pull request
Apr 27, 2026
…tp_group Task vllm-project#8 plumbing. The Ferrite forward path now actually receives the worker's NCCL communicator instead of swallowing it. - `FerriteModel` gains a `tp_group: Option<Arc<NcclGroup>>` field (gated on `feature = "nccl"`), mirroring the same shape the hand- written CudaModel arms already carry. - `CudaModel::set_tp_group` arm `Self::Ferrite(m) => m.tp_group = Some(group)` replaces the task-vllm-project#1 stub. - Both `ForwardCtx` construction sites (forward + forward_backbone) pass `tp_group: m.tp_group.as_ref()` so the universal `Instruction::AllReduce` eval arm has the group reference it expects when the lowering pass starts emitting AllReduce rows (task vllm-project#7's canonical fanout will activate that). - `FerriteModel` construction in cuda_worker initializes `tp_group: None`; the worker's later `set_tp_group` call wires it. Also cleans up an `AllReduceImpl::interpreter_arm` method I had dropped into `impl Implementation for AllReduceImpl` — that method isn't on the `Implementation` trait (the universal-eval pivot in `de15e035a` left only `opcode_shape` + `fan_out` as the codegen override surface). Removed with a comment pointing at the production eval path. Verified: `cargo check -p vllm-executor --features cuda` and `--features nccl` both green; `cargo clippy -D warnings` clean on ferrite-forward-macro and vllm-executor; macro tests 190/190 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
starpit
added a commit
to starpit/vllm
that referenced
this pull request
Apr 27, 2026
…+ Embedding) Tensor-parallel safetensors → GPU loaders matching Python vLLM's ColumnParallelLinear / RowParallelLinear / VocabParallelEmbedding weight_loader semantics. Used by codegen at tp>1 (task vllm-project#5 wires the dispatch from the macro side; this commit lands only the runtime helpers). Added on the kernel-side `LinearLayer` / `Linear` / `Embedding`: - `Linear::load_sharded(weights, prefix, dim, rank, world)`. The load-bearing bias rules: - `dim = 0` (column-parallel: q/k/v/gate/up/lm_head/embed): bias shards along dim 0 too — each rank holds its own slice. Mirrors Python `ColumnParallelLinear.weight_loader` → `loaded_weight.narrow(output_dim=0, …)`. - `dim = 1` (row-parallel: o_proj, down_proj): bias is **replicated full-size on rank 0 only**, `None` on other ranks. The forward path adds bias before the cross-rank AllReduce-sum; only rank 0's contribution survives the sum, giving exactly one bias add to the residual stream. Mirrors Python `RowParallelLinear.forward` line 1543 `bias_ = None if (self.tp_rank > 0 …) else self.bias`. - `LinearLayer::load_dense_sharded(weights, prefix, dim, rank, world)`. Thin wrapper over `Linear::load_sharded`. The codegen entry point for non-fused (single-prefix) sharded loads. - `LinearLayer::load_dense_concat_sharded(weights, prefixes, stream, rank, world)`. Sharded variant of `load_dense_concat` for the fused QKV / gate_up paths. Always column-parallel (no row-parallel concat exists in any current arch). Each source weight slices along dim 0 to `[out_i / world, hidden]` then packs into one contiguous `[(sum out_i) / world, hidden]` GPU buffer via per-source `take_shard_into`. Biases follow the column-parallel rule (sliced along dim 0) — matches Python `MergedColumnParallelLinear` / `QKVParallelLinear`. Per-rank divisibility is guaranteed by the macro's outer-loop fanout `skip` of indivisible (variant, tp) tuples (commit `889c44b2f`). - `Embedding::load_sharded(weights, prefix, rank, world)`. Vocab- parallel: slices the embedding table along dim 0 (`[vocab_size, hidden]` → `[vocab_size / world, hidden]`). Mirrors Python `VocabParallelEmbedding`. Same dim-0 cut as `Linear::load_sharded(dim=0)` — that's what makes `tie_weights(lm_head.weight = embed_tokens.weight)` self-consistent at tp>1. Defers FP8 / Marlin / BNB sharded variants — the verify model (commandr) is dense bf16. World == 1 short-circuits to byte-equivalent behavior with the existing unsharded paths in every helper, plus shard-kind-aware bias rules. No tests added at this layer (CUDA stream + safetensors fixtures aren't worth the infra spend; the integration test is task vllm-project#8). `take_shard` / `take_shard_into` on `GpuWeights` already exist (used by the prior hand-written TP path); these wrappers are pure call-site plumbing on top. Build clean: ferrite-kernels checks + clippy at default features. The macro-side consumer that chooses sharded vs unsharded based on shard_kind comes in tasks vllm-project#4 + vllm-project#5; until then these helpers have no runtime caller (intentionally — wholesale codegen migration per the no_piecemeal_codegen_migration rule). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
starpit
added a commit
to starpit/vllm
that referenced
this pull request
Apr 27, 2026
Phase 2 of task vllm-project#6 — lm_head side. Closes the all-gather hole the Instruction::AllGather + AllGatherImpl + OpKind::AllGather foundation in `9d70ff563` left for the lowering pass to fill. Lowering (tp_lowering.rs): - New `insert_lm_head_allgather(fuf, program, tp_world_size)`. At tp>1 walks the FUF for `OpKind::Gemm` nodes whose weight path's last segment is `"lm_head"`, and appends an `OpKind::AllGather` reading the gemm's output. Rewires every consumer of the lm_head Gemm to read the AllGather instead. At tp=1 it's a strict no-op. - Refactor: extract `rewire_consumers(fuf, old, new)` so the AllReduce and AllGather inserters share the consumer-rewiring walk (was duplicated inline in the AllReduce loop). Behavior unchanged. - Wired into `compile()` at the activation site right after `insert_all_reduces` — both passes are gated on tp_world_size > 1 internally, no extra outer-loop branch. backbone_output_for (codegen.rs): - Updated to walk past the AllGather node when present. lm_head's hidden-state input was `last_node.inputs.first()`; with the AllGather inserted, `last_node` is now the AllGather, and its first input is the lm_head Gemm. Skip one hop back to recover the lm_head Gemm, then read its first input as before. At tp=1 the unchanged path is taken (no AllGather node exists). Without this, `forward_backbone` (used by pipeline-parallel intermediate ranks) would mistakenly return the lm_head gemm output instead of the hidden state. FUF output shape on the AllGather node is left equal to the lm_head Gemm's output. The FUF carries pre-shard SYMBOLIC dims (e.g. `vocab_size` Bound, not `vocab_size / tp`); the runtime allocation comes from the kernel's `alloc_tensor` call, which reads `weight.dim(0)` (sharded) for the gemm and the gather's own world-size multiplier internally. The fresh slot for AllGather output is enforced by `AllGatherImpl::output_alias` returning `None` (already pinned by the `all_gather_impl_claims_single_tile_input_with_fresh_output_slot` test). Macro tests: 203/203 (was 200/200) at both default `--features cuda` and `--features nccl`. Three new tests: - `lowering_inserts_allgather_after_lm_head_gemm_at_tp_gt_1` - `lowering_no_allgather_at_tp_eq_1` - `lowering_skips_non_lm_head_gemms_for_allgather` — defends the name gate so a future change can't silently start emitting AllGathers on q_proj / o_proj / etc. (only one match per FUF in every current arch — multiple lm_head gemms would still be handled, but no arch produces them). Full ferrite-models umbrella build clean at `--features cuda,nccl` across all 11 arches × {1,2,4,8} tp variants in 7m19s. cuda_worker's `!use_tp` ferrite gate stays for one more commit — lifting it is the last step before task vllm-project#8 verify. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
starpit
added a commit
to starpit/vllm
that referenced
this pull request
Apr 27, 2026
End-to-end TP is now wired: codegen routes load calls through `_sharded` helpers (52615e881), lowering injects AllReduce after vocab-parallel Embed (aefa1de37), AllGather after lm_head Gemm (8a3ea25bc), and tp_rank threads through the full try_load → Weights::load → load_with chain (60d9b9d4e). At tp=1 every code path is byte-equivalent to the pre-TP build via the sharded-helpers' `world == 1` short-circuits. cuda_worker's `!use_tp` ferrite eligibility gate served as belt-and-suspenders during the multi-commit landing. With the chain complete, drop the gate so `vllm chat ... --tensor-parallel- size 2` reaches `try_load` with the matching `(arch, tp_world_size, tp_rank)` triple and gets the per-(model, tp) sharded registration. Build clean: vllm-executor + vllm-cuda check at `--features cuda` (1m44s) and `--features cuda,nccl` (7m35s, full ferrite-models umbrella for the latter). Next: task vllm-project#8 — verify on commandr at tp=2 with `vllm chat CohereForAI/c4ai-command-r-v01 --tensor-parallel-size 2`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 task
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains several miscellaneous updates to the system, with two notable changes:
swap_spacesize provided by the user (defaulting to 20 GiB).