Skip to content

llama-mmap: add MADV_HUGEPAGE hint for THP on Linux#22022

Open
Marxist-Leninist wants to merge 1 commit intoggml-org:masterfrom
Marxist-Leninist:feat/madv-hugepage-upstream
Open

llama-mmap: add MADV_HUGEPAGE hint for THP on Linux#22022
Marxist-Leninist wants to merge 1 commit intoggml-org:masterfrom
Marxist-Leninist:feat/madv-hugepage-upstream

Conversation

@Marxist-Leninist
Copy link
Copy Markdown
Contributor

Calls madvise(MADV_HUGEPAGE) on the read-only mmap region used for model weights on Linux.

When THP is in 'madvise' mode (the default on many desktop distros), this opts the mapping
into transparent huge page promotion. For a 4-5 GB model weight map the page count drops from
~1M 4 KB pages to ~2K 2 MB pages, which reduces TLB pressure and the number of minor faults
on the hot inference path.

No configuration required. The call is advisory: if THP is disabled
(transparent_hugepage/enabled == never) or pages cannot be promoted, madvise returns EINVAL
and a debug-level log message is emitted. NUMA-aware allocations are skipped (existing !numa
guard). Guarded by defined(MADV_HUGEPAGE) for portability.

Tested on Linux x86_64 with THP=madvise. Neutral on an unloaded machine (pages stay resident);
reduces re-fault latency spikes under memory pressure.

Complements #21821 which adds HugeTLB support; that approach requires pre-allocated pages in
the HugeTLB pool, while THP requires no system configuration.

Issue madvise(MADV_HUGEPAGE) on the read-only file mapping used for
model weights on Linux. For a 1 GB model this drops the potential
page count from ~262K 4KB pages to ~512 2MB pages, reducing TLB
pressure and (more importantly) reducing the number of re-faults
when pages get evicted under memory pressure.

No-op on kernels where THP is disabled. On 'madvise' mode (the
common modern default for desktop distros), this is opt-in and
requires the caller to ask. Guarded by defined(MADV_HUGEPAGE) so it
compiles cleanly on non-Linux.

Benchmark on a Skylake-SP VM, Bonsai-8B Q1_0, -fa on -ctk q8_0
-ctv q8_0 -t 12 -ub 128: neutral on this machine (~9.5 t/s tg128
both before and after) because the VM isn't memory-constrained.
The change is intended for systems where the mapping does get
evicted and re-faulted under pressure.
@doctorjei
Copy link
Copy Markdown

doctorjei commented Apr 19, 2026

I want to reinforce that these are complementary and not redundant; THP and pre-allocated huge pages can serve different goals / solve different problems.

Also related to #2251, #12444, and #7420.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants