Llama-4 mapping#20
Conversation
WalkthroughThe pull request extends the tensor mapping configuration in the Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant TM as TensorNameMap
U->>TM: Request tensor mapping for a given key
alt Lookup in mappings_cfg
TM->>TM: Retrieve mapping (embed_tokens, lm_head, norm)
end
alt Lookup in block_mappings_cfg
TM->>TM: Retrieve mapping (layers: input_layernorm, self_attn, feed_forward, etc.)
end
TM-->>U: Return the mapped configuration
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code Definitions (1)gguf-py/gguf/tensor_mapping.py (1)
🔇 Additional comments (5)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@danielhanchen oh sorry I didn't notice this PR. Thanks a lot!! I'll "forward" this PR to upstream llama.cpp (doing a git cherry-pick) |
…better shader parameter handling (ggml-org#20173) * K quant speedup (#20) * Basic JIT compilation for mul_mat, get_rows, and scale (#17) * scale jit working * preliminary working jit for getrows and mulmat, needs refining * simplified mul_mat preprocessing switch statement * get_rows fixes, mul_mat refinement * formatted + last edits * removed some extraneous prints * fixed get_rows, fixed workgroup dispatch in mul_mat. no gibberish * small fix * some changes, working * get_rows and mul_mat jit fixed and working * Update formatting * formatting * Add header --------- Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com> * Start work on all-encompassing shader library * refactor argmax, set_rows * Refactor all but flashattention, mat mul * no gibberish, all k quants added, merged * vec memory fix * q6_k matching metal on my machine, tests passing * Set tile size for q6_k separately * Separate out fast shaders --------- Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com> * Move towards writeBuffer for params * Move away from multiple buffers for set_rows errors, remove host buffer for parameter buffers, minor cleanups * Remove extra file * Formatting --------- Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com>
Make sure to read the contributing guidelines before submitting a PR
Summary by CodeRabbit