Add osx-specific python install instructions by inthecloud247 · Pull Request #47 · ggml-org/llama.cpp

inthecloud247 · 2023-03-12T13:04:27Z

The existing instructions won't work on the default osx 12.4 setup using the system python install. This fixes that.

ggerganov · 2023-03-12T20:29:37Z

I removed the problematic argument: 7c9e54e

Should no longer be needed

…-org#47) *Make sure to read the [contributing guidelines](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md) before submitting a PR*

* feat: add mixed precision dot product implementation and function declaration * feat: implement mixed precision vector dot product and conversion functions * fix: update data type handling in matrix multiplication implementation * fix: adjust row count handling in matrix multiplication implementation for accurate slicing * fix: optimize matrix multiplication implementation by unroll loop * update performance tracking for matrix multiplication implementation * add fetching * wip * fix: support F16 * F32 multiplication in is_mul_mat_supported function * fix: improve src0 fetching logic in vec_dot_product_mixed_impl for better alignment handling * fix test failure for row width 67 * try fix failed test * fix: rename aligned_address to align_down for clarity in vector alignment handling * wip * qnn fix: update device capabilities for quantized types in qnn-lib to improve compatibility * fix test failure at width == 193 * fix: replace zero vector initialization with previous vector in mixed dot product implementation * wip * fix: improve handling of last vector in mixed dot product implementation * wip * wip * wip * wip * Enhance mul_mat_f32 function to support quantized types and improve static assertions * rename * Refactor dequantization functions to use npu_device_fp16_t and improve type handling * Optimize dequantization in dequantize_row_q8_0 by replacing qf32 multiplication with qf16 * Optimize dequantization in dequantize_row_q4_0 by replacing qf32 multiplication with qf16 * Add hvx_vsf_convert_vhf function for improved vector conversion * add perf logs * Refactor dequantize_row_q4_0 for alignment * Update logging in supports_op_impl and supports_op to use ggml_op_desc for better clarity * Add support for ROPE operation in NPU capabilities and related functions * Implement ROPE operation in tensor and op_rope, including cache initialization and correction dimension calculations * enable ROPE by adding operation validation * add support to freq is null case * wip * Refactor rope_f32 to improve indexing by introducing total_planes calculation * reformat * Refactor rope_f32 to optimize data access patterns by introducing row and plane pointers * Add performance tracking to rope_f32 function for enhanced profiling * Refactor rope_f32 to use a templated implementation * Refactor rope_impl to replace loop with memcpy for improved performance * Refactor mul_mat_impl to support quantization as a template parameter * wip * wip * Refactor rope_impl to optimize plane indexing in the processing loop * Add aligned vector dot product implementation for mixed precision types * wip * Enhance matrix multiplication for F32 and F16 types with alignment checks * Optimize vec_dot_product_mix_aligned_impl for improved performance with additional vector sums * Add alignment checks for matrix multiplication and vector dot products * Refactor matrix multiplication to use function pointers for improved readability and maintainability * Fix alignment check in is_dot_product_aligned to ensure correct vector size handling * Remove unused f16_to_f32_table parameter from quantization and dequantization functions * wip * Add L2 fetch for src1 plane rows in matrix multiplication implementation * wip * Refactor hvx_vsf_convert_vhf to accept an additional parameter for flexibility in vector multiplication * Refactor vec_dot_product_mix_aligned_impl to improve variable naming for clarity * Refactor load_dual_block_generic and dequantize_row_q4_0 to improve performance * Refactor vector operation functions to improve clarity and consistency in variable usage * wip * wip * Refactor dequantize_row_q4_0_impl for improved clarity and performance in vector operations * wip * Update load_dual_block_generic to use intrinsics * Refactor load_dual_block_generic and load_qual_block_generic for improved performance and clarity * wip * wip * Optimize dequantize_row_q8_0 for improved performance by unrolling for loop * wip * wip * fix typo

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Add osx-specific python install instructions

b577e34

ggerganov closed this Mar 12, 2023

dmahurin pushed a commit to dmahurin/llama.cpp that referenced this pull request May 31, 2023

Bugfix: Wrong size of embeddings. Closes ggml-org#47

dc1c906

dmahurin pushed a commit to dmahurin/llama.cpp that referenced this pull request Jun 1, 2023

Bugfix: Wrong size of embeddings. Closes ggml-org#47

d568014

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this pull request Dec 19, 2023

Bugfix: Wrong size of embeddings. Closes ggml-org#47

38f442d

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

iq2_tn: slightly better performance on AVX2 (ggml-org#47)

49cbbc9

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add osx-specific python install instructions#47

Add osx-specific python install instructions#47
inthecloud247 wants to merge 1 commit intoggml-org:masterfrom
inthecloud247:fix-osx-python

inthecloud247 commented Mar 12, 2023

Uh oh!

ggerganov commented Mar 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

inthecloud247 commented Mar 12, 2023

Uh oh!

ggerganov commented Mar 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants