You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 24, 2024. It is now read-only.
According to the comments in the pull, it should trade a small amount of performance for less memory usage. However, at least one user commented they saw more memory use (not sure what size model).
This may be something to keep an eye on: ggml-org/llama.cpp#439
Looks like the corresponding code is here: https://github.com/rustformers/llama-rs/blob/bf7bdbcfff3114dcbdafb6eb7eed58f04f19b1c3/llama-rs/src/lib.rs#L1203
According to the comments in the pull, it should trade a small amount of performance for less memory usage. However, at least one user commented they saw more memory use (not sure what size model).