You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 24, 2024. It is now read-only.
I'm trying to run llama on mac using metal, but I noticed on the accelerators doc it states metal cannot be used for feeding in a prompt with more than 1 token. Is this an underlying limitation with ggml, or llm?
I'd love to help enable this, but I'm not sure where to begin.