Skip to content

feature: GPU/CUDA support? #69

@tensiondriven

Description

@tensiondriven

Please close this if it's off-topic or ill informed.

LocalAI seems to be focused on providing an OpenAI-compatible API for models running via CPU, (llama.cpp, ggml). I was excited about this project because I want to use my local models with projects like BabyAGI, AutoGPT, LangChain etc, which typically either only support OpenAI API or support OpenAI first.

I know it would add a lot of work to support every model under the sun in CPU, CUDA, ROCM, and Triton, so not proposing that, but it seems leaving CUDA off the table is really limiting this projects usability.

Am I simply wrong, and typical pt / safetensors models will work fine with LocalAI, or is this a valid concern?

When I read about LocalAI on Github, I imagined this project was more of a "dumb adapter"; an HTTP server that would route requests to models being run inside projects like text-generation-webui or others, but I see it actually does the work to stand up the models, which is impressive.

Perhaps (either in this project, or another project) it would be useful to provide a project that presents as an HTTP API / CLI, and has a simple plugin architecture to allow multiple models with different backends/requirements to interface with it, such that this project could support a variety of models without having to suffer the integration and maintenance headaches that projects like text-generation-webui are going for?

Metadata

Metadata

Assignees

Labels

up for grabsTickets that no-one is currently working on

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions