feature: GPU/CUDA support?

Please close this if it's off-topic or ill informed.

LocalAI seems to be focused on providing an OpenAI-compatible API for models running via CPU, (llama.cpp, ggml).  I was excited about this project because I want to use my local models with projects like BabyAGI, AutoGPT, LangChain etc, which typically either only support OpenAI API or support OpenAI first.

I know it would add a lot of work to support every model under the sun in CPU, CUDA, ROCM, and Triton, so not proposing that, but it seems leaving CUDA off the table is really limiting this projects usability.

Am I simply wrong, and typical pt / safetensors models will work fine with LocalAI, or is this a valid concern?

When I read about LocalAI on Github, I imagined this project was more of a "dumb adapter"; an HTTP server that would route requests to models being run inside projects like text-generation-webui or others, but I see it actually does the work to stand up the models, which is impressive.

Perhaps (either in this project, or another project) it would be useful to provide a project that presents as an HTTP API / CLI, and has a simple plugin architecture to allow multiple models with different backends/requirements to interface with it, such that this project could support a variety of models without having to suffer the integration and maintenance headaches that projects like text-generation-webui are going for?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature: GPU/CUDA support? #69

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

feature: GPU/CUDA support? #69

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions