Experimental in-database LLM and deep-learning inference via native relational prediction operator. Built on top of a high-performance analytical database system, DuckDB.
You can get an up and running instance of iPDb with either Building from the source or Building with Docker.
This step is optional if you are only inferencing with LLMs (via API or local models).
First, configure the ONNX runtime library which we'll use for pre-trained models stored in .onnx format. You have multiple options to get an up and running instance of ONNX runtime library.
Detailed instructions are available here https://onnxruntime.ai/docs/build/inferencing.html.
Download and extract a compatible release for your development platform from the onnxruntime GitHub releases.
Above link directs to the version
1.19.2instead of the latest version as this is the version tested to work (onlinux-amd64).
For both of the above options, set the ONNX runtime installed path as a env variable.
export ONNX_INSTALL_PREFIX=<onnx_runtime_installed_path>This step is optional if you are only inferencing with LLM APIs or pre-trained ONNX models.
First, fetch and build the llama.cpp library which we'll use for inferencing local large language models. You have multiple options to get an up and running instance of the llama.cpp library.
Clone the llama.cpp repository.
git clone https://github.com/ggml-org/llama.cpp
cd llama.cppBuild for the project using CMake (CPU build).
cmake -B build
cmake --build build --config Release -j 8
Releasebuilds prefered to achive the best inference performance.
Please refer to the detailed building guide here in the llama.cpp repository for other backend (i.e. GPU), debug builds and trobleshooting.
Set the llama.cpp installed path as a env variable.
export LIBLLAMA_INSTALL_PREFIX="<llama_cpp_repo>/build"duckdb use a make script for the builds. We have extended that with additional parameters to configure iPDb.
make debug GEN=ninja -j12 CORE_EXTENSIONS='httpfs' ENABLE_PREDICT=1 PREDICTOR_IMPL=llama_cpp ENABLE_LLM_API=True DISABLE_SANITIZER=1
GEN=ninja -j12(optional, drop if don't have ninja setup) use the ninja for build parallelization (w/ -j12 for 12 workers).CORE_EXTENSIONS='httpfs'enable thehttpfsextension used forhttpscommunitation for API LLM calls.ENABLE_PREDICT=1enables the ML extension.PREDICTOR_IMPL=onnxchoose the internal ML platform. Available options,onnx- Use ONNX Runtime to infer pre-trained.onnxmodels (Step 1.1 required).llama_cpp- Use llama.cpp to infer LLMs inGGUFformat.
ENABLE_LLM_API=1Enable LLM calling with OpenAI API compatible APIs via the network (sets theCORE_EXTENSIONS='httpfs'option automatically).
Previously available
torchscriptto infer pre-trainedpytorchmodels exported with TorchScript is [DEPRECATED] and is not supported.
Majority of publicly available remote LLMs requires an API key from the respective developer to use it's capabilities. iPDb lets users define the LLM API using the CREATE SECRET statement, where user's can define either in-memory or persisted API keys that can be reused with different models. Please, aqquire the respective key and use the following SQL syntax for defining API keys.
CREATE PERSISTENT SECRET openai_key (TYPE http, bearer_token '<openai_api_key>');
CREATE PERSISTENT SECRET google_key (TYPE http, bearer_token '<google_api_key>');Alternatively, the API keys can be set in the env variable as follows so that iPDb can indentify it before inference. However, this limit the models to only one vendor as only one API is available.
export OPENAI_API_KEY="<api_key>"Add the above in shell config file, i.e.,
.bashrcfor permenant availability.
Currently Docker script is written build only for ONNX model support. Please build from scratch to build for LLM inference.
Make sure Docker is set up correctly.
Clone the iPDb repository. It should include a Dockerfile and a .dockerignore.
Build the Docker image by running (we'll create a image named iPDb).
docker build -t iPDbRun the Docker image:
docker run -it -v <absolute_path_to_data_repo>:/data --name=”iPDb_container” iPDb /bin/bashFor subsequent runs, you can just spin up the stopped container using the name of the container.
docker start -ai iPDb_containerBoth of the above commands will open an interactive shell in the Docker container. Here, we mount a data directory where we will be storing the pre-trained models. If you want just to see if the iPDb works, run the about command without
-v <absolute_path_to_mldb_repo>:/data.
Run iPDb:
./build/debug/iPDb <your_database>Once you have a working iPDb instance, you can experiment with SQL queries that are capable of in-database inference.
Within the iPDb shell,
- Create and populate tables with feature data (say, iris data).
- Upload the model to the database via the
CREATE MODELstatement.CREATE TABULAR MODEL iris_cls PATH `<model_path>` ON TABLE iris OUTPUT (class INTEGER);
- Run a prediction query.
SELECT * FROM PREDICT(iris_cls, iris) AS p WHERE p.class = 2;
Syntax tree and examples of both CREATE MODEL and PREDICT statements are available here (opened via draw.io).
Make sure you have build iPDb with options that enable remote LLM calling,
ENABLE_PREDICT=1ENABLE_LLM_API=1
Additionally, make sure either SECRETs (refer to "API LLM Calling" section) or OPENAI_API_KEY environment variable is set with the OpenAI API key correctly.
Within the iPDb shell,
- Create and populate tables with data (say, a
jobtable withdescriptioncolumn containing a job listing document). - Upload the model to the database via the
CREATE MODELstatement.CREATE LLM MODEL o4_mini PATH 'o4-mini' ON PROMPT API 'https://api.openai.com' SECRET openai_key;
Notice that, PATH accepts the model name accepted by the API (e.g., o4-mini, gpt-4.1).
Furthermore, the model is uploaded as ON PROMPT, this results in the the query execution pipeline infering the input/output columns to the LLM from the prompt during the query execution.
Additionally, user should set API to the base url of the respective LLM.
- Run a prediction query.
SELECT * FROM LLM o4_mini (PROMPT 'extract the {s:location} and {d:salary} for job {description}', job);
Here, notice that we have an additional PROMPT clause within the PREDICT statement. Inside the prompt, user can define input columns by mentioning each column with braces, i.e., {column} (e.g., {descrtiption} in the above query). Similarly, user can define the output columns with braces in the format, {data_type:column_name} (e.g., {s:location} returns a VARCHAR column with location and {d:salary} returns a INTEGER column with salary).
- More details on building DuckDB from source is here.
- Follow the LibTorch tutorial on native model inference.
- If you are new to deep-learning inference, start with beginner-friendly Python-based training and inference examples here.
- Introduction to native C++ model exporting and inference with TorchScript is available here. The most feature-rich version of iPDb is implemented via the ONNX runtime instead of LibTorch, but this guide is far better than any ONNX examples can offer (the concepts are the same).
- The high-level duckdb execution model is explained in the official documentation and these slides.
- Read the source for the implementation of iPDb. Extensive documentation of this extension is WIP.
Native relational prediction operator is realized using following systems and frameworks.
- DuckDB (1.3.0)- Relational Database https://duckdb.org
- ONNX Runtime (1.19) - Efficient generalizable deep learning runtime https://onnx.ai
- llama.cpp (latest) - Local LLM inference https://github.com/ggml-org/llama.cpp
Please refer to the Official DuckDB repository for documentation of the base source code (https://github.com/duckdb/duckdb).