Skip to content

DebasisX/inference_engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

inference_engine

A small research-oriented inference engine for GPT-style models.

This repo builds a Python-callable shared library (x module) that loads safetensors weights and runs a simple transformer stack.


✅ Goals

  • Build the C++ inference engine as a shared library (x.so) via CMake.
  • Load a model from weights/<model_name> (e.g., weights/gpt2).
  • Expose a minimal Python API via pybind11.

🚀 Quick start (run the included test)

From the repo root:

python test.py

This will:

  • build the shared module (if it hasn’t been built yet)
  • load weights/gpt2
  • run a short generation and print the returned tokens.

🧱 Build the shared library from scratch

1) Prerequisites

  • CMake (>= 3.14)
  • A C++17 compiler (g++, clang++, etc.)
  • Python 3 with development headers (e.g., python3-dev / python3-devel)
  • pybind11 (can be installed via pip)

Install pybind11:

pip install pybind11

2) Clean build directory (from repo root)

rm -rf build
mkdir -p build
cd build
cmake ../model_app
cmake --build . --config Release

Optional: enable AMX + OpenMP

If you have an AMX-capable CPU and want to exercise the AMX-accelerated tiled GEMM path, pass compiler flags to enable AMX and OpenMP. For example:

cmake -DCMAKE_CXX_FLAGS="-fopenmp -mavx512f -mavx512bw -mavx512vbmi -mavx512vnni -mamx" ../model_app
cmake --build . --config Release

This produces the shared module x.so in build/ (and the repo root in this repo layout).

Note: CMakeLists.txt sets the target name to x and forces PREFIX "" so the output is x.so (not libx.so).


🔁 Rebuild after changes (update / test)

If you modify any C++ source (e.g., model_app/src/** or headers in model_app/include/**):

cd build
cmake --build . --config Release

If you add new source files or change CMake configuration, rerun CMake:

cd build
cmake ../model_app
cmake --build . --config Release

🧪 How to use the shared library from Python

Example (similar to test.py):

from x import engine

# Create the engine
eng = engine()

# Initialize by pointing to a model directory under `weights/`
eng.initialize("gpt2", "./weights/gpt2")

# Run generation
tokens = eng.generate("Hello", max_tokens=16)
print(tokens)

Model weights

This repo includes a weights/gpt2/ directory with a sample GPT-2 quantized model. To use a different model, replace or add a folder under weights/ with the same file structure.


📌 Notes

  • The current loader supports safetensors files and relies on a lightweight header parser in model_app/src/loader/loader.cpp.
  • The runtime is intentionally minimal and primarily for experimentation.

About

a shared lib using which you can do inference of AI models on CPU-optimized C++ using Python, tiled_matrix GEMM, AMX, AVX2 intrinsics used, (parallel scheduling strategies to be incorporated)!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors