inference_engine

A small research-oriented inference engine for GPT-style models.

This repo builds a Python-callable shared library (x module) that loads safetensors weights and runs a simple transformer stack.

✅ Goals

Build the C++ inference engine as a shared library (x.so) via CMake.
Load a model from weights/<model_name> (e.g., weights/gpt2).
Expose a minimal Python API via pybind11.

🚀 Quick start (run the included test)

From the repo root:

python test.py

This will:

build the shared module (if it hasn’t been built yet)
load weights/gpt2
run a short generation and print the returned tokens.

🧱 Build the shared library from scratch

1) Prerequisites

CMake (>= 3.14)
A C++17 compiler (g++, clang++, etc.)
Python 3 with development headers (e.g., python3-dev / python3-devel)
pybind11 (can be installed via pip)

Install pybind11:

pip install pybind11

2) Clean build directory (from repo root)

rm -rf build
mkdir -p build
cd build
cmake ../model_app
cmake --build . --config Release

Optional: enable AMX + OpenMP

If you have an AMX-capable CPU and want to exercise the AMX-accelerated tiled GEMM path, pass compiler flags to enable AMX and OpenMP. For example:
cmake -DCMAKE_CXX_FLAGS="-fopenmp -mavx512f -mavx512bw -mavx512vbmi -mavx512vnni -mamx" ../model_app
cmake --build . --config Release

This produces the shared module x.so in build/ (and the repo root in this repo layout).

Note: CMakeLists.txt sets the target name to x and forces PREFIX "" so the output is x.so (not libx.so).

🔁 Rebuild after changes (update / test)

If you modify any C++ source (e.g., model_app/src/** or headers in model_app/include/**):

cd build
cmake --build . --config Release

If you add new source files or change CMake configuration, rerun CMake:

cd build
cmake ../model_app
cmake --build . --config Release

🧪 How to use the shared library from Python

Example (similar to test.py):

from x import engine

# Create the engine
eng = engine()

# Initialize by pointing to a model directory under `weights/`
eng.initialize("gpt2", "./weights/gpt2")

# Run generation
tokens = eng.generate("Hello", max_tokens=16)
print(tokens)

Model weights

This repo includes a weights/gpt2/ directory with a sample GPT-2 quantized model. To use a different model, replace or add a folder under weights/ with the same file structure.

📌 Notes

The current loader supports safetensors files and relies on a lightweight header parser in model_app/src/loader/loader.cpp.
The runtime is intentionally minimal and primarily for experimentation.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
model_app		model_app
README.md		README.md
test.py		test.py
x.so		x.so

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

inference_engine

✅ Goals

🚀 Quick start (run the included test)

🧱 Build the shared library from scratch

1) Prerequisites

2) Clean build directory (from repo root)

🔁 Rebuild after changes (update / test)

🧪 How to use the shared library from Python

Model weights

📌 Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

inference_engine

✅ Goals

🚀 Quick start (run the included test)

🧱 Build the shared library from scratch

1) Prerequisites

2) Clean build directory (from repo root)

🔁 Rebuild after changes (update / test)

🧪 How to use the shared library from Python

Model weights

📌 Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages