ModelFlex is a small full‑stack application that helps you optimize ML models for deployment. It supports TensorFlow, PyTorch and ONNX formats and applies device‑specific optimizations (CPU, GPU, TPU when applicable).
This README focuses on how to run the project locally, what the backend API exposes, and troubleshooting tips specific to this repository layout.
frontend/— React + Vite frontend (dev server runs on :5173)backend/— FastAPI backend (ASGI app atapp.main:app)backend/app/main.py— API endpoints:/api/optimize,/api/download/{filename}backend/app/model_optimizer.py— model optimization logicbackend/run.py— simple runner that starts Uvicornbackend/requirements.txt— Python dependencies
README.md— this file
- Backend — create venv and install dependencies
cd backend
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt- Start the backend (from
backend/)
# either using the run helper
python run.py
# or directly with uvicorn
python -m uvicorn app.main:app --reloadBackend will be available at: http://127.0.0.1:8000
- Frontend
cd frontend
npm install
npm run devFrontend dev server defaults to http://localhost:5173. The frontend expects the backend at http://127.0.0.1:8000 (CORS is configured for :5173).
POST /api/optimize
- Content-Type: multipart/form-data
- Form fields:
file— uploaded model filetarget_device— string (one ofcpu,gpu,tpu)
Response (200):
{
"optimized_model": "opt_<uuid>.<ext>",
"metrics": {
"original_size_mb": float,
"optimized_size_mb": float,
"size_reduction_percent": float,
"original_latency_ms": float,
"optimized_latency_ms": float
}
}
GET /api/download/{filename}
- Returns the optimized model file as an octet stream.
backend/app/model_optimizer.pycurrently attempts real optimizations for TensorFlow, PyTorch and ONNX. Depending on the model and installed packages, conversion/quantization may be slow and require additional system libraries (for GPU support or specific TensorFlow builds).- TPU optimization requires representative data for full integer quantization. The implementation falls back to FP16 quantization when full integer quantization fails.
- If
uvicornfails to import the app asapp.main:app, confirm you are running the command from thebackend/folder and thatbackendis the current working directory. - On Windows PowerShell use
.\.venv\Scripts\Activate.ps1to activate the virtualenv before installing or running. - TensorFlow/PyTorch/ONNX installations can be large and sometimes fail on Windows without the right wheel for your Python version. If
pip install -r requirements.txterrors on a package, try installing the package separately or use the CPU-only variants (for examplepip install tensorflow-cpuif available for your platform). - If quantization fails with "representative_dataset must be specified", the optimizer now includes a synthetic representative dataset, but for best results provide a real sample dataset or adjust the optimizer to accept a dataset path.
- Start the backend and confirm the OpenAPI docs at
http://127.0.0.1:8000/docsload. - Use the frontend to perform an end‑to‑end flow or test the backend using
curl/httpie:
# example using curl (PowerShell)
curl -F "file=@C:\path\to\model.onnx" -F "target_device=cpu" http://127.0.0.1:8000/api/optimize- The FastAPI app accepts files to the
uploads/folder and writes optimized artifacts tooptimized/insidebackend/. - The frontend expects the optimized filename in the JSON response and requests the download from
/api/download/<filename>.
If you want to help improve ModelFlex:
- Open a new branch for your change
- Add tests for new behavior (if applicable)
- Run frontend and backend locally to verify end‑to‑end
- Create a PR with a clear description of changes
MIT
Timestamp: October 22, 2025