This repository contains the POPROX recommender code — the end-to-end logic for producing recommendations using article data, user histories and profiles, and trained models.
This repository includes a devcontainer configuration that we recommend using for development and testing of the recommender code. It is not a good solution for running evaluations (see below for non-container setup), but is the easiest and most reliable way to set up your development environment across platforms.
To use the devcontainer, you need:
- VS Code (other editors supporting DevContainer may also work, but this is the best-supported and best-tested).
- Docker (probably also works with Podman or other container CLIs, but we test with Docker).
With those installed, open the repository in VS Code, and it should prompt you to re-open in the dev container; if it does not, open the command palette (Ctrl+Shift+P) and choose “Dev Containers: Rebuild and Reopen in Container”.
On Linux, install the Docker Engine, and add your user to the docker
group so you can create containers without root.
On Windows, install Docker Desktop, Rancher Desktop, or similar.
On MacOS, you can install Docker or Rancher Desktop linked above, or you can use Colima, which we recommend for simplicity and licensing clarity. To install and use Colima:
$ brew install colima docker
$ colima start -m 4It should also be possible to directly use Lima, but we have not tested or documented support for that.
We manage software environments for this repository with uv, and model and
data files with dvc. The uv.lock file provides a locked dependency set
for reproducibly running the recommender code with all dependencies on Linux and
macOS (we use the devcontainer for development support on Windows).
The devcontainer automatically installs the development environment. If you want to manually
install the software, first install uv (see the install instructions), then run:
$ uv sync --group cpu
Resolved 304 packages in 10ms
Built poprox-recommender @ file:///Users/mde48/POPROX/poprox-recommender
Built poprox-concepts @ git+https://github.com/CCRI-POPROX/poprox-concepts.git@d0e27c90f6eddcc4f041e5d871ff4a13c2ec70f7
Built antlr4-python3-runtime==4.9.3
Built pylatex==1.4.2
Prepared 223 packages in 19.45s
Installed 278 packages in 1.80sNote
If you have a CUDA-enabled Linux system, you can use the cuda extra to get
CUDA-enabled PyTorch for POPROX batch inference and model training. To
install this, run:
$ uv sync --group cudaThe dev container also automatically activates the environment in its terminal. To activate manually, run the following in each shell session:
$ . .venv/bin/activateYou can also run things from inside the environment with uv run:
$ uv run dvc pullA few useful commands for the terminal:
-
Run the tests:
$ pytest tests
To get the data and models, there are two steps:
- Obtain the credentials for the S3 bucket and put them in
.env(the environment variablesAWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY) dvc pull
Local endpoint testing requires building and running the Docker image:
$ docker buildx build -t poprox-recommender:test .
$ docker run -d -p 9000:8080 --name=recommender poprox-recommender:testYou can then send a request to the endpoint:
$ python scripts/send-request.py -p 9000Pass the -h option to send-request.py to see command-line options.
The default setup for this package is CPU-only, which works for basic testing
and for deployment, but is very inefficient for evaluation. The current set of
models work on both CUDA (on Linux with NVidia cards) and MPS (macOS on Apple
Silicon). To make use of a GPU, install with the cuda extra (and the eval group,
which is included by default). Run the evaluation with dvc:
$ dvc repro measure-mind-smallYou can also run measure-mind-val or measure-mind-subset.
Timing information for generating recommendations with the MIND validation set:
| Machine | CPU | GPU | Rec. Time | Rec. Power | Eval Time |
|---|---|---|---|---|---|
| Cruncher | EPYC 7662 (2GHz) | A40 (CUDA) | 45m¹ | 418.5 Wh | 24m |
| Screamer | i9 14900K (3.2GHz) | 4090 (CUDA) | 28m16s² | 14m | |
| Ranger | Apple M2 Pro | - | <20hr³ | 30m³ | |
| Ranger | Apple M2 Pro | M2 (MPS) | <12hr³ |
Footnotes:
- Using 12 worker processes
- Using 8 worker processes
- Estimated based on early progress, not run to completion.
Model training is also orchestrated by DVC, along with its preprocessing stages. To avoid accidentally retraining models, the model training stages are frozen, and we un-freeze them when we need to re-train.
To re-train the model, run:
$ dvc pull
$ dvc unfreeze models/dvc.yaml:train-nrms-mind
Modifying stage 'train-nrms-mind' in 'models/dvc.yaml'
$ dvc repro models/dvc.yaml:train-nrms-mind
$ dvc freeze models/dvc.yaml:train-nrms-mind
Modifying stage 'train-nrms-mind' in 'models/dvc.yaml'
$ git add models
$ git commit
$ dvc push
$ git pushIf you are not using the devcontainer, set up pre-commit to make sure that
code formatting rules are applied as you make changes:
pre-commit installIf you update the dependencies in poprox-recommender, or add code that requires
a newer version of poprox-concepts, you need to regenerate the lock file with
uv sync. To update just poprox-concepts, run:
$ uv sync -P poprox-conceptsTo update all dependencies, run:
$ uv sync -UThe devcontainer automatically configures several VS Code extensions and
settings; we also provide an extensions.json listing recommended extensions
for this repository.