Prometheus example with torch model by rz4 · Pull Request #98 · AI-ModCon/BaseSIM_APEIRON

rz4 · 2026-04-13T20:30:36Z

Summary

Example harness for torch implementation of Prometheus' TemporalPredict.
A script is provided to reproduce the model in torch and compare against baseline.
The reproduced torch model runs on APEIRON.

Motivation & Context

Baseline Prometheus model was provided as a Keras model.
Not directly compatible with APEIRON.

Approach

Write model implementations in keras referencing npm1_pwr_model.keras model file.
Train keras model from scratch, then compare with base model.
Write model implementation in torch.
Train torch model from scratch, then compare with base model.
Select early checkpoint during training as starting model for APEIRON.
Adjust APEIRON config to detect drift.
Run and compare checkpoint with base Prometheus model.

Screenshots / Logs (optional)

Comparing APEIRON trained torch model with baseline

Training Log (Torch Reproduction)

Loading data ...
  23 training files, 4 test files
TorchTemporalModel(
  (lstm1): LSTM(12, 128, batch_first=True)
  (drop1): Dropout(p=0.1, inplace=False)
  (lstm2): LSTM(128, 64, batch_first=True)
  (drop2): Dropout(p=0.1, inplace=False)
  (lstm3): LSTM(64, 32, batch_first=True)
  (head): Linear(in_features=32, out_features=1, bias=True)
)

--- Case 0 [2025-02-27] (torch): X=(12029, 10, 12), Y=(12029, 10, 1) (stats from 1 case(s)) ---
  Epoch   1/1  train_loss=0.3614  val_loss=0.1466
  checkpoint -> ./output/prometheus_torch/checkpoints/torch_case_00.pt
  stats      -> ./output/prometheus_torch/checkpoints/torch_case_00.stats.json

=== Evaluation [torch_case_00] (torch) ===
  test 0 [2025-03-20]: R2=-0.9407  MAE=119793.776
  test 1 [2025-05-12]: R2=-11.7010  MAE=228090.305
  test 2 [2025-07-23]: R2=-11.0850  MAE=226998.640
  test 3 [2025-09-18]: R2=-13.6690  MAE=230475.408

--- Case 1 [2025-03-12] (torch): X=(12405, 10, 12), Y=(12405, 10, 1) (stats from 2 case(s)) ---
  Epoch   1/1  train_loss=0.0236  val_loss=0.0054
  checkpoint -> ./output/prometheus_torch/checkpoints/torch_case_01.pt
  stats      -> ./output/prometheus_torch/checkpoints/torch_case_01.stats.json

=== Evaluation [torch_case_01] (torch) ===
  test 0 [2025-03-20]: R2=-0.2049  MAE=76539.053
  test 1 [2025-05-12]: R2=-0.1590  MAE=28893.670
  test 2 [2025-07-23]: R2=-0.1835  MAE=29299.993
  test 3 [2025-09-18]: R2=-0.1579  MAE=24972.951

--- Case 2 [2025-03-19] (torch): X=(14681, 10, 12), Y=(14681, 10, 1) (stats from 3 case(s)) ---
  Epoch   1/1  train_loss=0.0439  val_loss=0.0078
  checkpoint -> ./output/prometheus_torch/checkpoints/torch_case_02.pt
  stats      -> ./output/prometheus_torch/checkpoints/torch_case_02.stats.json

=== Evaluation [torch_case_02] (torch) ===
  test 0 [2025-03-20]: R2=0.9812  MAE=5532.674
  test 1 [2025-05-12]: R2=0.4989  MAE=14322.213
  test 2 [2025-07-23]: R2=0.3201  MAE=32789.465
  test 3 [2025-09-18]: R2=0.1602  MAE=34926.435

...

--- Case 19 [2025-09-02] (torch): X=(6416, 10, 12), Y=(6416, 10, 1) (stats from 20 case(s)) ---
  Epoch   1/1  train_loss=0.0010  val_loss=0.0012
  checkpoint -> ./output/prometheus_torch/checkpoints/torch_case_19.pt
  stats      -> ./output/prometheus_torch/checkpoints/torch_case_19.stats.json

=== Evaluation [torch_case_19] (torch) ===
  test 0 [2025-03-20]: R2=0.9714  MAE=3851.500
  test 1 [2025-05-12]: R2=0.9887  MAE=2989.728
  test 2 [2025-07-23]: R2=0.9926  MAE=2802.097
  test 3 [2025-09-18]: R2=0.9316  MAE=4084.549

--- Case 20 [2025-09-16] (torch): X=(18274, 10, 12), Y=(18274, 10, 1) (stats from 21 case(s)) ---
  Epoch   1/1  train_loss=0.0007  val_loss=0.0006
  checkpoint -> ./output/prometheus_torch/checkpoints/torch_case_20.pt
  stats      -> ./output/prometheus_torch/checkpoints/torch_case_20.stats.json

=== Evaluation [torch_case_20] (torch) ===
  test 0 [2025-03-20]: R2=0.9760  MAE=4034.054
  test 1 [2025-05-12]: R2=0.9908  MAE=2303.845
  test 2 [2025-07-23]: R2=0.9974  MAE=1943.545
  test 3 [2025-09-18]: R2=0.9867  MAE=2424.433

--- Case 21 [2025-09-17] (torch): X=(31086, 10, 12), Y=(31086, 10, 1) (stats from 22 case(s)) ---
  Epoch   1/1  train_loss=0.0007  val_loss=0.0010
  checkpoint -> ./output/prometheus_torch/checkpoints/torch_case_21.pt
  stats      -> ./output/prometheus_torch/checkpoints/torch_case_21.stats.json

=== Evaluation [torch_case_21] (torch) ===
  test 0 [2025-03-20]: R2=0.9723  MAE=5257.041
  test 1 [2025-05-12]: R2=0.9895  MAE=2249.922
  test 2 [2025-07-23]: R2=0.9983  MAE=1910.335
  test 3 [2025-09-18]: R2=0.9844  MAE=2494.421

--- Case 22 [2025-09-25] (torch): X=(7377, 10, 12), Y=(7377, 10, 1) (stats from 23 case(s)) ---
  Epoch   1/1  train_loss=0.0335  val_loss=0.0300
  checkpoint -> ./output/prometheus_torch/checkpoints/torch_case_22.pt
  stats      -> ./output/prometheus_torch/checkpoints/torch_case_22.stats.json

=== Evaluation [torch_case_22] (torch) ===
  test 0 [2025-03-20]: R2=0.9100  MAE=23678.136
  test 1 [2025-05-12]: R2=0.6486  MAE=37530.078
  test 2 [2025-07-23]: R2=0.6880  MAE=36466.123
  test 3 [2025-09-18]: R2=0.6278  MAE=36732.376

Saved final torch model to ./output/prometheus_torch/reproduced_prometheus.pt
Saved final stats sidecar to ./output/prometheus_torch/reproduced_prometheus.stats.json

API / CLI Changes

foo.bar(x: int) -> str (new)
baz(qux: PathLike) (removed strict: bool)

Breaking Changes

None

Performance (optional)

n/A

Security & Privacy

No secrets committed
Input validation added where needed

Dependencies

Tensorflow

Testing Plan

n/A

Documentation

README in examples folder explains how to setup data and reproduce model in torch.

Checklist

Code formatted (Ruff) → ruff format --check
Lint passes (Ruff) → ruff check .
Types pass (mypy/pyright) → mypy src
Tests pass (pytest) → pytest -q

Backward compatibility considered
Adequate comments for tricky parts
CI green

Risk & Rollback Plan

Probably not needed in the beginning

Notes for Reviewers

…on/BaseSIM_APEIRON into example-prometheus-torch

Rafael Zamora-Resendiz (AMCRD) and others added 4 commits April 13, 2026 15:41

Prometheus Model Reproduced With Torch

7d1df9d

Update README.md

91751bd

Passes ruff and mypy.

e6afa34

Merge branch 'example-prometheus-torch' of https://github.com/AI-ModC…

8cecd29

…on/BaseSIM_APEIRON into example-prometheus-torch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus example with torch model#98

Prometheus example with torch model#98
rz4 wants to merge 4 commits intomainfrom
example-prometheus-torch

rz4 commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rz4 commented Apr 13, 2026

Summary

Motivation & Context

Approach

Screenshots / Logs (optional)

Comparing APEIRON trained torch model with baseline

Training Log (Torch Reproduction)

API / CLI Changes

Breaking Changes

Performance (optional)

Security & Privacy

Dependencies

Testing Plan

Documentation

Checklist

Risk & Rollback Plan

Notes for Reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant