Skip to content

feat: add HF-backed checkpoint storage#63

Draft
SoheylM wants to merge 3 commits into
mainfrom
codex/hf-checkpoint-backend
Draft

feat: add HF-backed checkpoint storage#63
SoheylM wants to merge 3 commits into
mainfrom
codex/hf-checkpoint-backend

Conversation

@SoheylM
Copy link
Copy Markdown
Contributor

@SoheylM SoheylM commented Apr 22, 2026

Summary

  • add a shared checkpoint backend with HF, W&B, local, and auto resolution paths
  • route training entrypoints through the new checkpoint packaging flow while keeping W&B compatibility
  • update evaluators and surrogate-model loading to prefer HF packages with W&B fallback
  • document the new workflow and add a migration playbook for historical W&B checkpoints

Validation

  • conda run -n engibench python -m py_compile on modified Python files
  • conda run -n engibench ruff check on modified files
  • conda run -n engibench mypy --follow-imports=skip engiopt/checkpoint_store.py engiopt/surrogate_model/run_pe_optimization.py engiopt/vqgan/evaluate_vqgan.py
  • local checkpoint package resolution smoke test

Follow-up

  • run authenticated smoke tests for HF-backed save and legacy W&B restore before review

@SoheylM
Copy link
Copy Markdown
Contributor Author

SoheylM commented Apr 23, 2026

@mkeeler43 requesting your review on this one. This PR adds an HF-backed checkpoint backend for EngiOpt so new model weights can live on Hugging Face instead of saturating W&B artifact storage, while keeping backward-compatible restore support for existing W&B-hosted checkpoints. It also packages run config and metadata with the checkpoint so new HF-backed runs remain reproducible and tied back to the originating W&B run.

@SoheylM SoheylM requested a review from mkeeler43 April 23, 2026 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant