Skip to content

Rewrite the Package with API Breaking Changes#7

Open
mandli wants to merge 12 commits into
mainfrom
v2
Open

Rewrite the Package with API Breaking Changes#7
mandli wants to merge 12 commits into
mainfrom
v2

Conversation

@mandli
Copy link
Copy Markdown
Owner

@mandli mandli commented Apr 30, 2026

v2 Design

Guiding principles

  1. Separate what to run (Job) from where to put things (JobPaths) from
    how to execute (Executor protocol). The current design fuses all three.
  2. Schedulers are executor backends, not subclasses of BatchController.
  3. The controller's job is orchestration: compute paths, set up directories,
    write data, dispatch to executor, track results.
  4. Job carries no Clawpack-specific internals; Clawpack coupling lives in
    subclasses and in write_data_objects() — same as today, but cleaner.

Package layout

batch/
├── __init__.py          # exports Job, BatchController, JobPaths, JobResult
├── job.py               # Job, JobPaths, JobResult dataclasses
├── controller.py        # BatchController
├── executors/
│   ├── __init__.py      # Executor protocol
│   ├── local.py         # SerialExecutor, ParallelExecutor
│   └── slurm.py         # SLURMExecutor, SLURMResources
└── sweep.py             # parameter sweep helpers (new)
pyproject.toml
CHANGELOG.md

Core dataclasses (job.py)

Key changes from v1:

  • restart is a first-class attribute on Job, not accessed via rundata.clawdata
  • paths is assigned back to the job by the controller — available for downstream use
  • output_path / data_path / log_path are gone from __init__ (they were dead)
  • No __future__ imports, modern super, f-strings

Executor protocol (executors/__init__.py)

SerialExecutor and ParallelExecutor (local runners) and SLURMExecutor all
implement this protocol. BatchController accepts any Executor — no subclassing
needed to add a new scheduler.

  • Local executors (executors/local.py) - Key improvements over v1:

    • No shell=True; $CLAW resolved explicitly in Python
    • Log handles properly closed when process finishes
    • _drain() rebuilds the list (no modify-while-iterating bug)
    • returncode propagated to JobResult; callers can detect failures
    • wait_all() is a separate method, not a flag on run()
  • SLURM executor (executors/slurm.py) Key design points:

    • render_slurm_script() is a pure function — easy to test and to override
    • --parsable flag on sbatch gives a clean job ID without shell parsing
    • dry_run=True generates and logs the script without submitting
    • slurm_resources on the Job object overrides the executor's default — per-job
      resource overrides don't require a new controller subclass
    • SLURMResources has extra_directives: list[str] for anything not covered
      (GRES, licenses, etc.) without needing to subclass
  • BatchController (controller.py) Key changes:

    • setup() and run() are distinct methods
    • wait=True default — no more silent child-process massacre on script exit
    • _make_paths() is isolated and testable
    • _setup_directories() takes job.restart directly (no rundata.clawdata)
    • Returns typed list[JobResult], not list[dict]
    • Failure reporting via logger, not silent

Other Proposed Changes

  • Updated Job subclass pattern (storm.py cleaned up)
  • Parameter sweep helpers (sweep.py) — new capability

Migration guide (v1 → v2)

v1 pattern v2 equivalent
job.type, job.name, job.prefix unchanged
job.rundata.clawdata.restart job.restart: bool
BatchController(jobs) then ctrl.run() BatchController(jobs, executor=SerialExecutor())
ctrl.parallel = True; ctrl.max_processes = N ParallelExecutor(max_workers=N)
ctrl.plot = True SerialExecutor(plot=True) or ParallelExecutor(plot=True)
ctrl.run(only_write_data=True) ctrl.setup()
ctrl.wait = True ctrl.run(wait=True) (now the default)
StampedeBatchController BatchController(jobs, executor=SLURMExecutor(...))
Return value paths[i]['output'] results[i].paths.output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant