Conversation
Also double checked a bunch of naming conventions and added a bit more documentation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v2 Design
Guiding principles
Job) from where to put things (JobPaths) fromhow to execute (
Executorprotocol). The current design fuses all three.BatchController.write data, dispatch to executor, track results.
Jobcarries no Clawpack-specific internals; Clawpack coupling lives insubclasses and in
write_data_objects()— same as today, but cleaner.Package layout
Core dataclasses (
job.py)Key changes from v1:
restartis a first-class attribute onJob, not accessed viarundata.clawdatapathsis assigned back to the job by the controller — available for downstream useoutput_path/data_path/log_pathare gone from__init__(they were dead)__future__imports, modern super, f-stringsExecutor protocol (
executors/__init__.py)SerialExecutorandParallelExecutor(local runners) andSLURMExecutorallimplement this protocol.
BatchControlleraccepts anyExecutor— no subclassingneeded to add a new scheduler.
Local executors (
executors/local.py) - Key improvements over v1:shell=True;$CLAWresolved explicitly in Python_drain()rebuilds the list (no modify-while-iterating bug)returncodepropagated toJobResult; callers can detect failureswait_all()is a separate method, not a flag onrun()SLURM executor (
executors/slurm.py) Key design points:render_slurm_script()is a pure function — easy to test and to override--parsableflag onsbatchgives a clean job ID without shell parsingdry_run=Truegenerates and logs the script without submittingslurm_resourceson theJobobject overrides the executor's default — per-jobresource overrides don't require a new controller subclass
SLURMResourceshasextra_directives: list[str]for anything not covered(GRES, licenses, etc.) without needing to subclass
BatchController (
controller.py) Key changes:setup()andrun()are distinct methodswait=Truedefault — no more silent child-process massacre on script exit_make_paths()is isolated and testable_setup_directories()takesjob.restartdirectly (norundata.clawdata)list[JobResult], notlist[dict]Other Proposed Changes
Jobsubclass pattern (storm.pycleaned up)sweep.py) — new capabilityMigration guide (v1 → v2)
job.type,job.name,job.prefixjob.rundata.clawdata.restartjob.restart: boolBatchController(jobs)thenctrl.run()BatchController(jobs, executor=SerialExecutor())ctrl.parallel = True; ctrl.max_processes = NParallelExecutor(max_workers=N)ctrl.plot = TrueSerialExecutor(plot=True)orParallelExecutor(plot=True)ctrl.run(only_write_data=True)ctrl.setup()ctrl.wait = Truectrl.run(wait=True)(now the default)StampedeBatchControllerBatchController(jobs, executor=SLURMExecutor(...))paths[i]['output']results[i].paths.output