feat: API with scheduler support by ltalirz · Pull Request #121 · glasagent/amorphouspy

ltalirz · 2026-02-08T17:44:11Z

No description provided.

github-actions · 2026-02-08T17:46:00Z

Coverage Report

File	Stmts	Miss	Cover	Missing
__init__.py	0	0	100%
app.py	38	5	86%	40–42, 44–45
config.py	15	3	80%	17–19
database.py	144	32	77%	74, 104–106, 133–135, 153, 163–165, 177–179, 212–214, 225–226, 228–229, 238, 240–241, 243, 245–247, 289, 292, 313–314
jobs.py	33	20	39%	41, 43, 50–52, 54, 63–66, 69–73, 75, 98–99, 101, 107
models.py	85	14	83%	64, 66, 71–73, 78–80, 82–83, 154–156, 164
visualization.py	93	13	86%	39, 56–59, 148, 152, 177, 179–180, 230–232
routers
__init__.py	0	0	100%
meltquench.py	115	29	74%	69–70, 85, 89–91, 122–123, 222–224, 275, 308–309, 311–312, 317–318, 323, 329–330, 332–339
workflows
__init__.py	2	0	100%
meltquench.py	29	20	31%	67–68, 71–76, 79–81, 84, 96–99, 117–118, 120, 122
TOTAL	554	136	75%

ltalirz · 2026-02-09T00:03:27Z

@jan-janssen For some reason, I'm seeing a behavior where when I run through my API integration test once, it fails - it keeps pinging the /check endpoint until I kill the test (see also CI).

There are no errors in the API logs. When I locally restart the API and run the test again, it passes immediately (i.e. the simulation worked the first time and I can see it produces the cache files, they are just not picked up the first time).
Note: this also works, if I remove the high-level caching that I introduced (tasks.db, so the problem should be at a lower level).

I'm a bit lost at the moment - if you have any ideas/spot anything, let me know.

Relevant code should be in

workflow:

amorphouspy/amorphouspy_api/src/amorphouspy_api/workflows/meltquench.py

Line 31 in 38e3043

def run_meltquench_workflow(

how it is submitted by the API:

amorphouspy/amorphouspy_api/src/amorphouspy_api/app.py

Line 91 in 38e3043

def submit_to_executor(request_data: dict) -> dict:

amorphouspy/amorphouspy_api/src/amorphouspy_api/app.py

Line 364 in 38e3043

async def check(task_id: str) -> TaskResponse:

The code is likely overly cautious/complex in some parts (was tries to get it to work that didn't work and can be removed again later)

jan-janssen · 2026-02-09T09:25:15Z

@ltalirz You were a bit too fast. From my perspective there are currently two conflicting tests the test_check_running_then_complete() test which wants to wait for the future to complete and the test_submit_meltquench_and_check() test which requires the future to return the result directly.

ltalirz · 2026-02-09T11:16:31Z

There's a few things to be cleaned up - I will likely not have time today to finish it, but should be able to have another look tomorrow

@jan-janssen One question: in 669a33c we introduced workers with separate subprocesses in order to work around the signal handling of pyiron

We no longer need this here, correct?
My plan was to delete this now and just make the job submission async (it should anyhow be fast).

jan-janssen · 2026-02-09T11:36:34Z

We no longer need this here, correct? My plan was to delete this now and just make the job submission async (it should anyhow be fast).

Yes, that works fine.

As suggested in #124 it seems to be an issue with orphan processes being killed from the testing framework, so the transition to flux as backend should solve this issue.

ltalirz · 2026-02-13T16:21:03Z

@jan-janssen I fixed the basic logic for submission and /check; I also switched to the TestClusterExecutor

With that I now see

INFO:     127.0.0.1:42466 - "GET /check/83c4627d-411a-443a-ba70-78364bc9e323 HTTP/1.1" 200 OK
Exception in thread Thread-61 (execute_tasks_h5):
Traceback (most recent call last):
  File "/home/runner/miniconda3/envs/test/lib/python3.13/threading.py", line 1044, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "/home/runner/miniconda3/envs/test/lib/python3.13/threading.py", line 995, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/miniconda3/envs/test/lib/python3.13/site-packages/executorlib/task_scheduler/file/shared.py", line 151, in execute_tasks_h5
    process_dict[k] for k in future_wait_key_lst
    ~~~~~~~~~~~~^^^
KeyError: 'get_structure_dictf69a7790ce10f1dde00df1103f771a09'

I can reproduce this locally as well.

Maybe some of the logic in /check is still not correct...

ltalirz · 2026-02-13T16:21:58Z

relevant code:
https://github.com/glasagent/amorphouspy/pull/121/changes#diff-fb310cd2ccc4a6f9566da2791956c1f5e5ddcb3c9e85b85e5ef96cddb0210449R140

and
https://github.com/glasagent/amorphouspy/pull/121/changes#diff-fb310cd2ccc4a6f9566da2791956c1f5e5ddcb3c9e85b85e5ef96cddb0210449R281

jan-janssen · 2026-02-14T21:16:45Z

@jan-janssen I fixed the basic logic for submission and /check; I also switched to the TestClusterExecutor

This is fixed in pyiron/executorlib#913

This reverts commit 5746ec3.

feat: API with SLURM support

2e23dd1

ltalirz added 5 commits February 8, 2026 19:08

fix tests

c4cacce

chore: clean up api tests

7793026

get API to actually work

a912df7

fix workflow setup

abd8266

simplify

081ba24

ltalirz force-pushed the feat/api-slurm branch from cf3fb2f to 081ba24 Compare February 8, 2026 22:53

ltalirz added 4 commits February 8, 2026 23:57

fix api tests

7eee40f

try fixing api integration test

d5d092f

add response model

9e85497

cleanup

38e3043

jan-janssen reviewed Feb 9, 2026

View reviewed changes

Comment thread amorphouspy_api/src/amorphouspy_api/jobs.py Outdated

jan-janssen reviewed Feb 9, 2026

View reviewed changes

Comment thread amorphouspy_api/src/tests/test_meltquench.py Outdated

jan-janssen and others added 3 commits February 9, 2026 10:12

fix: Use executorlib with context (#122)

b1b0fb9

shut down sqlite connection

c265e98

fix test logfile warning

94cda98

ltalirz added 3 commits February 9, 2026 10:27

fix test

6ad56bf

do not block event loop

14178f3

fix api unit tests

63c89ae

jan-janssen and others added 5 commits February 9, 2026 16:30

Test with flux (#124)

6af3546

fix: back to multi-connection task store

ae84d20

put back gitignore

2367fe3

bring back api

220fba0

move stuff around

4bac361

ltalirz force-pushed the feat/api-slurm branch from ff90e90 to 4bac361 Compare February 9, 2026 16:53

ltalirz added 3 commits February 13, 2026 17:00

fix broken api unit tests

4db69e7

fix

14cfb90

fix future resolution

a0745df

bump executorlib

51be467

ltalirz and others added 5 commits February 15, 2026 23:30

feat: use get_future_from_cache

68a5918

delete wrong test

1ae065f

fix

5746ec3

Revert "fix"

8289817

This reverts commit 5746ec3.

move to flux for integration

0dc6ac0

ltalirz force-pushed the feat/api-slurm branch from e664f86 to 0dc6ac0 Compare February 16, 2026 08:09

ltalirz added 2 commits February 16, 2026 09:16

try2

d480fe5

add pysqa

4cd6554

ltalirz changed the title ~~feat: API with SLURM support~~ feat: API with scheduler support Feb 16, 2026

ltalirz added 6 commits February 18, 2026 15:22

fix: do not touch future after executor shutdown

3dcd4fb

fix warning in integration test

49a8be2

switch to lifetime management

138460c

reduce code duplication

0f988df

drop executor from /check

09fe51f

always set error

ab14f8e

ltalirz marked this pull request as ready for review February 18, 2026 15:45

Merge branch 'main' into feat/api-slurm

44e3d92

ltalirz merged commit 149be76 into main Feb 18, 2026
5 checks passed

ltalirz deleted the feat/api-slurm branch February 18, 2026 15:49

This was referenced Feb 18, 2026

Switch from SingleNodeExecutor to TestClusterExecutor #128

Closed

SingleNodeExecutor with shutdown(wait=False, cancel_futures=False) #129

Closed

Switch from TestClusterExecutor to FluxClusterExecutor #131

Closed

Update executorlib version to 1.8.2 #132

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: API with scheduler support#121

feat: API with scheduler support#121
ltalirz merged 49 commits intomainfrom
feat/api-slurm

ltalirz commented Feb 8, 2026

Uh oh!

github-actions Bot commented Feb 8, 2026 •

edited

Loading

Uh oh!

ltalirz commented Feb 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jan-janssen commented Feb 9, 2026

Uh oh!

ltalirz commented Feb 9, 2026

Uh oh!

jan-janssen commented Feb 9, 2026 •

edited

Loading

Uh oh!

ltalirz commented Feb 13, 2026

Uh oh!

ltalirz commented Feb 13, 2026

Uh oh!

jan-janssen commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ltalirz commented Feb 8, 2026

Uh oh!

github-actions Bot commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ltalirz commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jan-janssen commented Feb 9, 2026

Uh oh!

ltalirz commented Feb 9, 2026

Uh oh!

jan-janssen commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ltalirz commented Feb 13, 2026

Uh oh!

ltalirz commented Feb 13, 2026

Uh oh!

jan-janssen commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Feb 8, 2026 •

edited

Loading

ltalirz commented Feb 9, 2026 •

edited

Loading

jan-janssen commented Feb 9, 2026 •

edited

Loading