Skip to content

Use normpath and improve override argument parsing in madengine discover#7

Merged
gargrahul merged 5 commits intomainfrom
use_normpath
Jun 19, 2025
Merged

Use normpath and improve override argument parsing in madengine discover#7
gargrahul merged 5 commits intomainfrom
use_normpath

Conversation

@Rohan138
Copy link
Copy Markdown
Contributor

@Rohan138 Rohan138 commented Jun 17, 2025

Before: madengine discover will list a relative path to a dockerfile e.g. "docker": "../docker/pyt_vllm" in scripts/vllm/models.json as "scripts/vllm/../../docker/pyt_vllm". After: will list as "docker/pyt_vllm".

Also, overriding through the :arg option will automatically parse arg=val to --arg val. This makes CLI overrides to arguments cleaner to specify.

@amathews-amd amathews-amd requested a review from gargrahul June 17, 2025 21:18
@Rohan138 Rohan138 changed the title Use normpath to make the docker paths render correctly Use normpath and improve override argument parsing in madengine discover Jun 18, 2025
@gargrahul gargrahul requested a review from coketaste June 19, 2025 20:35
@gargrahul gargrahul merged commit 8a1ca13 into main Jun 19, 2025
@coketaste coketaste deleted the use_normpath branch April 9, 2026 15:02
raviguptaamd added a commit to raviguptaamd/madengine that referenced this pull request May 1, 2026
Address all 9 inline comments from copilot-pull-request-reviewer[bot]:

ROCm#1 build_orchestrator.py — _execute_with_prebuilt_image now keys
   manifest['built_models'] by model_name (not use_image), so multiple
   models that share the same pre-built image are all preserved in the
   manifest.

ROCm#2 build_orchestrator.py — warn when discovered models have differing
   distributed/slurm configs in the prebuilt-image flow; the post-merge
   step still uses models[0]'s config but operators are now told.

ROCm#3 build_orchestrator.py — _execute_build_on_compute() now raises
   ConfigurationError early when registry is None instead of falling
   into registry.replace/.split/.lower with NoneType.

ROCm#4 build_orchestrator.py — credentials-required error now emits
   per-registry hints (docker.io / ghcr.io / gcr.io / quay.io / nvcr.io)
   instead of Docker-Hub-only PAT guidance.

ROCm#5 container_runner.py — document the shell=True trust boundary on the
   inner subprocess.run; cmd is internally constructed and any user
   model_args are routed through shlex-quoted assembly in the caller.

ROCm#6 slurm.py — drop duplicate `from typing import Optional` import.

ROCm#7 slurm.py — slurm_multi wrapper no longer hard-codes
   `#SBATCH --exclusive`; honours self.slurm_config.get('exclusive', True)
   to match the standard SLURM template behaviour.

ROCm#8 slurm_node_selector.py — cleanup_node()'s srun_cmd is now built once
   and includes both --job-name (when provided) and --reservation (when
   set); the second in-try reassignment that dropped --job-name is gone.

ROCm#9 run_orchestrator.py — replace the shallow `merged.update(...)` with
   a real recursive _deep_merge so the comment ("deep-merge") matches the
   behaviour: nested dicts under slurm/k8s/distributed/etc. are merged
   per-leaf, runtime --additional-context still wins on conflicts.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants