Skip to content

Fix small issues with madengine#5

Merged
gargrahul merged 21 commits intomainfrom
genesu/test-build
Jun 23, 2025
Merged

Fix small issues with madengine#5
gargrahul merged 21 commits intomainfrom
genesu/test-build

Conversation

@GeneDer
Copy link
Copy Markdown
Member

@GeneDer GeneDer commented Jun 10, 2025

  • install jq before using it
  • copy scripts directory to current working directory correctly
  • added additional_docker_run_options to allow additional options to be passed during docker run command

GeneDer added 14 commits June 9, 2025 10:53
Signed-off-by: Gene Su <Gene.Su@amd.com>
Signed-off-by: Gene Su <Gene.Su@amd.com>
Signed-off-by: Gene Su <Gene.Su@amd.com>
Signed-off-by: Gene Su <Gene.Su@amd.com>
Signed-off-by: Gene Su <Gene.Su@amd.com>
Signed-off-by: Gene Su <Gene.Su@amd.com>
Signed-off-by: Gene Su <Gene.Su@amd.com>
Signed-off-by: Gene Su <Gene.Su@amd.com>
Signed-off-by: Gene Su <Gene.Su@amd.com>
Signed-off-by: Gene Su <Gene.Su@amd.com>
Signed-off-by: Gene Su <Gene.Su@amd.com>
Signed-off-by: Gene Su <Gene.Su@amd.com>
Signed-off-by: Gene Su <Gene.Su@amd.com>
Signed-off-by: Gene Su <Gene.Su@amd.com>
Comment thread src/madengine/tools/run_models.py Outdated
Comment thread src/madengine/tools/run_models.py
@GeneDer
Copy link
Copy Markdown
Member Author

GeneDer commented Jun 12, 2025

@ppalaniappan-amd @Rohan138 can you help to take a look at this PR?

coketaste
coketaste previously approved these changes Jun 13, 2025
Copy link
Copy Markdown
Collaborator

@coketaste coketaste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Gene Su <Gene.Su@amd.com>
Copy link
Copy Markdown
Contributor

@Rohan138 Rohan138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, CI passed, @gargrahul good to merge

@gargrahul gargrahul merged commit f7ecad6 into main Jun 23, 2025
@GeneDer GeneDer deleted the genesu/test-build branch June 23, 2025 18:41
@Rohan138 Rohan138 mentioned this pull request Jun 24, 2025
raviguptaamd added a commit to raviguptaamd/madengine that referenced this pull request May 1, 2026
Address all 9 inline comments from copilot-pull-request-reviewer[bot]:

ROCm#1 build_orchestrator.py — _execute_with_prebuilt_image now keys
   manifest['built_models'] by model_name (not use_image), so multiple
   models that share the same pre-built image are all preserved in the
   manifest.

ROCm#2 build_orchestrator.py — warn when discovered models have differing
   distributed/slurm configs in the prebuilt-image flow; the post-merge
   step still uses models[0]'s config but operators are now told.

ROCm#3 build_orchestrator.py — _execute_build_on_compute() now raises
   ConfigurationError early when registry is None instead of falling
   into registry.replace/.split/.lower with NoneType.

ROCm#4 build_orchestrator.py — credentials-required error now emits
   per-registry hints (docker.io / ghcr.io / gcr.io / quay.io / nvcr.io)
   instead of Docker-Hub-only PAT guidance.

ROCm#5 container_runner.py — document the shell=True trust boundary on the
   inner subprocess.run; cmd is internally constructed and any user
   model_args are routed through shlex-quoted assembly in the caller.

ROCm#6 slurm.py — drop duplicate `from typing import Optional` import.

ROCm#7 slurm.py — slurm_multi wrapper no longer hard-codes
   `#SBATCH --exclusive`; honours self.slurm_config.get('exclusive', True)
   to match the standard SLURM template behaviour.

ROCm#8 slurm_node_selector.py — cleanup_node()'s srun_cmd is now built once
   and includes both --job-name (when provided) and --reservation (when
   set); the second in-try reassignment that dropped --job-name is gone.

ROCm#9 run_orchestrator.py — replace the shallow `merged.update(...)` with
   a real recursive _deep_merge so the comment ("deep-merge") matches the
   behaviour: nested dicts under slurm/k8s/distributed/etc. are merged
   per-leaf, runtime --additional-context still wins on conflicts.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants