Skip to content

Add support of deprecated models#4

Merged
gargrahul merged 5 commits intomainfrom
deprecated-model
Jun 11, 2025
Merged

Add support of deprecated models#4
gargrahul merged 5 commits intomainfrom
deprecated-model

Conversation

@coketaste
Copy link
Copy Markdown
Collaborator

Add support of deprecated models

@coketaste coketaste self-assigned this Jun 4, 2025
Comment thread src/madengine/mad.py Outdated
parser_run.add_argument('--tags', nargs='+', default=[], help="tags to run (can be multiple).")

# Deprecated Tag
parser_run.add_argument('--ignore_deprecated_flag', action='store_true', help="Force run deprecated models even if marked deprecated.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use "-" not "_" in between words.

@coketaste coketaste assigned Chetan-AMD and coketaste and unassigned coketaste Jun 9, 2025
@coketaste coketaste marked this pull request as ready for review June 9, 2025 17:30
@gargrahul gargrahul merged commit 8cd083d into main Jun 11, 2025
@coketaste coketaste deleted the deprecated-model branch April 9, 2026 15:02
srinivamd added a commit to srinivamd/madengine that referenced this pull request Apr 23, 2026
…play

- Copilot review comment ROCm#4: timeout option declared as int but
  assigned None after conversion; add elif timeout == 0: timeout = None
  sentinel so "no timeout" is expressed as None (not a magic int)
- Fix validation message to mention 0 is valid (--timeout 0 = no timeout)
- Fix 2 panel displays: show "Disabled" when timeout is None instead
  of "None s" which was confusing to users
raviguptaamd added a commit to raviguptaamd/madengine that referenced this pull request May 1, 2026
Address all 9 inline comments from copilot-pull-request-reviewer[bot]:

ROCm#1 build_orchestrator.py — _execute_with_prebuilt_image now keys
   manifest['built_models'] by model_name (not use_image), so multiple
   models that share the same pre-built image are all preserved in the
   manifest.

ROCm#2 build_orchestrator.py — warn when discovered models have differing
   distributed/slurm configs in the prebuilt-image flow; the post-merge
   step still uses models[0]'s config but operators are now told.

ROCm#3 build_orchestrator.py — _execute_build_on_compute() now raises
   ConfigurationError early when registry is None instead of falling
   into registry.replace/.split/.lower with NoneType.

ROCm#4 build_orchestrator.py — credentials-required error now emits
   per-registry hints (docker.io / ghcr.io / gcr.io / quay.io / nvcr.io)
   instead of Docker-Hub-only PAT guidance.

ROCm#5 container_runner.py — document the shell=True trust boundary on the
   inner subprocess.run; cmd is internally constructed and any user
   model_args are routed through shlex-quoted assembly in the caller.

ROCm#6 slurm.py — drop duplicate `from typing import Optional` import.

ROCm#7 slurm.py — slurm_multi wrapper no longer hard-codes
   `#SBATCH --exclusive`; honours self.slurm_config.get('exclusive', True)
   to match the standard SLURM template behaviour.

ROCm#8 slurm_node_selector.py — cleanup_node()'s srun_cmd is now built once
   and includes both --job-name (when provided) and --reservation (when
   set); the second in-try reassignment that dropped --job-name is gone.

ROCm#9 run_orchestrator.py — replace the shallow `merged.update(...)` with
   a real recursive _deep_merge so the comment ("deep-merge") matches the
   behaviour: nested dicts under slurm/k8s/distributed/etc. are merged
   per-leaf, runtime --additional-context still wins on conflicts.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants