Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 19, 2025

All examples hardcode port 29500 for torch.distributed initialization. Crashed processes leave ports bound, blocking development iteration.

Changes

New irisrun CLI tool (irisrun_cli/)

  • Finds free TCP port via localhost socket binding
  • Spawns processes with RANK, WORLD_SIZE, MASTER_ADDR, MASTER_PORT environment variables
  • Similar interface to torchrun: irisrun --nproc_per_node=N script.py [args]

Updated examples/00_load/load_bench.py

  • Checks for environment variables; falls back to hardcoded port if absent
  • Maintains backward compatibility with direct execution

Documentation

  • docs/irisrun.md: Usage reference and migration guide
  • examples/00_load/README.md: Added irisrun examples

Usage

# Recommended: auto port allocation
irisrun --nproc_per_node=2 examples/00_load/load_bench.py --verbose

# Legacy: hardcoded port 29500
python examples/00_load/load_bench.py --num_ranks 2 --verbose

Example script adaptation pattern:

def _worker(local_rank, world_size, init_url, args):
    if "RANK" in os.environ and "WORLD_SIZE" in os.environ:
        # irisrun mode: use environment variables
        rank = int(os.environ["RANK"])
        init_method = f"tcp://{os.environ['MASTER_ADDR']}:{os.environ['MASTER_PORT']}"
    else:
        # standalone mode: use function parameters
        rank = local_rank
        init_method = init_url
Original prompt

This section details on the original issue you should resolve

<issue_title>Introduce irisrun</issue_title>
<issue_description>All examples initial torch dist with hardcoded ports, when something crashes, we don't cleanly close the ports which makes development difficult. Introduce irisrun similar to torchrun and port just one example.</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 6 commits December 19, 2025 18:15
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
…ncoding

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
…ind_free_port

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Copilot AI changed the title [WIP] Introduce irisrun for cleaner port management Introduce irisrun for automatic port management in distributed examples Dec 19, 2025
Copilot AI requested a review from mawad-amd December 19, 2025 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce irisrun

2 participants