Skip to content

[hotfix] fix localhost measurement#4320

Merged
FrankLeeeee merged 1 commit intohpcaitech:mainfrom
Gy-Lu:main
Aug 1, 2023
Merged

[hotfix] fix localhost measurement#4320
FrankLeeeee merged 1 commit intohpcaitech:mainfrom
Gy-Lu:main

Conversation

@Gy-Lu
Copy link
Copy Markdown
Contributor

@Gy-Lu Gy-Lu commented Jul 25, 2023

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

fixed #4318

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

The error msg in the issue shows that each node thinks it is the localhost, and here is the question.
I print the localaddrs and hostaddrs of the second node:

[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.1.1', 22)), 
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.1.1', 22)), 
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_RAW: 3>, 0, '', ('127.0.1.1', 22)), 
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('192.168.0.64', 22)), 
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('192.168.0.64', 22)), 
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_RAW: 3>, 0, '', ('192.168.0.64', 22))]

 [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.1.1', 22)), 
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.1.1', 22)), 
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_RAW: 3>, 0, '', ('127.0.1.1', 22)), 
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('192.168.0.189', 22)), 
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('192.168.0.189', 22)), 
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_RAW: 3>, 0, '', ('192.168.0.189', 22))]

Under the current implementation, at the first iter, it returns True(whether it is localhost), which is not the fact.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@Gy-Lu Gy-Lu requested review from FrankLeeeee and ver217 July 25, 2023 07:06
Comment thread colossalai/cli/launcher/hostinfo.py
@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is %.

Click me to view the complete report
Name                                  Stmts   Miss  Cover
---------------------------------------------------------
colossalai/cli/launcher/hostinfo.py      44     44     0%
---------------------------------------------------------
TOTAL                                    44     44     0%

@Gy-Lu Gy-Lu closed this Jul 31, 2023
@Gy-Lu Gy-Lu reopened this Jul 31, 2023
@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is %.

Click me to view the complete report
Name                                  Stmts   Miss  Cover
---------------------------------------------------------
colossalai/cli/launcher/hostinfo.py      44     44     0%
---------------------------------------------------------
TOTAL                                    44     44     0%

@FrankLeeeee FrankLeeeee merged commit 03654c0 into hpcaitech:main Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Multi-rank on same device

3 participants