Skip to content

feat(cli): dimos restart command (DIM-683)#1476

Merged
spomichter merged 7 commits intodevfrom
feat/dim-683-restart-v2
Mar 9, 2026
Merged

feat(cli): dimos restart command (DIM-683)#1476
spomichter merged 7 commits intodevfrom
feat/dim-683-restart-v2

Conversation

@spomichter
Copy link
Contributor

@spomichter spomichter commented Mar 7, 2026

Summary

Adds dimos restart command that stops the running DimOS instance and re-invokes with the same arguments.

Supersedes #1445 (which targeted the now-merged daemon branch and referenced removed _stop_entry function).

Usage

dimos restart          # graceful stop + restart
dimos restart --force  # SIGKILL + restart

How it works

  1. Reads the most recent alive run from the registry
  2. Saves its original_argv (exact sys.argv from the original run)
  3. Calls stop_entry() (SIGTERM or SIGKILL with --force)
  4. Waits for old process to fully exit (up to 2s poll)
  5. Uses os.execvp() to replay the original command

No CLI flag reconstruction — just replays sys.argv verbatim. If the original run used --daemon, the restart inherits it automatically.

Changes

  • dimos/core/run_registry.py: Add original_argv field to RunEntry (+1 line)
  • dimos/robot/cli/dimos.py: Store sys.argv in RunEntry, add restart command (+30 lines)

Related

Contributor License Agreement

  • I have read and approved the CLA

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 7, 2026

Greptile Summary

This PR implements dimos restart, a CLI command that stops the currently-running DimOS instance and re-executes it with the exact original sys.argv using os.execvp. The approach is clean — it avoids fragile CLI-flag reconstruction by persisting the raw argument vector in a new original_argv field on RunEntry.

Key points:

  • original_argv has a default factory (list), so existing registry files without the field still deserialize correctly and are caught by the explicit if not entry.original_argv guard.
  • Both daemon and non-daemon RunEntry creation paths now capture sys.argv, which means --daemon is naturally inherited on restart without any special handling.
  • OSError from os.execvp is caught and reported cleanly, with a poll loop that waits up to 2 s after the stop signal before exec-ing the new process.
  • The wait loop runs after stop_entry() (which already implements its own 5-second timeout and SIGKILL escalation), providing defense-in-depth process synchronization.

Confidence Score: 5/5

  • PR is safe to merge. No functional issues or gaps identified.
  • The implementation is well-structured with proper error handling, backward compatibility, and defensive synchronization. All critical paths are handled: original_argv uses safe defaults for deserialization, OSError from os.execvp is caught and reported cleanly, and a post-stop poll ensures the old process has exited before the new one starts. Both daemon and non-daemon code paths correctly capture sys.argv, ensuring --daemon is naturally inherited on restart without special handling.
  • No files require special attention.

Last reviewed commit: 7406c2c

Comment on lines +285 to +288
if isinstance(value, bool):
if value:
cmd.append(flag)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean False config overrides are silently dropped on restart.

When a boolean config override is explicitly False (e.g., the original run was started with --no-verbose or another --no-<flag> style option), the current code only appends --<flag> for True values and appends nothing for False, causing the --no-<flag> form to be lost when reconstructing the command.

Looking at create_dynamic_callback() (lines 65-77), boolean GlobalConfig fields use the --flag/--no-flag Typer pattern. Any False override stored in config_overrides must emit --no-<flag> to preserve the original invocation.

Suggested change
if isinstance(value, bool):
if value:
cmd.append(flag)
else:
if isinstance(value, bool):
if value:
cmd.append(flag)
else:
cmd.append(f"--no-{key.replace('_', '-')}")

Re-stop the running instance and re-invoke with the same blueprint
args and config overrides.  Uses os.execvp to replace the process
so the new run inherits the terminal.

  dimos restart          # graceful stop + restart
  dimos restart --force  # SIGKILL + restart
  dimos restart --daemon # restart in background
Store sys.argv in RunEntry.original_argv when a run starts. The
restart command now replays the exact original command via os.execvp
instead of fragile config_overrides → CLI flag reconstruction.

Removes --daemon flag from restart (already in saved argv if used).
Tested: daemon start → restart → new instance starts with same args.
@spomichter spomichter force-pushed the feat/dim-683-restart-v2 branch from 581e015 to fd0b8a9 Compare March 9, 2026 15:42
@spomichter
Copy link
Contributor Author

@greptile

typer.echo(f" {msg}")

typer.echo(f" Running: {' '.join(argv)}")
os.execvp(argv[0], argv)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unhandled OSError from os.execvp

os.execvp raises FileNotFoundError (a subclass of OSError) if argv[0] is not found on PATH, and PermissionError if it is not executable. Since the old instance has already been stopped at this point, an unhandled exception here leaves the system with nothing running and exposes a raw Python traceback to the user instead of a clean error message.

Suggested change
os.execvp(argv[0], argv)
try:
os.execvp(argv[0], argv)
except OSError as exc:
typer.echo(f"Error: failed to restart — {exc}", err=True)
raise typer.Exit(1)

Comment on lines +282 to +286
msg, _ok = stop_entry(entry, force=force)
typer.echo(f" {msg}")

typer.echo(f" Running: {' '.join(argv)}")
os.execvp(argv[0], argv)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No wait after SIGKILL before exec in force mode

When --force is used, stop_entry sends SIGKILL and returns immediately without waiting for the old process to exit. os.execvp then runs the new dimos run which will call check_port_conflicts(). Because entry.remove() has already been called inside stop_entry, the registry check passes, but the underlying gRPC port may still be transiently held by the dying process. In practice SIGKILL is near-instantaneous on Linux, but a small polling wait before exec would make the force path as robust as the SIGTERM path (which already waits up to 5 s):

    msg, _ok = stop_entry(entry, force=force)
    typer.echo(f"  {msg}")

    # For SIGKILL the port is released almost immediately, but a brief
    # poll ensures the new run's port-conflict check never races the old process.
    from dimos.core.run_registry import is_pid_alive
    for _ in range(20):          # up to 2 s
        if not is_pid_alive(entry.pid):
            break
        time.sleep(0.1)

    typer.echo(f"  Running: {' '.join(argv)}")
    os.execvp(argv[0], argv)

Comment on lines +262 to +266
@main.command()
def restart(
force: bool = typer.Option(False, "--force", "-f", help="Force kill before restarting"),
) -> None:
"""Restart the running DimOS instance with the same arguments."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--daemon usage in PR description is unimplemented

The PR description explicitly advertises dimos restart --daemon as a supported invocation, but the restart command only declares a --force / -f option. Running dimos restart --daemon today produces a Typer "Unexpected option" error.

If background restart is genuinely desired, a --daemon flag needs to be added here (and os.execvp replaced with a subprocess spawn for that code path). If the feature is out of scope for this PR the description should be corrected to avoid user confusion.

- Wrap os.execvp in try/except OSError so a missing binary gives a
  clean error instead of a traceback (after the old instance is dead)
- Poll up to 2s for old process to exit before exec, preventing
  port conflict races when --force sends SIGKILL
@spomichter spomichter force-pushed the feat/dim-683-restart-v2 branch from a0a7e5f to 9b0efa8 Compare March 9, 2026 16:16
@spomichter
Copy link
Contributor Author

@greptile

cli_args: list[str] = field(default_factory=list)
config_overrides: dict[str, object] = field(default_factory=dict)
grpc_port: int = 9877
original_argv: list[str] = field(default_factory=list)
Copy link
Collaborator

@Dreamsorcerer Dreamsorcerer Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor suggestion, but it might be good to make these kind of things immutable in future generally. This also avoids needing the default_factory:

Suggested change
original_argv: list[str] = field(default_factory=list)
original_argv: Sequence[str] = ()

@spomichter spomichter merged commit 6a0205a into dev Mar 9, 2026
2 checks passed
@spomichter spomichter deleted the feat/dim-683-restart-v2 branch March 9, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants