Skip to content

SwanLab: add id and resume support for resuming runs (fixes #43698)#43739

Closed
Rayyan-Oumlil wants to merge 1 commit intohuggingface:mainfrom
Rayyan-Oumlil:fix/43698-swanlab-id-resume
Closed

SwanLab: add id and resume support for resuming runs (fixes #43698)#43739
Rayyan-Oumlil wants to merge 1 commit intohuggingface:mainfrom
Rayyan-Oumlil:fix/43698-swanlab-id-resume

Conversation

@Rayyan-Oumlil
Copy link
Copy Markdown

Fixes #43698

Summary

When using Trainer with SwanLab and resuming training (trainer.train(resume_from_checkpoint=...)), the integration previously had no way to pass id and resume to swanlab.init(), so a new experiment was always created instead of continuing the existing one.

This PR adds support for resuming a previous SwanLab run via environment variables (same pattern as MLflow's MLFLOW_RUN_ID):

  • SWANLAB_RUN_ID (or SWANLAB_ID): The 21-character SwanLab run ID to resume. Users set this when resuming so the same experiment continues.
  • SWANLAB_RESUME: Resume mode — "allow" / True (resume if run exists, else create new), "must" (resume only), or "never" / False (always create new).

Changes

  • In SwanLabCallback.setup(), read SWANLAB_RUN_ID/SWANLAB_ID and SWANLAB_RESUME from the environment and pass them to swanlab.init().
  • Document the new env vars in the callback docstring.
  • Parse SWANLAB_RESUME for common values (true/false, allow/must/never) so both string and boolean-like env values work.

Usage

When resuming training, set the env vars before calling trainer.train(resume_from_checkpoint=...):

import os
os.environ["SWANLAB_RUN_ID"] = "14pk4qbyav4toobziszli"  # from previous run
os.environ["SWANLAB_RESUME"] = "allow"
trainer.train(resume_from_checkpoint="./checkpoint-100")

This keeps metrics and history in a single SwanLab run across restarts.

Copilot AI review requested due to automatic review settings February 4, 2026 15:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for resuming SwanLab runs when using Trainer with resume_from_checkpoint, fixing issue #43698. Previously, resuming training always created a new SwanLab experiment because there was no way to pass id and resume parameters to swanlab.init().

Changes:

  • Added support for SWANLAB_RUN_ID/SWANLAB_ID and SWANLAB_RESUME environment variables
  • Updated SwanLabCallback.setup() to read these env vars and pass them to swanlab.init()
  • Added documentation for the new environment variables in the callback's docstring

Comment on lines +2281 to +2285
- **SWANLAB_RUN_ID** (`str`, *optional*, defaults to `None`):
The SwanLab run ID (21-character string) to resume. When set together with `SWANLAB_RESUME`, enables
resuming a previous run so that `trainer.train(resume_from_checkpoint=...)` continues the same
experiment instead of creating a new one. The run ID can be found in the experiment's Environment tab
or in the URL on the SwanLab dashboard.
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation mentions "SWANLAB_ID" as an alternative to "SWANLAB_RUN_ID", but this is only documented in the implementation code (line 2317), not in the docstring. For consistency and clarity, the docstring should document both environment variable names.

Consider updating the docstring at line 2281 to mention both names, for example:
"SWANLAB_RUN_ID (or SWANLAB_ID) (str, optional, defaults to None):"

This would make it clear to users that either environment variable name can be used.

Copilot uses AI. Check for mistakes.
Comment on lines +2321 to +2329
if swanlab_resume is not None:
if swanlab_resume.lower() in ("true", "1"):
init_args["resume"] = True
elif swanlab_resume.lower() in ("false", "0"):
init_args["resume"] = False
elif swanlab_resume.lower() in ("allow", "must", "never"):
init_args["resume"] = swanlab_resume.lower()
else:
init_args["resume"] = swanlab_resume
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment variable parsing for boolean values doesn't follow the established pattern in this codebase. Other integrations in this file use either ENV_VARS_TRUE_VALUES (e.g., MLflowCallback at lines 1300-1301) or utility functions like is_env_variable_true() for consistent parsing.

For consistency with the codebase, consider using the standard approach:

swanlab_resume_env = os.getenv("SWANLAB_RESUME", None)
if swanlab_resume_env is not None:
    resume_upper = swanlab_resume_env.upper()
    if resume_upper in ENV_VARS_TRUE_VALUES or resume_upper == "ALLOW":
        init_args["resume"] = "allow"  # or True, depending on SwanLab's API
    elif resume_upper in ("FALSE", "0", "NEVER"):
        init_args["resume"] = "never"  # or False
    elif resume_upper == "MUST":
        init_args["resume"] = "must"
    else:
        init_args["resume"] = swanlab_resume_env

This approach:

  1. Follows the established pattern (see src/transformers/integrations/integration_utils.py:1300-1301)
  2. Handles case-insensitivity consistently
  3. Avoids potential AttributeError if the value is not a string
  4. Uses the same values that other integrations recognize ("TRUE", "1", "FALSE", "0")

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Rayyan-Oumlil ! sorry this pr is superseded by #43719

@Rayyan-Oumlil Rayyan-Oumlil closed this by deleting the head repository Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SwanLab integration uses outdated swanlab.init() signature

3 participants