Skip to content

need to be able to restart the tracker #217

@stas00

Description

@stas00

Requesting to make it easier to restart the tracker. In the HuggingFace BigScience project we need to be able to restart the tracker in various places. After writing the code I discovered start/stop aren't a pair. One can't start() again after stop(). So the current stop() is similar to destroy() but one can't tell that from the name. start() after stop() gives:

[codecarbon WARNING @ 19:55:24] <class 'Exception'>
Traceback (most recent call last):
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/codecarbon/core/util.py", line 10, in suppress
    yield
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/codecarbon/emissions_tracker.py", line 314, in stop
    self._scheduler.shutdown()
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/apscheduler/schedulers/background.py", line 41, in shutdown
    super(BackgroundScheduler, self).shutdown(*args, **kwargs)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/apscheduler/schedulers/blocking.py", line 24, in shutdown
    super(BlockingScheduler, self).shutdown(wait)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 189, in shutdown
    raise SchedulerNotRunningError
apscheduler.schedulers.SchedulerNotRunningError: Scheduler is not running

Any chance you could adjust the code so that the following code could work?

cc = codecarbon.OfflineEmissionsTracker(...)
cc.start()
cc.stop(); cc.start() # restart
...
cc.stop(); cc.start() # restart
cc.stop()

or the longer version in the context of how I'm trying to write the wrapper to fit into Megatron-LM's setup:

_GLOBAL_CODECARBON_TRACKER = None
def _set_codecarbon_tracker(args):
    global _GLOBAL_CODECARBON_TRACKER
    if hasattr(args, 'codecarbon_dir'):
        import codecarbon
        print('> setting codecarbon ...')
        output_dir = args.codecarbon_dir
        output_file = f"emissions-{args.rank:03d}.csv"

        log_level = "info"
        country_iso_code="FRA"

        Path(output_dir).mkdir(parents=True, exist_ok=True)
        _GLOBAL_CODECARBON_TRACKER = codecarbon.OfflineEmissionsTracker(
            output_dir=output_dir,
            output_file=output_file,
            log_level=log_level,
            country_iso_code=country_iso_code,
        )


def codecarbon_tracker_start():
    global _GLOBAL_CODECARBON_TRACKER
    if _GLOBAL_CODECARBON_TRACKER is None:
        return

    print('codecarbon START')
    _GLOBAL_CODECARBON_TRACKER.start()


def codecarbon_tracker_stop():
    global _GLOBAL_CODECARBON_TRACKER
    if _GLOBAL_CODECARBON_TRACKER is None:
        return

    print('codecarbon STOP')
    _GLOBAL_CODECARBON_TRACKER.stop()


def codecarbon_tracker_restart():
    global _GLOBAL_CODECARBON_TRACKER
    if _GLOBAL_CODECARBON_TRACKER is None:
        return

    # output_dir = _GLOBAL_CODECARBON_TRACKER._output_dir
    # output_file = _GLOBAL_CODECARBON_TRACKER._output_file
    # log_level = _GLOBAL_CODECARBON_TRACKER._log_level
    # country_iso_code = _GLOBAL_CODECARBON_TRACKER._country_iso_code

    codecarbon_tracker_stop()
    codecarbon_tracker_start()

Otherwise I have to re-create the tracker all the time, but to do that, I can't even access the args passed to the tracker without going to private variables.

even better if there were an API method restart() which simply flushes the data to the file and continues. That would probably be a simpler change to do.

Thank you!

@JetRunner, @whobbes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions