Skip to content

Nanny reporting errors - may be working anyway? #5386

@keflavich

Description

@keflavich

I'm seeing a slew of messages like this in a log file:

Traceback (most recent call last):
  File "/orange/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/tornado/ioloop.py", line 905, in _run
    return self.callback()
  File "/orange/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/distributed/nanny.py", line 417, in memory_monitor
    process = self.process.process
AttributeError: 'NoneType' object has no attribute 'process'
tornado.application - ERROR - Exception in callback <bound method Nanny.memory_monitor of <Nanny: None, threads: 8>>

This is on a supercomputer with slurm running job control. These messages only appear if I run the job in batch mode, not if I run them interactively.

The code I'm using is here: https://github.com/ALMA-IMF/reduction/blob/master/analysis/cube_stats_grid.py

A dask LocalCluster is started with:

                cluster = LocalCluster(n_workers=1,
                                       threads_per_worker=int(nthreads),
                                       memory_target_fraction=0.95,
                                       memory_limit=memlimit)
                client = Client(cluster)

after which the actual code being run is handled by spectral-cube

versions:

dask==2021.1.1
dask-image==0.6.0
dask-jobqueue==0.7.2

Is this a bug that I can fix? Is there a workaround to silence these error messages? I can't tell if the code is running successfully or not both because there are too many errors to search through and because the code is getting killed for different reasons.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions