Skip to content

Rebase AWS dev on top of current ROOT master#3

Open
vepadulano wants to merge 577 commits intoCloudPyRDF:AWS-devfrom
vepadulano:AWS-dev
Open

Rebase AWS dev on top of current ROOT master#3
vepadulano wants to merge 577 commits intoCloudPyRDF:AWS-devfrom
vepadulano:AWS-dev

Conversation

@vepadulano
Copy link

Lately there have been many changes in DistRDF in master. I suggest this fork is rebased with upstream. This PR is just a reminder

alja and others added 30 commits October 12, 2021 12:34
 - Remove unused `import pty`
 - Disable stream capture (currently not working on Windows). The output coming from C++ (ROOT) will only print in the command prompt and not in the notebook (to be investigated)
 - Disable tab completion (currently not working on Windows).
 - Add the `delete=False` argument in `NamedTemporaryFile()` and manually delete the created file to prevent the following error:
    ```
    Error in callback <bound method CaptureDrawnPrimitives._post_execute of <JupyROOT.helpers.utils.CaptureDrawnPrimitives object at 0x0C321C58>> (for post_execute):
    ---------------------------------------------------------------------------
    PermissionError                           Traceback (most recent call last)
    ~\build\release\bin\JupyROOT\helpers\utils.py in _post_execute(self)
        461
        462     def _post_execute(self):
    --> 463         NotebookDraw()
        464
        465     def register(self):

    ~\build\release\bin\JupyROOT\helpers\utils.py in NotebookDraw()
        450 def NotebookDraw():
        451     DrawGeometry()
    --> 452     DrawCanvases()
        453     DrawRCanvases()
        454

    ~\build\release\bin\JupyROOT\helpers\utils.py in DrawCanvases()
        441     drawers = GetCanvasDrawers()
        442     for drawer in drawers:
    --> 443         drawer.Draw()
        444
        445 def DrawRCanvases():

    ~\build\release\bin\JupyROOT\helpers\utils.py in Draw(self)
        598
        599     def Draw(self):
    --> 600         self._display()
        601         return 0
        602

    ~\build\release\bin\JupyROOT\helpers\utils.py in _display(self)
        583             self._jsDisplay()
        584          else:
    --> 585             self._pngDisplay()
        586
        587     def GetDrawableObjects(self):

    ~\build\release\bin\JupyROOT\helpers\utils.py in _pngDisplay(self)
        572
        573     def _pngDisplay(self):
    --> 574         img = self._getPngImage()
        575         IPython.display.display(img)
        576

    ~\build\release\bin\JupyROOT\helpers\utils.py in _getPngImage(self)
        566         with _setIgnoreLevel(ROOT.kError):
        567             self.drawableObject.SaveAs(ofile.name)
    --> 568         img = IPython.display.Image(filename=ofile.name, format='png', embed=True)
        569         #ofile.close()
        570         #os.unlink(ofile.name)

    c:\python38-32\lib\site-packages\IPython\core\display.py in __init__(self, data, url, filename, format, embed, width, height, retina, unconfined, metadata)
       1229         self.retina = retina
       1230         self.unconfined = unconfined
    -> 1231         super(Image, self).__init__(data=data, url=url, filename=filename,
       1232                 metadata=metadata)
       1233

    c:\python38-32\lib\site-packages\IPython\core\display.py in __init__(self, data, url, filename, metadata)
        635             self.metadata = {}
        636
    --> 637         self.reload()
        638         self._check_data()
        639

    c:\python38-32\lib\site-packages\IPython\core\display.py in reload(self)
       1261         """Reload the raw data from file or URL."""
       1262         if self.embed:
    -> 1263             super(Image,self).reload()
       1264             if self.retina:
       1265                 self._retina_shape()

    c:\python38-32\lib\site-packages\IPython\core\display.py in reload(self)
        660         """Reload the raw data from file or URL."""
        661         if self.filename is not None:
    --> 662             with open(self.filename, self._read_flags) as f:
        663                 self.data = f.read()
        664         elif self.url is not None:

    PermissionError: [Errno 13] Permission denied: 'C:\\Users\\sftnight\\AppData\\Local\\Temp\\tmpk49jx00w.png'
    ```
    For the details, see the documentation of `NamedTemporaryFile` in [tempfile](https://docs.python.org/3/library/tempfile.html):
    ```
    Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
    ```
…elements from collections

AddCollectionToRow() modified so that the first 10 elements of long collections
are to be printed, then followed by "...". Previously, only the first and last
elements of collections of size >=3 were printed.

Print() logic is now simplified, since after "..." there is no guarantee
that there is going to be a next element.

Additional parameter to the constructor of the the RDisplay() was added,
that is the nMaxCollectionElements. It determines, number of elements to be
shown from long collection. Default is 10.

Corresponding tests were adapted and additional ones were added to check
the new functionality.

Regarding root-project#6981 - to display the whole collection, please use AsString().

Co-authored-by: Enrico Guiraud <enrico.guiraud@cern.ch>
Instead of suppressing the emission of weak symbols that have an existing
in-process definition, remove the definition of the weak symbol from the
module. This prevents relocation "surprises" as seen e.g. on Apple M1, where
weak symbol definitions are expected to be found within the module, and
relocations are thus expected to be local. By removing the definition, the
symbol becomes a reference (declaration) and relocation will work as for any
non-local symbol.

Thus also reduces the amount of duplicate work, as in-process weak symbols
will now be suppressed early.
Fixes test failures (crashes) in
    projectroot.test.test_stressiterators_interpreted
    projectroot.roottest.cling.dict.roottest_cling_dict_assertTmpltDefArgCtor
    projectroot.test.test_stresshistogram_interpreted
    projectroot.roottest.root.meta.roottest_root_meta_runnamespace_auto
This is now hit for variables on Windows (which don't see the GlobalVariable case).
Before the `#ifndef WIN32`, this case was not hit by roottest, neither on Windows
nor macOS nor Linux - so the coverage of "make it a declaration" is good.
Co-authored-by: Vincenzo Eduardo Padulano <v.e.padulano@gmail.com>
* Fix "an histogram" typos in TH1 documentation

* Small grammar fixes in TH1 documentation
 - remove useless `ioctlsocket` on a file decriptor which is not a socket. See the Microsoft [_pipe documentation](https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/pipe?view=msvc-160):
   > A pipe resembles a file because it has a file pointer, a file descriptor, or both, and it can be read from or written to by using the Standard Library input and output functions. However, a pipe does not represent a specific file or device. Instead, it represents temporary storage in memory that is independent of the program's own memory and is controlled entirely by the operating system.

 - use a large buffer and use `_fstat` on the file decriptor to check if there is something to read from the pipe. This to prevent blocking. See also the Microsoft [_pipe documentation](https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/pipe?view=msvc-160):
   > All read and write operations on the pipe wait until there is enough data or enough buffer space to complete the I/O request.
Following the last commit, the `StreamCapture()` can now be enabled on Windows
Convert all backslashes into forward slashes. This fixes the following startup exception, due to mixing forward and back slashes:
```
[E 10:34:29.657 NotebookApp] Exception while loading config file C:\Users\sftnight\.rootnb\jupyter_notebook_config.py
    Traceback (most recent call last):
      File "C:\Python38-32\lib\site-packages\traitlets\config\application.py", line 738, in _load_config_files
        config = loader.load_config()
      File "C:\Python38-32\lib\site-packages\traitlets\config\loader.py", line 614, in load_config
        self._read_file_as_dict()
      File "C:\Python38-32\lib\site-packages\traitlets\config\loader.py", line 646, in _read_file_as_dict
        exec(compile(f.read(), conf_filename, 'exec'), namespace, namespace)
      File "C:\Users\sftnight\.rootnb\jupyter_notebook_config.py", line 6
        c.NotebookApp.extra_static_paths = ['C:/Users/sftnight/build/release\js\']
```
vepadulano and others added 30 commits November 12, 2021 13:05
In df1a5a6, a memory leak when reading RooRealVars from ROOT files was
fixed by introducing a `RooRealVar::_sharedPropList` map that holds
`std::weak_ptr`s to all shared properties objects that were created by
any RooRealVar.

However, a mechanism to cleanup the weak pointers from the map when they
were explired was still missing, which caused the
`RooRealVar::_sharedPropList` map to grow forever and by a lot,
especially in toy studies where many datasets are created (in the RooFit
dataset constructors, RooAbsReals are cloned, which triggers the
creation of the shared properties object).

In this commit, the `RooRealVar::_sharedPropList` map is cleaned from
expiring pointers in the RooRealVar destructor, which should fix a bunch
of RooFit problems that seemed like memory leaks.

To avoid the static initialization order fiasko, the `_sharedPropList`
is now allocated on the heap and only deleted when the process exits
using the `RooSentinel` mechanism that is already used for the RooFit
memory pools.
The two classes that derive from `RooSharedProperties` are
`RooRealVarSharedProperties` and `RooCategorySharedProperties`.
For both of them, copying is discouraged because it makes no sense to
copy a shared object like this (instead, one should copy
`std::shared_ptr`s to that object).

To warn users who accidentally copied a shared properties object, there
were some print outs in the copy constructors of
`RooRealVarSharedProperties` and `RooCategorySharedProperties`, but it
is better to delete the copy constructor/assignment operator to begin
with. This is suggested in this commit.

The commit further makes some code slimplification by removing the
source files for the shared properties object and inlining everyithing
in the header, at the cost only and added inclusion of `RooAbsBinning.h`
in `RooRealVarSharedProperties.h`. This is acceptable, because
`RooRealVarSharedProperties.h` is an implementation detail of
`RooRealVar` and only included in `RooRealVar.cxx`.
should have no effect as it will be anyway overridden by whatever CodeGen has.
This is hit by stressShapes.cxx in WIN32 where `vshape[ivol-1]` gets accessed.
It is a broken assert, as relocation addresses can point outside the section as
long as the calculated address points inside the section.
This is https://bugs.llvm.org/show_bug.cgi?id=30584

It will be superseded by switching to JITLink.
When determining whether a weak symbol was already emitted (and its
subsequent definition can thus be replaced by a declaration) look
for emitted symbols in the symbol table and the in-process symbols,
instead of possibly generating symbols through the JIT. This prevents
symbols being emitted that refer into
IncrementalExecutor::m_PendingModules elements that have since been
removed.
`RunFunction()` might trigger unloading of `lastT`, so store its setting
for "want value printing" in a local variable before calling RunFunction().

Fixes valgrind complaining about `./stressInterpreter` which is reloading
the "outer" stressInterpreter.cxx from inside `RunFunction()`.
This reverts commit 0cdfa69.
We need to hide json includes: modules cannot cope with them, and it introduces a runtime requirement to find json.
This reverts commit 89c5d0c.
This file does not exist on all platforms for external nlohmann.
Will have to use "proper" fwd decls instead.
This reverts commit 1c38aa0.

We cannot rely on json_fwd (does not exist for all nlohman packages).
Will have to use proper fwd decls instead.
This commit implements a new backend for distributed RDataFrame that allows
sending computations to a [Dask](https://dask.org/) cluster. For example:

```python
import ROOT
from dask.distributed import Client
RDataFrame = ROOT.RDF.Experimental.Distributed.Dask.RDataFrame

if __name__ == "__main__":
    client = Client("SCHEDULER_ADDRESS")
    df = RDataFrame("mytree","myfile.root", daskclient=client)
    df.Define("x","someoperation").Histo1D("x")
```

The implementation of the Dask backend uses the `dask.delayed` API to make the
mapper and reducer functions of the `BaseBackend` lazy and builds a MapReduce
computation tree: each lazy mapper is assigned to one DistRDF range, then any
two mappers are reduced into one object until there is only the final result
left. The result of this computation is returned to the user as usual.

Similarly to the Spark backend, the user can specify an optional object
representing the connection to the cluster (in this case with the `daskclient`
argument). If it is not given by the user, the dataframe will create a default
one connected to a `LocalCluster` instance with N worker processes, where N is
the total number of logical cores on the user machine.

The Dask backend will try to optimize the `npartitions` argument if it is not
passed by the user, by setting it equal to the total number of available cores
in the Dask cluster at the moment of the initialization of the dataframe. If
this number is not available, it will return the default minimum number of
partitions (2).

Processing with Dask always involves spawning a few extra Python threads (used
internally by the tool e.g. for communication). This required changing the
triggering of the RDataFrame computation graph inside each distributed task.
This now happens with the Python GIL released and with
`ROOT.EnableThreadSafety()` called, to make sure that no two tasks can step on
each other (either two C++ tasks or something else like a Dask thread calling
into PyROOT for garbage collection).

Also, `ROOT.gROOT.SetBatch(True)` is now called at the start of each distributed
task, since the distributed workers won't need ROOT graphics while processing.

Using the Dask backend from a Python script will print a progress bar in the
terminal, as it happens with Spark. The same functionality cannot be reproduced
in a notebook, since the Dask docs advertise to put the `progressbar` function
as the last call in a notebook cell. But the `DaskBackend` class wraps this call
inside the `ProcessAndMerge` function so the progress bar can't be shown for
now.

The `mapper` and `reducer` functions have been reworked so that all types of
result pointers coming from an execution of the computation graph can be treated
equally. These are usually `RResultPtr`, but there are special cases for
`AsNumpy` and `Snapshot` which were so far treated with `if else` conditions.
This commit uses `functools.singledispatch` to define a function that behaves
differently based on the type of the result pointer and can thus both return the
correct type of mergeable value and call the proper merge function.

Co-authored-by: Enric Tejedor Saavedra <enric.tejedor.saavedra@cern.ch>
A new build option has been added, similarly to what was done for
test_distrdf_pyspark. The option checks at configure time that the `dask` and
`distributed` Python packages are installed on the system and available to the
Python interpreter that is being used for the ROOT build. The two packages have
also been added to the requirements.txt file with the appropriate comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.