Rebase AWS dev on top of current ROOT master#3
Open
vepadulano wants to merge 577 commits intoCloudPyRDF:AWS-devfrom
Open
Rebase AWS dev on top of current ROOT master#3vepadulano wants to merge 577 commits intoCloudPyRDF:AWS-devfrom
vepadulano wants to merge 577 commits intoCloudPyRDF:AWS-devfrom
Conversation
- Remove unused `import pty`
- Disable stream capture (currently not working on Windows). The output coming from C++ (ROOT) will only print in the command prompt and not in the notebook (to be investigated)
- Disable tab completion (currently not working on Windows).
- Add the `delete=False` argument in `NamedTemporaryFile()` and manually delete the created file to prevent the following error:
```
Error in callback <bound method CaptureDrawnPrimitives._post_execute of <JupyROOT.helpers.utils.CaptureDrawnPrimitives object at 0x0C321C58>> (for post_execute):
---------------------------------------------------------------------------
PermissionError Traceback (most recent call last)
~\build\release\bin\JupyROOT\helpers\utils.py in _post_execute(self)
461
462 def _post_execute(self):
--> 463 NotebookDraw()
464
465 def register(self):
~\build\release\bin\JupyROOT\helpers\utils.py in NotebookDraw()
450 def NotebookDraw():
451 DrawGeometry()
--> 452 DrawCanvases()
453 DrawRCanvases()
454
~\build\release\bin\JupyROOT\helpers\utils.py in DrawCanvases()
441 drawers = GetCanvasDrawers()
442 for drawer in drawers:
--> 443 drawer.Draw()
444
445 def DrawRCanvases():
~\build\release\bin\JupyROOT\helpers\utils.py in Draw(self)
598
599 def Draw(self):
--> 600 self._display()
601 return 0
602
~\build\release\bin\JupyROOT\helpers\utils.py in _display(self)
583 self._jsDisplay()
584 else:
--> 585 self._pngDisplay()
586
587 def GetDrawableObjects(self):
~\build\release\bin\JupyROOT\helpers\utils.py in _pngDisplay(self)
572
573 def _pngDisplay(self):
--> 574 img = self._getPngImage()
575 IPython.display.display(img)
576
~\build\release\bin\JupyROOT\helpers\utils.py in _getPngImage(self)
566 with _setIgnoreLevel(ROOT.kError):
567 self.drawableObject.SaveAs(ofile.name)
--> 568 img = IPython.display.Image(filename=ofile.name, format='png', embed=True)
569 #ofile.close()
570 #os.unlink(ofile.name)
c:\python38-32\lib\site-packages\IPython\core\display.py in __init__(self, data, url, filename, format, embed, width, height, retina, unconfined, metadata)
1229 self.retina = retina
1230 self.unconfined = unconfined
-> 1231 super(Image, self).__init__(data=data, url=url, filename=filename,
1232 metadata=metadata)
1233
c:\python38-32\lib\site-packages\IPython\core\display.py in __init__(self, data, url, filename, metadata)
635 self.metadata = {}
636
--> 637 self.reload()
638 self._check_data()
639
c:\python38-32\lib\site-packages\IPython\core\display.py in reload(self)
1261 """Reload the raw data from file or URL."""
1262 if self.embed:
-> 1263 super(Image,self).reload()
1264 if self.retina:
1265 self._retina_shape()
c:\python38-32\lib\site-packages\IPython\core\display.py in reload(self)
660 """Reload the raw data from file or URL."""
661 if self.filename is not None:
--> 662 with open(self.filename, self._read_flags) as f:
663 self.data = f.read()
664 elif self.url is not None:
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\sftnight\\AppData\\Local\\Temp\\tmpk49jx00w.png'
```
For the details, see the documentation of `NamedTemporaryFile` in [tempfile](https://docs.python.org/3/library/tempfile.html):
```
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
```
…elements from collections AddCollectionToRow() modified so that the first 10 elements of long collections are to be printed, then followed by "...". Previously, only the first and last elements of collections of size >=3 were printed. Print() logic is now simplified, since after "..." there is no guarantee that there is going to be a next element. Additional parameter to the constructor of the the RDisplay() was added, that is the nMaxCollectionElements. It determines, number of elements to be shown from long collection. Default is 10. Corresponding tests were adapted and additional ones were added to check the new functionality. Regarding root-project#6981 - to display the whole collection, please use AsString(). Co-authored-by: Enrico Guiraud <enrico.guiraud@cern.ch>
Instead of suppressing the emission of weak symbols that have an existing in-process definition, remove the definition of the weak symbol from the module. This prevents relocation "surprises" as seen e.g. on Apple M1, where weak symbol definitions are expected to be found within the module, and relocations are thus expected to be local. By removing the definition, the symbol becomes a reference (declaration) and relocation will work as for any non-local symbol. Thus also reduces the amount of duplicate work, as in-process weak symbols will now be suppressed early.
Fixes test failures (crashes) in
projectroot.test.test_stressiterators_interpreted
projectroot.roottest.cling.dict.roottest_cling_dict_assertTmpltDefArgCtor
projectroot.test.test_stresshistogram_interpreted
projectroot.roottest.root.meta.roottest_root_meta_runnamespace_auto
This is now hit for variables on Windows (which don't see the GlobalVariable case). Before the `#ifndef WIN32`, this case was not hit by roottest, neither on Windows nor macOS nor Linux - so the coverage of "make it a declaration" is good.
(cherry picked from commit 5af1fa4)
Co-authored-by: Vincenzo Eduardo Padulano <v.e.padulano@gmail.com>
This reverts commit a9ef733.
* Fix "an histogram" typos in TH1 documentation * Small grammar fixes in TH1 documentation
- remove useless `ioctlsocket` on a file decriptor which is not a socket. See the Microsoft [_pipe documentation](https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/pipe?view=msvc-160): > A pipe resembles a file because it has a file pointer, a file descriptor, or both, and it can be read from or written to by using the Standard Library input and output functions. However, a pipe does not represent a specific file or device. Instead, it represents temporary storage in memory that is independent of the program's own memory and is controlled entirely by the operating system. - use a large buffer and use `_fstat` on the file decriptor to check if there is something to read from the pipe. This to prevent blocking. See also the Microsoft [_pipe documentation](https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/pipe?view=msvc-160): > All read and write operations on the pipe wait until there is enough data or enough buffer space to complete the I/O request.
Following the last commit, the `StreamCapture()` can now be enabled on Windows
Convert all backslashes into forward slashes. This fixes the following startup exception, due to mixing forward and back slashes:
```
[E 10:34:29.657 NotebookApp] Exception while loading config file C:\Users\sftnight\.rootnb\jupyter_notebook_config.py
Traceback (most recent call last):
File "C:\Python38-32\lib\site-packages\traitlets\config\application.py", line 738, in _load_config_files
config = loader.load_config()
File "C:\Python38-32\lib\site-packages\traitlets\config\loader.py", line 614, in load_config
self._read_file_as_dict()
File "C:\Python38-32\lib\site-packages\traitlets\config\loader.py", line 646, in _read_file_as_dict
exec(compile(f.read(), conf_filename, 'exec'), namespace, namespace)
File "C:\Users\sftnight\.rootnb\jupyter_notebook_config.py", line 6
c.NotebookApp.extra_static_paths = ['C:/Users/sftnight/build/release\js\']
```
In df1a5a6, a memory leak when reading RooRealVars from ROOT files was fixed by introducing a `RooRealVar::_sharedPropList` map that holds `std::weak_ptr`s to all shared properties objects that were created by any RooRealVar. However, a mechanism to cleanup the weak pointers from the map when they were explired was still missing, which caused the `RooRealVar::_sharedPropList` map to grow forever and by a lot, especially in toy studies where many datasets are created (in the RooFit dataset constructors, RooAbsReals are cloned, which triggers the creation of the shared properties object). In this commit, the `RooRealVar::_sharedPropList` map is cleaned from expiring pointers in the RooRealVar destructor, which should fix a bunch of RooFit problems that seemed like memory leaks. To avoid the static initialization order fiasko, the `_sharedPropList` is now allocated on the heap and only deleted when the process exits using the `RooSentinel` mechanism that is already used for the RooFit memory pools.
The two classes that derive from `RooSharedProperties` are `RooRealVarSharedProperties` and `RooCategorySharedProperties`. For both of them, copying is discouraged because it makes no sense to copy a shared object like this (instead, one should copy `std::shared_ptr`s to that object). To warn users who accidentally copied a shared properties object, there were some print outs in the copy constructors of `RooRealVarSharedProperties` and `RooCategorySharedProperties`, but it is better to delete the copy constructor/assignment operator to begin with. This is suggested in this commit. The commit further makes some code slimplification by removing the source files for the shared properties object and inlining everyithing in the header, at the cost only and added inclusion of `RooAbsBinning.h` in `RooRealVarSharedProperties.h`. This is acceptable, because `RooRealVarSharedProperties.h` is an implementation detail of `RooRealVar` and only included in `RooRealVar.cxx`.
should have no effect as it will be anyway overridden by whatever CodeGen has.
Fixes CodeGen for opt levels > 0.
This is hit by stressShapes.cxx in WIN32 where `vshape[ivol-1]` gets accessed. It is a broken assert, as relocation addresses can point outside the section as long as the calculated address points inside the section. This is https://bugs.llvm.org/show_bug.cgi?id=30584 It will be superseded by switching to JITLink.
When determining whether a weak symbol was already emitted (and its subsequent definition can thus be replaced by a declaration) look for emitted symbols in the symbol table and the in-process symbols, instead of possibly generating symbols through the JIT. This prevents symbols being emitted that refer into IncrementalExecutor::m_PendingModules elements that have since been removed.
`RunFunction()` might trigger unloading of `lastT`, so store its setting for "want value printing" in a local variable before calling RunFunction(). Fixes valgrind complaining about `./stressInterpreter` which is reloading the "outer" stressInterpreter.cxx from inside `RunFunction()`.
This reverts commit 0cdfa69. We need to hide json includes: modules cannot cope with them, and it introduces a runtime requirement to find json.
This reverts commit 89c5d0c. This file does not exist on all platforms for external nlohmann. Will have to use "proper" fwd decls instead.
This reverts commit 1c38aa0. We cannot rely on json_fwd (does not exist for all nlohman packages). Will have to use proper fwd decls instead.
This commit implements a new backend for distributed RDataFrame that allows sending computations to a [Dask](https://dask.org/) cluster. For example: ```python import ROOT from dask.distributed import Client RDataFrame = ROOT.RDF.Experimental.Distributed.Dask.RDataFrame if __name__ == "__main__": client = Client("SCHEDULER_ADDRESS") df = RDataFrame("mytree","myfile.root", daskclient=client) df.Define("x","someoperation").Histo1D("x") ``` The implementation of the Dask backend uses the `dask.delayed` API to make the mapper and reducer functions of the `BaseBackend` lazy and builds a MapReduce computation tree: each lazy mapper is assigned to one DistRDF range, then any two mappers are reduced into one object until there is only the final result left. The result of this computation is returned to the user as usual. Similarly to the Spark backend, the user can specify an optional object representing the connection to the cluster (in this case with the `daskclient` argument). If it is not given by the user, the dataframe will create a default one connected to a `LocalCluster` instance with N worker processes, where N is the total number of logical cores on the user machine. The Dask backend will try to optimize the `npartitions` argument if it is not passed by the user, by setting it equal to the total number of available cores in the Dask cluster at the moment of the initialization of the dataframe. If this number is not available, it will return the default minimum number of partitions (2). Processing with Dask always involves spawning a few extra Python threads (used internally by the tool e.g. for communication). This required changing the triggering of the RDataFrame computation graph inside each distributed task. This now happens with the Python GIL released and with `ROOT.EnableThreadSafety()` called, to make sure that no two tasks can step on each other (either two C++ tasks or something else like a Dask thread calling into PyROOT for garbage collection). Also, `ROOT.gROOT.SetBatch(True)` is now called at the start of each distributed task, since the distributed workers won't need ROOT graphics while processing. Using the Dask backend from a Python script will print a progress bar in the terminal, as it happens with Spark. The same functionality cannot be reproduced in a notebook, since the Dask docs advertise to put the `progressbar` function as the last call in a notebook cell. But the `DaskBackend` class wraps this call inside the `ProcessAndMerge` function so the progress bar can't be shown for now. The `mapper` and `reducer` functions have been reworked so that all types of result pointers coming from an execution of the computation graph can be treated equally. These are usually `RResultPtr`, but there are special cases for `AsNumpy` and `Snapshot` which were so far treated with `if else` conditions. This commit uses `functools.singledispatch` to define a function that behaves differently based on the type of the result pointer and can thus both return the correct type of mergeable value and call the proper merge function. Co-authored-by: Enric Tejedor Saavedra <enric.tejedor.saavedra@cern.ch>
A new build option has been added, similarly to what was done for test_distrdf_pyspark. The option checks at configure time that the `dask` and `distributed` Python packages are installed on the system and available to the Python interpreter that is being used for the ROOT build. The two packages have also been added to the requirements.txt file with the appropriate comment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Lately there have been many changes in DistRDF in master. I suggest this fork is rebased with upstream. This PR is just a reminder