Box-Tools · akashdhruv · Jun 21, 2023 · Jun 21, 2023 · Jun 21, 2023 · Jun 21, 2023
diff --git a/DESIGN.rst b/DESIGN.rst
diff --git a/README.rst b/README.rst
@@ -13,7 +13,7 @@
 An overview of BoxKit is available in ``paper/paper.md`` that provides a
 summary and statement of need for the package. You can compile it into a
 pdf by running ``make`` in the ``paper`` directory. Please note that the
-``Makefile`` requires a functioning Docker service on the machine.
+``Makefile`` requires Docker to be installed and running on the machine.
 
 **************
  Installation
@@ -82,6 +82,15 @@ source code and is an effective method for debugging. Note that the
 
    pip install click
 
+The ``setup`` command acts a wrapper over ``setup.py`` to provide a
+developer friendly interface. The ``--help`` option provides
+instructions on how to configure installation with different options,
+
+.. code::
+
+   ./setup --help
+   ./setup develop --help
+
 *******
  Usage
 *******
@@ -98,21 +107,84 @@ be read by executing,
 
 .. code:: python
 
+   # Read dataset from a Flash-X simulation
    dset = boxkit.read_dataset(path_to_hdf5_file, source="flash")
 
 New datasets can be created using the ``create_dataset`` method
 
 .. code:: python
 
-   dset = boxkit.create_dataset(*args, **kwargs)
+   # Create a dataset using custom attributes
+   dset = boxkit.create_dataset(**attributes)
+
+Following is an example on how to create a block-structured dataset in
+BoxKit and use its interface. Similar functionality exists for datasets
+that are read from a simulation source like Flash-X
+(https://flash-x.org)
+
+.. code:: python
+
+   # Create a two-dimensional dataset with 25 blocks of size 4x4
+   dset = boxkit.create_dataset(xmin=0,xmax=1,ymin=0,ymax=1,nxb=4,nyb=4,nblockx=5,nblocky=5)
+
+.. code::
+
+   print(dset)
+
+   Dataset:
+   - type         : <class 'boxkit.library._dataset.Dataset'>
+   - file         : None
+   - keys         : []
+   - dtype      : []
+   - bound(z-y-x) : [0.0, 1.0] x [0.0, 0.8] x [0.0, 1.6]
+   - shape(z-y-x) : 1 x 4 x 4
+   - guard(z-y-x) : 0 x 0 x 0
+   - nblocks      : 25
+   - dtype        : {}
+
+Next add a solution variable using,
+
+.. code:: python
+
+   # Add a solution variable to the dataset
+   dset.addvar("soln")
+
+This creates a numpy memmap for solution variable and stores it on disk.
+The data can be accessed directly using ``dset["soln"]``. When dataset
+is read from HDF5 source using ``read_dataset``, like Flash-X
+simulations, then its representation on the disk is in the form of
+``h5py`` objects.
+
+.. code::
+
+   print(numpy.shape(dset["soln"])
+   (25, 1, 4, 4)
+
+The example dataset here contains 25 blocks that are arranged using a
+space-filling morton order as below,
+
+|morton|
+
+Solution data local to individual blocks can be accessed by looping over
+a dataset's ``blocklist``
+
+.. code:: python
+
+   for block in dset.blocklist:
+       print(block["soln"])
 
-A full of list of arguments can be found in the documentation.
+For instructions on using parallelization wrapper please read
+``paper/paper.md``. Detailed information on full functionality is
+availabe in documentation (https://akashdhruv.github.io/BoxKit/).
 
-*************
- Performance
-*************
+**************
+ Contribution
+**************
 
-|performance|
+Developers are encouraged to fork the repository and contribute to the
+source code in the form of pull requests to the ``development`` branch.
+Please read documentation (https://akashdhruv.github.io/BoxKit/) for an
+overview of software design and developer guide
 
 *********
  Testing
@@ -146,15 +218,6 @@ for an example.
      url          = {https://doi.org/10.5281/zenodo.8063195}
    }
 
-**************
- Contribution
-**************
-
-Developers are encouraged to fork the repository and contribute to the
-source code in the form of pull requests to the ``development`` branch.
-Please read ``DESIGN.rst`` for an overview of software design and
-developer guide
-
 ****************
  Help & Support
 ****************
@@ -178,5 +241,5 @@ features, and ask questions about usage
 .. |icon| image:: ./media/icon.svg
    :width: 30
 
-.. |performance| image:: ./media/performance.png
-   :width: 1000
+.. |morton| image:: ./media/morton.png
+   :width: 150
diff --git a/media/morton.png b/media/morton.png
diff --git a/media/workflow.drawio b/media/workflow.drawio
diff --git a/media/workflow.png b/media/workflow.png
diff --git a/paper/paper.bib b/paper/paper.bib
@@ -39,7 +39,7 @@ @dataset{HASSAN2023
 }
 
 @misc{argonne,
-  author = {{ANL}},
+  title = {{ANL}},
   year = 2023,
   url = {https://www.anl.gov/topic/business/laboratory-directed-research-and-development-ldrd}
 }
@@ -62,3 +62,9 @@ @ARTICLE{yt
    adsurl = {http://adsabs.harvard.edu/abs/2011ApJS..192....9T},
   adsnote = {Provided by the SAO/NASA Astrophysics Data System}
 }
+
+@misc{summit,
+  title = {{ORNL}},
+  year = 2023,
+  howpublished = {\url{https://www.olcf.ornl.gov/summit/}},
+}
diff --git a/paper/paper.md b/paper/paper.md
@@ -33,7 +33,7 @@ Non-Uniform Memory Access (NUMA) and distributed computing architectures.
 
 Simulation sofware instruments like Flash-X [@DUBEY2022] store output in 
 the form of Hierarchical Data Format (HDF5) datasets. Each dataset is often
-terabytes (TB) in size and requires cache efficient techniques to enable its 
+gigabytes (GB) in size and requires cache efficient techniques to enable its 
 integration with Python packages. BoxKit datastructures act as a wrapper around 
 simulation output stored in HDF5 files and provide metadata for AMR blocks that 
 describe the simulation domain. The wrapper objects are lightweight in nature and
@@ -45,14 +45,14 @@ application to numerical simulations.
 
 ![BoxKit is designed to integrate simulation software instruments like Flash-X 
 with Python-based machine learning and data analysis packages. Large simulation 
-datasets (~TB) can leverage BoxKit to improve performance of offline training/analysis. 
+datasets (~10 GB) can leverage BoxKit to improve performance of offline training/analysis. 
 This mechanism is part of a broader workflow to  integrate simulations with machine 
 learning using a Fortran-Python bridge shown with dotted lines. \label{fig:workflow}](../media/workflow.png)
 
 BoxKit also offers wrappers to scale the process of deploying workflows on NUMA and distributed
 computing architectures by providing decorators that can parallelize Python operations over a
-single datastructure to operate over a list. This can be understood better using the 
-workflow described in Figure \autoref{fig:workflow} that has been applied to data analysis and 
+single data structure to operate over a list. This can be understood better using the 
+workflow described in \autoref{fig:workflow} that has been applied to data analysis and 
 machine learning applications in chemical and thermal science engineering [@DHRUV2023; @HASSAN2023].
 Output from Flash-X boiling simulations is created and stored on multinode clusters. Processing 
 this output through BoxKit allows for scaling a simple operation over block to a list of blocks as
@@ -69,14 +69,52 @@ def operation_on_block(block, *args):
 operation_on_block((block for block in list_of_blocks), *args)
 ```
 
-The `Action` wrapperer converts the function, `operation_on_block`, into a parallel method which 
+The `Action` wrapper converts the function, `operation_on_block`, into a parallel method which 
 can be deployed on a multinode cluster with the desired backend (JobLib/Dask). BoxKit does not
 interfere with parallelization schema of target applications like SciKit, OpticalFlow, and PyTorch 
 which function independently using available resources.
 
-We aim to use BoxKit as part of a broader workflow that integrates Fortran/C++ based applications
-with state-of-art machine learning packages available in Python, described using dotted line in 
-Figure \autoref{fig:workflow}.
+![Preliminary performance analysis of BoxKit on a single 
+22 core IBM Power9 node (L1 cache - 32+32 kilobytes (KiB) per core, 
+L2 cache - 512 KiB per core) for operations involving 
+calculation of temporal mean across multiple datasets (left), 
+and merging block-structured AMR datasets into contiguous 
+arrays (right). \label{fig:performance}](../media/performance.png)
+
+\autoref{fig:performance} provides results of performance tests performed 
+on a single 22 core node on Summit [@summit] for two basic operations: 
+(1) Calculation of temporal mean of heat flux in Flash-X boiling simulations 
+$q(x,y,z,t)$, and (2) A block merger operations to convert AMR data into contiguous 
+arrays.
+
+Calculation of temporal mean requires operation on data across multiple 
+datasets, with each dataset approximately 10 GB in size. Following is 
+the mathematical representation of the problem where $Nt$ represents the 
+total number of datasets,
+
+\begin{equation}\label{eq:mean}
+\overline q = \frac{\sum_{n=1}^{Nt} q(x,y,z,t)}{Nt}
+\end{equation}
+
+Loading all the datasets into cache memory at the same time is very 
+inefficient for this problem and requires use of BoxKit's metadata
+wrappers to efficiently load data chunks from disk, operate locally in space, 
+and scale its computation across multiple threads. Based on the graph in 
+\autoref{fig:performance} the parallel performance scales better as $Nt$
+increases.
+
+Mapping of AMR data to contingous arrays becomes important for applications
+where global operations in space are required. An example of this is SciKit's 
+``skimage_measure`` method, which can be used to measure bubble shape and size 
+for Flash-X boiling simulations. BoxKit improves performance of this operation 
+by ~5x.
+
+# Ongoing work
+
+Our ongoing work focuses on using BoxKit to improve performance of Scientific
+Machine Learning (SciML) applications and using it as part of a broader workflow 
+that integrates Fortran/C++ based applications with state-of-art machine learning 
+packages available in Python shown by dotted lines in \autoref{fig:workflow}.
 
 # Acknowledgements