Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 0 additions & 30 deletions DESIGN.rst

This file was deleted.

99 changes: 81 additions & 18 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
An overview of BoxKit is available in ``paper/paper.md`` that provides a
summary and statement of need for the package. You can compile it into a
pdf by running ``make`` in the ``paper`` directory. Please note that the
``Makefile`` requires a functioning Docker service on the machine.
``Makefile`` requires Docker to be installed and running on the machine.

**************
Installation
Expand Down Expand Up @@ -82,6 +82,15 @@ source code and is an effective method for debugging. Note that the

pip install click

The ``setup`` command acts a wrapper over ``setup.py`` to provide a
developer friendly interface. The ``--help`` option provides
instructions on how to configure installation with different options,

.. code::

./setup --help
./setup develop --help

*******
Usage
*******
Expand All @@ -98,21 +107,84 @@ be read by executing,

.. code:: python

# Read dataset from a Flash-X simulation
dset = boxkit.read_dataset(path_to_hdf5_file, source="flash")

New datasets can be created using the ``create_dataset`` method

.. code:: python

dset = boxkit.create_dataset(*args, **kwargs)
# Create a dataset using custom attributes
dset = boxkit.create_dataset(**attributes)

Following is an example on how to create a block-structured dataset in
BoxKit and use its interface. Similar functionality exists for datasets
that are read from a simulation source like Flash-X
(https://flash-x.org)

.. code:: python

# Create a two-dimensional dataset with 25 blocks of size 4x4
dset = boxkit.create_dataset(xmin=0,xmax=1,ymin=0,ymax=1,nxb=4,nyb=4,nblockx=5,nblocky=5)

.. code::

print(dset)

Dataset:
- type : <class 'boxkit.library._dataset.Dataset'>
- file : None
- keys : []
- dtype : []
- bound(z-y-x) : [0.0, 1.0] x [0.0, 0.8] x [0.0, 1.6]
- shape(z-y-x) : 1 x 4 x 4
- guard(z-y-x) : 0 x 0 x 0
- nblocks : 25
- dtype : {}

Next add a solution variable using,

.. code:: python

# Add a solution variable to the dataset
dset.addvar("soln")

This creates a numpy memmap for solution variable and stores it on disk.
The data can be accessed directly using ``dset["soln"]``. When dataset
is read from HDF5 source using ``read_dataset``, like Flash-X
simulations, then its representation on the disk is in the form of
``h5py`` objects.

.. code::

print(numpy.shape(dset["soln"])
(25, 1, 4, 4)

The example dataset here contains 25 blocks that are arranged using a
space-filling morton order as below,

|morton|

Solution data local to individual blocks can be accessed by looping over
a dataset's ``blocklist``

.. code:: python

for block in dset.blocklist:
print(block["soln"])

A full of list of arguments can be found in the documentation.
For instructions on using parallelization wrapper please read
``paper/paper.md``. Detailed information on full functionality is
availabe in documentation (https://akashdhruv.github.io/BoxKit/).

*************
Performance
*************
**************
Contribution
**************

|performance|
Developers are encouraged to fork the repository and contribute to the
source code in the form of pull requests to the ``development`` branch.
Please read documentation (https://akashdhruv.github.io/BoxKit/) for an
overview of software design and developer guide

*********
Testing
Expand Down Expand Up @@ -146,15 +218,6 @@ for an example.
url = {https://doi.org/10.5281/zenodo.8063195}
}

**************
Contribution
**************

Developers are encouraged to fork the repository and contribute to the
source code in the form of pull requests to the ``development`` branch.
Please read ``DESIGN.rst`` for an overview of software design and
developer guide

****************
Help & Support
****************
Expand All @@ -178,5 +241,5 @@ features, and ask questions about usage
.. |icon| image:: ./media/icon.svg
:width: 30

.. |performance| image:: ./media/performance.png
:width: 1000
.. |morton| image:: ./media/morton.png
:width: 150
Binary file added media/morton.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion media/workflow.drawio

Large diffs are not rendered by default.

Binary file modified media/workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 7 additions & 1 deletion paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ @dataset{HASSAN2023
}

@misc{argonne,
author = {{ANL}},
title = {{ANL}},
year = 2023,
url = {https://www.anl.gov/topic/business/laboratory-directed-research-and-development-ldrd}
}
Expand All @@ -62,3 +62,9 @@ @ARTICLE{yt
adsurl = {http://adsabs.harvard.edu/abs/2011ApJS..192....9T},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

@misc{summit,
title = {{ORNL}},
year = 2023,
howpublished = {\url{https://www.olcf.ornl.gov/summit/}},
}
54 changes: 46 additions & 8 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Non-Uniform Memory Access (NUMA) and distributed computing architectures.

Simulation sofware instruments like Flash-X [@DUBEY2022] store output in
the form of Hierarchical Data Format (HDF5) datasets. Each dataset is often
terabytes (TB) in size and requires cache efficient techniques to enable its
gigabytes (GB) in size and requires cache efficient techniques to enable its
integration with Python packages. BoxKit datastructures act as a wrapper around
simulation output stored in HDF5 files and provide metadata for AMR blocks that
describe the simulation domain. The wrapper objects are lightweight in nature and
Expand All @@ -45,14 +45,14 @@ application to numerical simulations.

![BoxKit is designed to integrate simulation software instruments like Flash-X
with Python-based machine learning and data analysis packages. Large simulation
datasets (~TB) can leverage BoxKit to improve performance of offline training/analysis.
datasets (~10 GB) can leverage BoxKit to improve performance of offline training/analysis.
This mechanism is part of a broader workflow to integrate simulations with machine
learning using a Fortran-Python bridge shown with dotted lines. \label{fig:workflow}](../media/workflow.png)

BoxKit also offers wrappers to scale the process of deploying workflows on NUMA and distributed
computing architectures by providing decorators that can parallelize Python operations over a
single datastructure to operate over a list. This can be understood better using the
workflow described in Figure \autoref{fig:workflow} that has been applied to data analysis and
single data structure to operate over a list. This can be understood better using the
workflow described in \autoref{fig:workflow} that has been applied to data analysis and
machine learning applications in chemical and thermal science engineering [@DHRUV2023; @HASSAN2023].
Output from Flash-X boiling simulations is created and stored on multinode clusters. Processing
this output through BoxKit allows for scaling a simple operation over block to a list of blocks as
Expand All @@ -69,14 +69,52 @@ def operation_on_block(block, *args):
operation_on_block((block for block in list_of_blocks), *args)
```

The `Action` wrapperer converts the function, `operation_on_block`, into a parallel method which
The `Action` wrapper converts the function, `operation_on_block`, into a parallel method which
can be deployed on a multinode cluster with the desired backend (JobLib/Dask). BoxKit does not
interfere with parallelization schema of target applications like SciKit, OpticalFlow, and PyTorch
which function independently using available resources.

We aim to use BoxKit as part of a broader workflow that integrates Fortran/C++ based applications
with state-of-art machine learning packages available in Python, described using dotted line in
Figure \autoref{fig:workflow}.
![Preliminary performance analysis of BoxKit on a single
22 core IBM Power9 node (L1 cache - 32+32 kilobytes (KiB) per core,
L2 cache - 512 KiB per core) for operations involving
calculation of temporal mean across multiple datasets (left),
and merging block-structured AMR datasets into contiguous
arrays (right). \label{fig:performance}](../media/performance.png)

\autoref{fig:performance} provides results of performance tests performed
on a single 22 core node on Summit [@summit] for two basic operations:
(1) Calculation of temporal mean of heat flux in Flash-X boiling simulations
$q(x,y,z,t)$, and (2) A block merger operations to convert AMR data into contiguous
arrays.

Calculation of temporal mean requires operation on data across multiple
datasets, with each dataset approximately 10 GB in size. Following is
the mathematical representation of the problem where $Nt$ represents the
total number of datasets,

\begin{equation}\label{eq:mean}
\overline q = \frac{\sum_{n=1}^{Nt} q(x,y,z,t)}{Nt}
\end{equation}

Loading all the datasets into cache memory at the same time is very
inefficient for this problem and requires use of BoxKit's metadata
wrappers to efficiently load data chunks from disk, operate locally in space,
and scale its computation across multiple threads. Based on the graph in
\autoref{fig:performance} the parallel performance scales better as $Nt$
increases.

Mapping of AMR data to contingous arrays becomes important for applications
where global operations in space are required. An example of this is SciKit's
``skimage_measure`` method, which can be used to measure bubble shape and size
for Flash-X boiling simulations. BoxKit improves performance of this operation
by ~5x.

# Ongoing work

Our ongoing work focuses on using BoxKit to improve performance of Scientific
Machine Learning (SciML) applications and using it as part of a broader workflow
that integrates Fortran/C++ based applications with state-of-art machine learning
packages available in Python shown by dotted lines in \autoref{fig:workflow}.

# Acknowledgements

Expand Down