From ac687948fbb681d7eb70a431f0af14da9f21556a Mon Sep 17 00:00:00 2001 From: Peter Kalverla Date: Mon, 18 Jan 2021 10:18:41 +0100 Subject: [PATCH 1/7] Consistent grammar of 'multi-model' in docs and auto-wrap lines --- doc/recipe/preprocessor.rst | 66 +++++++++++++++++-------------------- 1 file changed, 30 insertions(+), 36 deletions(-) diff --git a/doc/recipe/preprocessor.rst b/doc/recipe/preprocessor.rst index 6a5b72e673..84a5a03b0e 100644 --- a/doc/recipe/preprocessor.rst +++ b/doc/recipe/preprocessor.rst @@ -456,17 +456,17 @@ Missing values masks -------------------- Missing (masked) values can be a nuisance especially when dealing with -multimodel ensembles and having to compute multimodel statistics; different +multi-model ensembles and having to compute multi-model statistics; different numbers of missing data from dataset to dataset may introduce biases and -artificially assign more weight to the datasets that have less missing -data. This is handled in ESMValTool via the missing values masks: two types of -such masks are available, one for the multimodel case and another for the -single model case. +artificially assign more weight to the datasets that have less missing data. +This is handled in ESMValTool via the missing values masks: two types of such +masks are available, one for the multi-model case and another for the single +model case. -The multimodel missing values mask (``mask_fillvalues``) is a preprocessor step +The multi-model missing values mask (``mask_fillvalues``) is a preprocessor step that usually comes after all the single-model steps (regridding, area selection etc) have been performed; in a nutshell, it combines missing values masks from -individual models into a multimodel missing values mask; the individual model +individual models into a multi-model missing values mask; the individual model masks are built according to common criteria: the user chooses a time window in which missing data points are counted, and if the number of missing data points relative to the number of total data points in a window is less than a chosen @@ -492,11 +492,11 @@ See also :func:`esmvalcore.preprocessor.mask_fillvalues`. Common mask for multiple models ------------------------------- -It is possible to use ``mask_fillvalues`` to create a combined multimodel -mask (all the masks from all the analyzed models combined into a single -mask); for that purpose setting the ``threshold_fraction`` to 0 will not -discard any time windows, essentially keeping the original model masks and -combining them into a single mask; here is an example: +It is possible to use ``mask_fillvalues`` to create a combined multi-model mask +(all the masks from all the analyzed models combined into a single mask); for +that purpose setting the ``threshold_fraction`` to 0 will not discard any time +windows, essentially keeping the original model masks and combining them into a +single mask; here is an example: .. code-block:: yaml @@ -530,13 +530,12 @@ Horizontal regridding Regridding is necessary when various datasets are available on a variety of `lat-lon` grids and they need to be brought together on a common grid (for -various statistical operations e.g. multimodel statistics or for e.g. direct +various statistical operations e.g. multi-model statistics or for e.g. direct inter-comparison or comparison with observational datasets). Regridding is conceptually a very similar process to interpolation (in fact, the regridder engine uses interpolation and extrapolation, with various schemes). The primary difference is that interpolation is based on sample data points, while -regridding is based on the horizontal grid of another cube (the reference -grid). +regridding is based on the horizontal grid of another cube (the reference grid). The underlying regridding mechanism in ESMValTool uses the `cube.regrid() `_ @@ -651,20 +650,15 @@ Multi-model statistics ====================== Computing multi-model statistics is an integral part of model analysis and evaluation: individual models display a variety of biases depending on model -set-up, initial conditions, forcings and implementation; comparing model data -to observational data, these biases have a significantly lower statistical -impact when using a multi-model ensemble. ESMValTool has the capability of -computing a number of multi-model statistical measures: using the preprocessor -module ``multi_model_statistics`` will enable the user to ask for either a -multi-model ``mean``, ``median``, ``max``, ``min``, ``std``, and / or -``pXX.YY`` with a set of argument parameters passed to -``multi_model_statistics``. Percentiles can be specified like ``p1.5`` or -``p95``. The decimal point will be replaced by a dash in the output file. - -Note that current multimodel statistics in ESMValTool are local (not global), -and are computed along the time axis. As such, can be computed across a common -overlap in time (by specifying ``span: overlap`` argument) or across the full -length in time of each model (by specifying ``span: full`` argument). +set-up, initial conditions, forcings and implementation; comparing model data to +observational data, these biases have a significantly lower statistical impact +when using a multi-model ensemble. ESMValTool has the capability of computing a +number of multi-model statistical measures: using the preprocessor module +``multi_model_statistics`` will enable the user to ask for either a multi-model +``mean``, ``median``, ``max``, ``min``, ``std``, and / or ``pXX.YY`` with a set +of argument parameters passed to ``multi_model_statistics``. Percentiles can be +specified like ``p1.5`` or ``p95``. The decimal point will be replaced by a dash +in the output file. Restrictive computation is also available by excluding any set of models that the user will not want to include in the statistics (by setting ``exclude: @@ -681,7 +675,7 @@ days in a year may vary between calendars, (sub-)daily data are not supported. .. code-block:: yaml preprocessors: - multimodel_preprocessor: + multi_model_preprocessor: multi_model_statistics: span: overlap statistics: [mean, median] @@ -702,11 +696,11 @@ entry contains the resulting cube with the requested statistic operations. .. note:: - Note that the multimodel array operations, albeit performed in + Note that the multi-model array operations, albeit performed in per-time/per-horizontal level loops to save memory, could, however, be rather memory-intensive (since they are not performed lazily as yet). The Section on :ref:`Memory use` details the memory intake - for different run scenarios, but as a thumb rule, for the multimodel + for different run scenarios, but as a thumb rule, for the multi-model preprocessor, the expected maximum memory intake could be approximated as the number of datasets multiplied by the average size in memory for one dataset. @@ -1512,14 +1506,14 @@ In the most general case, we can set upper limits on the maximum memory the analysis will require: -``Ms = (R + N) x F_eff - F_eff`` - when no multimodel analysis is performed; +``Ms = (R + N) x F_eff - F_eff`` - when no multi-model analysis is performed; -``Mm = (2R + N) x F_eff - 2F_eff`` - when multimodel analysis is performed; +``Mm = (2R + N) x F_eff - 2F_eff`` - when multi-model analysis is performed; where * ``Ms``: maximum memory for non-multimodel module -* ``Mm``: maximum memory for multimodel module +* ``Mm``: maximum memory for multi-model module * ``R``: computational efficiency of module; `R` is typically 2-3 * ``N``: number of datasets * ``F_eff``: average size of data per dataset where ``F_eff = e x f x F`` @@ -1538,7 +1532,7 @@ where ``Mm = 1.5 x (N - 2)`` GB As a rule of thumb, the maximum required memory at a certain time for -multimodel analysis could be estimated by multiplying the number of datasets by +multi-model analysis could be estimated by multiplying the number of datasets by the average file size of all the datasets; this memory intake is high but also assumes that all data is fully realized in memory; this aspect will gradually change and the amount of realized data will decrease with the increase of From ebccb1667735b242140266691ae2a4c045a4c222 Mon Sep 17 00:00:00 2001 From: Peter Kalverla Date: Mon, 18 Jan 2021 10:37:11 +0100 Subject: [PATCH 2/7] Some small refinements to the documentation --- doc/recipe/preprocessor.rst | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/doc/recipe/preprocessor.rst b/doc/recipe/preprocessor.rst index 84a5a03b0e..88ba4c9ff7 100644 --- a/doc/recipe/preprocessor.rst +++ b/doc/recipe/preprocessor.rst @@ -663,10 +663,15 @@ in the output file. Restrictive computation is also available by excluding any set of models that the user will not want to include in the statistics (by setting ``exclude: [excluded models list]`` argument). The implementation has a few restrictions -that apply to the input data: model datasets must have consistent shapes, and -from a statistical point of view, this is needed since weights are not yet -implemented; also higher dimensional data is not supported (i.e. anything with -dimensionality higher than four: time, vertical axis, two horizontal axes). +that apply to the input data: model datasets must have consistent shapes, apart +from the time dimension; and cubes with more than four dimensions (time, +vertical axis, two horizontal axes) are not supported. + +Input datasets may have different time coordinates. Statistics can be computed +across overlapping times only (``span: overlap``) or across the full time span +of the combined models (``span: full``). The preprocessor sets a common time +coordinate on all datasets. As the number of days in a year may vary between +calendars, (sub-)daily data with different calendars are not supported. Input datasets may have different time coordinates. The multi-model statistics preprocessor sets a common time coordinate on all datasets. As the number of @@ -696,14 +701,12 @@ entry contains the resulting cube with the requested statistic operations. .. note:: - Note that the multi-model array operations, albeit performed in - per-time/per-horizontal level loops to save memory, could, however, be - rather memory-intensive (since they are not performed lazily as - yet). The Section on :ref:`Memory use` details the memory intake - for different run scenarios, but as a thumb rule, for the multi-model - preprocessor, the expected maximum memory intake could be approximated as - the number of datasets multiplied by the average size in memory for one - dataset. + The multi-model array operations can be rather memory-intensive (since they + are not performed lazily as yet). The Section on :ref:`Memory use` details + the memory intake for different run scenarios, but as a thumb rule, for the + multi-model preprocessor, the expected maximum memory intake could be + approximated as the number of datasets multiplied by the average size in + memory for one dataset. .. _time operations: From 4a66a72174273e1ffc52710b5dbaf04e29907365 Mon Sep 17 00:00:00 2001 From: Peter Kalverla Date: Mon, 18 Jan 2021 11:44:01 +0100 Subject: [PATCH 3/7] Refactor multi_model_statistics Separate the ESMValCore internals, dealing with products, from the core function operating on cubes. This makes it easier to add a new preprocessor for ensemble statistics, and also to make a new core function (lazily) compute statistics across cubes. --- esmvalcore/preprocessor/_multimodel.py | 170 +++++++++++------- .../multimodel_statistics/test_multimodel.py | 9 +- .../_multimodel/test_multimodel.py | 32 ++-- 3 files changed, 133 insertions(+), 78 deletions(-) diff --git a/esmvalcore/preprocessor/_multimodel.py b/esmvalcore/preprocessor/_multimodel.py index 750768195a..b79586a835 100644 --- a/esmvalcore/preprocessor/_multimodel.py +++ b/esmvalcore/preprocessor/_multimodel.py @@ -1,14 +1,10 @@ -"""multimodel statistics. +"""Statistics across cubes. -Functions for multi-model operations -supports a multitude of multimodel statistics -computations; the only requisite is the ingested -cubes have (TIME-LAT-LON) or (TIME-PLEV-LAT-LON) -dimensions; and obviously consistent units. +This module contains functions to compute statistics across multiple cubes or +products. -It operates on different (time) spans: -- full: computes stats on full dataset time; -- overlap: computes common time overlap between datasets; +Wrapper functions separate esmvalcore internals, operating on products, from +generalized functions that operate on iris cubes. """ import logging @@ -295,66 +291,47 @@ def _assemble_data(cubes, statistic, span='overlap'): return stats_cube -def multi_model_statistics(products, span, statistics, output_products=None): - """Compute multi-model statistics. +def multicube_statistics(cubes, statistics, span): + """Compute statistics across input cubes. - Multimodel statistics computed along the time axis. Can be - computed across a common overlap in time (set span: overlap) - or across the full length in time of each model (set span: full). - Restrictive computation is also available by excluding any set of - models that the user will not want to include in the statistics - (set exclude: [excluded models list]). + This function deals with non-homogeneous cubes by taking the time union + (span='full') or intersection (span='overlap'), and extending or subsetting + the cubes as necessary. Apart from the time coordinate, cubes must have + consistent shapes. - Restrictions needed by the input data: - - model datasets must have consistent shapes, - - higher dimensional data is not supported (ie dims higher than four: - time, vertical axis, two horizontal axes). + This function operates directly on numpy (masked) arrays and rebuilds the + resulting cubes from scratch. Therefore, it is not suitable for lazy + evaluation. This function is restricted to maximum four-dimensional data: + time, vertical axis, two horizontal axes. Parameters ---------- - products: list - list of data products or cubes to be used in multimodel stat - computation; - cube attribute of product is the data cube for computing the stats. + cubes: list + list of cubes across which the statistics will be computed; + statistics: list + statistical metrics to be computed. Available options: mean, median, + max, min, std, or pXX.YY (for percentile XX.YY; decimal part optional). span: str overlap or full; if overlap, statitsticss are computed on common time- span; if full, statistics are computed on full time spans, ignoring missing data. - output_products: dict - dictionary of output products. MUST be specified if products are NOT - cubes - statistics: list of str - list of statistical measure(s) to be computed. Available options: - mean, median, max, min, std, or pXX.YY (for percentile XX.YY; decimal - part optional). Returns ------- - set or dict or list - `set` of data products if `output_products` is given - `dict` of cubes if `output_products` is not given - `list` of input cubes if there is no overlap between cubes when - using `span='overlap'` + dict + dictionary of statistics cubes with statistics' names as keys. Raises ------ ValueError If span is neither overlap nor full. """ - logger.debug('Multimodel statistics: computing: %s', statistics) - if len(products) < 2: - logger.info("Single dataset in list: will not compute statistics.") - return products - if output_products: - cubes = [cube for product in products for cube in product.cubes] - statistic_products = set() - else: - cubes = products - statistic_products = {} + logger.debug('Multicube statistics: computing: %s', statistics) # Reset time coordinates and make cubes share the same calendar _unify_time_coordinates(cubes) + # Check whether input is valid if span == 'overlap': # check if we have any time overlap times = [cube.coord('time').points for cube in cubes] @@ -362,7 +339,7 @@ def multi_model_statistics(products, span, statistics, output_products=None): if len(overlap) <= 1: logger.info("Time overlap between cubes is none or a single point." "check datasets: will not compute statistics.") - return products + return cubes logger.debug("Using common time overlap between " "datasets to compute statistics.") elif span == 'full': @@ -372,22 +349,89 @@ def multi_model_statistics(products, span, statistics, output_products=None): "Unexpected value for span {}, choose from 'overlap', 'full'". format(span)) + # Compute statistics + statistics_cubes = {} for statistic in statistics: - # Compute statistic statistic_cube = _assemble_data(cubes, statistic, span) + statistics_cubes[statistic] = statistic_cube - if output_products: - # Add to output product and log provenance - statistic_product = output_products[statistic] - statistic_product.cubes = [statistic_cube] - for product in products: - statistic_product.wasderivedfrom(product) - logger.info("Generated %s", statistic_product) - statistic_products.add(statistic_product) - else: - statistic_products[statistic] = statistic_cube + return statistics_cubes + + +def _multiproduct_statistics(products, + statistics, + output_products, + span='overlap'): + """Compute statistics across ESMValCore products. + + Extract cubes from products, calculate the statistics across cubes and + assign the resulting output cubes to the output_products. + + This function separates the ESMValCore internals (products) from the actual + statistics function that operates on Iris cubes. + + Parameters + ---------- + products: list + list of PreprocessorFile's + statistics: list + list of strings describing the statistics that will be computed + output_products: dict + dict of PreprocessorFile's with statistic names as keys. + span: str + 'full' or 'overlap', whether to calculate the statistics on the time + union ('full') or time intersection ('overlap') of the cubes. - if output_products: - products |= statistic_products - return products - return statistic_products + Returns + ------- + set + set of PreprocessorFiles containing the computed statistics cubes. + """ + # Extract cubes from products + cubes = [cube for product in products for cube in product.cubes] + + # Compute statistics + if len(cubes) < 2: + logger.info('Found only 1 cube; no statistics computed for %r', + list(products)[0]) + statistics_cubes = {statistic: cubes[0] for statistic in statistics} + else: + statistics_cubes = multicube_statistics(cubes=cubes, + statistics=statistics, + span=span) + + # Add cubes to output products and log provenance + statistics_products = set() + for statistic, cube in statistics_cubes.items(): + statistics_product = output_products[statistic] + statistics_product.cubes = [cube] + + for product in products: + statistics_product.wasderivedfrom(product) + + logger.info("Generated %s", statistics_product) + statistics_products.add(statistics_product) + + return statistics_products + + +def multi_model_statistics(products: set, + statistics: list, + output_products: set, + span: str = 'overlap'): + """Entry point for ESMValCore's multi-model statistics preprocessor. + + Compute statistics across cubes from different models. + + See Also + -------- + multicube_statistics : core statistics function. + """ + statistics_products = _multiproduct_statistics( + products=products, + statistics=statistics, + output_products=output_products, + span=span, + ) + + return statistics_products diff --git a/tests/sample_data/multimodel_statistics/test_multimodel.py b/tests/sample_data/multimodel_statistics/test_multimodel.py index a527a9b913..090e353817 100644 --- a/tests/sample_data/multimodel_statistics/test_multimodel.py +++ b/tests/sample_data/multimodel_statistics/test_multimodel.py @@ -8,7 +8,8 @@ import numpy as np import pytest -from esmvalcore.preprocessor import extract_time, multi_model_statistics +from esmvalcore.preprocessor import extract_time +from esmvalcore.preprocessor._multimodel import multicube_statistics esmvaltool_sample_data = pytest.importorskip("esmvaltool_sample_data") @@ -118,11 +119,11 @@ def calendar(cube): return cube_dict -def multimodel_test(cubes, span, statistic): +def multimodel_test(cubes, statistic, span): """Run multimodel test with some simple checks.""" statistics = [statistic] - result = multi_model_statistics(cubes, span=span, statistics=statistics) + result = multicube_statistics(cubes, statistics=statistics, span=span) assert isinstance(result, dict) assert statistic in result @@ -139,7 +140,7 @@ def multimodel_regression_test(cubes, span, name): are being written. """ statistic = 'mean' - result = multimodel_test(cubes, span=span, statistic=statistic) + result = multimodel_test(cubes, statistic=statistic, span=span) result_cube = result[statistic] filename = Path(__file__).with_name(f'{name}-{span}-{statistic}.nc') diff --git a/tests/unit/preprocessor/_multimodel/test_multimodel.py b/tests/unit/preprocessor/_multimodel/test_multimodel.py index 42b28a9525..0e7bd562dd 100644 --- a/tests/unit/preprocessor/_multimodel/test_multimodel.py +++ b/tests/unit/preprocessor/_multimodel/test_multimodel.py @@ -7,17 +7,19 @@ from cf_units import Unit import tests -from esmvalcore.preprocessor import multi_model_statistics -from esmvalcore.preprocessor._multimodel import (_assemble_data, - _compute_statistic, - _get_time_slice, _plev_fix, - _put_in_cube, - _unify_time_coordinates) +from esmvalcore.preprocessor._multimodel import ( + _assemble_data, + _compute_statistic, + _get_time_slice, + _plev_fix, + _put_in_cube, + _unify_time_coordinates, + multicube_statistics, +) class Test(tests.Test): """Test class for preprocessor/_multimodel.py.""" - def setUp(self): """Prepare tests.""" # Make various time arrays @@ -92,7 +94,9 @@ def test_compute_statistic(self): def test_compute_full_statistic_mon_cube(self): data = [self.cube1, self.cube2] - stats = multi_model_statistics(data, 'full', ['mean']) + stats = multicube_statistics(cubes=data, + statistics=['mean'], + span='full') expected_full_mean = np.ma.ones((5, 3, 2, 2)) expected_full_mean.mask = np.ones((5, 3, 2, 2)) expected_full_mean.mask[1] = False @@ -100,7 +104,9 @@ def test_compute_full_statistic_mon_cube(self): def test_compute_full_statistic_yr_cube(self): data = [self.cube4, self.cube5] - stats = multi_model_statistics(data, 'full', ['mean']) + stats = multicube_statistics(cubes=data, + statistics=['mean'], + span='full') expected_full_mean = np.ma.ones((4, 3, 2, 2)) expected_full_mean.mask = np.zeros((4, 3, 2, 2)) expected_full_mean.mask[2:4] = True @@ -108,13 +114,17 @@ def test_compute_full_statistic_yr_cube(self): def test_compute_overlap_statistic_mon_cube(self): data = [self.cube1, self.cube1] - stats = multi_model_statistics(data, 'overlap', ['mean']) + stats = multicube_statistics(cubes=data, + statistics=['mean'], + span='overlap') expected_ovlap_mean = np.ma.ones((2, 3, 2, 2)) self.assert_array_equal(stats['mean'].data, expected_ovlap_mean) def test_compute_overlap_statistic_yr_cube(self): data = [self.cube4, self.cube4] - stats = multi_model_statistics(data, 'overlap', ['mean']) + stats = multicube_statistics(cubes=data, + statistics=['mean'], + span='overlap') expected_ovlap_mean = np.ma.ones((2, 3, 2, 2)) self.assert_array_equal(stats['mean'].data, expected_ovlap_mean) From 12d419e198e5a59aaa2d17fc851c311aba750fc9 Mon Sep 17 00:00:00 2001 From: Peter Kalverla Date: Mon, 25 Jan 2021 17:29:35 +0100 Subject: [PATCH 4/7] Update docstrings and use the public multi_model_statistics function as entry point for both multi-cube and multi-product statistics. This is to maintain the alignment between the recipe API and the public preprocessor documentation. --- esmvalcore/preprocessor/_multimodel.py | 159 +++++++++--------- .../multimodel_statistics/test_multimodel.py | 6 +- .../_multimodel/test_multimodel.py | 26 +-- 3 files changed, 101 insertions(+), 90 deletions(-) diff --git a/esmvalcore/preprocessor/_multimodel.py b/esmvalcore/preprocessor/_multimodel.py index b79586a835..58b9e3addf 100644 --- a/esmvalcore/preprocessor/_multimodel.py +++ b/esmvalcore/preprocessor/_multimodel.py @@ -1,11 +1,4 @@ -"""Statistics across cubes. - -This module contains functions to compute statistics across multiple cubes or -products. - -Wrapper functions separate esmvalcore internals, operating on products, from -generalized functions that operate on iris cubes. -""" +"""This module contains functions to compute multi-cube statistics.""" import logging import re @@ -291,23 +284,22 @@ def _assemble_data(cubes, statistic, span='overlap'): return stats_cube -def multicube_statistics(cubes, statistics, span): - """Compute statistics across input cubes. +def _multicube_statistics(cubes, statistics, span): + """Compute statistics over multiple cubes. - This function deals with non-homogeneous cubes by taking the time union - (span='full') or intersection (span='overlap'), and extending or subsetting - the cubes as necessary. Apart from the time coordinate, cubes must have - consistent shapes. + Can be used e.g. for ensemble or multi-model statistics. - This function operates directly on numpy (masked) arrays and rebuilds the - resulting cubes from scratch. Therefore, it is not suitable for lazy - evaluation. This function is restricted to maximum four-dimensional data: + This function was designed to work on (max) four-dimensional data: time, vertical axis, two horizontal axes. + Apart from the time coordinate, cubes must have consistent shapes. There + are two options to combine time coordinates of different lengths, see + the `span` argument. + Parameters ---------- cubes: list - list of cubes across which the statistics will be computed; + list of cubes over which the statistics will be computed; statistics: list statistical metrics to be computed. Available options: mean, median, max, min, std, or pXX.YY (for percentile XX.YY; decimal part optional). @@ -326,7 +318,12 @@ def multicube_statistics(cubes, statistics, span): ValueError If span is neither overlap nor full. """ - logger.debug('Multicube statistics: computing: %s', statistics) + if len(cubes) < 2: + logger.info('Found only 1 cube; no statistics computed for %r', + list(cubes)[0]) + return {statistic: cubes[0] for statistic in statistics} + else: + logger.debug('Multicube statistics: computing: %s', statistics) # Reset time coordinates and make cubes share the same calendar _unify_time_coordinates(cubes) @@ -358,49 +355,21 @@ def multicube_statistics(cubes, statistics, span): return statistics_cubes -def _multiproduct_statistics(products, - statistics, - output_products, - span='overlap'): - """Compute statistics across ESMValCore products. +def _multiproduct_statistics( + products, + statistics, + output_products, + span=None, +): + """Compute multi-cube statistics on ESMValCore products. - Extract cubes from products, calculate the statistics across cubes and + Extract cubes from products, calculate multicube statistics and assign the resulting output cubes to the output_products. - - This function separates the ESMValCore internals (products) from the actual - statistics function that operates on Iris cubes. - - Parameters - ---------- - products: list - list of PreprocessorFile's - statistics: list - list of strings describing the statistics that will be computed - output_products: dict - dict of PreprocessorFile's with statistic names as keys. - span: str - 'full' or 'overlap', whether to calculate the statistics on the time - union ('full') or time intersection ('overlap') of the cubes. - - Returns - ------- - set - set of PreprocessorFiles containing the computed statistics cubes. """ - # Extract cubes from products cubes = [cube for product in products for cube in product.cubes] - - # Compute statistics - if len(cubes) < 2: - logger.info('Found only 1 cube; no statistics computed for %r', - list(products)[0]) - statistics_cubes = {statistic: cubes[0] for statistic in statistics} - else: - statistics_cubes = multicube_statistics(cubes=cubes, - statistics=statistics, - span=span) - - # Add cubes to output products and log provenance + statistics_cubes = _multicube_statistics(cubes=cubes, + statistics=statistics, + span=span) statistics_products = set() for statistic, cube in statistics_cubes.items(): statistics_product = output_products[statistic] @@ -415,23 +384,63 @@ def _multiproduct_statistics(products, return statistics_products -def multi_model_statistics(products: set, - statistics: list, - output_products: set, - span: str = 'overlap'): - """Entry point for ESMValCore's multi-model statistics preprocessor. +def multi_model_statistics(products, span, statistics, output_products=None): + """Compute multi-model statistics. - Compute statistics across cubes from different models. + This function computes multi-model statistics on cubes or products. + Products (or: preprocessorfiles) are used internally by ESMValCore to store + workflow and provenance information, and this option should typically be + ignored. - See Also - -------- - multicube_statistics : core statistics function. - """ - statistics_products = _multiproduct_statistics( - products=products, - statistics=statistics, - output_products=output_products, - span=span, - ) + This function was designed to work on (max) four-dimensional data: time, + vertical axis, two horizontal axes. Apart from the time coordinate, cubes + must have consistent shapes. There are two options to combine time + coordinates of different lengths, see the `span` argument. - return statistics_products + Parameters + ---------- + products: list + Cubes (or products) over which the statistics will be computed. + statistics: list + Statistical metrics to be computed. Available options: mean, median, + max, min, std, or pXX.YY (for percentile XX.YY; decimal part optional). + span: str + Overlap or full; if overlap, statitstics are computed on common time- + span; if full, statistics are computed on full time spans, ignoring + missing data. + output_products: dict + For internal use only. A dict with statistics names as keys and + preprocessorfiles as values. If products are passed as input, the + statistics cubes will be assigned to these output products. + + Returns + ------- + dict + A dictionary of statistics cubes with statistics' names as keys. (If + input type is products, then it will return a set of output_products.) + + Raises + ------ + ValueError + If span is neither overlap nor full, or if input type is neither cubes + nor products. + """ + if all(isinstance(p, iris.cube.Cube) for p in products): + return _multicube_statistics( + cubes=products, + statistics=statistics, + span=span, + ) + elif all(type(p).__name__ == 'PreprocessorFile' for p in products): + # Avoid circular input: https://stackoverflow.com/q/16964467 + return _multiproduct_statistics( + products=products, + statistics=statistics, + output_products=output_products, + span=span, + ) + else: + raise ValueError( + "Input type for multi_model_statistics not understood. Expected " + "iris.cube.Cube or esmvalcore.preprocessor.PreprocessorFile, " + "got {}".format(products)) diff --git a/tests/sample_data/multimodel_statistics/test_multimodel.py b/tests/sample_data/multimodel_statistics/test_multimodel.py index 090e353817..9a094c4986 100644 --- a/tests/sample_data/multimodel_statistics/test_multimodel.py +++ b/tests/sample_data/multimodel_statistics/test_multimodel.py @@ -9,7 +9,7 @@ import pytest from esmvalcore.preprocessor import extract_time -from esmvalcore.preprocessor._multimodel import multicube_statistics +from esmvalcore.preprocessor._multimodel import multi_model_statistics esmvaltool_sample_data = pytest.importorskip("esmvaltool_sample_data") @@ -123,7 +123,9 @@ def multimodel_test(cubes, statistic, span): """Run multimodel test with some simple checks.""" statistics = [statistic] - result = multicube_statistics(cubes, statistics=statistics, span=span) + result = multi_model_statistics(products=cubes, + statistics=statistics, + span=span) assert isinstance(result, dict) assert statistic in result diff --git a/tests/unit/preprocessor/_multimodel/test_multimodel.py b/tests/unit/preprocessor/_multimodel/test_multimodel.py index 0e7bd562dd..0d6cbb9b9b 100644 --- a/tests/unit/preprocessor/_multimodel/test_multimodel.py +++ b/tests/unit/preprocessor/_multimodel/test_multimodel.py @@ -14,7 +14,7 @@ _plev_fix, _put_in_cube, _unify_time_coordinates, - multicube_statistics, + multi_model_statistics, ) @@ -94,9 +94,9 @@ def test_compute_statistic(self): def test_compute_full_statistic_mon_cube(self): data = [self.cube1, self.cube2] - stats = multicube_statistics(cubes=data, - statistics=['mean'], - span='full') + stats = multi_model_statistics(products=data, + statistics=['mean'], + span='full') expected_full_mean = np.ma.ones((5, 3, 2, 2)) expected_full_mean.mask = np.ones((5, 3, 2, 2)) expected_full_mean.mask[1] = False @@ -104,9 +104,9 @@ def test_compute_full_statistic_mon_cube(self): def test_compute_full_statistic_yr_cube(self): data = [self.cube4, self.cube5] - stats = multicube_statistics(cubes=data, - statistics=['mean'], - span='full') + stats = multi_model_statistics(products=data, + statistics=['mean'], + span='full') expected_full_mean = np.ma.ones((4, 3, 2, 2)) expected_full_mean.mask = np.zeros((4, 3, 2, 2)) expected_full_mean.mask[2:4] = True @@ -114,17 +114,17 @@ def test_compute_full_statistic_yr_cube(self): def test_compute_overlap_statistic_mon_cube(self): data = [self.cube1, self.cube1] - stats = multicube_statistics(cubes=data, - statistics=['mean'], - span='overlap') + stats = multi_model_statistics(products=data, + statistics=['mean'], + span='overlap') expected_ovlap_mean = np.ma.ones((2, 3, 2, 2)) self.assert_array_equal(stats['mean'].data, expected_ovlap_mean) def test_compute_overlap_statistic_yr_cube(self): data = [self.cube4, self.cube4] - stats = multicube_statistics(cubes=data, - statistics=['mean'], - span='overlap') + stats = multi_model_statistics(products=data, + statistics=['mean'], + span='overlap') expected_ovlap_mean = np.ma.ones((2, 3, 2, 2)) self.assert_array_equal(stats['mean'].data, expected_ovlap_mean) From 44ee17c3b9c26385ce1c6a151c189bbc8fa3e21d Mon Sep 17 00:00:00 2001 From: Peter Kalverla Date: Mon, 25 Jan 2021 17:49:26 +0100 Subject: [PATCH 5/7] Codacy --- esmvalcore/preprocessor/_multimodel.py | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/esmvalcore/preprocessor/_multimodel.py b/esmvalcore/preprocessor/_multimodel.py index 58b9e3addf..80a930bbf5 100644 --- a/esmvalcore/preprocessor/_multimodel.py +++ b/esmvalcore/preprocessor/_multimodel.py @@ -1,4 +1,4 @@ -"""This module contains functions to compute multi-cube statistics.""" +"""Functions to compute multi-cube statistics.""" import logging import re @@ -322,8 +322,8 @@ def _multicube_statistics(cubes, statistics, span): logger.info('Found only 1 cube; no statistics computed for %r', list(cubes)[0]) return {statistic: cubes[0] for statistic in statistics} - else: - logger.debug('Multicube statistics: computing: %s', statistics) + + logger.debug('Multicube statistics: computing: %s', statistics) # Reset time coordinates and make cubes share the same calendar _unify_time_coordinates(cubes) @@ -355,12 +355,7 @@ def _multicube_statistics(cubes, statistics, span): return statistics_cubes -def _multiproduct_statistics( - products, - statistics, - output_products, - span=None, -): +def _multiproduct_statistics(products, statistics, output_products, span=None): """Compute multi-cube statistics on ESMValCore products. Extract cubes from products, calculate multicube statistics and @@ -431,7 +426,7 @@ def multi_model_statistics(products, span, statistics, output_products=None): statistics=statistics, span=span, ) - elif all(type(p).__name__ == 'PreprocessorFile' for p in products): + if all(type(p).__name__ == 'PreprocessorFile' for p in products): # Avoid circular input: https://stackoverflow.com/q/16964467 return _multiproduct_statistics( products=products, @@ -439,8 +434,7 @@ def multi_model_statistics(products, span, statistics, output_products=None): output_products=output_products, span=span, ) - else: - raise ValueError( - "Input type for multi_model_statistics not understood. Expected " - "iris.cube.Cube or esmvalcore.preprocessor.PreprocessorFile, " - "got {}".format(products)) + raise ValueError( + "Input type for multi_model_statistics not understood. Expected " + "iris.cube.Cube or esmvalcore.preprocessor.PreprocessorFile, " + "got {}".format(products)) From 47bce33e29a05048f8146a1c1e15d704e645d0aa Mon Sep 17 00:00:00 2001 From: Bouwe Andela Date: Wed, 27 Jan 2021 13:58:37 +0100 Subject: [PATCH 6/7] Update iris documentation URL for sphinx --- doc/conf.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/conf.py b/doc/conf.py index 06986c70ca..9c42ec7eb8 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -418,7 +418,7 @@ (f'https://docs.esmvaltool.org/projects/esmvalcore/en/{rtd_version}/', None), 'esmvaltool': (f'https://docs.esmvaltool.org/en/{rtd_version}/', None), - 'iris': ('https://scitools.org.uk/iris/docs/latest/', None), + 'iris': ('https://scitools-iris.readthedocs.io/en/latest/', None), 'matplotlib': ('https://matplotlib.org/', None), 'numpy': ('https://numpy.org/doc/stable/', None), 'python': ('https://docs.python.org/3/', None), From ab05006ce5916714b1ad5270b3b2367510ee2283 Mon Sep 17 00:00:00 2001 From: Peter Kalverla Date: Wed, 27 Jan 2021 13:59:40 +0100 Subject: [PATCH 7/7] import multi_model_statistics from preprocessor module in tests --- tests/unit/preprocessor/_multimodel/test_multimodel.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/unit/preprocessor/_multimodel/test_multimodel.py b/tests/unit/preprocessor/_multimodel/test_multimodel.py index 0d6cbb9b9b..1899e09097 100644 --- a/tests/unit/preprocessor/_multimodel/test_multimodel.py +++ b/tests/unit/preprocessor/_multimodel/test_multimodel.py @@ -7,6 +7,7 @@ from cf_units import Unit import tests +from esmvalcore.preprocessor import multi_model_statistics from esmvalcore.preprocessor._multimodel import ( _assemble_data, _compute_statistic, @@ -14,7 +15,6 @@ _plev_fix, _put_in_cube, _unify_time_coordinates, - multi_model_statistics, )