brightbandtech · aaTman · Jan 24, 2026 · Jan 23, 2026 · Jan 23, 2026 · Jan 24, 2026
diff --git a/README.md b/README.md
@@ -67,48 +67,11 @@ $ ewb --default
 ```python
 from extremeweatherbench import cases, inputs, metrics, evaluate, utils
 
-# Select model
-model = 'FOUR_v200_GFS'
-
-# Set up path to directory of file - zarr or kerchunk/virtualizarr json/parquet
-forecast_dir = f'gs://extremeweatherbench/{model}.parq'
-
-# Preprocessing function exclusive to handling the CIRA parquets
-def preprocess_bb_cira_forecast_dataset(ds: xr.Dataset) -> xr.Dataset:
-    """Preprocess CIRA kerchunk (parquet) data in the ExtremeWeatherBench bucket.
-    A preprocess function that renames the time coordinate to lead_time,
-    creates a valid_time coordinate, and sets the lead time range and resolution not
-    present in the original dataset.
-    Args:
-        ds: The forecast dataset to rename.
-    Returns:
-        The renamed forecast dataset.
-    """
-    ds = ds.rename({"time": "lead_time"})
-
-    # The evaluation configuration is used to set the lead time range and resolution.
-    ds["lead_time"] = np.array(
-        [i for i in range(0, 241, 6)], dtype="timedelta64[h]"
-    ).astype("timedelta64[ns]")
-
-    return ds
-
-# Define a forecast object; in this case, a KerchunkForecast
-fcnv2_forecast = inputs.KerchunkForecast(
-    name="fcnv2_forecast", # identifier for this forecast in results
-    source=forecast_dir, # source path
-    variables=["surface_air_temperature"], # variables to use in the evaluation
-    variable_mapping=inputs.CIRA_metadata_variable_mapping, # mapping to use for variables in forecast dataset to EWB variable names
-    storage_options={"remote_protocol": "s3", "remote_options": {"anon": True}}, # storage options for access
-    preprocess=preprocess_bb_cira_forecast_dataset # required preprocessing function for CIRA references
-)
+# Load in a forecast; here, we load in GFS initialized FCNv2 from the CIRA MLWP archive with a default variable built-in for convenience
+fcnv2_heatwave_forecast = defaults.cira_fcnv2_heatwave_forecast
 
-# Load in ERA5; source defaults to the ARCO ERA5 dataset from Google and variable mapping is provided by default as well
-era5_heatwave_target = inputs.ERA5(
-    variables=["surface_air_temperature"], # variable to use in the evaluation
-    storage_options={"remote_options": {"anon": True}}, # storage options for access
-    chunks=None, # define chunks for the ERA5 data
-)
+# Load in ERA5 with another default convenience variable 
+era5_heatwave_target = defaults.era5_heatwave_target
 
 # EvaluationObjects are used to evaluate a single forecast source against a single target source with a defined event type. Event types are declared with each case. One or more metrics can be evaluated with each EvaluationObject.
 heatwave_evaluation_list = [
@@ -120,7 +83,7 @@ heatwave_evaluation_list = [
             metrics.MaximumLowestMeanAbsoluteError(),
         ],
         target=era5_heatwave_target,
-        forecast=fcnv2_forecast,
+        forecast=fcnv2_heatwave_forecast,
     ),
 ]
 # Load in the EWB default list of event cases
@@ -134,7 +97,7 @@ ewb_instance = evaluate.ExtremeWeatherBench(
 
 # Execute a parallel run and return the evaluation results as a pandas DataFrame
 heatwave_outputs = ewb_instance.run(
-    parallel_config={'backend':'loky','n_jobs':16} # Uses 16 jobs with the loky backend
+    parallel_config={'n_jobs':16} # Uses 16 jobs with the loky backend as default
 )
 
 # Save the results

diff --git a/docs/recipes/cira_forecast.md b/docs/recipes/cira_forecast.md
@@ -2,22 +2,10 @@
 
 We have a dedicated virtual reference icechunk store for CIRA data **up to May 26th, 2025** available at `gs://extremeweatherbench/cira-icechunk`. Compared to using parquet virtual references, we have seen a speed improvements of around 2x with ~25% more memory usage.
 
-## Loading the store
-
-```python
-
-from extremeweatherbench import cases, inputs, metrics, evaluate, defaults
-import datetime
-import icechunk
-
-storage = icechunk.gcs_storage(
-    bucket="extremeweatherbench", prefix="cira-icechunk", anonymous=True
-)
-```
-
 ## Accessing a CIRA Model from the store
 
 ```python
+from extremeweatherbench import inputs
 
 group_list = inputs.list_groups_in_icechunk_datatree(storage)
 ```
@@ -39,22 +27,33 @@ group_list = inputs.list_groups_in_icechunk_datatree(storage)
 
 ```python
 
-# Find FCNv2's name in the group list
-fcnv2_group = [n for n in group_list if 'FOUR_v200_GFS' in n][0]
-
 # Helper function to access the virtual dataset
-fcnv2 = inputs.open_icechunk_dataset_from_datatree(
+fcnv2 = inputs.get_cira_icechunk(model_name='FOUR_v200_IFS')
+```
+
+`fcnv2` is a `ForecastBase` object ready to be used within EWB's evaluation framework. 
+
+> **Detailed Explanation**: `inputs.get_cira_icechunk` is syntactic sugar for this: 
+```python
+import icechunk
+
+storage = icechunk.gcs_storage(
+    bucket="extremeweatherbench", prefix="cira-icechunk", anonymous=True
+)
+
+fcnv2_icechunk_ds = inputs.open_icechunk_dataset_from_datatree(
     storage=storage, 
-    group=fcnv2_group, 
+    group="FOUR_v200_IFS", 
     authorize_virtual_chunk_access=inputs.CIRA_CREDENTIALS
     )
-fcnv2_icechunk_forecast_object = inputs.XarrayForecast(
+
+fcnv2 = inputs.XarrayForecast(
     ds=fcnv2,
     variable_mapping=inputs.CIRA_metadata_variable_mapping
     )
 ```
 
-`fcnv2_icechunk_forecast_object` is a `ForecastBase` object ready to be used within EWB's evaluation framework.
+Which is a three step process of accessing the icechunk storage, loading the dataset from the datatree/zarr group format, and finally applying that `Dataset` in a `ForecastBase` object.
 
 ## Set up metrics and target for evaluation
 

diff --git a/docs/usage.md b/docs/usage.md
@@ -37,7 +37,7 @@ To run an evaluation, there are three components required: a forecast, a target,
 ```python
 from extremeweatherbench import inputs
 ```
-There are two built-in `ForecastBase` classes to set up a forecast: `ZarrForecast` and `KerchunkForecast`. Here is an example of a `ZarrForecast`, using Weatherbench2's HRES zarr store:
+There are three built-in `ForecastBase` classes to set up a forecast: `ZarrForecast`, `XarrayForecast`, and `KerchunkForecast`. Here is an example of a `ZarrForecast`, using Weatherbench2's HRES zarr store:
 
 ```python
 hres_forecast = inputs.ZarrForecast(
@@ -56,9 +56,9 @@ There are required arguments, namely:
 - `variables`*
 - `variable_mapping`
 
-* `variables` can be defined within one or more metrics instead of in a `ForecastBase` object.
+* `variables` can alternatively be defined within one or more metrics, instead of in a `ForecastBase` object.
 
-A forecast needs a `source`, which is a link to the zarr store in this case. A `name` is required to identify the outputs. It also needs `variables` defined, which are based on CF Conventions. A list of variable namings exists in `defaults.py` as `DEFAULT_VARIABLE_NAMES`. Each forecast will likely have different names for their variables, so a `variable_mapping` dictionary is also essential to process the variables, as well as the coordinates and dimensions. EWB uses `lead_time`, `init_time`, and `valid_time` as time coordinates. The HRES data is mapped from `prediction_timedelta` to `lead_time`, as an example. `storage_options` define access patterns for the data if needed. These are passed to the opening function, e.g. `xarray.open_zarr`.
+> **Detailed Explanation**: A forecast needs a `source`, which is a link to the zarr store in this case. A `name` is required to identify the outputs. It also needs `variables` defined, which are based on CF Conventions. A list of variable namings exists in `defaults.py` as `DEFAULT_VARIABLE_NAMES`. Each forecast will likely have different names for their variables, so a `variable_mapping` dictionary is also essential to process the variables, as well as the coordinates and dimensions. EWB uses `lead_time`, `init_time`, and `valid_time` as time coordinates. The HRES data is mapped from `prediction_timedelta` to `lead_time`, as an example. `storage_options` define access patterns for the data if needed. These are passed to the opening function, e.g. `xarray.open_zarr`.
 
 Next, a target dataset must be defined as well to evaluate against. For this evaluation, we'll use ERA5:
 
@@ -71,7 +71,19 @@ era5_heatwave_target = inputs.ERA5(
 )
 ```
 
-Similarly to forecasts, we need to define the `source`, which here is the ARCO ERA5 provided by Google. `variables` are again required to be set for the `inputs.ERA5` class; `variable_mapping` defaults to `inputs.ERA5_metadata_variable_mapping` for many existing variables and likely is not required to be set unless your use case is for less common variables. Both forecasts and targets, if relevant, have an optional `chunks` parameter which defaults to what should be the most efficient value - usually `None` or `'auto'`, but can be changed as seen above.
+Note that EWB provides defaults for arguments, so most users will be able to instead write this (if defining variables with the intent of it applying to all metrics):
+
+```python
+era5_heatwave_target = inputs.ERA5(variables=['surface_air_temperature'])
+```
+
+Or (if defining variables as arguments to the metrics):
+
+```python
+era5_heatwave_target = inputs.ERA5()
+```
+
+> **Detailed Explanation**: Similarly to forecasts, we need to define the `source`, which here is the ARCO ERA5 provided by Google. `variables` are used to subset `inputs.ERA5` in an evaluation; `variable_mapping` defaults to `inputs.ERA5_metadata_variable_mapping` for many existing variables and likely is not required to be set unless your use case is for less common variables. Both forecasts and targets, if relevant, have an optional `chunks` parameter which defaults to what should be the most efficient value - usually `None` or `'auto'`, but can be changed as seen above. *If using the ARCO ERA5 and setting `chunks=None`, it is critical to order your subsetting by variables -> time -> `.sel` or `.isel` latitude & longitude -> rechunk. [See this Github comment](https://github.com/pydata/xarray/issues/8902#issuecomment-2036435045).
 
 We then set up an `EvaluationObject` list:
 
@@ -98,11 +110,11 @@ Plugging these all in:
 
 ```python
 from extremeweatherbench import cases, evaluate
-case_yaml = cases.load_ewb_events_yaml_into_case_list()
+case_list = cases.load_ewb_events_yaml_into_case_list()
 
 
 ewb_instance = evaluate.ExtremeWeatherBench(
-    cases=case_yaml,
+    cases=case_list,
     evaluation_objects=heatwave_evaluation_list,
 )
 
@@ -111,6 +123,8 @@ outputs = ewb_instance.run()
 outputs.to_csv('your_file_name.csv')
 ```
 
-Where the EWB default events YAML file is loaded in using a built-in utility helper function, then applied to an instance of `evaluate.ExtremeWeatherBench` along with the `EvaluationObject` list. Finally, we run the evaluation with the `.run()` method, where defaults are typically sufficient to run with a small to moderate-sized virtual machine. after subsetting and prior to metric calculation.
+Where the EWB default events YAML file is loaded in using a built-in utility helper function, then applied to an instance of `evaluate.ExtremeWeatherBench` along with the `EvaluationObject` list. Finally, we trigger the evaluation with the `.run()` method, where defaults are typically sufficient to run with a small to moderate-sized virtual machine. after subsetting and prior to metric calculation. 
+
+Running locally is feasible but is typically bottlenecked heavily by IO and network bandwidth. Even on a gigabit connection, the rate of data access is significantly slower compared to within a cloud provider VM.
 
 The outputs are returned as a pandas DataFrame and can be manipulated in the script, a notebook, or post-hoc after saving it.