diff --git a/docs/examples/redshift.rst b/docs/examples/redshift.rst index 41ed4ad54..922c52b8d 100644 --- a/docs/examples/redshift.rst +++ b/docs/examples/redshift.rst @@ -16,9 +16,7 @@ Pipeline Overview * `Test Pretrained`_ * `Download SDSS`_ * `Download Train Predict`_ - -.. * `Visualize Predictions`_ -.. * `Train Visualize`_ +* `Visualize Predictions`_ .. figure:: application-pipelines.png :align: center @@ -31,10 +29,6 @@ Train Test Single ~~~~~~~~~~~~~~~~~ Trains and evaluates a single CNN model. Uses predefined artifacts that contain the training and testing data. For this and all training pipelines, the artifacts should each contain a single numpy array. Input arrays should be a 4D array of shape **(n, y, x, c)** where n=number of images, y=image height, x=image width, and c=number of color channels. Output (label) arrays should be of shape **(n,)** . -.. Visualize Predictions -.. ~~~~~~~~~~~~~~~~~~~~~ - - Train Test Compare ~~~~~~~~~~~~~~~~~~ Trains and evaluates two CNN models and compares effectiveness of the models. @@ -55,10 +49,6 @@ Test Pretrained ~~~~~~~~~~~~~~~ Evaluates the performance of a pre-existing model that is saved as an artifact. -.. Train Visualize -.. ~~~~~~~~~~~~~~~ - - Download SDSS ~~~~~~~~~~~~~ Download SDSS images and save them as artifacts. Can be used in conjunction with the other pipelines that rely on artifacts rather than images retrieved at execution time. @@ -66,3 +56,10 @@ Download SDSS images and save them as artifacts. Can be used in conjunction with Download Train Predict ~~~~~~~~~~~~~~~~~~~~~~ Download SDSS images and use some images to train a model before using the model to predict the redshift value of the remaining galaxies. + +Visualize Predictions +~~~~~~~~~~~~~~~~~~~~~ +This pipeline produces a visualization that can be helpful for understanding the effectiveness of your redshift estimation model. It generates a set of graphs like the one below that show the output probability distribution function (pdf) for the redshift values of a set of random galaxies' images. A pair of vertical lines in each subplot indicate the actual redshift value (green) and the predicted redshift value (red) for that galaxy. This allows users to see how far the model's predictions are from the correct answers and can help with identifying biases or weak-points the model may have (for example, consistently underestimation or inaccuracy with galaxies in a specific redshift range). + +.. figure:: vis-pred-plot.png + :align: center \ No newline at end of file diff --git a/docs/examples/rs-tutorial.rst b/docs/examples/rs-tutorial.rst index 20fb93951..75ca0caee 100644 --- a/docs/examples/rs-tutorial.rst +++ b/docs/examples/rs-tutorial.rst @@ -10,9 +10,8 @@ Pipeline Overview 4. `Train CIFAR-10`_ 5. `Train-Test`_ 6. `Train-Test-Compare`_ -7. `Download-Train-Evaluate`_ - -.. 6. `Visualize Predictions`_ +7. `Train-PredVis`_ +8. `Download-Train-Evaluate`_ Pipelines --------- @@ -134,7 +133,7 @@ This pipeline gives a very basic example of how to create, train, and evaluate a Train-Test ~~~~~~~~~~ -.. figure:: train-basic.png +.. figure:: train-single.png :align: center This pipeline provides an example of how one might train and evaluate a redshift estimation model. In particular, the procedure implemented here is a simplified version of work by `Pasquet et. al. (2018) `_. For readers unfamiliar with cosmological redshift, `this article `_ provides a simple and brief introduction to the topic. For the training process, there are two primary additions that should be noted. @@ -146,10 +145,6 @@ Second, a class has been provided to give examples of how researchers may define The evaluation node has also been updated to provide metrics more in line with redshift estimation. Specifically, it calculates the fraction of outlier predictions, the model’s prediction bias, the deviation in the MAD scores of the model output, and the average Continuous Ranked Probability Score (CRPS) of the output. -.. Visualize Predictions -.. ~~~~~~~~~~~~~~~~~~~~~ - - Train-Test-Compare ~~~~~~~~~~~~~~~~~~ .. figure:: train-compare.png @@ -158,9 +153,53 @@ Train-Test-Compare This pipeline gives a more complicated example of how to create visualizations that may be helpful for understanding the effectiveness of a model. The **EvalCompare** node provides a simple comparison visualization of two models. +Train-PredVis +~~~~~~~~~~~~~ +.. figure:: vis-pred.png + :align: center + +This pipeline shows another more complex and useful visualization example that can be helpful for understanding the effectiveness of your redshift estimation model. It generates a set of graphs like the one below that show the output probability distribution function (pdf) for the redshift values of a set of random galaxies' images. A pair of vertical lines in each subplot indicate the actual redshift value (green) and the predicted redshift value (red) for that galaxy. + +As shown in this example, any visualization that can be created using the `matplotlib.pyplot `_ python library can be created and displayed by a pipeline. Displaying these visualizations can be accomplished by calling the **pyplot.show()** function after building the visualization. They can then be viewed from the `Executions view <../fundamentals/interface.rst#Executions>`_. + + +.. code-block:: python + + import numpy as np + from matplotlib import pyplot as plt + + class PredVis(): + def __init__(self, num_bins=180, num_rows=1, num_cols=1, max_val=0.4): + self.num_rows = num_rows + self.num_cols = num_cols + self.xrange = np.arange(0, max_val, max_val / num_bins) + return + + def execute(self, pt, gt, pdfs): + fig, splts = plt.subplots(self.num_rows, self.num_cols, sharex=True, sharey=True) + + num_samples = self.num_rows * self.num_cols + + random_indices = np.random.choice(list(range(len(gt))), num_samples, replace=False) + + s_pdfs = np.take(pdfs, random_indices, axis=0) + s_pt = np.take(pt, random_indices, axis=0) + s_gt = np.take(gt, random_indices, axis=0) + + for i in range(num_samples): + col = i % self.num_cols + row = i // self.num_cols + splts[row,col].plot(self.xrange, s_pdfs[i],'-') + splts[row,col].axvline(s_pt[i], color='red') + splts[row,col].axvline(s_gt[i], color='green') + plt.show() + +.. figure:: vis-pred-plot.png + :align: center + Download-Train-Evaluate ~~~~~~~~~~~~~~~~~~~~~~~ .. figure:: download.png :align: center -This pipeline provides an example of how data can be retrieved and utilized in the same pipeline. The previous pipelines use manually uploaded artifacts. In many real cases, users may desire to retrieve novel data or more specific data using SciServer’s CasJobs API. In such cases, the **DownloadSDSS** node here makes downloading data relatively simple for users. It should be noted that the data downloaded is not in a form easily usable by our models and first requires moderate preprocessing, which is performed in the **Preprocessing** node. This general structure of download-process-train is a common pattern, as data is rarely supplied in a clean, immediately usable format. +This pipeline provides an example of how data can be retrieved and utilized in the same pipeline. The previous pipelines use manually uploaded artifacts. In many real cases, users may desire to retrieve novel data or more specific data using SciServer’s CasJobs API. In such cases, the **DownloadSDSS** node here makes downloading data relatively simple for users. It should be noted that the data downloaded is not in a form easily usable by our models and first requires moderate preprocessing, which is performed in the **Preprocessing** node. This general structure of download-process-train is a common pattern, as data is rarely supplied in a clean, immediately usable format. \ No newline at end of file diff --git a/docs/examples/vis-pred-plot.png b/docs/examples/vis-pred-plot.png new file mode 100644 index 000000000..0aed10ce0 Binary files /dev/null and b/docs/examples/vis-pred-plot.png differ diff --git a/docs/examples/vis-pred.png b/docs/examples/vis-pred.png new file mode 100644 index 000000000..9dae7ba22 Binary files /dev/null and b/docs/examples/vis-pred.png differ