scverse · mbuttner · Aug 2, 2022 · Aug 2, 2022 · Aug 2, 2022 · Aug 2, 2022
diff --git a/.gitignore b/.gitignore
@@ -105,3 +105,6 @@ _build
 docs/pytometry.*
 lamin_sphinx
 docs/conf.py
+
+# data
+docs/tutorials/*.fcs
diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
@@ -11,4 +11,5 @@ This makes it both easy for the user to understand the documentation, and for th
 
 quickstart
 read_fcs
+preprocessing
 ```
diff --git a/docs/tutorials/preprocessing.ipynb b/docs/tutorials/preprocessing.ipynb
@@ -0,0 +1,262 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Preprocess flow data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this notebook, we load an fcs file into the anndata format, move the forward scatter (FCS) and sideward scatter (SSC) information to the `.obs` section of the anndata file and perform compensation on the data. Next, we apply different types of normalisation to the data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import readfcs\n",
+    "import pytometry as pm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%load_ext autoreload\n",
+    "%autoreload 2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Read data from `readfcs` package example."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from urllib.request import urlretrieve\n",
+    "\n",
+    "path_data, _ = urlretrieve(readfcs.datasets.example(), \"example.fcs\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata = pm.io.read_fcs(path_data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Reduce features "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We split the data matrix into the marker intensity part and the FSC/SSC part. Moreover, we move all height related features to the `.obs` part of the anndata file. Notably. the function `split_signal` checks if a feature name is either FSC/SSC or whether a name endswith `-A` for area related features and `-H` for height related features.   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pm.pp.split_signal(adata)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let us check the `var_names` of the features and the channel names. In this example, the channel names have been cleaned such that none of the markers have the `-A` or `-H` suffix. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata.var"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let us modify the feature column `signal_type` manually."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata.var[\"signal_type\"] = adata.var[\"signal_type\"].cat.add_categories([\"area\"])\n",
+    "adata.var[\"signal_type\"][3:] = \"area\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata.var"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Repeat to split the data matrix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pm.pp.split_signal(adata)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This time, we did not get the warning that all features are returned. Indeed, the data matrix was reduced by three features (`FSC-A`, `FSC-H` and `SSC-A`). "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Compensation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we compensate the data using the compensation matrix that is included in the FCS file header. Alternatively, one may provide a custom compensation matrix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pm.pp.compensate(adata)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Normalize data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the next step, we normalize the data. By default, normalization is an inplace operation, i.e. we only create a new anndata object, if we set the argument `copy=True`. We demonstrate three different normalization methods that are build in `pytometry`:\n",
+    "* arcsinh \n",
+    "* logicle \n",
+    "* bi-exponential"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata_arcsinh = pm.tl.normalize_arcsinh(adata, cofactor=150, copy=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata_logicle = pm.tl.normalize_logicle(adata, copy=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata_biex = pm.tl.normalize_biExp(adata, copy=True)"
+   ]
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "48c3c4927e81daf79217bae0bb1c93e3ab00a11990990ff2e155253980f357b0"
+  },
+  "kernelspec": {
+   "display_name": "Python 3.9.7 ('pyto_dev')",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/pytometry/preprocessing/_process_data.py b/pytometry/preprocessing/_process_data.py
@@ -173,10 +173,8 @@ def compensate(
     # check for nan values
     nan_val = np.isnan(adata.X[:, indexes]).sum()
     if nan_val > 0:
-        raise Warning(
-            f"{nan_val} NaN values found after compensation. Please adjust"
-            " compensation matrix."
-        )
+        assert f"{nan_val} NaN values found after compensation. Please adjust "
+        "compensation matrix."
 
     return adata if copy else None