Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,6 @@ _build
docs/pytometry.*
lamin_sphinx
docs/conf.py

# data
docs/tutorials/*.fcs
1 change: 1 addition & 0 deletions docs/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,5 @@ This makes it both easy for the user to understand the documentation, and for th

quickstart
read_fcs
preprocessing
```
262 changes: 262 additions & 0 deletions docs/tutorials/preprocessing.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Preprocess flow data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook, we load an fcs file into the anndata format, move the forward scatter (FCS) and sideward scatter (SSC) information to the `.obs` section of the anndata file and perform compensation on the data. Next, we apply different types of normalisation to the data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import readfcs\n",
"import pytometry as pm"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Read data from `readfcs` package example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from urllib.request import urlretrieve\n",
"\n",
"path_data, _ = urlretrieve(readfcs.datasets.example(), \"example.fcs\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"adata = pm.io.read_fcs(path_data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"adata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reduce features "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We split the data matrix into the marker intensity part and the FSC/SSC part. Moreover, we move all height related features to the `.obs` part of the anndata file. Notably. the function `split_signal` checks if a feature name is either FSC/SSC or whether a name endswith `-A` for area related features and `-H` for height related features. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pm.pp.split_signal(adata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us check the `var_names` of the features and the channel names. In this example, the channel names have been cleaned such that none of the markers have the `-A` or `-H` suffix. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"adata.var"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us modify the feature column `signal_type` manually."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"adata.var[\"signal_type\"] = adata.var[\"signal_type\"].cat.add_categories([\"area\"])\n",
"adata.var[\"signal_type\"][3:] = \"area\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"adata.var"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Repeat to split the data matrix."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pm.pp.split_signal(adata)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"adata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This time, we did not get the warning that all features are returned. Indeed, the data matrix was reduced by three features (`FSC-A`, `FSC-H` and `SSC-A`). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Compensation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we compensate the data using the compensation matrix that is included in the FCS file header. Alternatively, one may provide a custom compensation matrix."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pm.pp.compensate(adata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Normalize data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the next step, we normalize the data. By default, normalization is an inplace operation, i.e. we only create a new anndata object, if we set the argument `copy=True`. We demonstrate three different normalization methods that are build in `pytometry`:\n",
"* arcsinh \n",
"* logicle \n",
"* bi-exponential"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"adata_arcsinh = pm.tl.normalize_arcsinh(adata, cofactor=150, copy=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"adata_logicle = pm.tl.normalize_logicle(adata, copy=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"adata_biex = pm.tl.normalize_biExp(adata, copy=True)"
]
}
],
"metadata": {
"interpreter": {
"hash": "48c3c4927e81daf79217bae0bb1c93e3ab00a11990990ff2e155253980f357b0"
},
"kernelspec": {
"display_name": "Python 3.9.7 ('pyto_dev')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
6 changes: 2 additions & 4 deletions pytometry/preprocessing/_process_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,10 +173,8 @@ def compensate(
# check for nan values
nan_val = np.isnan(adata.X[:, indexes]).sum()
if nan_val > 0:
raise Warning(
f"{nan_val} NaN values found after compensation. Please adjust"
" compensation matrix."
)
assert f"{nan_val} NaN values found after compensation. Please adjust "
"compensation matrix."

return adata if copy else None

Expand Down