Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# CytoTable mise en place (general overview)\n",
"\n",
"This notebook includes a quick demonstration of CytoTable to help you understand the basics of using the package and the biological basis of each step.\n",
"This notebook will help you understand the basics of using CytoTable and the biological basis of each step.\n",
"We provide a high-level overview of the related concepts to give greater context about where and how the data are changed in order to gain new insights.\n",
"\n",
"The name of the notebook comes from the french _mise en place_:\n",
Expand Down Expand Up @@ -89,17 +89,18 @@
"id": "832c700f-63e0-4f22-853c-9bf6d5328a5c",
"metadata": {},
"source": [
"## Phase 1: Cells are stained and images are captured by microscopes\n",
"## Phase 1: Cells are imaged by microscopes, with optional fluorescence staining\n",
"\n",
"![Image showing cells being stained and captured as images using a microscope.](../_static/cell_to_image.png)\n",
"\n",
"__Figure 1.__ _Cells are stained in order to highlight cellular compartments and organelles. Microscopes are used to observe and capture data for later use._\n",
"__Figure 1.__ _A microscope images cells to highlight cell processes. Often, fluorescence dyes paint the cells to mark specific proteins, compartments, and/or organelles._\n",
"\n",
"CytoTable uses data created from multiple upstream steps involving images of \n",
"stained biological objects (typically cells).\n",
"Cells are cultured in multi-well plates, perturbed, and then fixed before being stained with a panel of six fluorescent dyes that highlight key cellular compartments and organelles, including the nucleus, nucleoli/RNA, endoplasmic reticulum, mitochondria, actin cytoskeleton, Golgi apparatus, and plasma membrane. These multiplexed stains are imaged across fluorescence channels using automated high-content microscopy, producing rich images that capture the morphology of individual cells for downstream analysis ([Bray et al., 2016](https://doi.org/10.1038/nprot.2016.105); [Gustafsdottir et al., 2013](https://doi.org/10.1371/journal.pone.0080999)).\n",
"CytoTable processes microscopy-based data that are created from multiple upstream steps.\n",
"CytoTable does not require any specific sample preparation, and can work with any microscopy experimental design.\n",
"However, most often, CytoTable processes fluorescence microscopy images from the Cell Painting assay.\n",
"In the Cell Painting assay, scientists stain cells with a panel of six fluorescent dyes that mark key cellular compartments and organelles, including the nucleus, nucleoli/RNA, endoplasmic reticulum, mitochondria, actin cytoskeleton, Golgi apparatus, and plasma membrane ([Bray et al., 2016](https://doi.org/10.1038/nprot.2016.105); [Gustafsdottir et al., 2013](https://doi.org/10.1371/journal.pone.0080999)). Scientists then use microscopes to image these cells across fluorescence channels, and use image analysis software to produce high-content morphology profiles of individual cells for downstream analysis .\n",
"\n",
"We use the ExampleHuman dataset provided from CellProfiler Examples ([Moffat et al., 2006](https://doi.org/10.1016/j.cell.2006.01.040), [CellProfiler Examples Link](https://github.com/CellProfiler/examples/tree/master/ExampleHuman)) to help describe this process below."
"We use the ExampleHuman dataset provided from CellProfiler Examples ([Moffat et al., 2006](https://doi.org/10.1016/j.cell.2006.01.040), [CellProfiler Examples Link](https://github.com/CellProfiler/examples/tree/master/ExampleHuman)) to describe this process below."
]
},
{
Expand Down Expand Up @@ -185,17 +186,17 @@
"id": "23897ed5-53aa-41a2-a8b2-494498045262",
"metadata": {},
"source": [
"## Phase 2: Images are segmented to build numeric feature datasets via CellProfiler\n",
"## Phase 2: CellProfiler segments cells and measures numeric features\n",
"\n",
"![Image showing CellProfiler being used to create image segmentations, measurements, and exporting numeric feature data to a file.](../_static/image_to_features.png)\n",
"\n",
"\n",
"__Figure 2.__ _CellProfiler is configured to use images and performs segmentation to evaluate numeric representations of cells. This data is captured for later use in tabular file formats such as CSV or SQLite tables._\n",
"__Figure 2.__ _CellProfiler takes in microscopy images and performs single-cell segmentation to distinguish cells from background. CellProfiler then measures \"hand-engineered\" computer vision features from every single cell. These data are captured for later use in a CSV table or SQLite database._\n",
"\n",
"After acquisition, the multiplexed images are processed using image-analysis software such as CellProfiler, which segments cells and their compartments into distinct regions of interest. From these segmented images, hundreds to thousands of quantitative features are extracted per cell, capturing properties such as size, shape, intensity, texture, and spatial organization.\n",
"After acquisition, scientists process the images using image-analysis software such as CellProfiler. CellProfiler segments single cells and their biological compartments into distinct regions of interest. From these segmented cells, CellProfiler extracts hundreds to thousands of quantitative features per cell, capturing properties such as size, shape, intensity, texture, and spatial organization.\n",
"These high-dimensional feature datasets provide a numerical representation of cell morphology that serves as the foundation for downstream profiling and analysis ([Carpenter et al., 2006](https://doi.org/10.1186/gb-2006-7-10-r100)).\n",
"\n",
"CellProfiler was used in conjunction with the `.cppipe` file to produce the following images and data tables from the ExampleHuman dataset."
"We use CellProfiler (with a prespecified configuration `.cppipe` file) to produce the following images and data tables from the ExampleHuman dataset."
]
},
{
Expand Down Expand Up @@ -1266,7 +1267,7 @@
}
],
"source": [
"# show the tables generated from the resulting CSV files\n",
"# show the tables generated from the resulting CSV files\n",
"for profiles in pathlib.Path(source_path).glob(\"*.csv\"):\n",
" print(f\"\\nProfiles from CellProfiler: {profiles}\")\n",
" display(pd.read_csv(profiles).head())"
Expand All @@ -1278,13 +1279,13 @@
"id": "5f5b7cd6-9511-4349-bacf-e6304a099025",
"metadata": {},
"source": [
"## Phase 3: Numeric feature datasets from CellProfiler are harmonized by CytoTable\n",
"## Phase 3: CytoTable harmonizes the feature datasets that CellProfiler generates\n",
"\n",
"![Image showing feature data being read by CytoTable and exported to a CytoTable file.](../_static/features_to_cytotable.png)\n",
"\n",
"The high-dimensional feature tables produced by CellProfiler often vary in format depending on the imaging pipeline, experiment, or storage system. CytoTable standardizes these single-cell morphology datasets by harmonizing outputs into consistent, analysis-ready formats such as Parquet or AnnData. This unification ensures that data from diverse experiments can be readily integrated and processed by downstream profiling tools like Pycytominer or coSMicQC, enabling scalable and reproducible cytomining workflows.\n",
"CellProfiler produces high-dimensional feature tables that vary in format depending on the imaging pipeline, experiment, or storage system. Sometimes these feature tables are thousands of columns and hundreds of thousands of rows. CytoTable harmonizes these outputs into consistent, analysis-ready formats such as Parquet or AnnData. This unification ensures that data from diverse experiments can be readily integrated and processed by downstream profiling tools like Pycytominer or coSMicQC, enabling scalable and reproducible bioinformatics workflows.\n",
"\n",
"We use CytoTable below to process the numeric feature data observed above."
"We use CytoTable below to process the numeric feature data we generated above."
]
},
{
Expand All @@ -1298,8 +1299,8 @@
"output_type": "stream",
"text": [
"example.parquet\n",
"CPU times: user 215 ms, sys: 159 ms, total: 374 ms\n",
"Wall time: 13.1 s\n"
"CPU times: user 239 ms, sys: 167 ms, total: 406 ms\n",
"Wall time: 13.3 s\n"
]
}
],
Expand Down Expand Up @@ -1594,13 +1595,13 @@
{
"data": {
"text/plain": [
"<pyarrow._parquet.FileMetaData object at 0x151376570>\n",
"<pyarrow._parquet.FileMetaData object at 0x1762a7dd0>\n",
" created_by: parquet-cpp-arrow version 21.0.0\n",
" num_columns: 312\n",
" num_rows: 289\n",
" num_row_groups: 1\n",
" format_version: 2.6\n",
" serialized_size: 87760"
" serialized_size: 87761"
]
},
"execution_count": 9,
Expand All @@ -1623,7 +1624,7 @@
"data": {
"text/plain": [
"{b'data-producer': b'https://github.com/cytomining/CytoTable',\n",
" b'data-producer-version': b'1.1.0.post6.dev0+4ddbbe1'}"
" b'data-producer-version': b'1.1.0.post13.dev0+2f51ec3'}"
]
},
"execution_count": 10,
Expand Down Expand Up @@ -1990,7 +1991,7 @@
"Nuclei_Number_Object_Number: int64\n",
"-- schema metadata --\n",
"data-producer: 'https://github.com/cytomining/CytoTable'\n",
"data-producer-version: '1.1.0.post6.dev0+4ddbbe1'"
"data-producer-version: '1.1.0.post13.dev0+2f51ec3'"
]
},
"execution_count": 12,
Expand Down Expand Up @@ -2020,7 +2021,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
"version": "3.10.16"
}
},
"nbformat": 4,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,9 +93,9 @@
"\n",
"![Image showing cells being stained and captured as images using a microscope.](../_static/cell_to_image.png)\n",
"\n",
"__Figure 1.__ _A microscope images cells to highlight cell processes. Often, fluorescence dyes paint the cells to mark specific proteins, compartments, and/or organelles._\n",
"__Figure 1.__ _A microscope images cells to highlight cell processes. Often, fluorescence dyes stain the cells to mark specific proteins, compartments, and/or organelles._\n",
"\n",
"CytoTable processes microscopy-based data that are created from multiple upstream steps.\n",
"CytoTable processes microscopy-based data that are created from multiple upstream steps (image analysis).\n",
"CytoTable does not require any specific sample preparation, and can work with any microscopy experimental design.\n",
"However, most often, CytoTable processes fluorescence microscopy images from the Cell Painting assay.\n",
"In the Cell Painting assay, scientists stain cells with a panel of six fluorescent dyes that mark key cellular compartments and organelles, including the nucleus, nucleoli/RNA, endoplasmic reticulum, mitochondria, actin cytoskeleton, Golgi apparatus, and plasma membrane ([Bray et al., 2016](https://doi.org/10.1038/nprot.2016.105); [Gustafsdottir et al., 2013](https://doi.org/10.1371/journal.pone.0080999)). Scientists then use microscopes to image these cells across fluorescence channels, and use image analysis software to produce high-content morphology profiles of individual cells for downstream analysis .\n",
Expand Down Expand Up @@ -191,7 +191,7 @@
"![Image showing CellProfiler being used to create image segmentations, measurements, and exporting numeric feature data to a file.](../_static/image_to_features.png)\n",
"\n",
"\n",
"__Figure 2.__ _CellProfiler takes in microscopy images and performs single-cell segmentation to distinguish cells from background. CellProfiler then measures \"hand-engineered\" computer vision features from every single cell. These data are captured for later use in a CSV table or SQLite database._\n",
"__Figure 2.__ _CellProfiler takes in microscopy images and performs single-cell segmentation to distinguish cells from background. CellProfiler then measures \"hand-engineered\" computer vision features from every single cell. These data are captured for later use in multiple CSV tables or SQLite database._\n",
"\n",
"After acquisition, scientists process the images using image-analysis software such as CellProfiler. CellProfiler segments single cells and their biological compartments into distinct regions of interest. From these segmented cells, CellProfiler extracts hundreds to thousands of quantitative features per cell, capturing properties such as size, shape, intensity, texture, and spatial organization.\n",
"These high-dimensional feature datasets provide a numerical representation of cell morphology that serves as the foundation for downstream profiling and analysis ([Carpenter et al., 2006](https://doi.org/10.1186/gb-2006-7-10-r100)).\n",
Expand Down
34 changes: 12 additions & 22 deletions docs/source/examples/cytotable_mise_en_place_general_overview.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,15 @@
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.17.2
# format_name: light
# format_version: '1.5'
# jupytext_version: 1.17.3
# kernelspec:
# display_name: Python 3 (ipykernel)
# language: python
# name: python3
# ---

# %% [markdown]
# # CytoTable mise en place (general overview)
#
# This notebook will help you understand the basics of using CytoTable and the biological basis of each step.
Expand All @@ -24,7 +23,7 @@
# > refer to organizing and arranging the ingredients ..."
# > - [Wikipedia](https://en.wikipedia.org/wiki/Mise_en_place)

# %%
# +
import pathlib
from collections import Counter

Expand All @@ -38,31 +37,29 @@
# setup variables for use throughout the notebook
source_path = "../../../tests/data/cellprofiler/examplehuman"
dest_path = "./example.parquet"
# -

# %%
# remove the dest_path if it's present
if pathlib.Path(dest_path).is_file():
pathlib.Path(dest_path).unlink()

# %%
# show the files we will use as source data with CytoTable
list(pathlib.Path(source_path).glob("*"))

# %% [markdown]
# ## Phase 1: Cells are imaged by microscopes, with optional fluorescence staining
#
# ![Image showing cells being stained and captured as images using a microscope.](../_static/cell_to_image.png)
#
# __Figure 1.__ _A microscope images cells to highlight cell processes. Often, fluorescence dyes paint the cells to mark specific proteins, compartments, and/or organelles._
# __Figure 1.__ _A microscope images cells to highlight cell processes. Often, fluorescence dyes stain the cells to mark specific proteins, compartments, and/or organelles._
#
# CytoTable processes microscopy-based data that are created from multiple upstream steps.
# CytoTable processes microscopy-based data that are created from multiple upstream steps (image analysis).
# CytoTable does not require any specific sample preparation, and can work with any microscopy experimental design.
# However, most often, CytoTable processes fluorescence microscopy images from the Cell Painting assay.
# In the Cell Painting assay, scientists stain cells with a panel of six fluorescent dyes that mark key cellular compartments and organelles, including the nucleus, nucleoli/RNA, endoplasmic reticulum, mitochondria, actin cytoskeleton, Golgi apparatus, and plasma membrane ([Bray et al., 2016](https://doi.org/10.1038/nprot.2016.105); [Gustafsdottir et al., 2013](https://doi.org/10.1371/journal.pone.0080999)). Scientists then use microscopes to image these cells across fluorescence channels, and use image analysis software to produce high-content morphology profiles of individual cells for downstream analysis .
#
# We use the ExampleHuman dataset provided from CellProfiler Examples ([Moffat et al., 2006](https://doi.org/10.1016/j.cell.2006.01.040), [CellProfiler Examples Link](https://github.com/CellProfiler/examples/tree/master/ExampleHuman)) to describe this process below.

# %%
# +
# display the images we will gather features from
image_name_map = {"d0.tif": "DNA", "d1.tif": "PH3", "d2.tif": "Cells"}

Expand All @@ -73,34 +70,31 @@
stain = val
print(f"\nImage with stain: {stain}")
display(Image.open(image))
# -

# %% [markdown]
# ## Phase 2: CellProfiler segments cells and measures numeric features
#
# ![Image showing CellProfiler being used to create image segmentations, measurements, and exporting numeric feature data to a file.](../_static/image_to_features.png)
#
#
# __Figure 2.__ _CellProfiler takes in microscopy images and performs single-cell segmentation to distinguish cells from background. CellProfiler then measures "hand-engineered" computer vision features from every single cell. These data are captured for later use in a CSV table or SQLite database._
# __Figure 2.__ _CellProfiler takes in microscopy images and performs single-cell segmentation to distinguish cells from background. CellProfiler then measures "hand-engineered" computer vision features from every single cell. These data are captured for later use in multiple CSV tables or SQLite database._
#
# After acquisition, scientists process the images using image-analysis software such as CellProfiler. CellProfiler segments single cells and their biological compartments into distinct regions of interest. From these segmented cells, CellProfiler extracts hundreds to thousands of quantitative features per cell, capturing properties such as size, shape, intensity, texture, and spatial organization.
# These high-dimensional feature datasets provide a numerical representation of cell morphology that serves as the foundation for downstream profiling and analysis ([Carpenter et al., 2006](https://doi.org/10.1186/gb-2006-7-10-r100)).
#
# We use CellProfiler (with a prespecified configuration `.cppipe` file) to produce the following images and data tables from the ExampleHuman dataset.

# %%
# show the segmentations through an overlay with outlines
for image in pathlib.Path(source_path).glob("*Overlay.png"):
print(f"Image outlines from segmentation (composite)")
print("Color key: (dark blue: nuclei, light blue: cells, yellow: PH3)")
display(Image.open(image))

# %%
# show the tables generated from the resulting CSV files
for profiles in pathlib.Path(source_path).glob("*.csv"):
print(f"\nProfiles from CellProfiler: {profiles}")
display(pd.read_csv(profiles).head())

# %% [markdown]
# ## Phase 3: CytoTable harmonizes the feature datasets that CellProfiler generates
#
# ![Image showing feature data being read by CytoTable and exported to a CytoTable file.](../_static/features_to_cytotable.png)
Expand All @@ -109,7 +103,7 @@
#
# We use CytoTable below to process the numeric feature data we generated above.

# %%
# +
# %%time

# run cytotable convert
Expand All @@ -122,25 +116,21 @@
preset="cellprofiler_csv",
)
print(pathlib.Path(result).name)
# -

# %%
# show the table head using pandas
pq.read_table(source=result).to_pandas().head()

# %%
# show metadata for the result file
pq.read_metadata(result)

# %%
# show schema metadata which includes CytoTable information
# note: this information will travel with the file.
pq.read_schema(result).metadata

# %%
# show schema column name summaries
print("Column name prefix counts:")
dict(Counter(w.split("_", 1)[0] for w in pq.read_schema(result).names))

# %%
# show full schema details
pq.read_schema(result)
2 changes: 1 addition & 1 deletion docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ caption: 'Contents:'
maxdepth: 3
---
overview
tutorial
tutorials
examples
presentations
contributing
Expand Down
2 changes: 1 addition & 1 deletion docs/source/overview.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Overview

This page provides a brief overview of CytoTable topics.
For a brief introduction on how to use CytoTable, please see the [tutorial](tutorial.md) page.
For a brief introduction on how to use CytoTable, please see the [tutorials](tutorials.md) page.

## Presets and Manual Overrides

Expand Down
Loading