Skip to content

Feature/filtering qc metrics highly variable genes#8

Merged
TRextabat merged 11 commits intodevelopfrom
feature/filtering-qc_metrics-highly_variable_genes
Apr 9, 2025
Merged

Feature/filtering qc metrics highly variable genes#8
TRextabat merged 11 commits intodevelopfrom
feature/filtering-qc_metrics-highly_variable_genes

Conversation

@Oykupnrbs
Copy link
Collaborator

@Oykupnrbs Oykupnrbs commented Apr 6, 2025

Pull Request Description:

This pull request adds three simple and reusable functions for preprocessing single-cell RNA-seq data with Scanpy.

What’s inside:

filter_data(): Removes cells with too few genes, and genes found in too few cells.
calculate_qc_metrics(): Finds mitochondrial genes and calculates quality control values.
select_highly_variable_genes(): Picks genes that are most variable for further analysis.

Demo Code:

  • Creates random fake data to test the functions.

  • Adds example mitochondrial genes to test QC.

  • Runs all three functions step by step with print statements to see results.

Why this is useful:

  • Code is now easier to read and use in other projects.

  • The demo shows how everything works in a simple way.

Demo Output:
output1

Oykupnrbs and others added 9 commits April 2, 2025 21:18
…election

- Added filter_data() to filter out low-quality cells and genes
- Added calculate_qc_metrics() to compute mitochondrial gene percentages and other QC stats
- Added select_highly_variable_genes() to identify informative genes for downstream analysis

These reusable functions improve code clarity and support scalable single-cell preprocessing workflows.
Copy link
Member

@TRextabat TRextabat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is great work Thx @Oykupnrbs @Ekin-hub-code

@TRextabat TRextabat self-requested a review April 9, 2025 16:15
Comment on lines +25 to +50
def calculate_qc_metrics(adata: AnnData) -> AnnData:
"""
Computes quality control (QC) metrics and adds them to the dataset.
Checks for mitochondrial genes with both uppercase and lowercase 'mt-' prefix.

Args:
adata (AnnData): The filtered dataset.

Returns:
AnnData: Dataset with QC metrics added.
"""
if adata.var_names.str.startswith("MT-").any():
adata.var["mt"] = adata.var_names.str.startswith("MT-")
elif adata.var_names.str.startswith("mt-").any():
adata.var["mt"] = adata.var_names.str.startswith("mt-")
else:
# In case neither is found, check in a case-insensitive way just to be sure
adata.var["mt"] = adata.var_names.str.upper().str.startswith("MT-")

sc.pp.calculate_qc_metrics(
adata,
qc_vars=["mt"],
percent_top=None,
log1p=False,
inplace=True
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add mt and MT as argument in future maybe we want to avoid mt filtration

@TRextabat TRextabat merged commit 461d960 into develop Apr 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants