Feature/filtering qc metrics highly variable genes#8
Merged
Conversation
…election - Added filter_data() to filter out low-quality cells and genes - Added calculate_qc_metrics() to compute mitochondrial gene percentages and other QC stats - Added select_highly_variable_genes() to identify informative genes for downstream analysis These reusable functions improve code clarity and support scalable single-cell preprocessing workflows.
TRextabat
reviewed
Apr 9, 2025
Member
TRextabat
left a comment
There was a problem hiding this comment.
that is great work Thx @Oykupnrbs @Ekin-hub-code
TRextabat
approved these changes
Apr 9, 2025
Comment on lines
+25
to
+50
| def calculate_qc_metrics(adata: AnnData) -> AnnData: | ||
| """ | ||
| Computes quality control (QC) metrics and adds them to the dataset. | ||
| Checks for mitochondrial genes with both uppercase and lowercase 'mt-' prefix. | ||
|
|
||
| Args: | ||
| adata (AnnData): The filtered dataset. | ||
|
|
||
| Returns: | ||
| AnnData: Dataset with QC metrics added. | ||
| """ | ||
| if adata.var_names.str.startswith("MT-").any(): | ||
| adata.var["mt"] = adata.var_names.str.startswith("MT-") | ||
| elif adata.var_names.str.startswith("mt-").any(): | ||
| adata.var["mt"] = adata.var_names.str.startswith("mt-") | ||
| else: | ||
| # In case neither is found, check in a case-insensitive way just to be sure | ||
| adata.var["mt"] = adata.var_names.str.upper().str.startswith("MT-") | ||
|
|
||
| sc.pp.calculate_qc_metrics( | ||
| adata, | ||
| qc_vars=["mt"], | ||
| percent_top=None, | ||
| log1p=False, | ||
| inplace=True | ||
| ) |
Member
There was a problem hiding this comment.
you could add mt and MT as argument in future maybe we want to avoid mt filtration
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Description:
This pull request adds three simple and reusable functions for preprocessing single-cell RNA-seq data with Scanpy.
What’s inside:
filter_data():Removes cells with too few genes, and genes found in too few cells.calculate_qc_metrics():Finds mitochondrial genes and calculates quality control values.select_highly_variable_genes():Picks genes that are most variable for further analysis.Demo Code:
Creates random fake data to test the functions.
Adds example mitochondrial genes to test QC.
Runs all three functions step by step with print statements to see results.
Why this is useful:
Code is now easier to read and use in other projects.
The demo shows how everything works in a simple way.
Demo Output:
