-
Notifications
You must be signed in to change notification settings - Fork 9
Loading your data
In order to conduct an analysis with the compassR package, you must first create a CompassSettings object. The constructor has the following parameters.
| Parameter | Description |
|---|---|
metabolic_model_directory(Default: See below.) |
The path to the directory containing the specifications of your metabolic model. |
gene_metadata_file(Default: "gene_metadata.csv") |
The name of the file in the metabolic model directory, containing tabular gene metadata. Each row should represent a single gene. The columns are up to you, so long as one of them provides a unique identifier for each gene. |
metabolite_metadata_file(Default: "metabolite_metadata.csv") |
The name of the file in the metabolic model directory, containing tabular metabolite metadata. Each row should represent a single metabolite. The columns are up to you, so long as one of them provides a unique identifier for each metabolite. |
reaction_metadata_file(Default: "reaction_metadata.csv") |
The name of the file in the metabolic model directory, containing tabular reaction metadata. Each row should represent a single reaction. The columns are up to you, so long as one of them provides a unique identifier for each reaction. |
user_data_directory |
The path to the directory containing the data specific to the analysis you hope to conduct. |
cell_metadata_file(Default: "cell_metadata.csv") |
The name of the file in the user data directory, containing tabular cell metadata.[1] Each row should represent a single cell. The columns are up to you, so long as one of them provides a unique identifier for each cell. |
compass_reaction_scores_file(Default: "reactions.tsv") |
The name of the file in the user data directory, containing the raw reaction consistencies matrix. (This is the output of the COMPASS algorithm.) |
linear_gene_expression_matrix_file(Default: "linear_gene_expression_matrix.tsv") |
The name of the file in the user data directory, containing the linear gene expression matrix. (This is the input to the COMPASS algorithm.[2]) |
cell_id_col_name |
The name of the column that uniquely identifies each cell in the cell metadata file. |
gene_id_col_name |
The name of the column that uniquely identifies each gene in the gene metadata file. |
Importantly, note that if you do not specify a metabolic model, then you opt to use the modified version of RECON2 that ships with the package.[3] In the vast majority of cases, this should be sufficient and you should not need to specify the metabolic_model_directory, gene_metadata_file, metabolite_metadata_file, or reaction_metadata_file. When using the default RECON2 model, gene_id_col_name should be "HGNC.symbol" for human genes or "MGI.symbol" for mouse genes.
You can also override any of the optional arguments that are documented in the man pages, if you so choose. Most of them concern postprocessing minutiae, like which reactions ought to be dropped, or how aggressively reactions ought to be clustered to define the metareactions.
In summary, the vast majority of the time you will instantiate your CompassSettings object like so ...
compass_settings <- CompassSettings$new(
user_data_directory = "path/to/your/data",
cell_id_col_name = "cell_id",
gene_id_col_name = "HGNC.symbol"
)... but if you want more flexibility, it's there for you to take advantage of it.
Once you have created a CompassSettings object, you're ready to create a CompassData object.
compass_data <- CompassData$new(compass_settings)This object will act as your interface with the following tables:
| Table | Type | Description |
|---|---|---|
reaction_consistencies |
Data frame | Each row is a reaction and each column is a cell. reaction_consistencies[i, j] is the consitency (or "compatibility") between reaction i and cell j. |
metareaction_consistencies |
Data frame | Each row is a metareaction and each column is a cell. metareaction_consistencies[i, j] is the consistency (or "compatibility") between metareaction i and cell j. |
metabolic_genes |
Tibble | Each row describes a gene in terms of its ID and whether it's a metabolic gene. |
gene_expression_statistics |
Tibble | Each row describes a cell in terms of its ID, total expression, metabolic expression, and metabolic activity.[4] |
cell_metadata |
Tibble | The cell metadata from cell_metadata.csv. In this example it's the Th17 cell data from the papers linked above. |
gene_metadata |
Tibble | The gene metadata from the metabolic model (RECON2, by default). |
metabolite_metadata |
Tibble | The metabolite metadata from the metabolic model (RECON2, by default). |
reaction_metadata |
Tibble | The reaction metadata from the metabolic model (RECON2, by default). |
reaction_partitions |
Tibble | Each row describes a reaction in terms of its ID, undirected ID, direction, and which metareaction (i.e. reaction group) it belongs to.[5] |
Note that all the metadata tables' fields are read as characters, and must manually be coerced into other data types if desired.
If you want to view a summary of the above at any time, you can simply evaluate your CompassData object in the interactive R interpreter and it will display like so:
CompassData:
reaction_consistencies data frame (6377 reactions x 290 cells)
metareaction_consistencies data frame (1181 metareactions x 290 cells)
metabolic_genes tibble (8813 genes x 2 fields)
gene_expression_statistics tibble (290 cells x 4 fields)
cell_metadata tibble (290 cells x 6 fields)
gene_metadata tibble (1733 genes x 6 fields)
metabolite_metadata tibble (5063 metabolites x 9 fields)
reaction_metadata tibble (7440 reactions x 8 fields)
reaction_partitions tibble (6377 reactions x 4 fields)
Footnotes:
- The presentation of the algorithm assumes a single-cell data set. However, you may choose to group cells together (e.g. via metacell or micropooling) to reduce computational overhead. You may also apply COMPASS to bulk transcriptome data sets (e.g. bulk RNA-seq or microarray data sets) of ample size.
- If you ran the COMPASS algorithm in several batches, then you have one linear gene expression matrix per COMPASS run. To combine them into a single linear gene expression matrix, you should concatenate them side-by-side (e.g. using the
cbindfunction in R), so that the column names read from left to right in the same order that the row names of the cell metadata file read from top to bottom. - Should you want to explore the specifics of the RECON2 model that ships with the package, you can load it from the directory to which the expression
system.file("extdata", "RECON2", package = "compassR")evaluates. - A cell's "total expression" is the extent to which it expresses any of its genes. Its "metabolic expression" is the extent to which it expresses its metabolic genes. And finally, its "metabolic activity" is the ratio of its metabolic expression to its total expression.
- A reaction's ID is composed of its undirected ID (i.e. shorthand for the reaction's name) and direction (i.e.
"pos"for forward reactions and"neg"for backward reactions). Meanwhile, a reaction's metareaction ID refers to the ID of the metareaction to which the reaction belongs. (A metareaction, in this context, is a group of similar reactions.)