This repository stored the custom codes we used for the re-analysis of the protocol from Wang et al. Nature Protocols https://doi.org/10.1038/s41596-024-00968-2 (2024).
It is intended to provide researchers with access to reproduce our work, facilitating future researchers to easily prepare the input data and install of MCScanX tool.
Refer to Usage documents for details.
Note
If you are new to Snakmake, please refer to this page on how to set-up SnakeMake. Make sure to test the example data below before running the workflow on your data.
# Test if you have successfully installed the SnakeMake
mamba activate snakemake
snakemake --help
Note
Begin with a config.yaml file as below (detailed the input files requested).
The six species of NCBI assemblies are used in this analysis, including B. carinata, A. suecica, A. arenosa, T. arvense, A. thaliana, B. oleracea.
config.yaml
species_name:
- Athaliana
ncbi_genomes:
Athaliana:
ncbi_assembly: "data/ncbi_download/GCF_000001735.4.zip"
assembly_id: "GCF_000001735.4"
feature_table: "data/ncbi_download/GCF_000001735.4_TAIR10.1_feature_table.txt.gz"
species: AthalianaNote
Optional: To download extra ncbi assembly 'XX.zip' from NCBI, users can substitue the ncbi_assembly id (e.g., GCF_000001735.4) with yours in the command below:
curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000001735.4/download?include_annotation_type=GENOME_FASTA,GENOME_GFF,RNA_FASTA,CDS_FASTA,PROT_FASTA,SEQUENCE_REPORT&filename=GCF_000001735.4.zip"
Now, you can run the pipeline using the following commands:
# Download the package
git clone https://github.com/zx0223winner/MCScanX_Assistant.git
# enter the working directory
cd MCScanX_Assistant
Note
Due to the size of sample files (we have prepared users with the standard input files of NCBI genome assemblies for the six species of NCBI assemblies are used in this analysis, including B. carinata, A. suecica, A. arenosa, T. arvense, A. thaliana, B. oleracea. ).
please download the test data - MCScanX_Assistant_data.tar.gz through the Google drive link
(optional) please download the test result - MCScanX_Assistant_results.tar.gz through the Google drive link. This file includes complete running results for users to check.
# Then decompress the file MCScanX_Assistant_data.tar.gz under the MCScanX_Assistant directory,
# This will bring you a data folder with test files ready
tar -xvzf MCScanX_Assistant_results.tar.gz
# Then you can give a dry run by the following command.
snakemake --use-conda --cores all -s workflow/Snakefile_Input_preparing -n
# If everthing is OK, then you can test the pipeline by running one after another:
snakemake --use-conda --cores all -s workflow/Snakefile_Input_preparing
snakemake --use-conda --cores all -s workflow/Snakefile_Ks_distribution_plot
snakemake --use-conda --cores all -s workflow/Snakefile_MCScanX_6species
The code within this repository is licensed under the MIT License. Please refer to the license file for more information on the terms and conditions of using and contributing to this project.
If you used the codes in this respository, please cite the link.
GitHub tool logo generation. Author reviewed and edited the content as needed and takes full responsibility for the content.
