Skip to content

BioMedBigDataCenter/KGCoV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KGCoV

To help analyze the spread and evolution of the virus, this repo collated and analyzed data related to the viral genome, its variations, and locations in time and space from GISAID and GenBank. Information from the Wikipedia web page and published research papers were categorized and mined to extract epidemiological data, which was then integrated with the public dataset. Genomic and epidemiological data were matched with public information, and the data quality was verified by manual curation.

This is a repo that contains code used in the article "Linking genomic and epidemiologic information to advance the study of COVID-19". For further information please refer to the publication.

File Description

case.py

This script ued to epidemiological data quality control.

Some imput file you can download, but others you should support the findings of this study are available from the corresponding author upon reasonable request.

genome.py

This script ued to genome data quality control.

ALL INPUT raw data that support the findings of this study are available from the corresponding author upon reasonable request.

case_genome.py

This script ued to match the epidemiological and the genome data.

Two input file are all generated by the first two step(case.py and genome.py)

One manual file(curated_case_genome.tsv) can support the findings of this study are available from the corresponding author upon reasonable request.

variant.py

This script is used to collect and quality control mutation information.

indicator_distribution_of_mutiple_countries.pdf

Every plot contains exactly 6 lines, which may overlap with each other.

About

To help analyze the spread and evolution of the virus, this repo collated and analyzed data related to the viral genome, its variations, and locations in time and space from GISAID and GenBank. Information from the Wikipedia web page and published research papers were categorized and mined to extract epidemiological data, which was then integrat…

Resources

License

Stars

Watchers

Forks

Contributors

Languages