Skip to content
Open
3 changes: 2 additions & 1 deletion data/enpen/collection.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
]
},
"dataset_order": [
"enpen/enterovirus/ev-d68"
"enpen/enterovirus/ev-d68",
"enpen/enterovirus/cva16"
]
}
5 changes: 5 additions & 0 deletions data/enpen/enterovirus/cva16/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
## Unreleased

Initial release of a Coxsackievirus A16 dataset for lineage classification!

Read more about Nextclade datasets in the documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
54 changes: 54 additions & 0 deletions data/enpen/enterovirus/cva16/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Coxsackievirus A16 dataset based on reference "G-10"

| Key | Value |
|----------------------|-----------------------------------------------------------------------|
| authors | [Nadia Neuner-Jehle](https://eve-lab.org/people/nadia-neuner-jehle/), [Alejandra González-Sánchez](https://www.vallhebron.com/en/professionals/alejandra-gonzalez-sanchez), [Emma B. Hodcroft](https://eve-lab.org/people/emma-hodcroft/), [ENPEN](https://escv.eu/european-non-polio-enterovirus-network-enpen/) |
| name | Coxsackievirus A16 |
| reference | [U05876.1](https://www.ncbi.nlm.nih.gov/nuccore/U05876) |
| workflow | https://github.com/enterovirus-phylo/nextclade_a16 |
| path | `enpen/enterovirus/cva16` |
| clade definitions | A–F |

## Scope of this dataset

This dataset uses the historical G-10 prototype sequence ([U05876.1](https://www.ncbi.nlm.nih.gov/nuccore/U05876)), which may differ from contemporary global CVA16 strains. It is intended for broad subgenogroup classification, mutation quality control, and phylogenetic analysis of CVA16 diversity.

*Note: The G-10 reference differs substantially from currently circulating strains.* This is common for enterovirus datasets, in contrast to some other virus datasets (e.g., seasonal influenza), where the reference is updated more frequently to reflect recent lineages.

To address this, the dataset is *rooted* on a Static Inferred Ancestor — a phylogenetically reconstructed ancestral sequence near the tree root. This provides a stable reference point that can be used, optionally, as an alternative for mutation calling.

## Features

This dataset supports:

- Assignment of subgenotypes
- Phylogenetic placement
- Sequence quality control (QC)

## Subgenogroups of Coxsackievirus A16

Subgenogroups B1a, B1b and B1c are the major phylogenetic divisions of CVA16 and are commonly used in virological surveillance and literature. They are defined by phylogenetic clustering and do not necessarily indicate antigenic differences. In recent years, recombinant forms were identified and labeled C-F (also known as B2, B3, and D). These recombinant forms cluster with the prototype strain, clade A.

These designations are based on the phylogenetic structure and mutations, and are widely used in molecular epidemiology, similar to subgenotype systems for other enteroviruses. Unlike influenza (H1N1, H3N2) or SARS-CoV-2, there is no universal, standardized global lineage nomenclature for enteroviruses. Naming follows conventions from published studies and surveillance practices.

## Reference types

This dataset includes several reference points used in analyses:
- *Reference:* RefSeq or similarly established reference sequence. Here G-10.

- *Parent:* The nearest ancestral node of a sample in the tree, used to infer branch-specific mutations.

- *Clade founder:* The inferred ancestral node defining a clade (e.g., B1a, B2). Mutations "since clade founder" describe changes that define that clade.

- *Static Inferred Ancestor:* Reconstructed ancestral sequence inferred with an outgroup, representing the likely founder of CVA16. Serves as a stable reference.

- *Tree root:* Corresponds to the root of the tree, it may change in future updates as more data become available.

All references use the coordinate system of the G-10 sequence.

## Issues & Contact
- For questions or suggestions, please [open an issue](https://github.com/enterovirus-phylo/nextclade_a16/issues) or email: eve-group[at]swisstph.ch

## What is a Nextclade dataset?

A Nextclade dataset includes the reference sequence, genome annotations, tree, clade definitions, and QC rules. Learn more in the [Nextclade documentation](https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html).
17 changes: 17 additions & 0 deletions data/enpen/enterovirus/cva16/genome_annotation.gff3
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
##sequence-region U05876.1 1 7413
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=31704
U05876.1 Genbank region 1 7413 . + . ID=U05876.1:1..7413;Dbxref=taxon:31704;gb-acronym=CV-A16;gbkey=Src;mol_type=genomic RNA;nat-host=Homo sapiens;strain=G-10
U05876.1 Genbank CDS 751 957 . + . Name=VP4;gbkey=Prot;product=VP4;ID=id-AAA50478.1:1..69
U05876.1 Genbank CDS 958 1719 . + . Name=VP2;gbkey=Prot;product=VP2;ID=id-AAA50478.1:70..323
U05876.1 Genbank CDS 1720 2445 . + . Name=VP3;gbkey=Prot;product=VP3;ID=id-AAA50478.1:324..565
U05876.1 Genbank CDS 2446 3336 . + . Name=VP1;gbkey=Prot;product=VP1;ID=id-AAA50478.1:566..862
U05876.1 Genbank CDS 3337 3786 . + . Name=2A;product=2A;gbkey=Prot;ID=id-AAA50478.1:863..1012
U05876.1 Genbank CDS 3787 4083 . + . Name=2B;product=2B;gbkey=Prot;ID=id-AAA50478.1:1013..1111
U05876.1 Genbank CDS 4084 5070 . + . Name=2C;product=2C;gbkey=Prot;ID=id-AAA50478.1:1112..1440
U05876.1 Genbank CDS 5071 5328 . + . Name=3A;product=3A;gbkey=Prot;ID=id-AAA50478.1:1441..1526
U05876.1 Genbank CDS 5329 5394 . + . Name=3B;product=3B;gbkey=Prot;ID=id-AAA50478.1:1527..1548
U05876.1 Genbank CDS 5395 5943 . + . Name=3C;product=3C;gbkey=Prot;ID=id-AAA50478.1:1549..1731
U05876.1 Genbank CDS 5944 7329 . + . Name=3D;product=3D;gbkey=Prot;ID=id-AAA50478.1:1732..2193
Loading