-
Notifications
You must be signed in to change notification settings - Fork 0
Working with annotations
An annotation is anything that can be represented as a genomic interval. An annotation
- is found on a particular chromosome or reference
- has a start coordinate
- has an end coordinate
- is either positive-stranded, negative-stranded, or double-stranded
In this codebase, annotations are marked as such by implementing the Annotated interface.
The simplest annotation is represented by the Annotation class. (The similarity between general term "annotation" and the class Annotation is confusing. I'll use normal text for annotations in general, and monospace type for the Annotation class and for Annotation objects.)
Making an Annotation with one block is straightforward. Simply provide the necessary variables.
Annotated annot = new Annotation("chr1", 3000, 4000, Strand.POSITIVE);
The intervals represented by any Annotated object are closed-open, as in Python. The annotation above contains all positions from 3000 to 4000, inclusive.
An Annotation with more than one block can be made using a builder. The following constructs an annotation with two blocks, one at chr1:1000-2000(+) and the other at chr1:3000-4000(+).
Annotated annot = Annotation.builder()
.addAnnotation(new Annotation("chr1", 1000, 2000, Strand.POSITIVE))
.addAnnotation(new Annotation("chr1", 3000, 4000, Strand.POSITIVE))
.build();
The builder will merge blocks if they overlap or are adjacent. Despite having three blocks added to it, this builder produces the single-block Annotation chr1:1000-4000(+) when build() is called.
Annotated annot = Annotation.builder()
.addAnnotation(new Annotation("chr1", 1000, 2000, Strand.POSITIVE))
.addAnnotation(new Annotation("chr1", 2000, 3000, Strand.POSITIVE))
.addAnnotation(new Annotation("chr1", 3000, 4000, Strand.POSITIVE))
.build();
Previous versions of the codebase made a distinction between a SingleInterval, representing a single continuous genomic block, and a BlockedAnnotation, composed of multiple blocks or exons. The current implementation of the Annotation class eliminates this distinction.
All Annotated objects have methods to extract constituent introns and exons. These introns and exons are themselves annotations, i.e., they implement the Annotated interface.
You can get all introns from an Annotated object as a single annotation, but it will be wrapped in an Optional to deal with the case where there are no introns.
Annotated annot = Annotation.builder()
.addAnnotation(new Annotation("chr1", 1000, 2000, Strand.POSITIVE))
.addAnnotation(new Annotation("chr1", 3000, 4000, Strand.POSITIVE))
.build();
Optional<Annotated> annotIntrons = annot.getIntrons();
annotIntrons.get().equals(new Annotation("chr1", 2000, 3000, Strand.POSITIVE)) // true
Annotated noIntrons = new Annotation("chr1", 1000, 5000, Strand.POSITIVE)
Optional<Annotated> empty = noIntrons.getIntrons();
empty.isPresent() // false
empty.equals(Optional.empty()) // true
Get the individual exons:
Iterator<Annotated> exons = annot.getBlockIterator();
while (exons.hasNext()) {
Annotated exon = exons.next();
// Do something with exon here.
}
Get the individual introns:
Iterator<Annotated> introns = annot.getIntronIterator();
while (introns.hasNext()) {
Annotated intron = introns.next();
// Do something with intron here.
}
Perform an action on the exons:
annot.getBlockStream().forEach(x -> doSomethingWithExon(x))
Perform an action on the introns:
annot.getIntronStream().forEach(x -> doSomethingWithIntron(x))