Skip to content

Working with annotations

Mason M Lai edited this page Jun 22, 2017 · 4 revisions

An annotation is anything that can be represented as a genomic interval. An annotation

  • is found on a particular chromosome or reference
  • has a start coordinate
  • has an end coordinate
  • is either positive-stranded, negative-stranded, or double-stranded

In this codebase, annotations are marked as such by implementing the Annotated interface.

Constructing annotations

The simplest annotation is represented by the Annotation class. (The similarity between general term "annotation" and the class Annotation is confusing. I'll use normal text for annotations in general, and monospace type for the Annotation class and for Annotation objects.)

Single block

Making an Annotation with one block is straightforward. Simply provide the necessary variables.

Annotated annot = new Annotation("chr1", 3000, 4000, Strand.POSITIVE);

The intervals represented by any Annotated object are closed-open, as in Python. The annotation above contains all positions from 3000 to 4000, inclusive.

Multiple block

An Annotation with more than one block can be made using a builder. The following constructs an annotation with two blocks, one at chr1:1000-2000(+) and the other at chr1:3000-4000(+).

Annotated annot = Annotation.builder()
        .addAnnotation(new Annotation("chr1", 1000, 2000, Strand.POSITIVE))
        .addAnnotation(new Annotation("chr1", 3000, 4000, Strand.POSITIVE))
        .build();

The builder will merge blocks if they overlap or are adjacent. Despite having three blocks added to it, this builder produces the single-block Annotation chr1:1000-4000(+) when build() is called.

Annotated annot = Annotation.builder()
        .addAnnotation(new Annotation("chr1", 1000, 2000, Strand.POSITIVE))
        .addAnnotation(new Annotation("chr1", 2000, 3000, Strand.POSITIVE))
        .addAnnotation(new Annotation("chr1", 3000, 4000, Strand.POSITIVE))
        .build();

Previous versions of the codebase made a distinction between a SingleInterval, representing a single continuous genomic block, and a BlockedAnnotation, composed of multiple blocks or exons. The current implementation of the Annotation class eliminates this distinction.

Extracting introns/exons

All Annotated objects have methods to extract constituent introns and exons. These introns and exons are themselves annotations, i.e., they implement the Annotated interface.

Introns as a single annotation

You can get all introns from an Annotated object as a single annotation, but it will be wrapped in an Optional to deal with the case where there are no introns.

Annotated annot = Annotation.builder()
        .addAnnotation(new Annotation("chr1", 1000, 2000, Strand.POSITIVE))
        .addAnnotation(new Annotation("chr1", 3000, 4000, Strand.POSITIVE))
        .build();

Optional<Annotated> annotIntrons = annot.getIntrons();
annotIntrons.get().equals(new Annotation("chr1", 2000, 3000, Strand.POSITIVE))  // true

Annotated noIntrons = new Annotation("chr1", 1000, 5000, Strand.POSITIVE)
Optional<Annotated> empty = noIntrons.getIntrons();
empty.isPresent()              // false
empty.equals(Optional.empty()) // true

With an iterator

Get the individual exons:

Iterator<Annotated> exons = annot.getBlockIterator();
while (exons.hasNext()) {
    Annotated exon = exons.next();
    // Do something with exon here.
}

Get the individual introns:

Iterator<Annotated> introns = annot.getIntronIterator();
while (introns.hasNext()) {
    Annotated intron = introns.next();
    // Do something with intron here.
}

With a stream

Perform an action on the exons:

annot.getBlockStream().forEach(x -> doSomethingWithExon(x))

Perform an action on the introns:

annot.getIntronStream().forEach(x -> doSomethingWithIntron(x))

Clone this wiki locally