Skip to content

Strict vs. ambiguous types for nucleotides #18

@lynn

Description

@lynn

Right now we have a single Nucleotide type that supports A/T/C/G/N (N being "any of ATCG").

We'd like to move this type to be "strict" (only support ATCG), then have NucleotideAmbiguous which is any of the ambiguity codes (ATCGYRWSKMDVHBN).

They'd have the same enum values so "converting" a sequence would just be a zero-copy check.

Thought: assign ATCG the enum values {1,2,4,8}, and represent ambiguities as bitwise ORs.

Consideration: CodonIdx becomes 12-bit instead of 9-bit; the translation tables will now each have length 4096, up from 293. (But this is surely still small enough to keep one cached.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions