Right now we have a single Nucleotide type that supports A/T/C/G/N (N being "any of ATCG").
We'd like to move this type to be "strict" (only support ATCG), then have NucleotideAmbiguous which is any of the ambiguity codes (ATCGYRWSKMDVHBN).
They'd have the same enum values so "converting" a sequence would just be a zero-copy check.
Thought: assign ATCG the enum values {1,2,4,8}, and represent ambiguities as bitwise ORs.
Consideration: CodonIdx becomes 12-bit instead of 9-bit; the translation tables will now each have length 4096, up from 293. (But this is surely still small enough to keep one cached.)
Right now we have a single
Nucleotidetype that supports A/T/C/G/N (N being "any of ATCG").We'd like to move this type to be "strict" (only support ATCG), then have
NucleotideAmbiguouswhich is any of the ambiguity codes (ATCGYRWSKMDVHBN).They'd have the same enum values so "converting" a sequence would just be a zero-copy check.
Thought: assign ATCG the enum values {1,2,4,8}, and represent ambiguities as bitwise ORs.
Consideration: CodonIdx becomes 12-bit instead of 9-bit; the translation tables will now each have length 4096, up from 293. (But this is surely still small enough to keep one cached.)