Handle all ambiguity codes + add "strict" types#22
Conversation
| fn dna(dna: &str) -> DnaSequence<NucleotideAmbiguous> { | ||
| DnaSequence::from_str(dna).unwrap() | ||
| } |
There was a problem hiding this comment.
I hate to say it since it'll be a pain, but I think the tests that use dna should be migrated to test on both Nucleotide and NucleotideAmbiguous to ensure the behavior is consistent.
There was a problem hiding this comment.
Also not sure if you've used quickcheck, but if you have or are interested in trying it, I think it could be useful to have a property test that tests behavior is consistent in the general case for DnaSequence<Nucleotide> and DnaSequence<NucleotideAmbiguous> where each NucleotideAmbiguous is A, T, C, or G.
If you want to write it and haven't used quickcheck before, LMK and I can point you at our property tests in the private repos and give some pointers.
|
Test failures are because you also need to add the new functions to the #[pymodule]
fn quickdna(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(_check_table, m)?)?;
m.add_function(wrap_pyfunction!(_translate, m)?)?;
m.add_function(wrap_pyfunction!(_reverse_complement, m)?)?;
// here
Ok(())
} |
| pub const M_AMBIGUITY: [Self; 2] = [Self::A, Self::C]; | ||
| pub const R_AMBIGUITY: [Self; 2] = Self::PURINES; | ||
| pub const W_AMBIGUITY: [Self; 2] = [Self::A, Self::T]; |
There was a problem hiding this comment.
removing these will break all upstream libraries that use these. It will be pretty annoying to bump the version of quickdna.
There was a problem hiding this comment.
We could keep them for backwards compatibility, but migrating should just be a matter of going from M_AMBIGUITY to NucleotideAmbiguous::M::possibilities()
vgel
left a comment
There was a problem hiding this comment.
Looks great, ready to merge IMO, waiting to approve until we resolve whether we want to keep the *_AMBIGUITY fields in for backwards compatibility.
| def test_translate(): | ||
| assert DnaSequence("AAAGGGAAA").translate(table=1) == ProteinSequence("KGK") | ||
| assert DnaSequence("AAAGGGAAA").translate( | ||
| table=1) == ProteinSequence("KGK") |
There was a problem hiding this comment.
note to self: also add python formatter and linter to this project. Given that there is some amount of python code. So far we only have Rust.
There was a problem hiding this comment.
This is actually the result of my Python formatter (autopep8 I think). I don't like it either though. I like black which is more like cargo fmt (i.e. quite strict/opinionated).
There was a problem hiding this comment.
we use black, maybe it outputs slightly different formats
|
you will have to rebase this to get #23. Otherwise CI wont let you merge. |
|
can you fix the README? It currently says:
|
ace44d1 to
2421562
Compare
Closes #18.
Changes
Types
Nucleotideno longer has N in it, and just represents one of ACTG.NucleotideAmbiguousrepresents ACTG or one of the 11 ambiguity codes WMRYSKBVDHN.NucleotideLikefor common behavior between the two, like "to ASCII" and "complement".Codon(3x Nucleotide) andCodonAmbiguous(3x NucleotideAmbiguous) are different types now.possibilities()methods for iterating over the possible realizations.Translation
N.DnaSequenceis generic over the type of the contained nucleotides. So,DnaSequence::<Nucleotide>::from_str(s)is our "strict mode", andDnaSequence::<NucleotideAmbiguous>::from_str(s)is the lax mode.Tests
test_dna_parses_strict, which verifies that this "strict mode" indeed only accepts"aAtTcCgG \t".test_translate_ambiguous, which verifies thatTTRTTVmaps to proteinLX: