Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion exercises/hamming/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
[package]
name = "hamming"
version = "0.0.0"
version = "2.1.1"
27 changes: 7 additions & 20 deletions exercises/hamming/README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,20 @@
# Hamming

Calculate the Hamming difference between two DNA strands.
Calculate the Hamming Distance between two DNA strands.

A mutation is simply a mistake that occurs during the creation or
copying of a nucleic acid, in particular DNA. Because nucleic acids are
vital to cellular functions, mutations tend to cause a ripple effect
throughout the cell. Although mutations are technically mistakes, a very
rare mutation may equip the cell with a beneficial attribute. In fact,
the macro effects of evolution are attributable by the accumulated
result of beneficial microscopic mutations over many generations.
Your body is made up of cells that contain DNA. Those cells regularly wear out and need replacing, which they achieve by dividing into daughter cells. In fact, the average human body experiences about 10 quadrillion cell divisions in a lifetime!

The simplest and most common type of nucleic acid mutation is a point
mutation, which replaces one base with another at a single nucleotide.
When cells divide, their DNA replicates too. Sometimes during this process mistakes happen and single pieces of DNA get encoded with the incorrect information. If we compare two strands of DNA and count the differences between them we can see how many mistakes occurred. This is known as the "Hamming Distance".

By counting the number of differences between two homologous DNA strands
taken from different genomes with a common ancestor, we get a measure of
the minimum number of point mutations that could have occurred on the
evolutionary path between the two strands.

This is called the 'Hamming distance'.

It is found by comparing two DNA strands and counting how many of the
nucleotides are different from their equivalent in the other string.
We read DNA using the letters C,A,G and T. Two strands might look like this:

GAGCCTACTAACGGGAT
CATCGTAATGACGGCCT
^ ^ ^ ^ ^ ^^

The Hamming distance between these two DNA strands is 7.
They have 7 differences, and therefore the Hamming Distance is 7.

The Hamming Distance is useful for lots of things in science, not just biology, so it's a nice phrase to be familiar with :)

# Implementation notes

Expand Down
2 changes: 1 addition & 1 deletion exercises/hamming/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/// Return the Hamming distance between the strings,
/// or None if the lengths are mismatched.
pub fn hamming_distance(s1: &str, s2: &str) -> Option<usize> {
unimplemented!("What is the Hamming Distance between {:?} and {:?}", s1, s2);
unimplemented!("What is the Hamming Distance between {} and {}", s1, s2);
}
121 changes: 113 additions & 8 deletions exercises/hamming/tests/hamming.rs
Original file line number Diff line number Diff line change
@@ -1,42 +1,147 @@
extern crate hamming;

fn process_distance_case(strand_pair: [&str; 2], expected_distance: Option<usize>) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why you wrote this as [&str; 2] instead of (&str, &str). I tend to think of function arguments as tuple-like instead of array-like, so the latter would have seemed more natural to me.

I suspect that the two forms are equivalent in memory, and this does work; you certainly don't need to change it. I'm just curious about your design process.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess I decided to go with the array, because the types of the variables are similar (perhaps a little C-like design).

If the strands where represented with different types, e.g. LeftStrand / RightStrand, then I would surely used the tuple.

assert_eq!(
hamming::hamming_distance(strand_pair[0], strand_pair[1]),
expected_distance
);
}

#[test]
fn test_no_difference_between_empty_strands() {
assert_eq!(hamming::hamming_distance("", ""), Some(0));
fn test_empty_strands() {
process_distance_case(["", ""], Some(0));
}

#[test]
#[ignore]
fn test_no_difference_between_identical_strands() {
assert_eq!(hamming::hamming_distance("GGACTGA", "GGACTGA"), Some(0));
process_distance_case(["GGACTGA", "GGACTGA"], Some(0));
}

#[test]
#[ignore]
fn test_complete_hamming_distance_in_small_strand() {
assert_eq!(hamming::hamming_distance("ACT", "GGA"), Some(3));
process_distance_case(["ACT", "GGA"], Some(3));
}

#[test]
#[ignore]
fn test_small_hamming_distance_in_the_middle_somewhere() {
assert_eq!(hamming::hamming_distance("GGACG", "GGTCG"), Some(1));
process_distance_case(["GGACG", "GGTCG"], Some(1));
}

#[test]
#[ignore]
fn test_larger_distance() {
assert_eq!(hamming::hamming_distance("ACCAGGG", "ACTATGG"), Some(2));
process_distance_case(["ACCAGGG", "ACTATGG"], Some(2));
}

#[test]
#[ignore]
fn test_first_string_is_longer() {
assert_eq!(hamming::hamming_distance("AAA", "AA"), None);
process_distance_case(["AAA", "AA"], None);
}

#[test]
#[ignore]
fn test_second_string_is_longer() {
assert_eq!(hamming::hamming_distance("A", "AA"), None);
process_distance_case(["A", "AA"], None);
}

#[test]
#[ignore]
/// non-unique character in first strand
fn test_nonunique_character_in_first_strand() {
process_distance_case(["AAG", "AAA"], Some(1));
}

#[test]
#[ignore]
/// identical strands
fn test_identical_strands() {
process_distance_case(["A", "A"], Some(0));
}

#[test]
#[ignore]
/// complete distance in small strands
fn test_complete_distance_in_small_strands() {
process_distance_case(["AG", "CT"], Some(2));
}

#[test]
#[ignore]
/// disallow first strand longer
fn test_disallow_first_strand_longer() {
process_distance_case(["AATG", "AAA"], None);
}

#[test]
#[ignore]
/// large distance
fn test_large_distance() {
process_distance_case(["GATACA", "GCATAA"], Some(4));
}

#[test]
#[ignore]
/// long identical strands
fn test_long_identical_strands() {
process_distance_case(["GGACTGA", "GGACTGA"], Some(0));
}

#[test]
#[ignore]
/// complete distance in single nucleotide strands
fn test_complete_distance_in_single_nucleotide_strands() {
process_distance_case(["A", "G"], Some(1));
}

#[test]
#[ignore]
/// small distance
fn test_small_distance() {
process_distance_case(["GGACG", "GGTCG"], Some(1));
}

#[test]
#[ignore]
/// non-unique character in second strand
fn test_nonunique_character_in_second_strand() {
process_distance_case(["AAA", "AAG"], Some(1));
}

#[test]
#[ignore]
/// small distance in long strands
fn test_small_distance_in_long_strands() {
process_distance_case(["ACCAGGG", "ACTATGG"], Some(2));
}

#[test]
#[ignore]
/// disallow second strand longer
fn test_disallow_second_strand_longer() {
process_distance_case(["ATA", "AGTG"], None);
}

#[test]
#[ignore]
/// small distance in small strands
fn test_small_distance_in_small_strands() {
process_distance_case(["AT", "CT"], Some(1));
}

#[test]
#[ignore]
/// large distance in off-by-one strand
fn test_large_distance_in_offbyone_strand() {
process_distance_case(["GGACGGATTCTG", "AGGACGGATTCT"], Some(9));
}

#[test]
#[ignore]
/// same nucleotides in different positions
fn test_same_nucleotides_in_different_positions() {
process_distance_case(["TAG", "GAT"], Some(2));
}