Get nerd-sniped and do some micro-optimizations.#41
Get nerd-sniped and do some micro-optimizations.#41adam-azarchs wants to merge 11 commits intomasterfrom
Conversation
Use std::unique_ptr in more places. Replace c-style arrays with c++ std::array. Use defaulted constructors/destructors where appropriate. Make it easier for the compiler to vectorize comparisons in the inner loop of stitchAlignToTranscript.
We're linking statically, and this lets the compiler be better at dead-code elimination.
src/lib.rs
Outdated
| let (chr, pos, cigar, cigar_ops) = if record.len() > 0 { | ||
| let rec = &record[0]; | ||
| (rec.tid().to_string(), rec.pos().to_string(), format!("{}", rec.cigar()), rec.cigar().len().to_string()) | ||
| } else { | ||
| ("NA".to_string(), "NA".to_string(), "NA".to_string(), "NA".to_string()) | ||
| }; | ||
| println!("{:?},{:?},{:?},{},{},{},{}", std::str::from_utf8(&read).unwrap(), new_now.duration_since(now), record.len(), chr, pos, cigar, cigar_ops); |
There was a problem hiding this comment.
Lots of unnecessary memcpy going on here.
| let (chr, pos, cigar, cigar_ops) = if record.len() > 0 { | |
| let rec = &record[0]; | |
| (rec.tid().to_string(), rec.pos().to_string(), format!("{}", rec.cigar()), rec.cigar().len().to_string()) | |
| } else { | |
| ("NA".to_string(), "NA".to_string(), "NA".to_string(), "NA".to_string()) | |
| }; | |
| println!("{:?},{:?},{:?},{},{},{},{}", std::str::from_utf8(&read).unwrap(), new_now.duration_since(now), record.len(), chr, pos, cigar, cigar_ops); | |
| if record.len() > 0 { | |
| let rec = &record[0]; | |
| println!("{:?},{:?},{:?},{},{},{},{}", std::str::from_utf8(&read).unwrap(), new_now.duration_since(now), record.len(), rec.tid(), rec.pos(),rec.cigar(), rec.cigar().len()) | |
| } else { | |
| println!("{:?},{:?},{:?},NA,NA,NA,NA", std::str::from_utf8(&read).unwrap(), new_now.duration_since(now), record.len()) | |
| }; |
Though I guess it doesn't really matter if this is just temporary benchmarking code.
But, maybe this should be an actual benchmark?
|
Hey @adam-azarchs, yeah agreed it doesn't matter since this is just quick benchmarking code. Just for context, we're spending a bunch of time in In general, on a per read basis we take about ~280 microseconds to align a read, for a throughput of about 3.5K reads a second, which means that with datasets with ~380M reads we're taking almost 30 core hours to get our alignments out. Obviously, with cloud that's nothing, but I suspect we could probably do much better for stand alone users and consume less energy. I benchmarked this branch again, and as before found that although it's better for reasons of readability and I kind of want to merge it in just for that, it didn't move the needle on speed (actually was ~4% slower on average, and although not shown, that wasn't statistically significant, so it may have been a little faster or a little slower but wasn't a game changer). The plot below is the timings for 10K of the same reads to align, we have a lot of reproducible variation between reads but not so much between branches. The plot shows how long it took to align each read on master and your branch. There's a lot of heuristics at work in this code, but stepping through the code it looks like we should have a much higher throughput than we do, so am looking into that a bit more. |
0e34e5c to
b34d0ff
Compare
|
I think I know why there was maybe a performance regression, because the loop vectorizer was having trouble figuring out that comparing two |
|
I'm wondering whether #53 will make any difference to what we see here. |
|
I think at next |
|
Low utilization means I/O bound. There's definitely low-hanging fruit to be found there. To start with, I think we need to give another try to the mmap strategy, just this time a less buggy implementation. |

Use std::unique_ptr in more places.
Replace c-style arrays with c++ std::array.
Use defaulted constructors/destructors where appropriate.
Make it easier for the compiler to vectorize comparisons in the inner
loop of stitchAlignToTranscript.
Muck around a little with compiler flags.