Conversation
setup.py
Outdated
| CLASSIFIERS = [ | ||
| "Development Status :: 4 - Beta", | ||
| "Natural Language :: English", | ||
| "License :: OSI Approved :: GNU General Public License v2 (GPLv2)", |
There was a problem hiding this comment.
The license checked into the repo is MIT. Other Mint repos are BSD-3 so would prefer to use that for consistency, unless there's a good reason not to.
There was a problem hiding this comment.
No reason. I'll switch it.
src/sctools/bam.py
Outdated
|
|
||
| def tag(self, output_bam_name, tag_generators): | ||
| """ | ||
| given a bam file and tag generators in the same order, adds tags to the .bam file, |
There was a problem hiding this comment.
what do you mean by "in the same order" here? same order as something in the bam file?
There was a problem hiding this comment.
Correct. All files need equivalent sort order. I'll make the doc string clearer.
| outbam = pysam.AlignmentFile(output_bam_name, 'wb', header=inbam.header) | ||
| try: | ||
| # zip up all the iterators | ||
| for *tag_sets, sam_record in zip(*tag_generators, inbam): |
There was a problem hiding this comment.
I'm not familiar with pysam. I'm assuming this does not pull all reads in the bam into memory at once, although naively that's that it looks like. Can you confirm?
There was a problem hiding this comment.
Confirmed; it's an iterator, and pysam is the python wrapper for htslib.
This line is zipping up three iterators that are lazily read. You get 2 fastq records (8 lines) and one sam line per iteration.
src/sctools/reader.py
Outdated
|
|
||
| :return Iterator: iterator over tuples of records, one from each passed Reader object. | ||
| """ | ||
| # iterators = [iter(r) for r in readers] |
There was a problem hiding this comment.
Is this commented out line needed?
src/sctools/reader.py
Outdated
|
|
||
|
|
||
| def zip_readers(*readers, indices=None): | ||
| """zip together multiple fastq objects, yielding records simultaneously. |
There was a problem hiding this comment.
This is actually a more general function, right? It doesn't have to be a fastq object, just a Reader.
There was a problem hiding this comment.
Good point. I'll edit the docstring!
setup.py
Outdated
| install_requires=[ | ||
| 'pysam', | ||
| 'numpy', | ||
| 'google-cloud', |
There was a problem hiding this comment.
This brings in a huge web of dependencies. We were doing the same in secondary-analysis but changed it to install a more targeted set of google libraries. (We ran into an issue with one of the many packages that google-cloud installs, which it turned out we didn't actually need.) This also makes it a lot faster to install.
There was a problem hiding this comment.
Ah, very cool. Do you think it would be adequate to pull the same set of libraries you're using in the secondary-analysis repo?
There was a problem hiding this comment.
Not sure, depends which actual google modules you're using. Are you using it just for reading from buckets? In that case you probably just need google-cloud-storage.
There was a problem hiding this comment.
I'll verify, but I think that's all.
| return the length of the Reader object. this should typically not be used with sys.stdin, | ||
| as it will consume the input | ||
| """ | ||
| return sum(1 for _ in self) |
There was a problem hiding this comment.
Would be helpful to clarify that this gives the sum of line counts for all files in Reader and requires fully reading all files.
| for file_ in self._files: | ||
|
|
||
| # set correct mode | ||
| if self._mode == 'r' and any(file_.endswith(s) for s in ['.gz', '.bz2']): |
There was a problem hiding this comment.
Why do we treat this as a special case? We already set mode = 'r' when self._mode == 'r' by default in the else. In this special case we set it to 'rt' instead. And according to this, r and rt are the same thing:
https://stackoverflow.com/questions/23051062/open-files-in-rt-and-wt-modes
There was a problem hiding this comment.
the gzip and bzip2 libraries do make a distinction between r and rt; r means bytes, by default for these packages, which is annoying/confusing. This line fixes it so r means str and rb means bytes across the reader, which is, I think, normally what the user expects.
There was a problem hiding this comment.
... if they have proper extensions. I will look into a better way of detecting compressing type.
There was a problem hiding this comment.
Okay, in that case I think what you have is perfectly reasonable here. Maybe just add a comment explaining that rt really does differ from r in this case.
src/sctools/reader.py
Outdated
| """return the collective size of all files being read in bytes""" | ||
| return sum(os.stat(f).st_size for f in self._files) | ||
|
|
||
| def select_indices(self, indices): |
There was a problem hiding this comment.
Indices is a set of line numbers that we want to filter down to, right? Or is it more general than that? If it's line numbers it might be clearer to call this select_lines.
There was a problem hiding this comment.
select_record_indices maybe? It's not pulling out lines/records themselves, just indices to them.
src/sctools/gtf.py
Outdated
| to create records specific to exons, transcripts, and genes | ||
| """ | ||
|
|
||
| __slots__ = ['_fields', '_attribute'] |
There was a problem hiding this comment.
Should this be _attributes plural? Looks like it's a dict of k/v pairs below.
src/sctools/gtf.py
Outdated
| """ | ||
| fields = record.strip(';\n').split('\t') | ||
| self._fields = fields[:8] | ||
| self._attribute = { |
There was a problem hiding this comment.
Worth a comment explaining the parsing that's happening here, and/or a short example of what fields[8] looks like.
src/sctools/gtf.py
Outdated
| :param key: item to access | ||
| :return str: value of item | ||
| """ | ||
| try: |
There was a problem hiding this comment.
you could replace 94-97 with return self._attribute.get(key)
| def size(self): | ||
| size = self.end - self.start | ||
| if size < 0: | ||
| raise ValueError('invalid record: negative size %d (start > end)' % size) |
There was a problem hiding this comment.
Why not calculate this in __init__ and throw an error there? That would protect against creation of invalid records. Or is there value in being able to parse invalid ones?
There was a problem hiding this comment.
Actually it's more about limiting unnecessary processing of records; I wanted to maximize speed here and so do no parsing of the records until the data is specifically requested.
Often only a few fields of the record are of interest, or a particular type of record. For example, often times on gene records are wanted, and therefore there is no value in processing the much more frequent exon or transcript records.
I grudgingly added the variable field parsing to init (field[8:], which you correctly wanted an explanation for) because it contains gene id information that is normally of interest.
There was a problem hiding this comment.
Added explanation in docstring.
| super().__init__(files, mode, header_comment_char) # has different default args from super | ||
|
|
||
| def __iter__(self): | ||
| for line in super().__iter__(): |
There was a problem hiding this comment.
Would for line in super(): accomplish the same thing?
There was a problem hiding this comment.
Actually I take that back, I like being explicit here.
src/sctools/fastq.py
Outdated
| :param [str|bytes] record: list of four strings or bytes objects defining a | ||
| single fastq record | ||
| """ | ||
| self._data = list(record) |
There was a problem hiding this comment.
Is there a special reason why we are making a new list rather than using record as is, or are you just being careful? If there's a special reason, would be worth a comment.
There was a problem hiding this comment.
My docstring in line 21 is inaccurate; it should say "iterable" instead of list. This will often be read as a tuple. However, I require the ability to change the record, and so need it to be mutable. Hence the list conversion. Good catch.
|
|
||
| @name.setter | ||
| def name(self, value): | ||
| if not isinstance(value, (bytes, str)): |
There was a problem hiding this comment.
Currently you can create a record with an invalid name type but setting name to an invalid type after the fact throws an error. Would it be worth having __init__ call the setters so that this error is also thrown at record creation time? Or is there a reason why you would want to allow invalid records to be constructed?
There was a problem hiding this comment.
I can't think of a value for invalid records. I don't think calling the setters at __init__ time will introduce a significant overhead, so I'll switch to this. Thanks for the suggestion. 👍
|
|
||
| @name2.setter | ||
| def name2(self, value): | ||
| if not isinstance(value, (bytes, str)): |
There was a problem hiding this comment.
Good to have these constraints. In addition to restricting types, are there other constraints that would be worth adding? Do you frequently encounter fields with mangled format that you would want to reject for example?
There was a problem hiding this comment.
Yea; technically they are supposed to start with '@'. I can add this in.
There was a problem hiding this comment.
That said, I want to be careful to balance validation with speed; all quality strings should contain only ASCII letters and the name field has a specific structure defined by the sequencer that generates the fastq file. However, these are expensive to validate and might be better served by external validation tools (or expanded later, if we find we want to apply tools that require the structures of those fields to be intact/correct"
src/sctools/fastq.py
Outdated
| self._data[3] = value | ||
|
|
||
| def __bytes__(self): | ||
| try: |
There was a problem hiding this comment.
Do you expect _data to usually be in bytes, and/or do you prefer to optimize speed for the bytes case, rather than for string? That appears to be the choice you're making here, just want to confirm.
There was a problem hiding this comment.
Would it be worth moving implementation of __bytes__ and __str__ into the BytesRecord and StrRecord classes, so you don't have to try catch?
There was a problem hiding this comment.
I expect it to almost always be bytes (and it will always be bytes for our pipelines), as
it's faster to convert from bytes to 2-bit encodings; which you will see later in the package.
I used to have these inside the respective classes as you're suggesting. The reason I moved away from it was essentially just to reduce the amount of code in the package. My logic was that anyone using strings no longer cares about speed, and therefore it's OK to add small inefficiencies for string records.
I am very willing to move it back if you feel that would be better!
There was a problem hiding this comment.
.. I will also make a note to be clear bytes is the preferred implementation.
There was a problem hiding this comment.
It's such a small amount of code and you have to have it somewhere - seems a little clearer to put it in BytesRecord and StrRecord.
There was a problem hiding this comment.
Adjusted to make bytes and str both 1st class. StrRecord now subclasses Record, which operates on bytes. This should also indicate to the user (subtly) that bytes is the preferred implementation.
src/sctools/fastq.py
Outdated
|
|
||
| def average_quality(self): | ||
| """return the average quality of this record""" | ||
| return sum(c for c in self.quality[:-1]) / (len(self.quality[:-1]) - 1) - 33 |
There was a problem hiding this comment.
Why minus 33? A comment might be helpful here.
There was a problem hiding this comment.
Absolutely correct. It's because of an old solexa/illumina conversion.
src/sctools/fastq.py
Outdated
| :return int: mean | ||
| :return float: standard deviation | ||
| """ | ||
| pass # implement |
There was a problem hiding this comment.
Maybe better to return NotImplementedError here like you do elsewhere. Or in this case, maybe just take out this function.
There was a problem hiding this comment.
I will remove it. Thanks.
src/sctools/bam.py
Outdated
| chromosome = str(chromosome) # try to convert | ||
| if chromosome not in valid_chromosomes: | ||
| raise ValueError('chromsome %s not valid. Must be one of %r' % | ||
| (chromosome, valid_chromosomes)) |
There was a problem hiding this comment.
Do you mean to explicitly limit this to GRCh37/38? It will break for hg19 and b37.
There was a problem hiding this comment.
No; I should expand it to accept chr19 type chromosomes. Thanks for the comment 👍
src/sctools/bam.py
Outdated
| for i, record in enumerate(fin): | ||
|
|
||
| if not record.is_unmapped: # record is mapped | ||
| if chromosome in record.reference_name and specific < n_specific: |
There was a problem hiding this comment.
Is there a chance of a if "1" in "15" or something like that here?
There was a problem hiding this comment.
I've only used this for chromosomes 19 and 21; I think you're right. I'll adjust it to use ==.
I was, ironically, trying to support the chr19 vs 19 problem you identify above but there are ways to get that without this bug :)
src/sctools/bam.py
Outdated
| if chromosome in record.reference_name and specific < n_specific: | ||
| chromosome_indices.append(i) | ||
| specific += 1 | ||
| elif nonspecific < include_other: |
There was a problem hiding this comment.
It seems like other_indices is meant to point to reads that aren't aligned to chromosome, but here it can also include reads aligned to chromosome if there are more than n_specific such reads. If that's intentional, it might be helpful to update the documentation.
There was a problem hiding this comment.
Not intentional, thanks for pointing this out.
src/sctools/barcode.py
Outdated
|
|
||
| def __init__(self, barcode_set, barcode_length): | ||
| """ | ||
| :param Counter barcode_set: dictionary |
There was a problem hiding this comment.
The name barcode_set could make people think it should be a Set rather than a Mapping.
There was a problem hiding this comment.
True; I'll make this.. barcodes
src/sctools/barcode.py
Outdated
| 'median': distances[int(len(distances) * .5)], | ||
| 'average': sum(distances) / len(distances), | ||
| '75th percentile': distances[int(len(distances) * .75)], | ||
| 'maximum': distances[-1] |
There was a problem hiding this comment.
Any reason not to use numpy.percentile for these? I think we've already got numpy, and that'll return correct values.
There was a problem hiding this comment.
I was trying to avoid using numpy in the package but eventually capitulated for the entropy calculations. I'll switch it in.
src/sctools/encodings.py
Outdated
| except KeyError: | ||
| if byte != 78: | ||
| raise | ||
| return random.randint(0, 4) |
There was a problem hiding this comment.
This can return a 4. I think you want randrange.
Also, there are lots of ways to indicate ambiguous bases other than "N".
There was a problem hiding this comment.
In python 3 randint's upper bound is exclusive.
You're right about the other codes. I guess there is no reason not to include the full IUPAC list; I have only ever seen N ambiguous nucleotides in fastq sequence, which was the main purpose here.
👍 for generalizability, however.
There was a problem hiding this comment.
I'm committed to python 2.7 forever, so I'm not an expert on this, but I think randint is inclusive in python3: https://docs.python.org/3/library/random.html#random.randint
There was a problem hiding this comment.
Oh I'm sorry, i was assuming I was using numpy here, which is exclusive. I even went and tested the numpy version >.<
You're completely correct. I'll change this.
src/sctools/encodings.py
Outdated
| class ThreeBitEncodingMap: | ||
|
|
||
| # C: 1, A: 2, G: 3, T: 4, N: 6; # note, not using 0 | ||
| map_ = {65: 2, 97: 2, 67: 1, 99: 1, 71: 3, 103: 3, 84: 4, 116: 4, 78: 6, 110: 6} |
There was a problem hiding this comment.
Would it maybe be clearer to say ord("A") rather than 65?
There was a problem hiding this comment.
Yes; that's wonderful. I've been struggling with how to write this clearly.
src/sctools/reader.py
Outdated
| """ | ||
| # iterators = [iter(r) for r in readers] | ||
| if indices: | ||
| iterators = zip(*[r.select_indices(indices) for r in readers]) |
There was a problem hiding this comment.
Do you want to use the generator syntax here? I'm not actually sure if it makes a difference.
There was a problem hiding this comment.
Do you mean iterators = zip(*(r.select_indices(indices) for r in readers))?
I'll make sure it doesn't break, but yea; that would probably be more memory efficient.
There was a problem hiding this comment.
Yeah. I mean it probably has no real effect, but you're so careful about using iterators everywhere else.
There was a problem hiding this comment.
No, I think this could really matter at scale. This is a good catch. 👍
| """ | ||
|
|
||
| @staticmethod | ||
| def record_grouper(iterable): |
There was a problem hiding this comment.
I understand this function better after talking in person. A comment about its purpose would probably be helpful, here or maybe in the class docstring.
There was a problem hiding this comment.
added docstring to function.
src/sctools/fastq.py
Outdated
| """ | ||
| :param FastqRecord record: record to extract from | ||
| :param Tag tag: defines tag to extract | ||
| :return tuple: |
There was a problem hiding this comment.
Could use a more detailed comment about what's returned
src/sctools/fastq.py
Outdated
|
|
||
|
|
||
| # this could easily be expanded beyond fastq generated tags, we just don't have any use cases yet. | ||
| Tag = namedtuple('Tag', ['start', 'end', 'sequence_tag', 'quality_tag']) |
There was a problem hiding this comment.
I think I mostly understand now, but it was not clear to me at first what a fastq tag is and what TagGenerator was doing. More descriptive comments about what Tag and TagGenerator are for would be helpful.
There was a problem hiding this comment.
I feel like there is one too many levels of abstraction in this set of functions; this is what I was most concerned about being unclear. I'll put some time into improving the documentation and also maybe the code.
src/sctools/encodings.py
Outdated
| class ThreeBitEncodingMap: | ||
|
|
||
| # C: 1, A: 2, G: 3, T: 4, N: 6; # note, not using 0 | ||
| map_ = {65: 2, 97: 2, 67: 1, 99: 1, 71: 3, 103: 3, 84: 4, 116: 4, 78: 6, 110: 6} |
There was a problem hiding this comment.
Hmm, unless it would bog down performance, what about creating the map_ this way?
{ord('A'): 2, ord('a'): 2, ... }
That way you are documenting the mapping in code rather than a comment, and less chance for a typo to introduce an error.
Or you could have raw_map = {'A': 2, 'a': 2, ... } and then process that to create map_ = {65: 2, ... }
There was a problem hiding this comment.
Markus made the same (really good) suggestion. I'm going to change this; much much clearer the way you're both suggesting.
src/sctools/encodings.py
Outdated
| except KeyError: | ||
| if byte != 78: | ||
| raise | ||
| return random.randint(0, 4) |
There was a problem hiding this comment.
This returns 0 to 4 inclusive, but I think you meant 0-3. random.randrange(0, 4) or random.randint(0, 3)
There was a problem hiding this comment.
Correct; I converted from numpy but forgot that numpy.randint and random.randint are different.
src/sctools/barcode.py
Outdated
| for i in reversed(range(self._barcode_length)): | ||
| binary_base_representations, counts = np.unique(keys & 3, return_counts=True) | ||
| if weighted: # todo weighted not working, values multiplication does not work | ||
| base_counts_by_position[i, binary_base_representations] = counts # * values |
There was a problem hiding this comment.
Maybe this should throw a NotImplementedError, or else take out this option altogether for now.
There was a problem hiding this comment.
it should throw a NotImplementedError or I can extract it to another open branch and add an issue. I think this will be useful, but we can drop it from this upload as I haven't figured out how to do this efficiently, yet.
src/sctools/barcode.py
Outdated
|
|
||
| def __init__(self, barcode_set, barcode_length): | ||
| """ | ||
| :param Counter barcode_set: dictionary |
There was a problem hiding this comment.
Would be helpful to indicate that this is a dict of barcodes to counts. Do you need it to be a collections.Counter specifically, or any dict of barcodes to counts?
There was a problem hiding this comment.
I think any mapping to counts is probably Ok; i'll verify and make explicit.
| def from_whitelist(cls, file_, barcode_length): | ||
| """Creates a barcode set from a whitelist file | ||
|
|
||
| :param str file_: location of the whitelist file. Should be formatted one barcode per line. |
There was a problem hiding this comment.
It looks like the barcodes are expected to be strings like 'ACGT', not two bit or three bit encoded, etc. Would be helpful to specify that or give an example barcode string, since elsewhere in this package we expect two bit or three bit encoded strings.
There was a problem hiding this comment.
Added comment in file_ docstring.
src/sctools/barcode.py
Outdated
| def from_iterable_strings(cls, iterable, barcode_length): | ||
| """construct an ObservedBarcodeSet from an iterable of string barcodes""" | ||
| tbe = TwoBit(barcode_length) | ||
| return cls(Counter(tbe.encode(b) for b in iterable)) |
There was a problem hiding this comment.
Don't you need to include barcode_length as a param to cls? Same for line 128.
| if not isinstance(barcode_set, Mapping): | ||
| raise TypeError('barcode set must be a dict-like object mapping barcodes to counts') | ||
| self._data = barcode_set | ||
| self._barcode_length = barcode_length |
There was a problem hiding this comment.
Should we throw an error if barcode_length is None?
src/sctools/barcode.py
Outdated
| return base4_entropy(self.base_frequency(weighted=weighted)) | ||
|
|
||
|
|
||
| class PriorBarcodeSet(BarcodeBase): |
There was a problem hiding this comment.
I'm not clear on why two classes are needed here, rather than three factory methods that create instances of BarcodeBase from a file, from an iterable of strings, and an iterable of bit encoded strings. Once you've constructed them, they share all of their code in BarcodeBase.
There was a problem hiding this comment.
Good idea. I might have some questions about implementation here, but I'll make this change.
src/sctools/platform.py
Outdated
| 'using picard FastqToSam') | ||
| parser.add_argument('-o', '--output-bamfile', required=True, | ||
| help='filename for tagged bam') | ||
| args = vars(parser.parse_args()) |
There was a problem hiding this comment.
why convert to a dict here? Isn't it easier to use args.u2 etc below instead of args['u2']?
There was a problem hiding this comment.
For testing; it's hard to create namespace objects but easy to create keyword dictionaries; I'll make it clear that args can be passed that way in the docstring.
src/sctools/platform.py
Outdated
| } | ||
|
|
||
| @classmethod | ||
| def get_tag(cls, sequencing_read): |
There was a problem hiding this comment.
Should this be get_tags plural? Returns a tuple of tags. Or get_tag_set.
src/sctools/platform.py
Outdated
| """ | ||
| tag_generators = [] | ||
| for k, v in files_with_tags.items(): | ||
| tag_generators.append(fastq.TagGenerator(cls.get_tag(k), files=v)) |
There was a problem hiding this comment.
The docstring says the key is the filename and the value is the tag, opposite of what is happening here.
| self._file = alignment_file | ||
| self._open_mode = open_mode | ||
|
|
||
| def indices_by_chromosome(self, n_specific, chromosome, include_other=0): |
There was a problem hiding this comment.
Maybe call it include_unaligned instead of include_other. Would it make sense to interpret -1 to mean include all unaligned reads? Is that a useful thing to be able to do?
There was a problem hiding this comment.
Oh I see, this is not just unaligned reads but any read that does not match the chromosome we asked for.
There was a problem hiding this comment.
i'll find a way to clarify; sounds like it produced some confusion.
There was a problem hiding this comment.
Adjusted docstring:
"""
Return the list of first n_specific indices of reads aligned to selected chromosome.
If desired, will also return non-specific indices in a second list (can serve as negative
control reads).
:param int n_specific: number of aligned reads to return indices for
:param str chromosome: only reads from this chromosome are considered valid
:param int include_other: optional, (default=0), the number of reads to include that are
NOT aligned to chromosome (could be aligned or unaligned read)
:return [int]: list of indices to reads aligning to chromosome
:return [int]: list of indices to reads NOT aligning to chromosome, only returned if
include_other is not 0.
"""
src/sctools/bam.py
Outdated
| :param [fastq.TagGenerator] tag_generators: generators that yield Tag objects | ||
| (see fastq.Tag) | ||
| """ | ||
| inbam = pysam.AlignmentFile(self.bam_file, 'rb', check_sq=False) |
There was a problem hiding this comment.
How about using with pysam.AlignmentFile the way you do on line 57?
|
|
||
| with pysam.AlignmentFile(self._file, self._open_mode) as fin: | ||
| specific, nonspecific = 0, 0 # counters | ||
| chromosome = str(chromosome) |
There was a problem hiding this comment.
You already converted to string on 52.
There was a problem hiding this comment.
This is a bit different; here I'm being lenient in case the user passed an int as chromosome. On 52 I was building up the set of allowable chromosomes.
| chromosome_indices = [] | ||
| other_indices = [] | ||
|
|
||
| for i, record in enumerate(fin): |
There was a problem hiding this comment.
It feels like some of the contents here, especially the if else block would be worth moving out into its own function that could be unit tested on its own.
There was a problem hiding this comment.
I simplified the code. Since each list hashes its length, I don't need the counters; I can just test the length of chromosome_indices and other_indices directly.
However, removing this into its own function would only serve to separate the type/value checking from the index extraction; I think this belongs where it is, but hope that the code simplification makes it clearer and reduces your desire to extract the code.
dshiga
left a comment
There was a problem hiding this comment.
I really like how you've written this, very clear and concise and well documented! I made a bunch of comments on particular lines of code. Once you've made new commits, let me know and I'll take another look.
|
Thanks very much @dshiga. I'll begin addressing your comments shortly! |
src/sctools/bam.py
Outdated
| """ | ||
|
|
||
| # acceptable chromosomes | ||
| valid_chromosomes = [str(i) for i in range(1, 23)] + ['M', 'X', 'Y'] |
There was a problem hiding this comment.
I believe "MT" is also used to represent mitochrondia too.
src/sctools/bam.py
Outdated
|
|
||
| if len(chromosome_indices) < n_specific or len(other_indices) < include_other: | ||
| warnings.warn('Only %d unaligned and %d reads aligned to chromosome %s were found in' | ||
| '%s' % (len(chromosome_indices), len(other_indices), |
There was a problem hiding this comment.
I think you've swapped chromosome_indices and other_indices.
src/sctools/encodings.py
Outdated
| try: | ||
| return self.map_[byte] | ||
| except KeyError: | ||
| if byte != 78: |
There was a problem hiding this comment.
Just to keep everything consistent, you might want byte in (ord("N"), ord("n"))
| def __getitem__(self, item): | ||
| return self._data[item] | ||
|
|
||
| def summarize_hamming_distances(self): |
There was a problem hiding this comment.
Could you add a little test of this method?
|
Marcus' most recent comment revealed some non-functional, untested code. I ran a coverage report and added tests to everything I thought important to test. Most of the things that are not currently tested are errors that are supposed to be thrown when the classes receive incorrect input. If anyone's interested, please take a look at the attached report and let me know if you'd like tests added for anything that's currently not covered. If not, I'll merge this before the demo tomorrow at around 1~10:45a. |
* first draft for generalizing attaching barcodes * updated generalized class * Fixed PR #1 * styling comment changes * styling comment changes * styling comment changes * barcode end postion bug * Fixed PR #2 * Fixed PR #2, made arg validation more human readable * add comments/documentation to the class * updated comments/documentation style for return values * Fixed PR #57 comments * Updated doc strings, error handling * provided descriptions for input files and purpose of class * updated class description
Initial upload of the
scsequtilsclass, converted and uploaded to humancellatlas assctoolsI put together a basic readme of the main package classes and their use, as well as the command line argument(s) (currently only one) that I'm using for optimus.
The main functions are in the
sctoolspackage, and tests for each module are insctools/testThere are quite a few testing files whose size I have tried to keep small; there is no need to review any code in
sctools/test/data/