initial upload by ambrosejcarr · Pull Request #2 · HumanCellAtlas/sctools

ambrosejcarr · 2017-10-31T18:22:40Z

Initial upload of the scsequtils class, converted and uploaded to humancellatlas as sctools

I put together a basic readme of the main package classes and their use, as well as the command line argument(s) (currently only one) that I'm using for optimus.

The main functions are in the sctools package, and tests for each module are in sctools/test

There are quite a few testing files whose size I have tried to keep small; there is no need to review any code in sctools/test/data/

dshiga · 2017-11-02T20:16:21Z

setup.py

+CLASSIFIERS = [
+    "Development Status :: 4 - Beta",
+    "Natural Language :: English",
+    "License :: OSI Approved :: GNU General Public License v2 (GPLv2)",


The license checked into the repo is MIT. Other Mint repos are BSD-3 so would prefer to use that for consistency, unless there's a good reason not to.

No reason. I'll switch it.

dshiga · 2017-11-02T20:21:50Z

src/sctools/bam.py

+
+    def tag(self, output_bam_name, tag_generators):
+        """
+        given a bam file and tag generators in the same order, adds tags to the .bam file,


what do you mean by "in the same order" here? same order as something in the bam file?

Correct. All files need equivalent sort order. I'll make the doc string clearer.

dshiga · 2017-11-02T20:34:42Z

src/sctools/bam.py

+        outbam = pysam.AlignmentFile(output_bam_name, 'wb', header=inbam.header)
+        try:
+            # zip up all the iterators
+            for *tag_sets, sam_record in zip(*tag_generators, inbam):


I'm not familiar with pysam. I'm assuming this does not pull all reads in the bam into memory at once, although naively that's that it looks like. Can you confirm?

Confirmed; it's an iterator, and pysam is the python wrapper for htslib.

This line is zipping up three iterators that are lazily read. You get 2 fastq records (8 lines) and one sam line per iteration.

dshiga · 2017-11-08T20:36:26Z

src/sctools/reader.py

+
+    :return Iterator: iterator over tuples of records, one from each passed Reader object.
+    """
+    # iterators = [iter(r) for r in readers]


Is this commented out line needed?

Nope, will remove.

dshiga · 2017-11-08T20:48:29Z

src/sctools/reader.py

+
+
+def zip_readers(*readers, indices=None):
+    """zip together multiple fastq objects, yielding records simultaneously.


This is actually a more general function, right? It doesn't have to be a fastq object, just a Reader.

Good point. I'll edit the docstring!

dshiga · 2017-11-08T20:57:41Z

setup.py

+    install_requires=[
+        'pysam',
+        'numpy',
+        'google-cloud',


This brings in a huge web of dependencies. We were doing the same in secondary-analysis but changed it to install a more targeted set of google libraries. (We ran into an issue with one of the many packages that google-cloud installs, which it turned out we didn't actually need.) This also makes it a lot faster to install.

Ah, very cool. Do you think it would be adequate to pull the same set of libraries you're using in the secondary-analysis repo?

Not sure, depends which actual google modules you're using. Are you using it just for reading from buckets? In that case you probably just need google-cloud-storage.

I'll verify, but I think that's all.

dshiga · 2017-11-08T21:12:04Z

src/sctools/reader.py

+        return the length of the Reader object. this should typically not be used with sys.stdin,
+        as it will consume the input
+        """
+        return sum(1 for _ in self)


Would be helpful to clarify that this gives the sum of line counts for all files in Reader and requires fully reading all files.

dshiga · 2017-11-08T21:17:29Z

src/sctools/reader.py

+        for file_ in self._files:
+
+            # set correct mode
+            if self._mode == 'r' and any(file_.endswith(s) for s in ['.gz', '.bz2']):


Why do we treat this as a special case? We already set mode = 'r' when self._mode == 'r' by default in the else. In this special case we set it to 'rt' instead. And according to this, r and rt are the same thing:
https://stackoverflow.com/questions/23051062/open-files-in-rt-and-wt-modes

the gzip and bzip2 libraries do make a distinction between r and rt; r means bytes, by default for these packages, which is annoying/confusing. This line fixes it so r means str and rb means bytes across the reader, which is, I think, normally what the user expects.

https://docs.python.org/3/library/gzip.html#gzip.open

... if they have proper extensions. I will look into a better way of detecting compressing type.

Okay, in that case I think what you have is perfectly reasonable here. Maybe just add a comment explaining that rt really does differ from r in this case.

dshiga · 2017-11-08T21:28:49Z

src/sctools/reader.py

+        """return the collective size of all files being read in bytes"""
+        return sum(os.stat(f).st_size for f in self._files)
+
+    def select_indices(self, indices):


Indices is a set of line numbers that we want to filter down to, right? Or is it more general than that? If it's line numbers it might be clearer to call this select_lines.

select_record_indices maybe? It's not pulling out lines/records themselves, just indices to them.

dshiga · 2017-11-09T14:46:22Z

src/sctools/gtf.py

+    to create records specific to exons, transcripts, and genes
+    """
+
+    __slots__ = ['_fields', '_attribute']


Should this be _attributes plural? Looks like it's a dict of k/v pairs below.

dshiga · 2017-11-09T15:05:56Z

src/sctools/gtf.py

+        """
+        fields = record.strip(';\n').split('\t')
+        self._fields = fields[:8]
+        self._attribute = {


Worth a comment explaining the parsing that's happening here, and/or a short example of what fields[8] looks like.

dshiga · 2017-11-09T15:11:00Z

src/sctools/gtf.py

+        :param key: item to access
+        :return str: value of item
+        """
+        try:


you could replace 94-97 with return self._attribute.get(key)

dshiga · 2017-11-09T15:12:47Z

src/sctools/gtf.py

+    def size(self):
+        size = self.end - self.start
+        if size < 0:
+            raise ValueError('invalid record: negative size %d (start > end)' % size)


Why not calculate this in __init__ and throw an error there? That would protect against creation of invalid records. Or is there value in being able to parse invalid ones?

Actually it's more about limiting unnecessary processing of records; I wanted to maximize speed here and so do no parsing of the records until the data is specifically requested.

Often only a few fields of the record are of interest, or a particular type of record. For example, often times on gene records are wanted, and therefore there is no value in processing the much more frequent exon or transcript records.

I grudgingly added the variable field parsing to init (field[8:], which you correctly wanted an explanation for) because it contains gene id information that is normally of interest.

Added explanation in docstring.

dshiga · 2017-11-09T15:14:57Z

src/sctools/gtf.py

+        super().__init__(files, mode, header_comment_char)  # has different default args from super
+
+    def __iter__(self):
+        for line in super().__iter__():


Would for line in super(): accomplish the same thing?

Actually I take that back, I like being explicit here.

left unchanged.

dshiga · 2017-11-09T15:42:05Z

src/sctools/fastq.py

+        :param [str|bytes] record: list of four strings or bytes objects defining a
+          single fastq record
+        """
+        self._data = list(record)


Is there a special reason why we are making a new list rather than using record as is, or are you just being careful? If there's a special reason, would be worth a comment.

My docstring in line 21 is inaccurate; it should say "iterable" instead of list. This will often be read as a tuple. However, I require the ability to change the record, and so need it to be mutable. Hence the list conversion. Good catch.

dshiga · 2017-11-09T15:47:52Z

src/sctools/fastq.py

+
+    @name.setter
+    def name(self, value):
+        if not isinstance(value, (bytes, str)):


Currently you can create a record with an invalid name type but setting name to an invalid type after the fact throws an error. Would it be worth having __init__ call the setters so that this error is also thrown at record creation time? Or is there a reason why you would want to allow invalid records to be constructed?

I can't think of a value for invalid records. I don't think calling the setters at __init__ time will introduce a significant overhead, so I'll switch to this. Thanks for the suggestion. 👍

dshiga · 2017-11-09T15:51:21Z

src/sctools/fastq.py

+
+    @name2.setter
+    def name2(self, value):
+        if not isinstance(value, (bytes, str)):


Good to have these constraints. In addition to restricting types, are there other constraints that would be worth adding? Do you frequently encounter fields with mangled format that you would want to reject for example?

Yea; technically they are supposed to start with '@'. I can add this in.

That said, I want to be careful to balance validation with speed; all quality strings should contain only ASCII letters and the name field has a specific structure defined by the sequencer that generates the fastq file. However, these are expensive to validate and might be better served by external validation tools (or expanded later, if we find we want to apply tools that require the structures of those fields to be intact/correct"

dshiga · 2017-11-09T18:27:09Z

src/sctools/fastq.py

+            self._data[3] = value
+
+    def __bytes__(self):
+        try:


Do you expect _data to usually be in bytes, and/or do you prefer to optimize speed for the bytes case, rather than for string? That appears to be the choice you're making here, just want to confirm.

Would it be worth moving implementation of __bytes__ and __str__ into the BytesRecord and StrRecord classes, so you don't have to try catch?

I expect it to almost always be bytes (and it will always be bytes for our pipelines), as
it's faster to convert from bytes to 2-bit encodings; which you will see later in the package.

I used to have these inside the respective classes as you're suggesting. The reason I moved away from it was essentially just to reduce the amount of code in the package. My logic was that anyone using strings no longer cares about speed, and therefore it's OK to add small inefficiencies for string records.

I am very willing to move it back if you feel that would be better!

.. I will also make a note to be clear bytes is the preferred implementation.

It's such a small amount of code and you have to have it somewhere - seems a little clearer to put it in BytesRecord and StrRecord.

Adjusted to make bytes and str both 1st class. StrRecord now subclasses Record, which operates on bytes. This should also indicate to the user (subtly) that bytes is the preferred implementation.

dshiga · 2017-11-09T18:32:04Z

src/sctools/fastq.py

+
+    def average_quality(self):
+        """return the average quality of this record"""
+        return sum(c for c in self.quality[:-1]) / (len(self.quality[:-1]) - 1) - 33


Why minus 33? A comment might be helpful here.

Absolutely correct. It's because of an old solexa/illumina conversion.

dshiga · 2017-11-09T18:56:03Z

src/sctools/fastq.py

+        :return int: mean
+        :return float: standard deviation
+        """
+        pass  # implement


Maybe better to return NotImplementedError here like you do elsewhere. Or in this case, maybe just take out this function.

I will remove it. Thanks.

mckinsel · 2017-11-09T17:55:29Z

src/sctools/bam.py

+            chromosome = str(chromosome)  # try to convert
+        if chromosome not in valid_chromosomes:
+            raise ValueError('chromsome %s not valid. Must be one of %r' %
+                             (chromosome, valid_chromosomes))


Do you mean to explicitly limit this to GRCh37/38? It will break for hg19 and b37.

No; I should expand it to accept chr19 type chromosomes. Thanks for the comment 👍

mckinsel · 2017-11-09T17:55:55Z

src/sctools/bam.py

+            for i, record in enumerate(fin):
+
+                if not record.is_unmapped:  # record is mapped
+                    if chromosome in record.reference_name and specific < n_specific:


Is there a chance of a if "1" in "15" or something like that here?

I've only used this for chromosomes 19 and 21; I think you're right. I'll adjust it to use ==.

I was, ironically, trying to support the chr19 vs 19 problem you identify above but there are ways to get that without this bug :)

mckinsel · 2017-11-09T18:03:30Z

src/sctools/bam.py

+                    if chromosome in record.reference_name and specific < n_specific:
+                        chromosome_indices.append(i)
+                        specific += 1
+                    elif nonspecific < include_other:


It seems like other_indices is meant to point to reads that aren't aligned to chromosome, but here it can also include reads aligned to chromosome if there are more than n_specific such reads. If that's intentional, it might be helpful to update the documentation.

Not intentional, thanks for pointing this out.

mckinsel · 2017-11-09T18:11:37Z

src/sctools/barcode.py

+
+    def __init__(self, barcode_set, barcode_length):
+        """
+        :param Counter barcode_set: dictionary


The name barcode_set could make people think it should be a Set rather than a Mapping.

True; I'll make this.. barcodes

mckinsel · 2017-11-09T18:14:44Z

src/sctools/barcode.py

+            'median': distances[int(len(distances) * .5)],
+            'average': sum(distances) / len(distances),
+            '75th percentile': distances[int(len(distances) * .75)],
+            'maximum': distances[-1]


Any reason not to use numpy.percentile for these? I think we've already got numpy, and that'll return correct values.

I was trying to avoid using numpy in the package but eventually capitulated for the entropy calculations. I'll switch it in.

mckinsel · 2017-11-09T18:41:31Z

src/sctools/encodings.py

+            except KeyError:
+                if byte != 78:
+                    raise
+                return random.randint(0, 4)


This can return a 4. I think you want randrange.

Also, there are lots of ways to indicate ambiguous bases other than "N".

In python 3 randint's upper bound is exclusive.

You're right about the other codes. I guess there is no reason not to include the full IUPAC list; I have only ever seen N ambiguous nucleotides in fastq sequence, which was the main purpose here.

👍 for generalizability, however.

I'm committed to python 2.7 forever, so I'm not an expert on this, but I think randint is inclusive in python3: https://docs.python.org/3/library/random.html#random.randint

Oh I'm sorry, i was assuming I was using numpy here, which is exclusive. I even went and tested the numpy version >.<

You're completely correct. I'll change this.

mckinsel · 2017-11-09T18:49:42Z

src/sctools/encodings.py

+    class ThreeBitEncodingMap:
+
+        # C: 1, A: 2, G: 3, T: 4, N: 6;  # note, not using 0
+        map_ = {65: 2, 97: 2, 67: 1, 99: 1, 71: 3, 103: 3, 84: 4, 116: 4, 78: 6, 110: 6}


Would it maybe be clearer to say ord("A") rather than 65?

Yes; that's wonderful. I've been struggling with how to write this clearly.

mckinsel · 2017-11-09T19:00:38Z

src/sctools/reader.py

+    """
+    # iterators = [iter(r) for r in readers]
+    if indices:
+        iterators = zip(*[r.select_indices(indices) for r in readers])


Do you want to use the generator syntax here? I'm not actually sure if it makes a difference.

Do you mean iterators = zip(*(r.select_indices(indices) for r in readers))?

I'll make sure it doesn't break, but yea; that would probably be more memory efficient.

Yeah. I mean it probably has no real effect, but you're so careful about using iterators everywhere else.

No, I think this could really matter at scale. This is a good catch. 👍

dshiga · 2017-11-09T19:39:00Z

src/sctools/fastq.py

+    """
+
+    @staticmethod
+    def record_grouper(iterable):


I understand this function better after talking in person. A comment about its purpose would probably be helpful, here or maybe in the class docstring.

added docstring to function.

dshiga · 2017-11-09T19:44:56Z

src/sctools/fastq.py

+        """
+        :param FastqRecord record: record to extract from
+        :param Tag tag: defines tag to extract
+        :return tuple:


Could use a more detailed comment about what's returned

dshiga · 2017-11-09T20:04:05Z

src/sctools/fastq.py

+
+
+# this could easily be expanded beyond fastq generated tags, we just don't have any use cases yet.
+Tag = namedtuple('Tag', ['start', 'end', 'sequence_tag', 'quality_tag'])


I think I mostly understand now, but it was not clear to me at first what a fastq tag is and what TagGenerator was doing. More descriptive comments about what Tag and TagGenerator are for would be helpful.

I feel like there is one too many levels of abstraction in this set of functions; this is what I was most concerned about being unclear. I'll put some time into improving the documentation and also maybe the code.

dshiga · 2017-11-09T20:14:38Z

src/sctools/encodings.py

+    class ThreeBitEncodingMap:
+
+        # C: 1, A: 2, G: 3, T: 4, N: 6;  # note, not using 0
+        map_ = {65: 2, 97: 2, 67: 1, 99: 1, 71: 3, 103: 3, 84: 4, 116: 4, 78: 6, 110: 6}


Hmm, unless it would bog down performance, what about creating the map_ this way?
{ord('A'): 2, ord('a'): 2, ... }
That way you are documenting the mapping in code rather than a comment, and less chance for a typo to introduce an error.
Or you could have raw_map = {'A': 2, 'a': 2, ... } and then process that to create map_ = {65: 2, ... }

Markus made the same (really good) suggestion. I'm going to change this; much much clearer the way you're both suggesting.

dshiga · 2017-11-09T20:41:31Z

src/sctools/encodings.py

+            except KeyError:
+                if byte != 78:
+                    raise
+                return random.randint(0, 4)


This returns 0 to 4 inclusive, but I think you meant 0-3. random.randrange(0, 4) or random.randint(0, 3)

Correct; I converted from numpy but forgot that numpy.randint and random.randint are different.

dshiga · 2017-11-09T21:03:46Z

src/sctools/barcode.py

+        for i in reversed(range(self._barcode_length)):
+            binary_base_representations, counts = np.unique(keys & 3, return_counts=True)
+            if weighted:  # todo weighted not working, values multiplication does not work
+                base_counts_by_position[i, binary_base_representations] = counts  # * values


Maybe this should throw a NotImplementedError, or else take out this option altogether for now.

it should throw a NotImplementedError or I can extract it to another open branch and add an issue. I think this will be useful, but we can drop it from this upload as I haven't figured out how to do this efficiently, yet.

dshiga · 2017-11-09T21:07:35Z

src/sctools/barcode.py

+
+    def __init__(self, barcode_set, barcode_length):
+        """
+        :param Counter barcode_set: dictionary


Would be helpful to indicate that this is a dict of barcodes to counts. Do you need it to be a collections.Counter specifically, or any dict of barcodes to counts?

I think any mapping to counts is probably Ok; i'll verify and make explicit.

dshiga · 2017-11-09T21:10:16Z

src/sctools/barcode.py

+    def from_whitelist(cls, file_, barcode_length):
+        """Creates a barcode set from a whitelist file
+
+        :param str file_: location of the whitelist file. Should be formatted one barcode per line.


It looks like the barcodes are expected to be strings like 'ACGT', not two bit or three bit encoded, etc. Would be helpful to specify that or give an example barcode string, since elsewhere in this package we expect two bit or three bit encoded strings.

Added comment in file_ docstring.

dshiga · 2017-11-09T21:24:11Z

src/sctools/barcode.py

+    def from_iterable_strings(cls, iterable, barcode_length):
+        """construct an ObservedBarcodeSet from an iterable of string barcodes"""
+        tbe = TwoBit(barcode_length)
+        return cls(Counter(tbe.encode(b) for b in iterable))


Don't you need to include barcode_length as a param to cls? Same for line 128.

dshiga · 2017-11-09T21:25:04Z

src/sctools/barcode.py

+        if not isinstance(barcode_set, Mapping):
+            raise TypeError('barcode set must be a dict-like object mapping barcodes to counts')
+        self._data = barcode_set
+        self._barcode_length = barcode_length


Should we throw an error if barcode_length is None?

dshiga · 2017-11-09T21:31:15Z

src/sctools/barcode.py

+        return base4_entropy(self.base_frequency(weighted=weighted))
+
+
+class PriorBarcodeSet(BarcodeBase):


I'm not clear on why two classes are needed here, rather than three factory methods that create instances of BarcodeBase from a file, from an iterable of strings, and an iterable of bit encoded strings. Once you've constructed them, they share all of their code in BarcodeBase.

Good idea. I might have some questions about implementation here, but I'll make this change.

dshiga · 2017-11-09T21:36:54Z

src/sctools/platform.py

+                     'using picard FastqToSam')
+            parser.add_argument('-o', '--output-bamfile', required=True,
+                                help='filename for tagged bam')
+            args = vars(parser.parse_args())


why convert to a dict here? Isn't it easier to use args.u2 etc below instead of args['u2']?

For testing; it's hard to create namespace objects but easy to create keyword dictionaries; I'll make it clear that args can be passed that way in the docstring.

dshiga · 2017-11-09T21:38:01Z

src/sctools/platform.py

+    }
+
+    @classmethod
+    def get_tag(cls, sequencing_read):


Should this be get_tags plural? Returns a tuple of tags. Or get_tag_set.

dshiga · 2017-11-09T21:41:40Z

src/sctools/platform.py

+        """
+        tag_generators = []
+        for k, v in files_with_tags.items():
+            tag_generators.append(fastq.TagGenerator(cls.get_tag(k), files=v))


The docstring says the key is the filename and the value is the tag, opposite of what is happening here.

dshiga · 2017-11-09T21:54:35Z

src/sctools/bam.py

+        self._file = alignment_file
+        self._open_mode = open_mode
+
+    def indices_by_chromosome(self, n_specific, chromosome, include_other=0):


Maybe call it include_unaligned instead of include_other. Would it make sense to interpret -1 to mean include all unaligned reads? Is that a useful thing to be able to do?

Oh I see, this is not just unaligned reads but any read that does not match the chromosome we asked for.

i'll find a way to clarify; sounds like it produced some confusion.

Adjusted docstring:

""" Return the list of first n_specific indices of reads aligned to selected chromosome. If desired, will also return non-specific indices in a second list (can serve as negative control reads). :param int n_specific: number of aligned reads to return indices for :param str chromosome: only reads from this chromosome are considered valid :param int include_other: optional, (default=0), the number of reads to include that are NOT aligned to chromosome (could be aligned or unaligned read) :return [int]: list of indices to reads aligning to chromosome :return [int]: list of indices to reads NOT aligning to chromosome, only returned if include_other is not 0. """

dshiga · 2017-11-09T21:56:46Z

src/sctools/bam.py

+        :param [fastq.TagGenerator] tag_generators: generators that yield Tag objects
+          (see fastq.Tag)
+        """
+        inbam = pysam.AlignmentFile(self.bam_file, 'rb', check_sq=False)


How about using with pysam.AlignmentFile the way you do on line 57?

dshiga · 2017-11-09T21:57:59Z

src/sctools/bam.py

+
+        with pysam.AlignmentFile(self._file, self._open_mode) as fin:
+            specific, nonspecific = 0, 0  # counters
+            chromosome = str(chromosome)


You already converted to string on 52.

This is a bit different; here I'm being lenient in case the user passed an int as chromosome. On 52 I was building up the set of allowable chromosomes.

dshiga · 2017-11-09T22:02:01Z

src/sctools/bam.py

+            chromosome_indices = []
+            other_indices = []
+
+            for i, record in enumerate(fin):


It feels like some of the contents here, especially the if else block would be worth moving out into its own function that could be unit tested on its own.

I simplified the code. Since each list hashes its length, I don't need the counters; I can just test the length of chromosome_indices and other_indices directly.

However, removing this into its own function would only serve to separate the type/value checking from the index extraction; I think this belongs where it is, but hope that the code simplification makes it clearer and reduces your desire to extract the code.

dshiga

I really like how you've written this, very clear and concise and well documented! I made a bunch of comments on particular lines of code. Once you've made new commits, let me know and I'll take another look.

ambrosejcarr · 2017-11-09T22:07:11Z

Thanks very much @dshiga. I'll begin addressing your comments shortly!

dshiga

Looks great!

mckinsel · 2017-11-14T18:19:52Z

src/sctools/bam.py

+        """
+
+        # acceptable chromosomes
+        valid_chromosomes = [str(i) for i in range(1, 23)] + ['M', 'X', 'Y']


I believe "MT" is also used to represent mitochrondia too.

mckinsel · 2017-11-14T18:20:57Z

src/sctools/bam.py

+
+        if len(chromosome_indices) < n_specific or len(other_indices) < include_other:
+            warnings.warn('Only %d unaligned and %d reads aligned to chromosome %s were found in' 
+                          '%s' % (len(chromosome_indices), len(other_indices),


I think you've swapped chromosome_indices and other_indices.

mckinsel · 2017-11-14T18:21:59Z

src/sctools/encodings.py

+            try:
+                return self.map_[byte]
+            except KeyError:
+                if byte != 78:


Just to keep everything consistent, you might want byte in (ord("N"), ord("n"))

mckinsel · 2017-11-14T18:24:22Z

src/sctools/barcode.py

+    def __getitem__(self, item):
+        return self._data[item]
+
+    def summarize_hamming_distances(self):


Could you add a little test of this method?

mckinsel

👍

ambrosejcarr · 2017-11-14T22:10:23Z

@dshiga @mckinsel

Marcus' most recent comment revealed some non-functional, untested code. I ran a coverage report and added tests to everything I thought important to test. Most of the things that are not currently tested are errors that are supposed to be thrown when the classes receive incorrect input.

If anyone's interested, please take a look at the attached report and let me know if you'd like tests added for anything that's currently not covered. If not, I'll merge this before the demo tomorrow at around 1~10:45a.

cov_html.zip

* first draft for generalizing attaching barcodes * updated generalized class * Fixed PR #1 * styling comment changes * styling comment changes * styling comment changes * barcode end postion bug * Fixed PR #2 * Fixed PR #2, made arg validation more human readable * add comments/documentation to the class * updated comments/documentation style for return values * Fixed PR #57 comments * Updated doc strings, error handling * provided descriptions for input files and purpose of class * updated class description

dshiga reviewed Nov 2, 2017

View reviewed changes

ambrosejcarr mentioned this pull request Nov 4, 2017

reduce size of test files to shrink repo #1

Closed

dshiga reviewed Nov 8, 2017

View reviewed changes

dshiga reviewed Nov 9, 2017

View reviewed changes

mckinsel reviewed Nov 9, 2017

View reviewed changes

dshiga reviewed Nov 9, 2017

View reviewed changes

dshiga approved these changes Nov 13, 2017

View reviewed changes

mckinsel reviewed Nov 14, 2017

View reviewed changes

mckinsel approved these changes Nov 14, 2017

View reviewed changes

initial upload

d0ab92d

ambrosejcarr force-pushed the upload branch from 3e238e1 to d0ab92d Compare November 15, 2017 15:00

ambrosejcarr merged commit a011a61 into master Nov 15, 2017

ambrosejcarr deleted the upload branch November 15, 2017 15:07

benjamincarlin added a commit that referenced this pull request Feb 19, 2019

Fixed PR #2

2accfbc

benjamincarlin added a commit that referenced this pull request Feb 19, 2019

Fixed PR #2, made arg validation more human readable

29956d2



		def zip_readers(*readers, indices=None):
		"""zip together multiple fastq objects, yielding records simultaneously.



		# this could easily be expanded beyond fastq generated tags, we just don't have any use cases yet.
		Tag = namedtuple('Tag', ['start', 'end', 'sequence_tag', 'quality_tag'])

		return base4_entropy(self.base_frequency(weighted=weighted))


		class PriorBarcodeSet(BarcodeBase):

Comments

Conversation

ambrosejcarr commented Oct 31, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ambrosejcarr Nov 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dshiga Nov 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ambrosejcarr Nov 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dshiga Nov 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ambrosejcarr Nov 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ambrosejcarr Nov 4, 2017 •

edited

Loading

dshiga Nov 8, 2017 •

edited

Loading

ambrosejcarr Nov 8, 2017 •

edited

Loading

dshiga Nov 9, 2017 •

edited

Loading

ambrosejcarr Nov 9, 2017 •

edited

Loading

dshiga Nov 9, 2017 •

edited

Loading