I noticed that for the following lines, it seems we are automatically reverse complementing reverse (-) strand sequences, on write to Zarr:
https://github.com/ML4GLand/SeqData/blob/main/seqdata/_io/readers/fasta.py#L220
https://github.com/ML4GLand/SeqData/blob/main/seqdata/_io/readers/fasta.py#L249
I wanted to better understand the logic here. If I pass in a bed file with strand information, are we effectively writing everything as if it were on the forward (+) strand?
I think this may be a problem if so. If I read back in this same dataset, I'm going to get the (+) strand returned, but the metadata for the dataset is going to indicate that it is coming from the original strand, which could be (-).
I noticed that for the following lines, it seems we are automatically reverse complementing reverse (-) strand sequences, on write to Zarr:
https://github.com/ML4GLand/SeqData/blob/main/seqdata/_io/readers/fasta.py#L220
https://github.com/ML4GLand/SeqData/blob/main/seqdata/_io/readers/fasta.py#L249
I wanted to better understand the logic here. If I pass in a bed file with strand information, are we effectively writing everything as if it were on the forward (+) strand?
I think this may be a problem if so. If I read back in this same dataset, I'm going to get the (+) strand returned, but the metadata for the dataset is going to indicate that it is coming from the original strand, which could be (-).