New tool(s) to convert yeast chrnames between roman and arabic numerals

We are transitioning from using our custom sacCer3_cegr reference genome to using the standard sacCer3 genome from SGD for yeast analysis. This new standard uses a roman numeral chromosome naming system while our custom reference genome uses arabic numerals. This tool will help users with this transition by converting any file back and forth between the chromosome naming systems.

**Arabic --> Roman**
chr1 --> chrI
chr2 --> chrII
...
chr16 --> chrXVI
chrM --> chrmt

**Roman --> Arabic**
...

- Should the user be expected to specify the format of the input file to be converted? (GFF/BED/BAM/TAB)
- User option to indicate custom delimiter may be useful. Should this feature be added?

GFF/BED/BAM/TAB-formatted files can be converted using a HashMap on each of the tokens. This assumes all instances of chromosome names occupy their own column. However, some file formats have a comments column that can contain chromosome information, like interaction info with coordinates on other chromosomes. These instances of chromosome names do not exist as their own token.

- Should we simply implement a global replace for each line? Keep in mind that order of conversions is important if a global replace is done (for example, chrII needs to be replaced before chrI). There may be some edge cases that are mis-converted if we do a simple global replace.
- It might be useful for the user to optionally indicate if they wish for the information in the comments column to also be converted. (GFF col9, BED col7+, maybe indicate certain column range of TAB-format file)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New tool(s) to convert yeast chrnames between roman and arabic numerals #49

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New tool(s) to convert yeast chrnames between roman and arabic numerals #49

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions