Skip to content

Conversation

@ali-yz
Copy link

@ali-yz ali-yz commented Feb 15, 2025

fix the same species genes in og bug by checking for both gene.xref and species.

the is the diff for the outputs of the following command before and after the fix is applied.

command:

fastoma-collect-subhogs --pickle-folder pickle_folders/    --roothogs-folder omamer_rhogs/    --gene-id-pickle-file gene_id_dic_xml.pickle    --out FastOMA_HOGs.orthoxml    --marker-groups-fasta OrthologousGroups.tsv    --roothog-tsv RootHOGs.tsv    --species-tree species_tree_checked.nwk    --id-transform noop    -vv

Diff:

(FastOMA_Runs) ➜  case_input_copy diff FastOMA_HOGs.orthoxml ../case_input/FastOMA_HOGs.orthoxml                
2c2
< <orthoXML xmlns="http://orthoXML.org/2011/" origin="FastOMA 0.3.5+dev" originVersion="2025-02-15 22:09:37" version="0.5">
---
> <orthoXML xmlns="http://orthoXML.org/2011/" origin="FastOMA 0.3.5+dev" originVersion="2025-02-15 21:23:59" version="0.5">
(FastOMA_Runs) ➜  case_input_copy diff OrthologousGroupsFasta/OG_0000001.fa ../case_input/OrthologousGroupsFasta/OG_0000001.fa           
29a30,39
> >mRNA:Solyc09g090150.4.1 mRNA:Solyc09g090150.4.1||LA128||1002026622 mRNA:Solyc09g090150.4.1 [LA128]
> LFILQSSQLKIMEIIKFSLCFFLLFSCCFSQIEQQQSFLWQKLQYQQQHRRGRAKTDCRI
> SSLSAREPTYKFNSEAGTTEFWDRNLEEFECAGVAAVRNEIQPNGLLLPHYNNAPQLLYI
> VQGSGILGTVIPGCAETFESPQRERSMRGEEGRSEGGSQYRTGGDRHQKVRRFRQGDVLA
> LPAGITLWLYNNRQEQLVTVALLDVSNPANQLDLQFRHFFLAGNPNPKGLSGSRYEEEIQ
> SRKQHEQGGQPQQQQPGNLFDGFDLDILAEVFNVDQNLAKNLQGREDQRGQIIRAENLDV
> LSPEFEEEQPHRPGRGSRPNGLEETICAMRLRENLGRTSRADVYNPRGGRISTLNSHKLP
> ILNWLQLSAEKGNLYQNAVMAPYWNLNAHSIIYIIRGTGRIQVVGDTGNSVFDDEVREGQ
> MIVVPQNFAIMKKAGDQGLEYIAFKTNDQAITSALAGRLSAILAMPEEVLMNSYQISRQE
> ARSLKYNREETCVFAGRKSTGYSTRAMEYALTAVEAFLKV
564a575,584
> >mRNA:Solyc03g005580.2.1 mRNA:Solyc03g005580.2.1||SWEET||1058015128 mRNA:Solyc03g005580.2.1 [SWEET]
> MASKSSFLCFFFCFLVVCQISFAQIFERQQIWQRLQHQQQHRALRSKTECQIERLNAQEP
> NRRFESEAGVVEFWDATQEQFECAGVQAVRHEIRRNGLLLPYYSNTPQLFYIVQGSGVHS
> TIFPGCAETFETESPLDRRAQSGDRGQRSLDRHQKVRRFQAGDILALPAGVTHWTYNDGE
> EPIISVSLIDTSNVANQLDLTFRKFFLAGNPQRGVQQQVLGRQQETTSQYGRRGSEQEKG
> GNMLSGFDPQVLSEAFNVDVEVIRKIQEEAPERGIIVLAENLRFLLPEEKEEEEEEREWH
> SRRGFPLNGLEETFCTMKLRENIGHPTRSDVYNPRGGRISTVNSNSLPVLNWLQLSAERG
> TLYNNAIVAPHWNLNAHSIIYIIRGSGRFQVVGNAGKSVFDDQVRQGQLIVVPQNFAIVK
> KAGEQGLDYIAFKTNDNAMISPLAGRLSAIRAMPEEVLMNSYQISRQEAKSLKFNRDELS
> VFGPGARSSRQYA

@sinamajidian
Copy link
Collaborator

Great! thanks
we might want to report the species name here (since there are same gene names in different species)

            for gene_xref, _ in group_members:
                tsv.write(f"{group_name}\t{gene_xref}\t{omamer_roothog}\n")

but to keep it with minimal changes and stay with current format, I prefer the current version.

@sinamajidian sinamajidian requested a review from alpae February 15, 2025 19:11
@sinamajidian sinamajidian merged commit 3bec89a into DessimozLab:dev Feb 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants