Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Build artifacts
build/
dist/
*.egg-info/
*.egg

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
*.manifest
*.spec

# Virtual environments
venv/
env/
.venv/

# IDE
.idea/
.vscode/
*.swp
*.swo

# ProBioPred output
ProBioPred_out/
43 changes: 43 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Contributing to ProBioPred

Thank you for your interest in contributing to ProBioPred!

## How to contribute

1. **Fork** the repository on GitHub
2. **Clone** your fork locally
3. **Create a branch** for your changes: `git checkout -b your-feature-name`
4. **Make your changes** and test them
5. **Commit** with a clear message: `git commit -m "Add/fix: description"`
6. **Push** to your fork: `git push origin your-feature-name`
7. **Open a Pull Request** against the `main` branch

## Development setup

```bash
conda create -n probiopred python=3.10
conda activate probiopred
conda install -c bioconda blast
conda install -c conda-forge libsvm
pip install -e .
```

You will also need RGI (Resistance Gene Identifier) for antibiotic resistance detection. See the main README for full installation instructions.

## Code style

- Use Python 3
- Follow PEP 8 where practical
- Add docstrings to new functions

## Reporting issues

If you find a bug or have a suggestion, please open an issue on GitHub with:

- A clear description of the problem
- Steps to reproduce (for bugs)
- Your environment (OS, Python version, etc.)

## Questions

For questions about ProBioPred, you can open an issue or contact the maintainers.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The input genome should be in standard FASTA format to run this tool.

The ProBioPred uses available genetic information and Support Vector Machine (SVM) models for prediction of potential probiotic candidate. In brief, based on extensive literature survey and available databases, ProBioPred uses information on genes imparting **probiotic properties**, **virulence factors** and **antibiotic resistance genes** to generate and train models which eventually predicts a potential probiotic candidate. ProBioPred can also serves as a tool to predict probiotic genes, virulence factors and antibiotic resistance genes which can be browsed on the website or downloaded. These models can be used for analysis of genome sequences using ProBioPred either [online](http://210.212.161.142/ProBioPred/) or as a stand-alone tool.

![](https://github.com/microDM/ProBioPred/blob/master/performance.jpeg)
![](https://github.com/microDM/ProBioPred/blob/main/performance.jpeg)


## Installing ProBioPred
Expand Down Expand Up @@ -45,7 +45,7 @@ pip install .
usage: proBioPred.py [-h] -i PATH -g GENUS [-o PATH] [-t THREADS]

Wrapper for running ProBioPred. Searches for probiotic, virulent and
antibiotic resistence genes in query genome. Then predicts the probability
antibiotic resistance genes in query genome. Then predicts the probability
score of genome being probiotic or non-probiotic based on SVM model.

optional arguments:
Expand Down Expand Up @@ -79,11 +79,11 @@ ProBioPred generates output directory with several files and prints SVM score fo

|File|Description|
|:----|:------|
|out.libsvm|svm-predict output (1/-1 referes to probiotic/non-probiotic class)|
|out.libsvm|svm-predict output (1/-1 refers to probiotic/non-probiotic class)|
|pro_hits.pfasta|Probiotic genes (multi-FASTA file)|
|pro_outFiltered.blast|BLAST outfmt6 for probiotic genes|
|resulTab.csv|Scores for each features (.csv format)|
|rgi_out.json|RGI output (json format)|
|rgi_out.txt|RGI output (tab-delimited format)|
|vfdb_hits.pfasta|Virulent genes (mult-FASTA file)|
|vfdb_hits.pfasta|Virulent genes (multi-FASTA file)|
|vfdb_outFiltered.blast|BLAST outfmt6 for virulent genes|
4 changes: 2 additions & 2 deletions probiopred/probiopred.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,13 @@ def count_no_of_lines(filename):
return nlines

def runRGI(genomeFile,outFile,threads):
"""Execute RGI main on given genome file to find antibiotic resistence genes
"""Execute RGI main on given genome file to find antibiotic resistance genes

Arguments:
genomeFile {str} -- [genome file name]

Returns:
[boolean] -- [True if succeded or subprocess stderr object]
[boolean] -- [True if succeeded or subprocess stderr object]
"""
#ardb categories -
ardbCat1 = "3002600 3002529 3002533 3002574 3003677 3003199 3003676 3002581 3002549 3002548 3002557 3002558 3002561 3002562 3002563 3002564 3002611 3002614 3002616 3002618 3002619 3002620 3002604 3002607 3002608 3001816 3000502 3001821 3001832 3001827 3001838 3001839 3001842 3001843 3001845 3001849 3001823 3001855 3003171 3001824 3001825 3001830 3001826 3003847 3003856 3003857 3003858 3003859 3003860 3003862 3003863 3003848 3003864 3003865 3003866 3003867 3003868 3003849 3003870 3003887 3000777 3002481 3000853 3002982 3002983 3000230 3002635 3004086 3002658 3000839 3000858 3004056 3002985 3002985 3002849 3002877 3002878 3003801 3003931 3002817 3002240 3002249 3002255 3003808 3002670 3002680 3002682 3002999 3003835 3001856 3003097 3000783 3002702 3002021 3002112 3002128 3002130 3002025 3002029 3002034 3002035 3002055 3002057 3002064 3002085 3002100 3003093 3001972 3001973 3001981 3001982 3003168 3001893 3001899 3001906 3001907 3001908 3002859 3002860 3003949 3000027 3000074 3000598 3004045 3003308 3002156 3002345 3002203 3002225 3002197 3000840 3004102 3002312 3002475 3003969 3003574 3002489 3002486 3004194 3002179 3003173 3002170 3003950 3004073 3003779 3003389 3004157 3003463 3003392 3002420 3001641 3001642 3001644 3001465 3001485 3001488 3001687 3001510 3001576 3001577 3001399 3003116 3001808 3001773 3001703 3001704 3001625 3001404 3001632 3001628 3001631 3001646 3002390 3002397 3002398 3002403 3002410 3002412 3002414 3002415 3002416 3000621 3002497 3002500 3002505 3004336 3004338 3004339 3004342 3002507 3004343 3004344 3004345 3004349 3004352 3003714 3004114 3004054 3004038 3003702 3003684 3003686 3003895 3003896 3003688 3003046 3003047 3003836 3000448 3002709 3002724 3002726 3002732 3002733 3002751 3002762 3002764 3002719 3002720 3002779 3002783 3002723 3002789 3003193 3000823 3000245 3003894 3000861 3003198 3002691 3004334 3003307 3001155 3001156 3001071 3001169 3001173 3001174 3001175 3001182 3001189 3001190 3001191 3001075 3001199 3001200 3001201 3001347 3001080 3001062 3001096 3001123 3001066 3001124 3001125 3001126 3000510 3003311 3003041 3000985 3000997 3001002 3001004 3001012 3001015 3001023 3001025 3000887 3001026 3001028 3001029 3001030 3001032 3001033 3001034 3001035 3000888 3001037 3001041 3001042 3001375 3001374 3001046 3001054 3001055 3001376 3000874 3000891 3001378 3001382 3001383 3001385 3001394 3000894 3000898 3000899 3000875 3000900 3000903 3000904 3000876 3000910 3000911 3000912 3000914 3000916 3000917 3000918 3000921 3000922 3000924 3000926 3000928 3000878 3000929 3000931 3000934 3000935 3000879 3000936 3000937 3000938 3000939 3000941 3000942 3000943 3000944 3000946 3000947 3000948 3000949 3000951 3000952 3000953 3000954 3000955 3000956 3000957 3000958 3000959 3000961 3000962 3000963 3000476 3000196 3000481 3002871 3000565 3000566 3000556 3003196 3000165 3004032 3003980 3004035 3000180 3000166 3004033 3003981 3004036 3000195 3000167 3000168 3000173 3000175 3000177 3000179 3000186 3000190 3000194 3000205 3000182 3000183 3000851 3003202 3003203 3002827 3004105 3004106 3003059 3000237 3003679 3003680 3000844 3004059 3004060 3003305 3003309 3000010 3000013 3002907 3002908 3003723 3002914 3003727 3002910 3002913 3002921 3002923 3002924 3002927 3002929 3002933 3003726 3002940 3002941 3002961 3002962 3002963 3002840 3002841 3002842 3002844 3003744 3002845 3003987 3002831 3002833 3003990 3004118 3004117 3004289 3002283 3002288 3002289 3002272 3002298 3002299 3002273 3002300 3002301 3002302 3002303 3002304 3002305 3002306 3002307 3003178 3003179 3002276 3002277 3002278 3002279 3003558 3003063 3003064 3003952".split()
Expand Down
40 changes: 40 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "ProBioPred"
version = "1.0.0"
description = "Probiotic candidate prediction from genome sequences using SVM models"
readme = "README.md"
license = {text = "GPL-3.0"}
requires-python = ">=3.8"
authors = [
{name = "Dattatray Mongad"}
]
keywords = ["probiotics", "bioinformatics", "SVM", "genome"]
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Topic :: Scientific/Engineering :: Bio-Informatics",
]
dependencies = [
"biopython>=1.81",
"pandas>=2.0",
]

[project.urls]
Homepage = "https://github.com/microDM/ProBioPred"
Repository = "https://github.com/microDM/ProBioPred"

[tool.setuptools.packages.find]
where = ["."]
include = ["probiopred*"]

[tool.setuptools.package-data]
probiopred = ["data/**/*"]
2 changes: 1 addition & 1 deletion scripts/proBioPred.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import pandas as pd

parser = argparse.ArgumentParser(description="Wrapper for running ProBioPred. Searches for "
"probiotic, virulent and antibiotic resistence "
"probiotic, virulent and antibiotic resistance "
"genes in query genome. Then predicts the probability "
"score of genome being probiotic or non-probiotic "
"based on SVM model.")
Expand Down