seq_tools

A Python package for manipulating and analyzing nucleic acid sequences (DNA and RNA) in pandas DataFrames.

Features

Batch operations: Work with sequences in pandas DataFrames for efficient processing
Sequence manipulation: Convert between DNA/RNA, reverse complement, add sequences
Structure prediction: Fold RNA sequences using ViennaRNA
Analysis tools: Calculate molecular weights, extinction coefficients, edit distances
CLI interface: Command-line tools for quick sequence operations
Python API: Full programmatic access to all functionality

Installation

pip install rna_seq_tools

Quick Start

Command Line Interface

# Get help
seq-tools --help

# Convert RNA to DNA
seq-tools to-dna "AUCG"

# Fold RNA sequence
seq-tools fold "GGGGUUUUCCCC"

# Calculate molecular weight
seq-tools mw "ATCG"

Python API

import pandas as pd
from seq_tools import sequences_to_dataframe
from seq_tools.dataframe import to_rna, fold, get_molecular_weight

# Create a DataFrame from sequences
sequences = ["ATCG", "GCTA", "AAAA"]
df = sequences_to_dataframe(sequences)

# Convert to RNA
df = to_rna(df)

# Fold RNA sequences
df = fold(df)

# Calculate molecular weights
df = get_molecular_weight(df, "RNA", double_stranded=False)

print(df)

Single Sequence Functions

For single sequence operations, import directly from seq_tools:

from seq_tools import to_dna, to_rna, get_reverse_complement, get_molecular_weight

# Convert sequences
rna_seq = to_rna("ATCG")  # Returns "AUCG"
dna_seq = to_dna("AUCG")  # Returns "ATCG"

# Reverse complement
rc = get_reverse_complement("ATCG", "DNA")  # Returns "CGAT"

# Molecular weight
mw = get_molecular_weight("ATCG", "DNA")  # Returns 1307.80

CLI Commands

`add`

Add a sequence to the 5' and/or 3' end of sequences.

seq-tools add -p5 "AAAA" "GGGGUUUUCCCC"
seq-tools add -p5 "AAAA" -p3 "CCCC" input.csv

`ec`

Calculate the extinction coefficient for each sequence.

seq-tools ec "GGGGUUUUCCCC"
seq-tools ec input.csv -nt RNA -ds  # RNA, double-stranded

`edit-distance`

Calculate the average edit distance of a sequence library.

seq-tools edit-distance input.csv
seq-tools edit-distance input.csv --parallel --workers 4

`fold`

Fold RNA sequences using ViennaRNA.

seq-tools fold "GGGGUUUUCCCC"
seq-tools fold input.csv

`mw`

Calculate the molecular weight for each sequence.

seq-tools mw "ATCG"
seq-tools mw input.csv -nt DNA -ds  # DNA, double-stranded

`rc`

Calculate reverse complement for each sequence.

seq-tools rc "ATCG"
seq-tools rc input.csv -nt DNA

`to-dna`

Convert RNA sequences to DNA (replace U with T).

seq-tools to-dna "AUCG"
seq-tools to-dna input.csv -o output.csv

`to-dna-template`

Convert RNA sequences to DNA template with T7 promoter.

seq-tools to-dna-template "AUCG"
seq-tools to-dna-template input.csv

`to-rna`

Convert DNA sequences to RNA (replace T with U).

seq-tools to-rna "ATCG"
seq-tools to-rna input.csv

`transcribe`

Transcribe DNA template sequences to RNA (removes T7 promoter).

seq-tools transcribe input.csv

`trim`

Trim 5'/3' ends of sequences.

seq-tools trim input.csv --start 5 --end 3

`to-fasta`

Generate FASTA file from CSV.

seq-tools to-fasta input.csv output.fasta

`to-opool`

Generate oligo pool file (Excel) from CSV.

seq-tools to-opool input.csv "pool_name" output.xlsx

DataFrame Functions

The package provides comprehensive DataFrame operations via seq_tools.dataframe:

Conversion: to_dna(), to_rna(), to_dna_template()
Analysis: get_molecular_weight(), get_extinction_coeff(), get_length()
Structure: fold() - predict RNA secondary structures
Manipulation: add(), trim(), get_reverse_complement()
Generation: generate_random_sequences(), generate_mutated_sequences()
Validation: has_t7_promoter(), has_5p_sequence(), has_3p_sequence()
File I/O: to_fasta(), to_opool()

from seq_tools.dataframe import to_rna, fold, get_molecular_weight
# Use with DataFrames containing a 'sequence' column

Note: For backward compatibility, functions are also available with _df suffix from the main package (e.g., from seq_tools import to_rna_df), but the recommended approach is to import from seq_tools.dataframe.

See the notebooks directory for detailed examples.

Requirements

Python 3.9+
pandas
numpy
ViennaRNA (for structure prediction)
editdistance
click
tabulate

Tutorial Notebooks

Interactive Jupyter notebooks are available in the notebooks/ directory:

01_introduction.ipynb: Package overview and quick start
02_sequence_operations.ipynb: Working with individual sequences
03_structure_analysis.ipynb: RNA folding and structure analysis
04_dataframe_operations.ipynb: Batch processing with DataFrames
05_advanced_features.ipynb: Advanced features and workflows

See the notebooks README for more details.

Development

Using Conda/Mamba (Recommended)

# Clone the repository
git clone https://github.com/jyesselm/seq_tools.git
cd seq_tools

# Create conda environment from environment.yml
conda env create -f environment.yml
# OR using mamba (faster)
mamba env create -f environment.yml

# Activate environment
conda activate seq_tools

# Install package in editable mode
pip install -e .

# Run tests
pytest test/ -v

Using pip/venv

# Clone the repository
git clone https://github.com/jyesselm/seq_tools.git
cd seq_tools

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies and package
pip install -e .

# Run tests
pytest test/ -v

Note: ViennaRNA is required for structure prediction. It's included in the conda environment, but for pip installations you may need to install it separately via conda or your system package manager.

License

This project is licensed under a Non-Commercial License. Commercial use is prohibited. See LICENSE file for details.

For commercial licensing inquiries, please contact jyesselm@unl.edu.

Author

Joe Yesselman - jyesselm@unl.edu

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for detailed coding standards and guidelines.

Quick Start for Contributors

# Install with dev dependencies
make install

# Run all quality checks
make check

# Individual checks
make format      # Format code with ruff
make lint        # Lint with ruff
make type-check  # Type check with mypy
make coverage    # Run tests with coverage (90% minimum)

Code Standards

Maximum 3 levels of indentation
Functions ≤ 30 lines (with few exceptions)
One responsibility per function
All functions must have type hints and docstrings
Minimum 90% test coverage required
Use ruff and mypy for code quality

See CONTRIBUTING.md for complete details.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github		.github
notebooks		notebooks
seq_tools		seq_tools
test		test
.cursorrules		.cursorrules
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tox.ini		tox.ini

License

jyesselm/seq_tools

Folders and files

Latest commit

History

Repository files navigation

seq_tools

Features

Installation

Quick Start

Command Line Interface

Python API

Single Sequence Functions

CLI Commands

add

ec

edit-distance

fold

mw

rc

to-dna

to-dna-template

to-rna

transcribe

trim

to-fasta

to-opool

DataFrame Functions

Requirements

Tutorial Notebooks

Development

Using Conda/Mamba (Recommended)

Using pip/venv

License

Author

Contributing

Quick Start for Contributors

Code Standards

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`add`

`ec`

`edit-distance`

`fold`

`mw`

`rc`

`to-dna`

`to-dna-template`

`to-rna`

`transcribe`

`trim`

`to-fasta`

`to-opool`

Packages