A scalable Nextflow implementation of PULSAR_MINER for distributed cluster computing.
NEXTO converts the Python-based PULSAR_MINER pipeline into modular Nextflow processes, enabling:
- Parallelization: Run hundreds of DM trials simultaneously across cluster nodes
- Scalability: Scale from laptops to HPC clusters to cloud platforms
- Reproducibility: Containerized execution with Docker/Singularity
- Resume capability: Automatic resume from failed/interrupted runs
- Resource optimization: Dynamic resource allocation per process
- Optional Filterbank Processing (
FILTOOL) - Time/freq decimation and RFI filtering (PulsarX) - RFI Detection (
RFIFIND) - Identifies and masks radio frequency interference - Dedispersion (
PREPDATA) - Creates DM trial timeseries (highly parallelized) - Birdie Detection & Zaplist Creation (
ACCELSEARCH_ZMAX0,MAKE_ZAPLIST) - RFI line identification (runs once to create frequency mask) - Acceleration Search (
ACCELSEARCH) - Periodic signal detection with acceleration (parallelized per DM, uses zaplist) - Candidate Sifting (
ACCELSIFT) - Candidate filtering by sigma threshold and harmonic removal - Folding (
PREPFOLD_FROM_CANDFILE,PSRFOLD_PULSARX) - Creates phase-folded profiles for top candidates - Single Pulse Search (
SINGLE_PULSE_SEARCH) - Transient detection
All processes are defined in modules.nf with clear separation of concerns:
- Each PRESTO/PulsarX tool has its own process
- Input/output channels clearly defined
- Explicit resource requirements per process (no label-based allocation)
- Publishing strategies for intermediate results
- Support for both PRESTO and PulsarX folding backends
- Nextflow ≥23.04.0
- PRESTO 3.0 or 4.0 (incompatible with PRESTO 5.0.0+)
- Python 3.8+ with NumPy
- Optional: NVIDIA CUDA ≤11.8 for GPU acceleration
- CPU: Multi-core processor (≥8 cores recommended)
- Memory: ≥32 GB RAM
- Storage: SSD/NVMe for reduced I/O latency
- GPU: NVIDIA GPU for accelerated searches (optional)
curl -s https://get.nextflow.io | bash
chmod +x nextflow
sudo mv nextflow /usr/local/bin/Follow the PRESTO installation guide.
git clone <repository-url> nexto
cd nextoEdit nextflow.config and set:
params.presto_path = "/path/to/your/presto"nextflow run nexto_search.nf --input observation.fil --outdir resultsnextflow run nexto_search.nf \
--input observation.fil \
--outdir results \
--dm_low 0 \
--dm_high 200 \
--dm_step 0.5 \
--zmax 100 \
--numharm 16 \
--sigma_threshold 7.0NEXTO includes pre-configured profiles for three HPC clusters. See CLUSTER_CONFIGS.md for detailed documentation.
nextflow run nexto_search.nf \
-profile ozstar \
--input observation.fil \
--outdir resultsnextflow run nexto_search.nf \
-profile hercules \
--input observation.fil \
--outdir resultsnextflow run nexto_search.nf \
-profile contra \
--input observation.fil \
--outdir resultsEnable GPU acceleration on supported clusters:
nextflow run nexto_search.nf \
-profile hercules \
--input observation.fil \
--use_cuda trueNextflow automatically resumes from the last successful step:
nextflow run nexto_search.nf --input observation.fil -resume| Parameter | Default | Description |
|---|---|---|
--input |
required | Input observation file (.fil or .fits) |
--outdir |
results |
Output directory |
--dm_low |
0.0 |
Minimum DM to search |
--dm_high |
100.0 |
Maximum DM to search |
--dm_step |
0.5 |
DM step size |
--downsample |
1 |
Downsampling factor |
--zmax |
50 |
Maximum acceleration |
--wmax |
0 |
Maximum jerk (0 = disabled) |
--numharm |
8 |
Number of harmonics to sum |
--sigma_threshold |
6.0 |
Minimum sigma for candidates |
--rfifind_time |
2.0 |
Time interval for rfifind (sec) |
--rfifind_freqsig |
4.0 |
Frequency sigma for rfifind |
--npart |
50 |
Number of phase bins for folding |
--use_cuda |
false |
Enable GPU acceleration |
--gpu_id |
0 |
GPU device ID |
--enable_jerk |
false |
Enable jerk search |
--enable_single_pulse |
true |
Enable single pulse search |
--sp_threshold |
5.0 |
Single pulse sigma threshold |
HPC Cluster Profiles:
ozstar- OzSTAR (Swinburne) SLURM clusterhercules- Hercules (MPIfR Bonn) SLURM clustercontra- Contra (MPIfR Bonn) HT-Condor cluster
Local Profiles:
standard- Local execution (single CPU)local- Local execution (multi-core with Apptainer)
See CLUSTER_CONFIGS.md for detailed cluster configuration documentation.
results/
├── 01_RFIFIND/ # RFI detection masks and plots
├── 02_BIRDIES/ # RFI line lists (zapfiles)
├── 03_DEDISPERSION/ # Dedispersed timeseries and FFTs
├── 04_SIFTING/ # Sifted candidate lists
├── 05_FOLDING/ # Folded profiles and plots (.pfd, .ps, .bestprof)
├── 06_SINGLE_PULSES/ # Single pulse detections
├── pipeline_trace.txt # Execution trace
├── timeline.html # Execution timeline
├── report.html # Execution report
└── pipeline_dag.svg # Pipeline DAG visualization
NEXTO achieves massive parallelization through:
- DM trials: Each DM is processed independently (200 DMs = 200 parallel jobs)
- Acceleration search: Each DM's FFT searched independently
- Single pulse search: Each dedispersed timeseries searched in parallel
Example: For 200 DM trials with 100 available CPUs:
- Original PULSAR_MINER: ~10-20 hours (sequential)
- NEXTO: ~1-2 hours (parallel)
NEXTO uses explicit resource allocation for all processes. Resources are standardized across all HPC clusters:
| Process | CPUs | Memory | Time | maxForks |
|---|---|---|---|---|
| FILTOOL | 16 | 8GB | 4h | 1 |
| RFIFIND | 1 | 8GB | 4h | 400 |
| PREPDATA | 1 | 4GB | 4h | 400 |
| ACCELSEARCH | 16 (1 if GPU) | 8GB | 2d | 200 |
| ACCELSEARCH_ZMAX0 | 8 | 8GB | 4h | 200 |
| PREPFOLD | 1 | 8GB | 4h | 400 |
| PSRFOLD_PULSARX | 16 | 8GB | 4h | 400 |
| SINGLE_PULSE_SEARCH | 4 | 8GB | 4h | 500 |
Resources scale automatically on retries using check_max() with task.attempt multiplier.
To customize resources for your cluster, edit the appropriate config file in conf/ directory. See CLUSTER_CONFIGS.md for details.
Build a PRESTO Docker image:
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
build-essential \
git \
python3 \
python3-numpy \
pgplot5 \
libfftw3-dev \
libglib2.0-dev
# Install PRESTO
RUN git clone https://github.com/scottransom/presto /opt/presto
WORKDIR /opt/presto/src
RUN make && make prep && make clean
ENV PRESTO=/opt/presto
ENV PATH=$PRESTO/bin:$PATH
ENV LD_LIBRARY_PATH=$PRESTO/lib:$LD_LIBRARY_PATH
ENV PYTHONPATH=$PRESTO/lib/python:$PYTHONPATHRun with Docker:
nextflow run nexto_search.nf -profile docker --input observation.filsingularity build presto.sif docker://your-presto-image:latest
nextflow run nexto_search.nf -profile singularity --input observation.filCreate a custom DM list file:
# generate_dms.py
import numpy as np
dms = np.concatenate([
np.arange(0, 50, 0.3),
np.arange(50, 200, 1.0),
np.arange(200, 500, 2.0)
])
np.savetxt('dm_list.txt', dms, fmt='%.2f')Modify the workflow to read from file instead of generating range.
Process multiple observations:
#!/bin/bash
for obs in observations/*.fil; do
nextflow run nexto_search.nf \
-profile slurm \
--input $obs \
--outdir results/$(basename $obs .fil) \
-resume
done| Feature | PULSAR_MINER | NEXTO |
|---|---|---|
| Language | Python | Nextflow DSL2 |
| Execution | Sequential with threading | Fully parallel |
| Cluster support | Manual scripting | Native (10+ schedulers) |
| Resume | Checkpoint-based | Automatic |
| Scalability | Limited by threads | Unlimited |
| Monitoring | Log files | HTML reports + timeline |
| Resource management | Manual | Dynamic per-process |
| Cloud support | Manual | Native (AWS, GCP, Azure) |
Issue: command not found: rfifind
Solution: Ensure PRESTO is in PATH. Check params.presto_path in config.
Issue: Out of memory errors
Solution: Increase memory allocation in nextflow.config for specific processes.
Issue: Too many jobs submitted
Solution: Adjust executor.queueSize and submitRateLimit in config.
Run with debug output:
nextflow run nexto_search.nf --input observation.fil -with-trace -with-dag dag.htmlIf you use NEXTO, please cite:
- Original PULSAR_MINER: alex88ridolfi/PULSAR_MINER
- PRESTO: Ransom, S. M. (2001), PhD thesis, Harvard University
- Nextflow: Di Tommaso et al. (2017) Nature Biotechnology
Same as PULSAR_MINER.
Contributions welcome! Please open issues or pull requests.
For questions or issues, please open a GitHub issue.