feat: Add GPU-accelerated operations via PyTorch by maragall · Pull Request #21 · Cephla-Lab/stitcher

maragall · 2026-03-07T18:07:17Z

Summary

Add optional GPU acceleration for core utility functions (phase_cross_correlation, shift_array, match_histograms, block_reduce, compute_ssim) using PyTorch CUDA, with automatic CPU fallback when no GPU is available
Preserve input dtypes across all accelerated operations for consistency
Replace magic numbers with named constants (_FFT_EPS, _SSIM_K1, _SSIM_K2, _PARABOLIC_EPS)
Add comprehensive test suite: unit tests for each GPU operation, CPU fallback tests, dtype preservation tests, and subpixel phase correlation tests

Benchmark results (S5_reg1, 184 tiles)

Phase	CPU (s)	GPU (s)	Speedup
Registration	4.94	3.16	1.57x
Full pipeline	59.54	57.27	1.04x

GPU speedup is most visible in the registration phase. The full pipeline is dominated by I/O-bound fusion, where GPU provides minimal benefit.

Test plan

pytest tests/test_block_reduce.py tests/test_fft.py tests/test_histogram_match.py tests/test_shift_array.py tests/test_ssim.py tests/test_cpu_fallback.py — all new GPU tests pass
Verify CPU-only fallback works when PyTorch/CUDA is unavailable
Run end-to-end registration on a tiled dataset with GPU enabled

🤖 Generated with Claude Code

Consolidates PRs Cephla-Lab#4-8 into a single feature branch: - phase_cross_correlation: GPU FFT via torch.fft (~46x speedup) - shift_array: GPU grid_sample for subpixel shifts (~6.7x speedup) - match_histograms: GPU sort/quantile mapping (~13.3x speedup) - block_reduce: GPU avg_pool2d (~4x speedup) - compute_ssim: GPU conv2d for local statistics (~6.4x speedup) All functions include automatic CPU fallback when CUDA is unavailable. Replaces cupy/cucim dependency with PyTorch for broader compatibility. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…reduce

…_SSIM_K2)

- Add type hints to phase_cross_correlation, shift_array, match_histograms, block_reduce, compute_ssim - Add return type hints to to_numpy and to_device - Import Callable, Any, Union from typing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix incorrect pixel assignment in _match_histograms_torch The previous code used unnecessary indexing that permuted results incorrectly - Simplify to_device return type from Union[Any, np.ndarray] to Any - Remove unused Union import - Add pixel-by-pixel test comparing GPU vs skimage results 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove redundant `torch is not None` check in to_numpy - Add type hint to _shift_array_torch shift_vec parameter - Fix shift_array CPU path to compute in float64 for API consistency (preserve_dtype=False now returns float on both GPU and CPU paths) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The GPU implementation returns 0.0 for error and phasediff values since these are not computed. Added notes to docstring to clarify. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

maragall · 2026-03-07T18:11:36Z

Closing in favor of #13 which is the original PR for this work.

I've rebased the branch onto latest main (zero conflicts) and benchmarked it — see my comment on #13 with results and the rebased branch reference.

hongquanli and others added 18 commits March 7, 2026 04:56

style: Apply black formatting

1eae788

fix: Remove unused variables in _match_histograms_torch

a51bc18

feat: Add dtype preservation to shift_array, match_histograms, block_…

b627a15

…reduce

fix: Handle 2D block_size for 3D arrays in _block_reduce_torch

bfefa67

refactor: Extract duplicate data_range calculation in compute_ssim

f9473c9

refactor: Add named constants for magic numbers (_FFT_EPS, _SSIM_K1, …

a2d1eb9

…_SSIM_K2)

test: Add CPU fallback and dtype preservation tests

a297df5

test: Add subpixel phase correlation tests

21d2d45

refactor: Add _PARABOLIC_EPS constant for subpixel refinement

0e03178

fix: Guard against 1-pixel arrays in _shift_array_torch

f2bd5a1

refactor: Add __all__ export list and document legacy compatibility vars

91c7ea9

test: Refactor tests to use rng fixture and pytest class style

ea5d0f5

chore: Clean up unused imports and variables

99a46a8

maragall closed this Mar 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add GPU-accelerated operations via PyTorch#21

feat: Add GPU-accelerated operations via PyTorch#21
maragall wants to merge 18 commits intoCephla-Lab:mainfrom
maragall:feature/gpu-acceleration

maragall commented Mar 7, 2026

Uh oh!

maragall commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maragall commented Mar 7, 2026

Summary

Benchmark results (S5_reg1, 184 tiles)

Test plan

Uh oh!

maragall commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants