ORCS stands for "Optimized Ray-tracing Core Simulation" for Fixed-Radius Nearest Neighbors (FRNN) interactions in 3D space. This repository contains the source code and data sets that we have made available to the community as part of our published research "Advancing RT Core-Accelerated Fixed-Radius Nearest Neighbor Search"
ArXiv article "Advancing RT Core-Accelerated Fixed-Radius Nearest Neighbor Search"
https://arxiv.org/abs/2601.15633
- CMake ≥ 3.20
- A C++17-capable host compiler and NVCC (CUDA Toolkit 11.x or newer recommended)
- NVIDIA GPU with RTX hardware support and a recent driver exposing
libnvidia-ml.so - NVIDIA OptiX 7 SDK (
OPTIX_HOMEshould point to the SDK root) - OpenMP-capable host toolchain (for CPU solver parallelism)
Optional but recommended for plotting data:
- Python 3 with
matplotlib/pandasfor plotting utilities inplots/ clang-formatfor maintaining code style (clang-formatconfiguration included)
mkdir -p build
cd build
cmake .. -DCMAKE_BUILD_TYPE=<Release|Debug> -DOPTIX_HOME=/opt/optix-sdk
make -jNotes:
OPTIX_HOMEmust reference the OptiX 7 SDK installation root so that headers and libraries can be located.- The build system auto-detects the local GPU SM architecture via
nvidia-smi; override with-DCMAKE_CUDA_ARCHITECTURES=86(or similar) for reproducible deployments. - NVML is required at link time. Ensure your NVIDIA driver install provides
libnvidia-ml.so(usually in/usr/lib/x86_64-linux-gnu/).
All options can be toggled with -D<OPTION>=ON/OFF during configuration.
| Option | Default | Description |
|---|---|---|
USE_DOUBLE_PRECISION |
ON | Switches real typedefs to double. Disable for faster single-precision runs. |
USE_PERIODIC_BOUNDARY |
OFF | Compiles with periodic boundary conditions as the default. Runtime flag --border_type can still override per run. |
LOGRTXTIME |
OFF | Prints raw iteration timings from the RTX solver instead of the formatted summary. |
MEASURE_POWER |
OFF | Enables NVML/CPU power monitors. Requires NVML support and sufficient privileges. |
LOG_INTERACTIONS |
OFF | Persists per-iteration interaction counts to interaction_stats.csv. |
DELTA_TIME |
0.001 |
Compile-time constant injected into solvers (useful for physics tuning). |
CPU_NATIVE |
OFF | Adds -march=native (or equivalent) to host compilation for maximum CPU performance. |
RTXNEIGHBORS_OPTIX_ARCH |
empty | Overrides the OptiX PTX -arch (e.g. compute_90). Useful when targeting different GPUs than the build host. |
Re-configure with new flags to rebuild, e.g.:
cmake .. -DCMAKE_BUILD_TYPE=<Release|Debug> -DOPTIX_HOME=/opt/optix-sdk -DCPU_NATIVE=ON -DUSE_DOUBLE_PRECISION=OFF
make -jFrom the build directory:
./ORCS <method> <n_particles> <max_neighbors> [options]| ID | Solver | Description |
|---|---|---|
0 |
CPU | Baseline CPU neighbor search with OpenMP parallelism. |
1 |
GRID | Uniform grid acceleration structure on the GPU. |
2 |
RTX | OptiX-based neighbor search. |
3 |
RTX Physics | RTX solver with physics computed in OptiX shaders. |
4 |
RTX Payload | Variant using OptiX payload for neighbor data transport. |
5 |
NaiveGPU | Reference CUDA implementation (no spatial acceleration). |
6 |
GridV2 | Experimental grid-based GPU solver. |
| Flag | Default | Purpose |
|---|---|---|
-c, --cutoff_radius <factor> |
2.5 |
Multiplier applied to particle radii when searching neighbors. |
-p, --positions <g|u|n> |
g |
Particle placement: grid, uniform random, or normal mixture. |
--pp1 <value> |
1.0 |
Extra parameter for position generator (stddev for normal, jitter scale otherwise). |
--pp2 <value> |
1 |
Additional shape parameter (e.g. number of normal "bells"). |
-r, --radius_distribution <u|n|l> |
u |
Radius distribution (uniform, normal, lognormal). |
--rmin/--rmax <value> |
100 |
Minimum and maximum particle radii before applying the cutoff factor. |
--pr1/--pr2 <value> |
1.0 |
Distribution-specific radius parameters (mean/stddev). |
-i, --iterations <count> |
10 |
Number of benchmark iterations to execute. |
-s, --seed <int> |
0 |
RNG seed for reproducible particle sets. |
-o, --output_file <path> |
empty | Writes final positions/velocities to the specified file. |
-v, --verbose <0|1|2> |
0 |
Runtime logging level (requires DEBUG build for detailed prints). |
-m, --use_morton |
off | Enables Morton ordering before neighbor search. |
-x, --bvh_rebuild_scheme <0-4> |
0 |
Selects BVH rebuild strategy (see below). |
-w, --window <int> |
10 |
Sliding window size for rebuild heuristics. |
--border_type <0|1> |
0 |
0 reflective walls, 1 periodic domain. |
-t, --nthreads <int> |
1 |
OpenMP thread count for CPU solver. |
-d, --dev <id> |
0 |
CUDA device index to target. |
BVH rebuild schemes (-x): 0=FIXED, 1=BASIC, 2=TOTAL_AVG, 3=LAST_AVG, 4=DERIVATIVE (mapped to rtxneighbors::OptimizerType).
./ORCS 2 500000 128 -c 3.0 -p u --rmin 50 --rmax 120 -i 20 -m -x 2 --border_type 1 -d 0Runs the RTX solver on 500k particles with a uniform distribution, periodic boundaries, and the running-average BVH rebuild heuristic.
src/core application, solvers, CUDA/OptiX kernels, and benchmarking harness.include/third-party single-header dependencies (e.g.,CLI11.hpp).CMake/custom find modules (FindOptiX7.cmake, OptiX IR utilities).optixir/generated OptiX IR/ptx blobs (populated at build time).plots/,results/,scripts/supporting analysis and automation.build/CMake build tree (ignored by version control; shown here for reference).
ORCS is distributed under the terms of the repository's LICENSE file.