Skip to content

Vricken/alder2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Design doc — GPU SVO ray-marching renderer + hybrid physics for Bevy 0.16 + Avian 0.3

(Deep, implementation-level plan focused on maximum runtime performance, editability, and mechanical constructs like airships/ropes/trains.)

I’ll give (A) a high-level architecture, (B) concrete data structures and GPU layouts, (C) algorithms and GPU/CPU pipelines (generation, ray traversal, edits, syncing), (D) physics integration and collider pipeline for Avian, (E) performance engineering and tuning recipes, and (F) open choices / tradeoffs at the end.

Where useful I cite primary sources and implementation-relevant references (Bevy/WebGPU, Avian, SVO & voxel-hashing literature, NVIDIA GVDB).

Executive summary (one paragraph)

Use a GPU-resident pointerless SVO + brick pool for rendering static terrain (built and maintained on GPU by compute shaders), and render dynamic, non-grid-aligned objects by ray-marching their local voxel data in object-space. Store a sparse edit overlay (GPU-resident) for player changes, and keep a CPU chunked voxel grid only for physics/collider generation (Avian). Upload only modified bricks/metadata to GPU. Ray-march per-pixel on GPU with stackless traversal for the main SVO, empty-space skipping and coarse distance/mip levels; when a hit voxel has sub-voxel detail, spawn a localized secondary ray into a micro-brick (or procedural micro-SDF) for sub-voxel precision. Mesh-generation for physics uses localized greedy meshing + optional convex decomposition for dynamic objects. Bevy 0.16 has WebGPU + compute shader plumbing and buffer primitives you need; Avian 0.3 provides constraints/colliders and will be your physics backend.

Goals & constraints (explicit)

  • GPU-first rendering: procedural generation, SVO construction, and ray marching in GPU compute/render passes. Limit CPU–GPU transfer to small, incremental uploads (edited bricks, node metadata).
  • Editability: players can add/remove voxels at 1m resolution (and some voxels support sub-voxel content). Edits must appear visually and be reflected in physics quickly.
  • Physics: robust, stable rigid-body + joints + ropes via Avian (CPU), so physics colliders must exist on CPU. This includes dynamic constructs that are not aligned with the world grid.
  • Performance: aim for realtime 60fps on modern desktop GPUs; gracefully degrade on weaker hardware with LOD/temporal tricks.
  • Implementation environment: Bevy 0.16 (WebGPU via wgpu) — compute shaders and dynamic storage buffers are available via bevy render graph / DynamicStorageBuffer / helper crates.

High-level architecture (components)

  1. Renderer (GPU)
    • Pointerless SVO node pool (buffer/texture) + brick pool (3D bricks for leaf data) for static world geometry.
    • Compute shader generator: creates/updates bricks and builds higher-level nodes (reduction).
    • Fullscreen ray-march pass performing a hybrid traversal for static world SVO and dynamic objects.
    • Secondary micro-ray pass for sub-voxel bricks.
    • GPU overlay hash / small-brick map for edits.
  2. Edit subsystem
    • Player edits go into a CPU edit queue (minimal), then are applied on CPU voxel chunk grid for physics and batched for GPU upload to the overlay/brick pool.
  3. Physics (CPU)
    • CPU chunk voxel grid covering an adjustable physics radius around relevant points (players, constructs).
    • Greedy mesher / dual-contouring (configurable) to generate collider meshes; optionally convex decomposition (VHACD or simple heuristic) for complex dynamic constructs. Avian runs on these colliders.
  4. Streaming / paging
    • GPU brick LRU manager and staging upload pipeline; CPU-side world chunk loader for saving/streaming edits.
  5. Persistence
    • Procedural baseline (seed + deterministic rules) + overlay diffs (sparse set of changed bricks/chunks). Save only diffs.

Diagram (text):

Player -> edit -> CPU edit queue |-> update CPU physics voxel chunks -> rebuild collider mesh -> Avian (provides world transforms) |-> batch -> GPU staging -> update GPU overlay bricks / node metadata / dynamic object transforms GPU compute: procedural generator (on-demand) + overlay -> brick pool + node pool GPU render: fullscreen hybrid raymarch -> final pixels Streaming: disk <-> CPU chunk loader <-> GPU staging

Data structures & memory layout (concrete proposals)

Coordinate & addressing conventions

  • Use a 3D world coordinate system with integer 1m voxels at L0 (leaf unit).
  • For SVO, use node depth d where each node covers a 2^d \times 2^d \times 2^d region of L0 voxels. Depth 0 = leaf (1m). Choose max depth to cover world extents (e.g., 32-bit Morton key covers huge extents). Use signed 64-bit world keys if you want negative coordinates. Use Morton (Z-order) for compact locality when needed.

Brick pool (leaf storage)

  • Brick size: B = 8³ voxels recommended (trade memory vs traversal). Alternative: 16³ if you prefer fewer nodes and more memory per brick. I will use 8³ for examples (512 voxels).
  • Per-voxel data layout (compact): store material id (8 bit) + density/occupancy (8 bit) = 2 bytes/voxel. Add an optional 8-bit normal/flags if needed (3 bytes total). You can also store SDF (16-bit) for smoother normals; that inflates memory.
  • Brick size bytes = B_voxels * bytes_per_voxel. For B=8³ and 2 bytes/voxel → 1024 bytes per brick (1 KB). Example pool sizing: 100k bricks ≈ 100 MB; 1M bricks ≈ 1 GB (exact: 1,024,000,000 bytes ≈ 0.9537 GiB). (see numbers).
  • These are practical numbers for modern GPUs; tune brick size & pool size to target VRAM. (estimates computed).

(Estimates: brick 8³, 2 bytes/voxel → 512 * 2 = 1024 bytes.)

Node pool (pointerless octree)

  • Each octree node stores:
    • child_mask (8 bits) — which of 8 children exist.
    • child_ptr (24 bits or 32 bits) — index into node pool or brick index for leaf child (align to 32 bits).
    • flags (8 bits) — e.g. has_subbrick, has_overlay, material hint.
    • morton_prefix or LOD info optionally (32 bits).
  • Pack node into 64 bits (8 bytes) or 128 bits for extra metadata (node AABB LOD, skip pointer). 8–16 bytes per node is a good target. Example: 16M nodes * 8 bytes ≈ 122 MiB. (You can store node pool as a StorageBuffer<u32/u64> on GPU.) Calculation shown earlier.

Indirection & overlay

  • Indirection table / indirection 3D texture: pointerless octree nodes reference bricks via indices. For GPU coherence, store an indirection texture (3D or flat 1D) mapping brick index → GPU address (or slot in brick buffer). This makes binds simple for shaders.
  • Edit overlay: maintain a sparse GPU hash map from world brick coordinate → overlay brick slot. Implemented as:
    • CPU-managed sparse hash (fast) mirrored to GPU each batch, OR
    • GPU-resident linear probing hash table using atomic ops (harder but fully GPU-bound). For simplicity & reliability, use CPU-managed sparse overlay + incremental upload of changed overlay list (see pipeline). For pure-GPU editors you can implement a GPU hash with atomics; literature: voxel hashing by Nießner et al. shows how hashing can be used for realtime voxel access.

Micro-brick (sub-voxel detail)

  • If block flagged as has_microbrick, its node contains microbrick_index into a microbrick pool. Microbrick choices:
    • SDF sampled at 32³ or 16³ inside the 1m cube (stored as signed distance 16-bit → memory heavy but high quality).
    • Alternatively use 8³ microbricks with higher per-voxel precision/material mapping.
    • You can also use procedural micro-detail (noise + local SDF) to avoid storing micro-bricks except when player edits them.

GPU buffers/textures (summary)

  • NodeBuffer (StorageBuffer, array of nodes) — compact node entries for the static world SVO.
  • BrickBuffer (storage image / storage buffer) — contiguous bricks. Holds data for both static terrain and dynamic objects. For cache-friendly reads a 3D texture array (BrickArray) is convenient — it gives trilinear sampling on hardware and filtering for SDFs.
  • DynamicObjectBuffer (StorageBuffer) — array of structs containing world matrix, inverse world matrix, AABB, and pointer to voxel data for each dynamic object. Updated each frame from physics.
  • OverlayIndexBuffer (small) — mapping of changed brick coords → brick slot. Upload batched.
  • MicrobrickBuffer (if used) — microbrick data.
  • StagingBuffer for uploads (bevy dynamic storage buffer support). Use Bevy’s DynamicStorageBuffer to update small regions efficiently.

GPU pipelines & algorithms (concrete)

  1. Procedural generation & SVO build (GPU)

Goal: build leaf bricks deterministically on GPU for arbitrary tree nodes on demand, then build internal nodes with reduction.

Strategy (proven approach)

  • On-demand leaf generation: when a node at depth D (covering a region of workspace) is requested by ray traversal or a streaming request, run a compute shader that fills one or more bricks at the leaf resolution using your procedural generator (noise, erosion, cave rules). This is embarrassingly parallel: one workgroup per brick, threads per voxel. Seeded RNG + coordinate transform produces deterministic content. Laine/Karras describe GPUs building SVOs and ray casts; bottom-up building is common.
  • Reduction for internal nodes: after leaf bricks are available for all 8 children of a parent, launch a small compute reduction that computes child_mask and optionally a coarse distance/occupancy field for the parent (like a 2×2×2 downsample). Build up to the root in log(depth) passes for any set of newly created leaves — this is a parallel prefix/reduction per parent. Avoid rebuilding the whole tree; only rebuild parent nodes affected by new leaf bricks.
  • Streaming/build scheduling: maintain a GPU work-queue (ring buffer) with brick generation requests; the renderer/compute will poll this queue and dispatch generators on-demand (ray march may request more bricks at traversal time, but prefer prefetch based on camera motion).

Implementation notes

  • Brick generator compute: input params (world coordinate, brick index, seed, generation options). Output to BrickBuffer slot allocated from free-list.
  • Node builder compute: takes 8 child masks → writes parent node entry (atomic compare-and-swap or single-writer scheduling).
  • Avoid fine-grained atomics on hot paths — prefer deterministic worker allocation from a single writer or per-frame batched updates.
  1. Ray-march traversal (primary)

Key: stackless, pointerless traversal for the world SVO, combined with object-space marching for dynamic, non-grid-aligned constructs.

Algorithm (hybrid world/object-space traversal)

  • For each ray (one shader thread per pixel or small-warp group):
    1. Compute ray origin & dir in world space. Initialize closest_hit to max distance.
    2. Dynamic Objects: Iterate through active dynamic objects (e.g., airships) using the DynamicObjectBuffer. For each object: a. Test the world-space ray against the object's world-space AABB (provided by physics). b. If it hits, transform the ray into the object's local space using its inverse world matrix. c. Perform a fast ray-march against the object's local voxel data (which can be a simple grid or small SVO stored in the main BrickBuffer). d. If a hit is found in object space, transform the hit point and normal back to world space. Update closest_hit if this hit is nearer.
    3. Static World SVO: March the main world SVO, but only up to the distance of closest_hit. a. Start at root node. Use AABB intersection to find t_enter, t_exit. b. Recurse down levels by computing which child the ray enters first. Instead of recursion, compute child index and update current node index = child_ptr + child_index if child exists. c. If node is absent or indicates empty space (child_mask == 0), skip the whole node by advancing t = t_exit of that node — empty-space skipping. d. If node is a leaf/brick pointer, sample the brick. If an occupied voxel is found, produce a hit. If this hit is nearer than closest_hit, update closest_hit.
    4. The final hit data is in closest_hit. If a hit occurred, proceed to shading.
    5. If the final hit voxel has has_microdetail, spawn a secondary localized ray (see below).
    6. Traversal uses efficient DDA-style stepping between child cells within the SVO.

Memory & cache optimizations

  • Store NodeBuffer as structured StorageBuffer (u32/u64 packs) for cache line efficiency.
  • BrickBuffer as 3D texture array — hardware cache handles brick texels; sampling SDFs benefits from trilinear filtering if used.
  • Use per-node coarse SDF / occupancy mip enabling wider skip distances (like GVDB techniques). GVDB conceptually uses higher-level nodes with distance/opacity to skip large empty volumes efficiently.

Secondary rays for micro-detail

  • When you detect a hit at the 1m voxel level and node flag has_microbrick is true:
    • Transform hit position into the local 0..1 microbrick coordinate system.
    • Launch a secondary short ray march within the microbrick (either in the same shader invocation or schedule a second pass). Because these micro-bricks are local and small, their ray marching is fast and stable.
    • For performance: only do micro-rays for near pixels (distance threshold), or if microbrick contributes to silhouette (sample normal variance test). Limit count per frame with a reservoir or screen-space temporal scheduling.
  1. Shading & GI
  • Do primary shading in-screen-space after ray-hit with:
    • direct light via shadow rays (fast: single-sample occlusion using SVO mip for shadow skip),
    • ambient occlusion via cone tracing / few-sample SVO queries (Crassin et al.), or precomputed voxel-based indirect (mip cone).
  • Use temporal accumulation and reprojection (TAA-like) to reduce per-frame ray work and denoise.
  1. Edit pipeline (player edits)

Edits must update both CPU physics and GPU renderer with minimal transfer.

Flow (per edit):

  1. Player action (add/remove block) recorded in a small CPU edit buffer keyed by voxel coordinate.
  2. Apply to CPU physics chunk grid: mark chunk dirty; schedule greedy-mesh comparator for that chunk (run in background thread pool via Bevy tasks). This updates Avian colliders (see physics section).
  3. Batched GPU update: group edits into brick-level updates. For each affected brick:
    • If brick exists in GPU overlay: update its brick slot (upload new microbrick or brick contents) via staging buffer write (queue.write_buffer or mapped staging buffer).
    • If brick not present: allocate a brick slot from GPU brick pool free list and upload.
  4. Node updates: if edits change occupancy of bricks, update parent node masks via a small node-builder compute pass (or have CPU write a small list of node updates to the GPU node-builder). Keep the node updates batched to avoid atomics in hot loops.

Practical choice: CPU-managed overlay + incremental upload

  • Maintain the authoritative edit map on CPU (sparse map keyed by brick coordinate) — this is required anyway for physics colliders. Periodically flush a compacted list of changed bricks to GPU each frame or at fixed frequency. This avoids per-edit GPU atomics and is robust cross-platform. (If you insist everything must be GPU-only, build an atomic GPU hash table, but that complexity is high.)
  1. GPU streaming & memory management
  • Maintain a GPU brick LRU with a free list; when camera requests new bricks, schedule generation and/or load from disk overlay. Eviction strategy: least recently used bricks (by last-frame timestamp) and keep bricks inside high-priority region (around camera + physics radius). Generation priority: bricks required for ray traversal path first, then prefetch ahead based on movement vector.
  • Use asynchronous staging: allocate staging buffer on CPU, write modified brick bytes, submit copy to GPU via queue.write_buffer or mapped buffer and let GPU handle copy. Use Bevy’s DynamicStorageBuffer for dynamic offsets and small updates.

Physics & Avian integration (detailed)

Why CPU colliders are required

Avian (like most physics engines) expects CPU-side colliders/meshes and runs its solver on CPU. Avian supports constraints, joints and mesh colliders; use Avian for all rigid-body simulation and constraints (hinges, ropes via distance constraints/XPBD).

Maintain a CPU chunk grid (for physics)

  • Partition the world into physics chunks of size C = 16³ or 32³ voxels (tune). Each physics chunk contains an array of voxels (1 bit occupancy + material id). This is the authoritative data for collider generation.
  • Keep physics chunks in memory for a physics radius around players/active constructs (e.g., 2–4 chunks radius). Outside that radius, you can free chunks and rely on GPU SVO for visuals.

Collider mesh generation pipeline

  • For each dirty chunk (after edits):
    1. Run greedy meshing to create an optimized triangle mesh of the surface (fast for axis-aligned voxels). Greedy meshing reduces triangles drastically versus naive per-voxel faces.
    2. Optionally run mesh simplification or convex decomposition depending on intended use:
      • For static terrain colliders: leave as watertight mesh (static collider).
      • For dynamic large constructs (ships), produce a compound collider of convex parts: either simple primitive heuristics (box merging) or run VHACD offline/periodically for high quality convex decomposition.
    3. Feed the mesh into Avian as a collider (Avian supports colliders from meshes/scene geometries). The Avian API makes mesh creation ergonomic in Bevy.
  • Keep collision mesh LODs: for far-away constructs, use simplified colliders; for active constructs / objects in the physics island, keep full collision detail.

Dynamic constructs (airships, trains, spinners)

  • Model constructs as compound rigid bodies: group contiguous blocks that move as one rigid body if they are intended to be welded together. Use Avian constraints (hinge, revolute) for actuated joints.
  • For performance: reduce solver complexity by merging small attached blocks into fewer compound colliders (coarse convex hulls) whenever possible. Avoid thousands of tiny rigid bodies — use hierarchical grouping and breakage logic.

Ropes & chains

  • Simulate ropes via XPBD chains or distance constraints. Avian provides constraints and XPBD support — implement rope as chain of small rigid bodies with distance constraints and reduce resolution for long ropes (LOD for rope simulation). For long ropes, simulate anchored segments with simplified constraints, then visually render detailed rope via spline interpolation.

Collision queries from renderer

  • When the renderer needs to know whether a region should display micro-detail (e.g., player edits visible), use the CPU edit map or spatial queries against physics voxel grid. Avoid round-trip GPU→CPU reads.

Performance engineering — key knobs & concrete optimizations

  1. Minimize per-pixel ray steps
  • Aggressively use node-level empty skipping and coarse occupancy / distance fields at internal nodes. A well-built SVO lets you skip thousands of meters of empty space with one test. Laine/GVDB techniques are directly applicable.
  • Use shadow/occlusion caches (voxel mips) for cheap soft shadows and AO.
  1. Reduce divergence & improve memory locality
  • Dispatch ray marching in tiles (e.g., 16×16) and use ray sorting or binning by octree region to improve coherence. Process rays that traverse the same top-level node together.
  • Pack node & brick indices to improve cache lines; align buffers to 128B/256B for GPU cache.
  1. Temporal & spatial reuse
  • Use temporal reprojection (TAA-like) + reservoir sampling so most pixels only fully ray-march infrequently; fill gaps with reprojection and denoising.
  • Use screen-space upscaling (FSR/AMD CAS) or render at half-res then upsample with edge-aware filters for distant camera shots.
  1. Limit micro-ray usage
  • Only allow micro-brick refinement inside a short world distance (e.g., < 30m) or when normal variance indicates silhouette detail. Keep a per-frame budget of micro-ray invocations; do remainder via temporal accumulation.
  1. GPU/CPU sync minimization
  • Batch edits and uploads; avoid per-edit uploads. Send compact “changed brick list” each frame instead of many small writes. Use staging buffers for large uploads. Use Bevy systems to collect changes and submit single combined update. Use DynamicStorageBuffer for efficient dynamic offsets.
  1. Physics performance
  • Keep physics islands small and coarse: prefer compound colliders & convex shapes for moving constructs; only full triangle-mesh collision for static terrain.
  • Use sleeping and broadphase culling aggressively (Avian supports sleeping and spatial queries).
  1. Memory budgets (example)
  • Brick: 8³, 2 bytes/voxel = 1 KB per brick. 200k bricks ≈ 200 MB VRAM (good first target).
  • Node pool: 16M nodes @ 8 bytes ≈ 122 MiB. Balance these against GPU memory target (e.g., allocate 2.5–4 GB for SVO on high-end rigs; smaller pools for mid-range). Use streaming to scale.

Implementation plan & Bevy integration (practical steps / milestones)

Phase 0 — prototypes (short)

  1. Bevy compute plumbing prototype: run a small compute shader that generates a single brick and writes to a storage buffer / texture; render it by sampling the texture in a fullscreen shader. Use bevy_app_compute or bevy render graph to dispatch compute stage.
  2. Simple ray-march over a procedural brick: implement a fullscreen raymarch that samples bricks deterministically (no SVO, single large volume). Confirm performance.

Phase 1 — SVO core 3. Implement brick pool & NodeBuffer (GPU storage). Design node & brick packing. 4. Implement on-demand brick generator compute and a minimal node builder (reduction) pass (GPU). Use CPU to enqueue leaf requests. 5. Implement stackless ray traversal using NodeBuffer + BrickBuffer. Start with small world extents.

Phase 2 — edits & physics integration 6. Implement CPU physics chunk grid and greedy mesher + Avian collider upload. Test collision with simple dynamic constructs. 7. Implement edit pipeline: player edits update CPU chunk & enqueue brick uploads to GPU overlay. Implement staging buffer & batched upload to BrickBuffer. Ensure node updates follow.

Phase 3 — sub-voxel & streaming 8. Add microbrick pool and secondary-ray logic for micro-detail; add LOD policies to limit micro-ray usage. 9. Implement brick LRU & streaming prefetch based on camera motion. 10. Add GI enhancements (voxel cone / SVO mips), temporal reprojection, and denoising.

Phase 4 — polish & tools 11. Add tools: debug views (node visualization, brick occupancy heatmap, overlay map), streamer metrics, inspector for physics islands. 12. Profiling & platform tuning.

Example pseudocode: Node & brick structs (Rust-like)

// Node (64 bits)
#[repr(C)]
struct Node {
    child_mask: u8,
    flags: u8,
    reserved: u8,
    level: u8,         // node depth (0..255)
    child_ptr: u32,    // index into node array or brick slot
    // total 8 bytes (with alignment)
}

// Brick storage: consecutive bytes for each brick slot.
// brick_slot = brick_index * BRICK_BYTES
// brick bytes interpreted on GPU as material+density per voxel

On GPU, NodeBuffer is storage_buffer, BrickBuffer is storage_texture_3d_array or storage_buffer with computed offsets.

Important citations (relevant references)

  • Bevy 0.16 release & WebGPU compute usage — Bevy 0.16 has new rendering pieces and compute shader capability; use Bevy render graph / DynamicStorageBuffer / helper crates to dispatch compute shaders & buffer updates.
  • Avian 0.3 (Bevy physics) — Avian provides rigid/kinematic bodies, constraints, raycasts and integrates naturally with Bevy; use Avian for joints/ropes.
  • Sparse Voxel Octrees — canonical SVO techniques and GPU build/traversal work (Laine & Karras). Use these algorithms as a foundation for traversal + node packing.
  • GVDB / GPU voxel DB — NVidia GVDB shows practical GPU data structures & skip-pointer ideas for efficient voxel traversal and raycasting on GPU. Good design inspiration.
  • Voxel hashing — Nießner et al. for sparse/hashed voxel storage & on-the-fly insertion (useful if you need GPU hash-based overlays).

Open choices & tradeoffs (what to decide later)

I’ll list the remaining decisions that aren’t strictly “one true best” — these are important to pick early because they influence data layout and performance.

  1. Brick size (8³ vs 16³ vs 32³):
    • 8³ → fine-grained updates, smaller wasted memory, more bricks (higher node counts).
    • 16³ → fewer bricks, better ray sample locality, but larger upload granularity on edits.
    • Pick based on expected edit density and GPU cache characteristics.
  2. Per-voxel representation (occupancy vs SDF vs density):
    • Occupancy+material (2 bytes) is smallest and best for blocky worlds.
    • SDF (16-bit) enables smooth normals and better secondary-ray hits but multiplies memory.
    • Hybrid: occupancy at leaf + optional SDF microbrick for flagged voxels.
  3. Edit overlay implementation (CPU-managed mirrored map vs GPU atomic hash):
    • CPU-managed overlay is simpler and easier to sync with physics but has small CPU→GPU transfer cost (batched).
    • Full GPU overlay (atomic hash) avoids CPU writes but is significantly more complex and harder to debug.
  4. Physics collider strategy (mesh vs convex decomposition vs primitives):
    • Static terrain: keep triangle meshes from greedy meshing.
    • Moving constructs: use compound convex hulls for best solver performance.
    • Decide whether to run VHACD online or to use heuristic box merging (runtime cost vs quality tradeoff).
  5. Micro-detail approach (stored microbricks vs procedural micro-SDFs):
    • Stored microbricks give editable, precise detail at the cost of memory.
    • Procedural micro-detail saves memory but makes editability harder.
  6. Raymarch style (full path/lighting vs tiled G-buffer + deferred shading):
    • Full ray-march gives best visual integration but is heaviest.
    • Hybrid: G-buffer at lower cost + screen-space tracing for GI is faster; but more complex to integrate.
  7. How aggressively to push GPU-only approach:
    • The safe pragmatic route is the hybrid: GPU for visuals, CPU for physics+edits (batched).
    • If you truly must avoid CPU involvement, prepare to implement a GPU physics layer — but this is a major engineering project.

Final prioritized checklist (what to build and test first)

  1. Bevy compute + texture/SSBO plumbing prototype (confirm you can dispatch generates and sample results).
  2. Small brick generator + single-brick raymarch (visual correctness).
  3. NodeBuffer & stackless traversal prototype (single root, a handful of bricks). Use deterministic brick generator.
  4. CPU physics chunk grid + greedy mesher + Avian colliders (make sure physics play is robust).
  5. Edit pipeline: batched CPU edit -> staged upload -> GPU overlay (sync visual & physics).
  6. Performance work: node skipping, temporal reprojection, microbricks budget.
  7. Streaming & LRU.

Short list of immediate risks & mitigation

  • GPU memory blowout: mitigate by choosing smaller brick size, limiting pool size, and aggressive streaming. Provide fallback “simpler renderer” (lower quality) when memory tight.
  • Physics desync / lag: avoid sending per-edit blocking updates to CPU; batch and only rebuild colliders within physics radius. Use placeholder colliders for fast response.
  • Shader divergence and slow raymarch: mitigate with tile-based processing + temporal reuse + per-pixel step budget.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published