"RAM prices are soaring. Games are $70. Your GPU is a supercomputer waiting to be unleashed. Stop consuming pixels—start computing them."
Welcome to CUDA_0. This repository is a collection of high-performance, GPU-accelerated projects designed to push consumer hardware to its absolute mathematical limits.
We are living in an era where "beefy machines" are marketed solely for gaming, while their compute capability remains dormant 90% of the time. This repo is the antidote. We reject the notion of buying hardware just to play; we buy it to crunch.
From simulating relativistic astrophysics to training neural networks from scratch, CUDA_0 is about unlocking the raw TFLOPS sitting under your desk.
Current Status: Live & Optimized
A 4K-capable, CUDA-accelerated simulation of a Schwarzschild black hole. It leverages parallel kernels to compute relativistic physics, gravitational lensing, and Doppler beaming for over 40,000 particles in real-time.
- Tech Stack: Python, Numba, Pygame
- Performance: ~145 FPS @ 1920x1200 (RTX 50-Series)
- Key Feature: Moving from O(n) CPU loops to O(1) parallel GPU execution.
Current Status: Live & Vivid
A massive-scale biological morphogenesis simulation that solves Partial Differential Equations (PDEs) for over 2 million pixels simultaneously. This engine simulates the emergence of complex life-like patterns (mitosis, coral, stripes) from simple chemical rules.
- Tech Stack: Python, Numba, Pygame (Hardware Surface)
- Throughput: ~3.7 BILLION state calculations per second.
- Key Feature: Virtual GPU Camera & Multi-Channel Color Diffusion.
- Iterations:
- V1: Core Engine (Pure Math).
- V2: Vivid Edition (RGB Diffusion).
- V3: Alpha Edition (Variable Intensity & Thickness Control).
This repository will grow into a library of parallel computing experiments. Upcoming modules will focus on:
- Building custom BLAS (Basic Linear Algebra Subprograms) kernels.
- Implementing Matrix Multiplication algorithms (Tiled, Shared Memory) that rival cuBLAS.
- Simulating star clusters or galaxy collisions.
- Moving from
$O(N^2)$ complexity to optimized Barnes-Hut algorithms on the GPU.
- Implementing "Forward Pass" and "Backpropagation" purely in CUDA kernels.
- Building a perceptron from scratch without high-level frameworks like PyTorch.
- Real-time Lattice Boltzmann simulations.
- Visualizing smoke and water flow using GPU grid solvers.
- Writing a path tracer from scratch.
- Utilizing RT Cores for non-gaming applications.
To run most projects in this repo, you will need a standardized "Compute" environment.
The "Golden Standard" Conda Setup:
# 1. Create the environment
conda create -n cuda_0 python=3.10
conda activate cuda_0
# 2. Install the Core Engine (Toolkit & Numba)
conda install cudatoolkit numba numpy scipy
# 3. Install Visualization & System Tools
pip install pygame matplotlib
conda install pywin32 # Windows Hardware AccessGot a "beefy machine" gathering dust?
- Fork this repo.
- Clone your fork.
- Optimize a kernel or add a new simulation.
- Push and create a Pull Request.
Note: We prioritize speed,memory efficiency, and raw math. If it runs on the CPU, it doesn't belong here.
This project is open-source under the MIT License. Use the code. Fork the physics. Burn the GPU.