Multiverse 2.0

Multiverse 2.0 is a next-generation, GPU-accelerated AI/LLM training simulator designed for high-performance and high-fidelity large-scale AI/LLM system research. In this version, both the system and network layers are fully GPU-accelerated while retaining CPU-only execution support for compatibility and testing. All components have been re-architected using Data-Oriented Design (DOD) principles to maximize simulation efficiency.

Note: In Multiverse 2.0, the system and network layers run within the same process, eliminating the need for shared memory communication between them.

What's New in 2.0

Full GPU Acceleration: Both the system simulator and network simulator leverage GPUs for significant speedup. CPU-only execution remains available for compatibility and testing.
Unified Data-Oriented Design: The entire simulator is redesigned with DOD principles to optimize data locality and throughput.
Simplified Architecture: The system and network layers now operate in the same process, removing the need for shared memory communication.
Flexible Input Pipeline: Simulation scenarios are defined using Chakra files, further optimized with AIOB for efficient parsing and preprocessing.

Key Features

System Simulator: Emulates large-scale LLM training systems, supporting parallel strategies (TP, DP), collective communication algorithms (e.g., Ring AllReduce), and advanced congestion control (DCQCN). Efficiently runs on both CPU and GPU.
Network Simulator: Models advanced network topologies (Fattree, BCube, HPN) and high-performance collective communication, with full GPU acceleration and DOD.
Flexible Configuration: Input via Chakra files, with biob optimization for rapid scenario deployment and reproducibility.

Architecture Overview

Build Multiverse

Build with Docker (TBD)

Build Manually

Ensure you have the dependencies listed in Dependencies.

Fetch the repository (don't forget the --recursive!):

git  clone  --recursive  https://github.com/harnets/multiverse.git

cd  multiverse

For Linux and macOS: build the simulator with CMake and Make:

mkdir  build

cd  build

cmake  ..

make  -j  # Use the number of cores you prefer

cd  ..

Set up the Python components of the repository with pip:

pip  install  -e  .

Run Multiverse

After building, you can run a built-in example to verify your installation:

cd  your_directory/multiverse/

bash  run.sh

Dependencies

Supported Platforms

Linux: Ubuntu 18.04 or newer
Other distributions with equivalent or newer kernel / GLIBC versions are also supported

General Dependencies

CMake 3.24 (or newer)
Python 3.9 (or newer)

GPU-Backend Dependencies

Volta or newer NVIDIA GPU
CUDA 12.5 (plus appropriate NVIDIA drivers).
Linux Only: CUDA on Windows lacks certain unified memory features required by Madrona.

If these dependencies are not present, Madrona's GPU backend will be disabled, but you can still use the CPU backend.

Future Work


Strategy/Algorithm	TP✔️	DP✔️	PP	EP✔️
Collective Communication	Ring allreduce✔️	Halving doubling allreduce	Binary tree allreduce	/
Topology	Fattree✔️	HPN	Rail-optimized	Bcube
Congestion Control	DCQCN✔️	HPCC	Timely	Poseidon
Scale up network DES	PCIe	NVLINK	TTPoE	UALink

✔️ indicates that the corresponding feature is ready in Multiverse 2.0. Others are planned for future releases.

Contributing

Contributions are welcome! Feel free to report issues or submit pull requests. Please follow our contribution guidelines.

Contact Us

multiverse@harnets.ai

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
docs/figs		docs/figs
external		external
scripts		scripts
src		src
train_src/madrona_escape_room_learn		train_src/madrona_escape_room_learn
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multiverse 2.0

What's New in 2.0

Table of Contents

Key Features

Architecture Overview

Build Multiverse

Build with Docker (TBD)

Build Manually

Run Multiverse

Dependencies

Supported Platforms

General Dependencies

GPU-Backend Dependencies

Future Work

Contributing

Contact Us

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

harnets/multiverse

Folders and files

Latest commit

History

Repository files navigation

Multiverse 2.0

What's New in 2.0

Table of Contents

Key Features

Architecture Overview

Build Multiverse

Build with Docker (TBD)

Build Manually

Run Multiverse

Dependencies

Supported Platforms

General Dependencies

GPU-Backend Dependencies

Future Work

Contributing

Contact Us

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages