-
Notifications
You must be signed in to change notification settings - Fork 693
[Feature] [Benchmark]: add ZMQ-based FMQ implementation and benchmark tools #5418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: freeliuzc <lzc842650834@gmail.com>
* [RL] Support Rollout Routing Replay * add routing indices cache * fix config bug and moe forward bug * R3 Support GLM * support eb4.5 * fix merge bug * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * add routing replay ci * support glm topk * support orther top_k * fix ci bug * pre-commit * only support chatcmpl --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com>
…5374) * fix multi-inputs * fix threshold * fix threshold * fix
* [RL] Support Rollout Routing Replay * add routing indices cache * fix config bug and moe forward bug * R3 Support GLM * support eb4.5 * fix merge bug * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * add routing replay ci * support glm topk * support orther top_k * fix ci bug * pre-commit * only support chatcmpl * Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)" This reverts commit c45e064. * Fix XPU and NPU bug --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com>
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a ZeroMQ-based Fast Message Queue (FMQ) implementation for FastDeploy, providing high-performance IPC/TCP messaging with shared-memory optimization for large payloads. It includes a factory pattern for standard queue communication between API server, engine, and worker components, along with comprehensive benchmarking tools.
Key Changes:
- Implemented FMQ core with PUSH/PULL and PUB/SUB patterns, shared-memory fast path for large messages (>1MB), and async message handling
- Added FMQFactory to standardize queue creation for inter-component communication (api↔engine↔worker)
- Created benchmark suite to measure latency, throughput, and bandwidth under various workloads
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 27 comments.
Show a summary per file
| File | Description |
|---|---|
fastdeploy/inter_communicator/fmq.py |
Core FMQ implementation with Queue/Topic classes, shared memory support, and ZeroMQ socket management |
fastdeploy/inter_communicator/fmq_factory.py |
Factory providing standardized queue endpoints for api-engine-worker communication flow |
fastdeploy/envs.py |
Added FMQ_CONFIG_JSON environment variable for runtime configuration |
fastdeploy/utils.py |
Added fmq_logger for FMQ-specific logging |
benchmarks/benchmark_fmq.py |
Multi-process benchmark tool measuring latency, throughput, and bandwidth with progress tracking |
tests/inter_communicator/test_fmq.py |
Unit tests for queue operations, shared memory transfers, and pub/sub functionality |
tests/inter_communicator/test_fmq_factory.py |
Unit tests for factory methods, message roundtrip, and context sharing |
| if msg.descriptor: | ||
| msg.payload = msg.descriptor.read_and_unlink() | ||
|
|
||
| self._msg_id += 1 |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _msg_id is incremented in the consumer's get() method, but this field represents messages sent by the producer. Incrementing it in the consumer is incorrect and serves no purpose. This line should be removed.
| self._msg_id += 1 |
| msg = await q.get() | ||
| recv_ts = time.perf_counter() | ||
| if msg is None: | ||
| pbar.write("recv None") |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out debug code should be removed before merging to keep the codebase clean.
| pbar.write("recv None") |
| class FMQ: | ||
| _instance = None | ||
| _context = None | ||
|
|
||
| def __new__(cls, config_path="fmq_config.json"): | ||
| if cls._instance is None: | ||
| cls._instance = super().__new__(cls) | ||
| EndpointManager.load_config() | ||
|
|
||
| # Determine IO threads based on global defaults | ||
| io_threads = 1 | ||
| if EndpointManager.config.endpoints: | ||
| # Use max io_threads among all endpoints | ||
| io_threads = max(ep.io_threads for ep in EndpointManager.config.endpoints.values()) | ||
|
|
||
| cls._context = zmq.asyncio.Context(io_threads=io_threads) | ||
| return cls._instance |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a singleton pattern with __new__ is not thread-safe. In a multi-threaded environment, multiple threads could create multiple instances. Consider using a thread-safe singleton pattern with a lock, or better yet, document that FMQ should be instantiated once and reused.
| async def get(self, timeout: int = None) -> Optional[Message]: | ||
| # Receive data from queue | ||
| if self.role != Role.CONSUMER: | ||
| raise PermissionError("Only consumers can get messages.") | ||
|
|
||
| try: | ||
| if timeout: | ||
| raw = await asyncio.wait_for(self.socket.recv(), timeout / 1000) | ||
| else: | ||
| raw = await self.socket.recv(copy=self.copy) | ||
| except asyncio.TimeoutError: | ||
| fmq_logger.error(f"Timeout receiving message on {self.name}") | ||
| return None | ||
|
|
||
| msg = Message.deserialize(raw) | ||
| if msg.descriptor: | ||
| msg.payload = msg.descriptor.read_and_unlink() | ||
|
|
||
| self._msg_id += 1 | ||
| return msg |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get() method lacks proper error handling for deserialization failures. If Message.deserialize(raw) fails due to corrupted data, it will raise an uncaught exception. Consider wrapping this in a try-except block and logging the error:
try:
msg = Message.deserialize(raw)
except Exception as e:
fmq_logger.error(f"Failed to deserialize message on {self.name}: {e}")
return None| def read_and_unlink(self) -> bytes: | ||
| # Read and cleanup shared memory | ||
| try: | ||
| shm = shared_memory.SharedMemory(name=self.shm_name) | ||
| data = bytes(shm.buf[: self.size]) | ||
| shm.close() | ||
| shm.unlink() | ||
| return data | ||
| except FileNotFoundError: | ||
| return b"" |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling path in read_and_unlink() that catches FileNotFoundError lacks test coverage. Add a test case that simulates shared memory cleanup failure to ensure this error path is properly tested.
| send_ts = time.perf_counter() | ||
| await q.put(data={"pid": proc_id, "i": i, "send_ts": send_ts, "payload": payload}, shm_threshold=shm_threshold) | ||
| pbar.update(1) | ||
| # pbar.write(f"send {i}") |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out debug code should be removed before merging to keep the codebase clean:
pbar.update(1)| # pbar.write(f"send {i}") | |
| except asyncio.TimeoutError: | ||
| fmq_logger.error(f"Timeout receiving message on {self.name}") | ||
| return None |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout error handling path lacks test coverage. Add a test case that triggers a timeout to ensure the error logging and None return work as expected.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Motivation
This PR introduces a new FMQ (Fast Message Queue) module built on top of ZeroMQ, providing an efficient IPC/TCP messaging layer for FastDeploy. It also includes benchmark utilities for evaluating latency, throughput, and shared-memory performance under different workloads.
Key motivations:
Modifications
1. ZeroMQ-based FMQ Implementation
ForkingPicklerfor robust IPC payload handling.2. Unified Configuration System with Dynamic Endpoint Abstraction
Introduced the
FMQ_CONFIG_JSONenvironment variable as the runtime configuration source.Removed reliance on static configuration files.
Added a unified
Endpointclass that encapsulates:ipc/tcp)path/host:port)io_threadsoverrideUpdated
EndpointManagerto construct connection addresses using this unified abstraction.Supports per-endpoint configuration to allow flexible and dynamic runtime setups.
FMQ_CONFIG_JSON example:
3. FMQ Benchmark Suite
Added micro-benchmarks covering:
Enables users to profile and validate FMQ performance across scenarios.
Usage
Run FMQ Benchmarks
Benchmark Output Example
Below is an example output from the new FMQ benchmark tool:
Example: Queue Usage
Configure FMQ via Environment Variable
Accuracy
This PR does not modify model logic or inference kernels.
No accuracy changes are introduced.
Checklist
pre-commitexecuted