Add TopoSense-Bench: A Semantic-Spatial Sensor Scheduling Benchmark#38
Conversation
| @@ -0,0 +1,23 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Can you follow template to add one install.sh that is needed for our integration? Thanks.
| os.makedirs(output_dir) | ||
|
|
||
| df = pd.DataFrame(results) | ||
| df.to_json( |
There was a problem hiding this comment.
Can you refer to https://github.com/sys-intelligence/system-intelligence-benchmark/tree/main/benchmarks/course_exam_bench#output-files to have different level of result files? summary.json is needed.
xuafeng
left a comment
There was a problem hiding this comment.
Thanks for the contributions. I left some comments and also looped our team members for feedback.
benchmarks/toposense_bench/README.md
Outdated
|
|
||
| ## 📊 Overview | ||
|
|
||
| - **Source**: Hosted on [Hugging Face](https://huggingface.co/datasets/IoT-Brain-Project/TopoSense-Bench) (Seamlessly integrated via the `datasets` library). |
There was a problem hiding this comment.
It said "404". Is it because that the datasets are not public yet?
| { | ||
| "answer": "sensor_name_here", | ||
| "explanation": "Brief reasoning based on map tags" | ||
| } |
There was a problem hiding this comment.
missing the closing ``` fence
| try: | ||
| # Load the 'topology' configuration. | ||
| # Hugging Face defaults uploaded JSONL files to the 'train' split. | ||
| ds = load_dataset("IoT-Brain/TopoSense-Bench", "topology", split="train") |
There was a problem hiding this comment.
"IoT-Brain/TopoSense-Bench"
Not aligned to the README's Hugging Face link
tareknaser
left a comment
There was a problem hiding this comment.
Thank you for the contribution! I left some comments
There was a problem hiding this comment.
Could you add an entry for the benchmark to the root project README?
There was a problem hiding this comment.
This test isn’t running in CI right now. Please add it to .github/workflows/test.yml
There was a problem hiding this comment.
There’s also ongoing work in another PR to add a Why.md file to each benchmark directory. See the discussion: #21 (comment)
|
Thank you @xuafeng, @Qian-Cheng-nju, and @tareknaser for the constructive feedback! I have pushed the latest changes which address all the points raised in the review. Here is a summary of the updates: 1. Output Format & Engineering (@xuafeng)
2. Documentation & CI (@tareknaser)
3. Code Fixes (@Qian-Cheng-nju)
I verified the changes locally with a smoke test script, confirming that the data loading, RAG retrieval, and file generation logic work as expected. Ready for the next round of review! |
|
The new version looks great to me — thank you very much! |
Add TopoSense-Bench: A Semantic-Spatial Sensor Scheduling Benchmark
Add TopoSense-Bench: A Semantic-Spatial Sensor Scheduling Benchmark
Add TopoSense-Bench: A Semantic-Spatial Sensor Scheduling Benchmark
Introduction
This PR integrates TopoSense-Bench, a rigorous benchmark designed to evaluate Large Language Models (LLMs) on the Semantic-Spatial Sensor Scheduling (S³) problem.
It originates from the ACM MobiCom '26 paper: "IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling".
Unlike standard QA tasks, this benchmark requires the LLM to act as an agent that translates high-level user intents (e.g., "Find my backpack lost between the library and the gym") into precise physical sensor node IDs within a large-scale digital twin.
Key Features
datasetslibrary to load data directly from Hugging Face.TopologyManagerthat simulates a retrieval system. It dynamically fetches relevant building/floor topological data based on the user query, testing the model's ability to reason over long contexts and spatial constraints.TopoSenseEvaluatorto robustly parse and match sensor Node IDs (e.g.,teaching_building_1_camera_03) against ground truth.📊 Dataset Statistics
Implementation Details
benchmarks/toposense_bench/).sdk.executor.SimpleExecutorfor LLM calls andsdk.utilsfor configuration management.env.toml(template provided).How to Run
Navigate to the benchmark directory:
cd benchmarks/toposense_benchInstall dependencies:
Configure
env.tomlwith your API Key (e.g.,OPENAI_API_KEY).Run the evaluation script: