You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To establish GraphGen as an essential tool for training and evaluation data synthesis, its development roadmap focuses on two core pillars: implementing a robust, multi-dimensional data quality assessment and filtering system to ensure the reliability of generated knowledge graphs, and expanding its architecture to support multi-modal and multi-omics data inputs.
If you'd like to work on one of these tasks, please comment below to claim it and create an issue for the feature you'll be implementing.
🔍 Data provenance: ensure every record in the final training/evaluation set can be traced back to its original raw corpus through the full pipeline.
2 Multi-Modal & Multi-Omics
🧬 Define ImageNode, AudioNode, ProteinNode, etc.
👁️🗨️ Vision–language fusion extraction: use open VLMs to generate "image–caption–entity" triples and write them into the graph: feat: add vqa pipeline #69
🧪 Multi-omics extraction: process genomics/transcriptomics/proteomics with automatic node-property alignment
3 Data Quality & Curation
📊 Multi-dimensional quality metrics with a unified scoring API
⚖️ Automatic data ratio optimization: dynamically adjust the mixing ratio of different data based on quality scores and training feedback to optimize model performance
Backgraound
To establish GraphGen as an essential tool for training and evaluation data synthesis, its development roadmap focuses on two core pillars: implementing a robust, multi-dimensional data quality assessment and filtering system to ensure the reliability of generated knowledge graphs, and expanding its architecture to support multi-modal and multi-omics data inputs.
Features
1 GraphGen Framework
Reader→KG_Builder→Partitioner→Generator. Data flow: Raw corpus → Reader → Splitter → KG_Builder → Partitioner → Generator → training / evaluation data: refactor: Partitioner & Generator #59, Kg builder #58, Refactor KG builder #522 Multi-Modal & Multi-Omics
3 Data Quality & Curation
4 Graph Construction
5 Engineering
6 Community Detection & Data Synthesis
7 UX, Docs & Community
8 Others
Further feature ideas are welcome—feel free to suggest and join the plan!