This repository contains the raw scraped data for the SpecsQA dataset used in the DualGraph project.
The dataset consists of scraped HTML product pages from the Samsung UK store, packaged as split tar.xz files in the scraped_data/ directory.
- DualGraph_scraping: Code used to scrape this dataset - https://github.com/SamsungLabs/DualGraph_scraping
- DualGraph: Main project including raw data preprocessing and evaluation code - https://github.com/SamsungLabs/DualGraph