The schema should be expanded to allow the addition of dataset metadata. At minimum, this should include:
- timestamp for dataset production
- version of the BERtron schema that it is compatible with
- data source URL
- version of that data source (if available/applicable)
- contact info for the dataset (email address)
Suggested structure:
meta:
timestamp: 2025-09-17T18:29:01Z
berton_schema_version: 0.12.0
data_source: https://data-source.com
version: 1.2.3 # optional
contact: curators@data-source.com
data:
# data goes here
Alternatively the metadata could be supplied in a separate file but it's probably best to keep everything together.
The schema should be expanded to allow the addition of dataset metadata. At minimum, this should include:
Suggested structure:
Alternatively the metadata could be supplied in a separate file but it's probably best to keep everything together.