Skip to content

Conversation

@ihsaan-ullah
Copy link
Collaborator

@ihsaan-ullah ihsaan-ullah commented Oct 25, 2025

Description

This PR has updates and new features listed in the Release 2 of the datasets issue:

  • Croissant metadata is added in a json-ld script so that Google Datasets Search can search for the public datasets.
  • Added a license field to the serializer to fix the issue of the license being None when a new dataset is created.

🚨 NOTE

Dataset version and citation are dummy for now because we don't have these fields added to the dataset model. We have to discuss about dataset version and we have to add the missing fields. I have added this point to Release 3 of the datasets issue.

Issues this PR resolves

A checklist for hand testing

  • Open dataset detail page from datasets public page, open page source and check that a smiliar script as below is there
<script type="application/ld+json">
    {
        "@context": {
            "@language": "en",
            "@vocab": "https://schema.org/",
            "croissant": "https://mlcommons.org/croissant/1.0"
        },
        "conformsTo": "http://mlcommons.org/croissant/1.0",
        "@type": "Dataset",
        "name": "Test Dataset",
        "description": "This is a test dataset for testing croissant dataset meta data",
        "url": "http://localhost/datasets/35/",
        "creator": {
            "@type": "Person",
            "name": "ihsan"
        },
        "datePublished": "2025-10-25T17:33:31.239088Z",
        "license": {
            "@type": "CreativeWork",
            "name": "MIT"
        },
        "citation": "-",
        "version": 1.0
    }
</script>

Checklist

  • Code review by me
  • Hand tested by me
  • I'm proud of my work
  • Code review by reviewer
  • Hand tested by reviewer
  • CircleCi tests are passing
  • Ready to merge

…pt added to dataset detail page to make it searchable by google dataset search
@ihsaan-ullah ihsaan-ullah mentioned this pull request Oct 25, 2025
15 tasks
@ihsaan-ullah ihsaan-ullah changed the title Datasets PR#1 Croissant JSON-LD script added Datasets PR#1 Croissant metadata added Oct 25, 2025
@ihsaan-ullah ihsaan-ullah changed the title Datasets PR#1 Croissant metadata added Datasets PR#2 Croissant metadata added Oct 25, 2025
@ObadaS
Copy link
Collaborator

ObadaS commented Nov 13, 2025

Tested this on a local instance, it seems to be working correctly
image
This is the content of the script I had :

    {
        "@context": {
            "@language": "en",
            "@vocab": "https://schema.org/",
            "croissant": "https://mlcommons.org/croissant/1.0"
        },
        "conformsTo": "http://mlcommons.org/croissant/1.0",
        "@type": "Dataset",
        "name": "Test Dataset",
        "description": "Test Description",
        "url": "http://localhost/datasets/1974/",
        "creator": {
            "@type": "Person",
            "name": "codabench"
        },
        "datePublished": "2025-11-13T08:27:43.549956Z",
        "license": {
            "@type": "CreativeWork",
            "name": "MIT"
        },
        "citation": "-",
        "version": 1.0
    }
    

@ObadaS ObadaS merged commit 54550ad into develop Nov 13, 2025
1 check passed
@ObadaS ObadaS deleted the datasets_updates branch November 13, 2025 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants