Skip to content

Support dataset-cover-image.png upload for datasets metadata --update#969

Merged
jmasukawa merged 2 commits intomainfrom
datasets-image-asbolute-filepath
Apr 15, 2026
Merged

Support dataset-cover-image.png upload for datasets metadata --update#969
jmasukawa merged 2 commits intomainfrom
datasets-image-asbolute-filepath

Conversation

@jmasukawa
Copy link
Copy Markdown
Contributor

@jmasukawa jmasukawa commented Apr 13, 2026

Follow-up to #959

Requiring the dataset cover / thumbnail image to be in the same dir as dataset-metadata.json means that if you create a dataset with kaggle datasets create --dir-mode=(zip|tar), the metadata image will be included in your dataset files, which the author may not want. I had set it up that way because it adhered to Data Package spec, but it's kind of annoying.

This PR adds a special named file, dataset-cover-image.(png|jpg|jpeg|webp) in the same location as your dataset-metadata.json to provide it as the image upload, without including it in your dataset.

/some/path/dataset-metadata.json (not uploaded)
/some/path/my-files/* (uploaded)
/some/path/dataset-cover-image.png (not uploaded)

We still support relative path via "image" property to adhere to Data Package spec. This was the first form of the API for Adaption Labs (and we don't want to break them), but there's also an edge case scenario where you want to re-use an image in your dataset as the cover image.

Local testing:

Note: there are some other random changes that showed up from running ./docker-hatch run lint:fmt

http://b/500108129

@jmasukawa jmasukawa requested review from goeffthomas and rosbo April 13, 2026 23:31
Comment thread src/kaggle/cli.py
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes to this file are from ./docker-hatch run lint:fmt

Comment thread src/kaggle/api/kaggle_api_extended.py Outdated
Comment on lines +1965 to +1968
if os.path.isabs(relative_or_absolute_image_file_path):
image_full_path = relative_or_absolute_image_file_path
else:
image_full_path = os.path.join(metadata_file_path, relative_or_absolute_image_file_path)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main code change in this PR.

We check if the image file path is absolute instead of assuming it is always relative

@rosbo rosbo requested review from stevemessick and removed request for rosbo April 14, 2026 16:32
Copy link
Copy Markdown
Contributor

@stevemessick stevemessick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@goeffthomas
Copy link
Copy Markdown

Requiring the dataset cover / thumbnail image to be in the same dir as dataset-metadata.json means that if you create a dataset with kaggle datasets create --dir-mode=(zip|tar), the metadata image will be included in your dataset files, which the author may not want. I had set it up that way because it adhered to Data Package spec, but it's kind of annoying.

Interesting. Does that mean that the metadata JSON is also baked into the dataset in these cases?

@goeffthomas
Copy link
Copy Markdown

Requiring the dataset cover / thumbnail image to be in the same dir as dataset-metadata.json means that if you create a dataset with kaggle datasets create --dir-mode=(zip|tar), the metadata image will be included in your dataset files, which the author may not want. I had set it up that way because it adhered to Data Package spec, but it's kind of annoying.

Interesting. Does that mean that the metadata JSON is also baked into the dataset in these cases?

Actually, I found the answer. It looks like we exclude it from the upload:

self.DATASET_METADATA_FILE,
. So maybe that's even a different alternative. We could let users specify a dataset-cover-image.(png|jpg...) that's co-located with the metadata JSON, and then if present, we pull that to work our magic. WDYT? Then users don't even have to put it in their metadata JSON, right?

@rosbo
Copy link
Copy Markdown
Contributor

rosbo commented Apr 14, 2026

FYI: When Steve and I discussed this FR before our OOO, we were leaning towards using a co-located file with a specific name for the header image as Goeff suggested above.

@jmasukawa
Copy link
Copy Markdown
Contributor Author

jmasukawa commented Apr 14, 2026

So maybe that's even a different alternative. We could let users specify a dataset-cover-image.(png|jpg...) that's co-located with the metadata JSON, and then if present, we pull that to work our magic. WDYT? Then users don't even have to put it in their metadata JSON, right?

FYI: When Steve and I discussed this FR before our OOO, we were leaning towards using a co-located file with a specific name for the header image as Goeff suggested above.

Ah, that's a smart idea, i wish i thought about that before. 😬 🤦

Given we supported relative path already for Adaption Labs, I'll update the PR to use dataset-cover-image.png and ignore it from dataset creation as the preferred, but i'll still leave support for relative path (because we had shipped a version that supported it).

I'll update docs to document the dataset-cover-image.png approach

@jmasukawa jmasukawa force-pushed the datasets-image-asbolute-filepath branch from b814c9c to ed20d6c Compare April 14, 2026 23:49
@jmasukawa jmasukawa requested review from rosbo and removed request for goeffthomas April 14, 2026 23:50
@jmasukawa
Copy link
Copy Markdown
Contributor Author

jmasukawa commented Apr 14, 2026

So maybe that's even a different alternative. We could let users specify a dataset-cover-image.(png|jpg...) that's co-located with the metadata JSON, and then if present, we pull that to work our magic. WDYT? Then users don't even have to put it in their metadata JSON, right?

FYI: When Steve and I discussed this FR before our OOO, we were leaning towards using a co-located file with a specific name for the header image as Goeff suggested above.

Ah, that's a smart idea, i wish i thought about that before. 😬 🤦

Given we supported relative path already for Adaption Labs, I'll update the PR to use dataset-cover-image.png and ignore it from dataset creation as the preferred, but i'll still leave support for relative path (because we had shipped a version that supported it).

I'll update docs to document the dataset-cover-image.png approach

@rosbo @stevemessick I updated to use this approach, PTAL. Thanks again for the suggestion.

@jmasukawa jmasukawa changed the title Support abs filepath for image upload re: datasets metadata --update Support dataset-cover-image.png upload for datasets metadata --update Apr 14, 2026
@jmasukawa jmasukawa requested a review from stevemessick April 15, 2026 07:18
@jmasukawa jmasukawa merged commit 9c8588b into main Apr 15, 2026
5 checks passed
@jmasukawa jmasukawa deleted the datasets-image-asbolute-filepath branch April 15, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants