Skip to content

feat: support blob api in pytorch loader#3217

Merged
eddyxu merged 2 commits intomainfrom
lei/pytorch_blob_api
Dec 8, 2024
Merged

feat: support blob api in pytorch loader#3217
eddyxu merged 2 commits intomainfrom
lei/pytorch_blob_api

Conversation

@eddyxu
Copy link
Copy Markdown
Member

@eddyxu eddyxu commented Dec 7, 2024

Support handling Blob data in PyTorch loader

@eddyxu eddyxu requested a review from chebbyChefNEQ December 7, 2024 14:33
@github-actions github-actions Bot added enhancement New feature or request python labels Dec 7, 2024
@eddyxu eddyxu force-pushed the lei/pytorch_blob_api branch from d6188fa to 8e4082a Compare December 7, 2024 23:49

self._blob_columns = self._blob_columns()
if self._blob_columns:
self.with_row_id = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Copy Markdown
Member Author

@eddyxu eddyxu Dec 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need row id to call dataset.take_blobs()

Comment thread python/python/lance/torch/data.py Outdated
for col in cols:
arr: pa.Array = batch[col]

if isinstance(arr, list) and arr and isinstance(arr[0], lance.BlobFile):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to check earlier? Like when constructing the loader?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we probably can. One way to do it is passing more parameters, but it makes user-specified to_tensor_fn more complicated.

@eddyxu eddyxu merged commit f1c6c3e into main Dec 8, 2024
@eddyxu eddyxu deleted the lei/pytorch_blob_api branch December 8, 2024 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants