-
Notifications
You must be signed in to change notification settings - Fork 11
add width/height for dataset item #409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jean-lucas
commented
Nov 9, 2023
- Allow passing width/height to dataset item
- Useful for privacy mode, where we cannot calculate width/height on our side
| def check_items_have_dimensions(dataset_items: Sequence[DatasetItem]): | ||
| for item in dataset_items: | ||
| has_width = getattr(item, "width") | ||
| has_height = getattr(item, "height") | ||
| if not (has_width and has_height): | ||
| raise Exception( | ||
| f"When using privacy mode, all items require a width and height. Missing for item: '{item.reference_id}'" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @gatli your script for privacy data upload will break after this change.
But i believe you can already get the image dimensions from the embedding service anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up 🙂
gatli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we lru_cache this or ideally just use the information passed for the constructor?
Otherwise LGTM 👍
nucleus/constants.py
Outdated
| DATASET_PRIVACY_MODE_KEY = "use_privacy_mode" | ||
| DATASET_SLICES_KEY = "slice_ids" | ||
| DATASET_USE_PRIVACY_MODE = "use_privacy_mode" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing double 🧑🦲 🧑🦲 . Four "use_privacy_mode" constants!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where's my laugh? I make good joke! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you go!
| @property | ||
| def use_privacy_mode(self) -> bool: | ||
| """Whether or not the dataset was created for privacy mode.""" | ||
| if self._use_privacy_mode is not None: | ||
| return self._use_privacy_mode | ||
| response = self._client.make_request( | ||
| {}, f"dataset/{self.id}/use_privacy_mode", requests.get | ||
| )[DATASET_USE_PRIVACY_MODE] | ||
| self._use_privacy_mode = response | ||
| return self._use_privacy_mode # type: ignore | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little vary about using round-trip checks for every append. I think this will introduce more flakiness in our pipelines given how often we're seeing 503's etc.
Ideally we'd return this for the dataset constructor -> then we always have the data locally.
We should the very least cache this locally since this won't ever change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I changed client.create_dataset such that the properties name, is_scene, and use_privacy_mode are saved and dont require round-trip checks
| def check_items_have_dimensions(dataset_items: Sequence[DatasetItem]): | ||
| for item in dataset_items: | ||
| has_width = getattr(item, "width") | ||
| has_height = getattr(item, "height") | ||
| if not (has_width and has_height): | ||
| raise Exception( | ||
| f"When using privacy mode, all items require a width and height. Missing for item: '{item.reference_id}'" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up 🙂
pyproject.toml
Outdated
| [tool.poetry] | ||
| name = "scale-nucleus" | ||
| version = "0.16.7" | ||
| version = "0.16.9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, as I see, version 0.16.8 has not been released so maybe it would make sense to combine these changes into that version instead of creating a new one?
Opted to using the info passed during |