Skip to content

Conversation

@jean-lucas
Copy link
Contributor

  • Allow passing width/height to dataset item
  • Useful for privacy mode, where we cannot calculate width/height on our side

Comment on lines +252 to +259
def check_items_have_dimensions(dataset_items: Sequence[DatasetItem]):
for item in dataset_items:
has_width = getattr(item, "width")
has_height = getattr(item, "height")
if not (has_width and has_height):
raise Exception(
f"When using privacy mode, all items require a width and height. Missing for item: '{item.reference_id}'"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi @gatli your script for privacy data upload will break after this change.

But i believe you can already get the image dimensions from the embedding service anyways?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up 🙂

@jean-lucas jean-lucas requested review from gatli and ntamas92 November 14, 2023 12:38
@jean-lucas jean-lucas changed the title add dims for dataset item add width/height for dataset item Nov 16, 2023
Copy link
Contributor

@gatli gatli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we lru_cache this or ideally just use the information passed for the constructor?

Otherwise LGTM 👍

Comment on lines 46 to 48
DATASET_PRIVACY_MODE_KEY = "use_privacy_mode"
DATASET_SLICES_KEY = "slice_ids"
DATASET_USE_PRIVACY_MODE = "use_privacy_mode"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing double 🧑‍🦲 🧑‍🦲 . Four "use_privacy_mode" constants!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops good catch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's my laugh? I make good joke! 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you go!

Comment on lines 190 to 200
@property
def use_privacy_mode(self) -> bool:
"""Whether or not the dataset was created for privacy mode."""
if self._use_privacy_mode is not None:
return self._use_privacy_mode
response = self._client.make_request(
{}, f"dataset/{self.id}/use_privacy_mode", requests.get
)[DATASET_USE_PRIVACY_MODE]
self._use_privacy_mode = response
return self._use_privacy_mode # type: ignore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little vary about using round-trip checks for every append. I think this will introduce more flakiness in our pipelines given how often we're seeing 503's etc.

Ideally we'd return this for the dataset constructor -> then we always have the data locally.
We should the very least cache this locally since this won't ever change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I changed client.create_dataset such that the properties name, is_scene, and use_privacy_mode are saved and dont require round-trip checks

Comment on lines +252 to +259
def check_items_have_dimensions(dataset_items: Sequence[DatasetItem]):
for item in dataset_items:
has_width = getattr(item, "width")
has_height = getattr(item, "height")
if not (has_width and has_height):
raise Exception(
f"When using privacy mode, all items require a width and height. Missing for item: '{item.reference_id}'"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up 🙂

pyproject.toml Outdated
[tool.poetry]
name = "scale-nucleus"
version = "0.16.7"
version = "0.16.9"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, as I see, version 0.16.8 has not been released so maybe it would make sense to combine these changes into that version instead of creating a new one?

@jean-lucas
Copy link
Contributor Author

Can we lru_cache this or ideally just use the information passed for the constructor?

Opted to using the info passed during client.create_dataset.
Round trips are still necessary when using get_dataset, but we can fix that at a later point.

@jean-lucas jean-lucas merged commit 4aadb82 into master Nov 16, 2023
@jean-lucas jean-lucas deleted the jean-lucas-compute-dims-on-privacy branch November 16, 2023 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants