Conversation
e4268ce to
7fcf23e
Compare
There was a problem hiding this comment.
Actually for_wsi_classification should also accept a list of pretransforms so we can just add pre_transforms in the configs
There was a problem hiding this comment.
That should be generalized the same way the loss factory is
…an read features correctly
…d is set in the writers
…oss. In the future maybe this should be done with an argument from the configs. Now it works with whatever is available
…uld return that for features
merge main branch that includes classification models
|
Next to the available test I've also confirmed that I can extract features with this model and then use them in a classification pipeline as expected. |
jonasteuwen
left a comment
There was a problem hiding this comment.
Some comments to begin with.
I think we could use two functions:
read_region_at_scale and read_region_at_level. The latter can be used for instance for features, and the first for images/masks. The first can use the SlideImage backend
| elif data_format == DataFormat.FEATURE: | ||
| size = (num_samples, 1) | ||
| mpp = 1.0 | ||
| tile_size = (1, 1) | ||
| tile_overlap = (0, 0) |
There was a problem hiding this comment.
I would make the mpp equal to the factor of the tile size and mpp.
There was a problem hiding this comment.
I do not fully understand what you mean here, but these settings are mainly here so that the reader gets them correctly. I can change the mpp as long as the reader also reads at that specific mpp. 1.0 was chosen now for ease
| data_format=DataFormat.COMPRESSED_IMAGE if compression != "none" else DataFormat.IMAGE, | ||
| color_profile=color_profile, |
There was a problem hiding this comment.
Doesn't 'none' need to be an Enum?
There was a problem hiding this comment.
Fixed, I've looked at it and here the dataformat should always be Image, because the compression here does not make the image a compressed image.
| if self._stitching_mode != StitchingMode.CROP: | ||
| raise NotImplementedError("Stitching mode other than CROP is not supported for features.") | ||
|
|
There was a problem hiding this comment.
You also need to check that the overlap is 0, or at least need to think about what this means for the features, and explain it somewhere.
There was a problem hiding this comment.
Added check. ATM I don't see how features with overlap should work so we do not allow it.
|
|
||
| def read_region(self, location: tuple[int, int], level: int, size: tuple[int, int]) -> pyvips.Image: | ||
| """ | ||
| Reads a region in the stored h5 file. This function stitches the regions as saved in the cache file. Doing this | ||
| it takes into account: |
There was a problem hiding this comment.
Mention somewhere the difference between this function and the feature function (e.g. interpolation etc)
There was a problem hiding this comment.
Added comments to the function
There was a problem hiding this comment.
That should be generalized the same way the loss factory is
|
|
||
| @classmethod | ||
| def for_wsi_classification( | ||
| cls, data_description: DataDescription, requires_target: bool = True | ||
| ) -> PreTransformTaskFactory: | ||
| transforms: list[PreTransformCallable] = [] | ||
|
|
||
| transforms.append(SampleNFeatures(n=1000)) | ||
|
|
||
| if not requires_target: | ||
| return cls(transforms) | ||
| return cls(transforms, data_description, requires_target) | ||
|
|
||
| index_map = data_description.index_map | ||
| if index_map is None: | ||
| raise ConfigurationError("`index_map` is required for classification models when the target is required.") | ||
|
|
||
| label_keys = data_description.label_keys |
There was a problem hiding this comment.
Rewrite this in a more factory design, check how this is done with losses and metrics
There was a problem hiding this comment.
I've improved the function and added docstring, but I am not sure how I would improve on the factory design more.
| raise ValueError(f"Expected features to have a width dimension of 1, got {h}.") | ||
|
|
||
| n_random_indices = ( | ||
| np.random.choice(w, self.n, replace=False) if w > self.n else np.random.choice(w, self.n, replace=True) | ||
| ) | ||
|
|
||
| # Extract the selected columns (indices) from the image | ||
| # Create a new image from the selected indices | ||
| # todo: this can probably be done without a for-loop quicker | ||
| selected_columns = [features.crop(idx, 0, 1, h) for idx in n_random_indices] |
There was a problem hiding this comment.
you need to seed on something to make sure that each run gives reproducible values. Maybe the path is a good seed?
There was a problem hiding this comment.
I think the seed_everything function in the entrypoints should handle this right?
| class SelectSpecificLabels: | ||
| def __init__(self, keys: list[str] | str): | ||
| if isinstance(keys, str): | ||
| keys = [keys] | ||
| self._keys = keys | ||
|
|
| using_features = False | ||
|
|
||
| if tile.bands > 4: | ||
| # assuming that more than four bands/channels means that we are handling features | ||
| using_features = True | ||
| tile_ = tile |
There was a problem hiding this comment.
I don't think we should use pyvips when we have features. In that case torch.Tensor of numpy arrays should be what we are using.
There was a problem hiding this comment.
A dlup dataset will always return a pyvips image and it does work as currently implemented, but indeed we could think about if we want to change that.
|
Can you add a full example on how you run the extraction? Adding it here is fine, afterwards we can see how to add it to documentation. |
The changes in this PR make it possible to load features (h5 or Zarr) from a database.
By changing the backend of the TiledWsiDataset the features can be loaded in a similar way we do with images. At the moment I'm assuming the classification task only, but it should be extendable to segmentation tasks as well. (In segmentation tasks you might want to have multiple "tiles" of features per WSI, in classification you would only want one set of features per WSI).
LMK what you think