Load features for classification by moerlemans · Pull Request #104 · NKI-AI/ahcore

moerlemans · 2024-08-08T12:18:29Z

The changes in this PR make it possible to load features (h5 or Zarr) from a database.

By changing the backend of the TiledWsiDataset the features can be loaded in a similar way we do with images. At the moment I'm assuming the classification task only, but it should be extendable to segmentation tasks as well. (In segmentation tasks you might want to have multiple "tiles" of features per WSI, in classification you would only want one set of features per WSI).

LMK what you think

…iles

moerlemans · 2024-08-12T09:17:22Z

Actually for_wsi_classification should also accept a list of pretransforms so we can just add pre_transforms in the configs

That should be generalized the same way the loss factory is

…an read features correctly

…d is set in the writers

…ckends

…ures

… turn of cache

…oss. In the future maybe this should be done with an argument from the configs. Now it works with whatever is available

…uld return that for features

merge main branch that includes classification models

moerlemans · 2024-09-26T12:19:22Z

Next to the available test I've also confirmed that I can extract features with this model and then use them in a classification pipeline as expected.

jonasteuwen

Some comments to begin with.

I think we could use two functions:

read_region_at_scale and read_region_at_level. The latter can be used for instance for features, and the first for images/masks. The first can use the SlideImage backend

jonasteuwen · 2024-09-26T15:42:47Z

+        elif data_format == DataFormat.FEATURE:
+            size = (num_samples, 1)
+            mpp = 1.0
+            tile_size = (1, 1)
+            tile_overlap = (0, 0)


I would make the mpp equal to the factor of the tile size and mpp.

I do not fully understand what you mean here, but these settings are mainly here so that the reader gets them correctly. I can change the mpp as long as the reader also reads at that specific mpp. 1.0 was chosen now for ease

jonasteuwen · 2024-09-26T15:43:12Z

+            data_format=DataFormat.COMPRESSED_IMAGE if compression != "none" else DataFormat.IMAGE,
            color_profile=color_profile,


Doesn't 'none' need to be an Enum?

Fixed, I've looked at it and here the dataformat should always be Image, because the compression here does not make the image a compressed image.

jonasteuwen · 2024-09-26T15:44:52Z

+        if self._stitching_mode != StitchingMode.CROP:
+            raise NotImplementedError("Stitching mode other than CROP is not supported for features.")
+


You also need to check that the overlap is 0, or at least need to think about what this means for the features, and explain it somewhere.

Added check. ATM I don't see how features with overlap should work so we do not allow it.

jonasteuwen · 2024-09-26T15:45:30Z

+
+    def read_region(self, location: tuple[int, int], level: int, size: tuple[int, int]) -> pyvips.Image:
+        """
+        Reads a region in the stored h5 file. This function stitches the regions as saved in the cache file. Doing this
+        it takes into account:


Mention somewhere the difference between this function and the feature function (e.g. interpolation etc)

Added comments to the function

jonasteuwen · 2024-09-26T15:45:45Z

That should be generalized the same way the loss factory is

jonasteuwen · 2024-09-26T15:46:26Z


    @classmethod
    def for_wsi_classification(
        cls, data_description: DataDescription, requires_target: bool = True
    ) -> PreTransformTaskFactory:
        transforms: list[PreTransformCallable] = []
+
+        transforms.append(SampleNFeatures(n=1000))
+
        if not requires_target:
-            return cls(transforms)
+            return cls(transforms, data_description, requires_target)

        index_map = data_description.index_map
        if index_map is None:
            raise ConfigurationError("`index_map` is required for classification models when the target is required.")

+        label_keys = data_description.label_keys


Rewrite this in a more factory design, check how this is done with losses and metrics

I've improved the function and added docstring, but I am not sure how I would improve on the factory design more.

jonasteuwen · 2024-09-26T15:47:01Z

+            raise ValueError(f"Expected features to have a width dimension of 1, got {h}.")
+
+        n_random_indices = (
+            np.random.choice(w, self.n, replace=False) if w > self.n else np.random.choice(w, self.n, replace=True)
+        )
+
+        # Extract the selected columns (indices) from the image
+        # Create a new image from the selected indices
+        # todo: this can probably be done without a for-loop quicker
+        selected_columns = [features.crop(idx, 0, 1, h) for idx in n_random_indices]


you need to seed on something to make sure that each run gives reproducible values. Maybe the path is a good seed?

I think the seed_everything function in the entrypoints should handle this right?

jonasteuwen · 2024-09-26T15:47:11Z

+class SelectSpecificLabels:
+    def __init__(self, keys: list[str] | str):
+        if isinstance(keys, str):
+            keys = [keys]
+        self._keys = keys
+


jonasteuwen · 2024-09-26T15:47:50Z

+        using_features = False
+
+        if tile.bands > 4:
+            # assuming that more than four bands/channels means that we are handling features
+            using_features = True
+            tile_ = tile


I don't think we should use pyvips when we have features. In that case torch.Tensor of numpy arrays should be what we are using.

A dlup dataset will always return a pyvips image and it does work as currently implemented, but indeed we could think about if we want to change that.

jonasteuwen · 2024-10-02T15:07:44Z

Can you add a full example on how you run the extraction? Adding it here is fine, afterwards we can see how to add it to documentation.

moerlemans added 3 commits August 8, 2024 10:57

Added H5Slide reader and fixed reshaping in backend

1f1c728

added database models for features

5eb07b3

added loading features as a dataset + necessary utils

117b66e

moerlemans requested review from AjeyPaiK, BPdeRooij, EricMarcus-ai, JorenB, VanessaBotha and jonasteuwen August 8, 2024 12:18

moerlemans mentioned this pull request Aug 8, 2024

fixes arguments and also copies the folder for mrxs files #79

Open

moerlemans added 2 commits August 8, 2024 15:52

fix black

0d19bf3

improved classification pre_transforms and added random sampling of t…

7fcf23e

…iles

moerlemans force-pushed the feature/feature-dataset branch from e4268ce to 7fcf23e Compare August 8, 2024 14:23

moerlemans self-assigned this Aug 8, 2024

moerlemans added the enhancement New feature or request label Aug 8, 2024

moerlemans commented Aug 12, 2024

View reviewed changes

moerlemans linked an issue Aug 12, 2024 that may be closed by this pull request

Copy masks, annotations, caches and/or features with CLI tool #105

Closed

moerlemans added 13 commits August 15, 2024 13:04

Added specific tile_size and size to the writer, so that the reader c…

c8e1b7c

…an read features correctly

Added DataFormat enum that handles reading features in the readers an…

d9d6346

…d is set in the writers

fix bugs in database models

7094379

added ahcore ImageBackend enum which includes both ahcore and dlup ba…

e29fa3d

…ckends

added dataformat enum and fixed loading of datasets to work with feat…

65d3ef2

…ures

Fixes pretransforms to be used on features, also allows for option to…

eedee6f

… turn of cache

minor fixes to allow models for classification

f58c656

Adapt for dataformat enum

aad8637

added SetTarget method which chooses what the target will be in the l…

0a3dcb3

…oss. In the future maybe this should be done with an argument from the configs. Now it works with whatever is available

precommit fixes and removed the three pixel check, dlup will fix that

2e24cb1

model will expect Bxnum_tilesxfeature_dim, so the ToTensor method sho…

cc9ea49

…uld return that for features

merge with main, fixes mypy

e60768d

fix some mypy

0d97c4a

moerlemans added 7 commits September 5, 2024 13:42

fixes mypy

f5c08dc

fix test for readers

ec9285d

Merge branch 'main' into feature/feature-dataset

cb2d42b

merge main branch that includes classification models

added tests for features

0c0d1e4

cross_entropy now handles BxC inputs as well, also improved logic

8adffcc

make dimension work out for labels

5eea88d

simplify reader and dataset builders

c26910a

moerlemans linked an issue Sep 11, 2024 that may be closed by this pull request

Add ZarrFileFeatureWriter with chunks=False #99

Closed

moerlemans added 6 commits September 11, 2024 14:55

fix writers and tests for writers

7efaa70

fix test, mypy and file_writer + callback

2d83964

now also passes tests...

71bb871

mypy, pylint and it runs now

ae580cb

bugfixes to make the writer working

ea8a330

cleaned feature description and manifest

88dff49

moerlemans marked this pull request as ready for review September 26, 2024 12:18

jonasteuwen reviewed Sep 26, 2024

View reviewed changes

fixes review comments

e43c5ae

jonasteuwen merged commit b0248e2 into main Oct 3, 2024

		data_format=DataFormat.COMPRESSED_IMAGE if compression != "none" else DataFormat.IMAGE,
		color_profile=color_profile,

		if self._stitching_mode != StitchingMode.CROP:
		raise NotImplementedError("Stitching mode other than CROP is not supported for features.")

Conversation

moerlemans commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moerlemans commented Sep 26, 2024

Uh oh!

jonasteuwen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moerlemans Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonasteuwen commented Oct 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

moerlemans commented Aug 8, 2024 •

edited

Loading

moerlemans Oct 2, 2024 •

edited

Loading