Added Multi Trajectory context window generation by Danfoa · Pull Request #9 · Machine-Learning-Dynamical-Systems/kooplearn

Danfoa · 2024-03-27T20:32:52Z

I modified TrajectoryContextDataset to accept multiple trajectory data.

As the data stream remains agnostic to the dimensions of the features, I had to add a "multi_traj" boolean argument at the class's initializer to assert if the first dimension of the vector is considered as the number of trajectories or not.

I suggest enforcing a prefix input expected shape of (n_trajs, n_frames, *features_dims), to avoid this new unnecessary argument and rely on the auxiliary methods traj_to_contexts and multi_traj_to_context to deal with the more flexible shapes.

Comments: There seems to be a perhaps unnecessarily deep hierarchy or classes TrajectoryContextDataset -> TensorContextDataset -> ContextWindowDataset -> ContextWindow -> Sequence, for handling the time series data slicing. Several of these classes repeat data parsing to appropriate backends and shapes. This deep hierarchy makes it quite cumbersome to modify the data pipeline for someone unfamiliar with the code. I propose to reduce this hierarchy to 1 or 2 classes maximum.

- Modified `TrajectoryContextDataset` to accept multiple trajectory data.

Danfoa · 2024-03-28T14:55:15Z

It's worth noting that the HuggingFace Dataset class accepts an ArrayLike structure with the shape (n_samples, context_len, *features) (using the method ) consisting of either np.ndarray or torch.Tensors.

from datasets import Dataset
def my_gen():
    for i in range(1, 4):
        yield <numpy/torch.tensor>
dataset = Dataset.from_generator(my_gen)

This class seamlessly converts the data format to the appropriate backend (numpy, torch, jax, TensorFlow) using the dataset.with_format("tf/torch") method.

from datasets import Dataset
data = [[1, 2],[3, 4]]
ds = Dataset.from_dict({"data": data})
ds = ds.with_format("tf")
ds[0]
ds[:2]

Therefore, Kooplearn could primarily focus on trajectory/context slicing, leaving the backend management to the HuggingFace backend, which is already integrated within the framework.

kooplearn/data.py

pietronvll · 2024-04-02T05:53:03Z

kooplearn/data.py

-                raise ImportError(
-                    "You selected the 'torch' backend, but kooplearn wasn't able to import it."
-                )
+            if isinstance(data_per_traj[0], np.ndarray):


By concatenating everything together, in the case of multi-trajectories it becomes hard to answer questions like "What was the n-th frame of the k-th trajectory?"

Of course, if every trajectory has the same length, this can be quickly recovered, but this might not always be the case (see my other comment).

The primary purpose of the idx_map, although the API is not exploited yet, is to help the user evaluate multi-step forecasting errors. For a simple trajectory, this can be done, e.g. as

for t in times: ref_idxs = test_contexts.idx_map.data[:, 0, 0] + t Y_pred = model.predict(test_contexts, t=t) Y_true = test_contexts.data[ref_idxs]

It was unclear for me from the documentation and docstring what the idx_map role was.

I follow now what you mean. Will update.

In the case of multi-trajectory then, idx_map will have an index per trajectory.

- After initializing TensorContextDataset the `self.backend` attribute stores the desired backend. Either torch or numpy. - The method __getitems__ is a method from Dataset classes that enable fast collection of samples already in batch form. As our datasets are assumed to be in memory, we can enable this fast indexing.

pietronvll · 2024-04-03T03:40:27Z

kooplearn/data.py

        elif isinstance(idx, slice):
            return TensorContextDataset(self.data[idx])

+    def __getitems__(self, indices: list[int]) -> 'TensorContextDataset':


Nice, I didn't know if this trick. As far as I can tell, however, it only works with the torch backend, right? I propose to change __getitem__ as follows

def __getitem__(self, idx) -> 'TensorContextDataset': if np.issubdtype(type(idx), np.integer): # TODO: The default collect behaviour is to return a list of [type] objects. This additional dimension here # seems to be introduced only for a very specific customm collect_fn. return TensorContextDataset(self.data[idx][None, ...]) elif isinstance(idx, slice): return TensorContextDataset(self.data[idx]) # CHANGE STARTS HERE else: if self.backend == 'torch': self.__getitems__(idx) else: return TensorContextDataset(self.data[idx])

Yes, this is torch specific. The __getitems__ method is called on the default data fetch method used by DataLoaders

pietronvll · 2024-04-10T18:02:37Z

@Danfoa any updates on this pull request?

Danfoa added 2 commits March 27, 2024 21:15

Contexts from multiple trajectory data

eccfb6d

- Modified `TrajectoryContextDataset` to accept multiple trajectory data.

Merge main changes to multi_trajectory

967b8cf

Danfoa changed the title ~~Multi trajectory~~ Added Multi Trajectory context window generation Mar 28, 2024

Danfoa mentioned this pull request Mar 31, 2024

[WIP] Introduction of equivariant DAE and DPNet models. #11

Open

pietronvll requested changes Apr 2, 2024

View reviewed changes

pietronvll reviewed Apr 3, 2024

View reviewed changes

pietronvll added 5 commits April 11, 2024 11:27

Working on the multi-trajectory class

1ab0b56

Added multi trajectories

4045c10

Multi-Traj and better handling of the contexts

5101af9

Tested examples

565f9c6

Merge branch 'dev' into pr/Danfoa/9

9597ba2

pietronvll merged commit 88b8a6a into Machine-Learning-Dynamical-Systems:dev Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Multi Trajectory context window generation#9

Added Multi Trajectory context window generation#9
pietronvll merged 8 commits intoMachine-Learning-Dynamical-Systems:devfrom
Danfoa:multi_trajectory

Danfoa commented Mar 27, 2024

Uh oh!

Danfoa commented Mar 28, 2024 •

edited

Loading

Uh oh!

Uh oh!

pietronvll Apr 2, 2024

Uh oh!

Danfoa Apr 2, 2024 •

edited

Loading

Uh oh!

pietronvll Apr 3, 2024

Uh oh!

Danfoa Apr 3, 2024

Uh oh!

pietronvll commented Apr 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Danfoa commented Mar 27, 2024

Uh oh!

Danfoa commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pietronvll Apr 2, 2024

Choose a reason for hiding this comment

Uh oh!

Danfoa Apr 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pietronvll Apr 3, 2024

Choose a reason for hiding this comment

Uh oh!

Danfoa Apr 3, 2024

Choose a reason for hiding this comment

Uh oh!

pietronvll commented Apr 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Danfoa commented Mar 28, 2024 •

edited

Loading

Danfoa Apr 2, 2024 •

edited

Loading