Skip to content

Conversation

@shunping
Copy link
Collaborator

@shunping shunping commented Mar 13, 2025

Depends on #34285

@shunping
Copy link
Collaborator Author

r: @damccorm

@shunping shunping self-assigned this Mar 14, 2025
@shunping shunping added this to the 2.64.0 Release milestone Mar 14, 2025
@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

@shunping shunping force-pushed the anomaly-detection-5-3 branch 2 times, most recently from 7b3218f to 670b948 Compare March 14, 2025 03:39
- torch.Tensor(n) gives a nx1-tensor with zeros: i.e. Tensor(0, 0, ... 0).
- torch.tensor(n) gives a 1x1-tensor with value n: i.e. Tensor(n).
@shunping shunping force-pushed the anomaly-detection-5-3 branch from 670b948 to 863e293 Compare March 14, 2025 12:49
Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do a deep dive on the PR because I'm a bit skeptical about the high level of what this looks like. I think we can simplify the implementation and end up with a more coherent user experience by limiting the types of model handlers we support.


def _to_numpy_array(row: beam.Row):
"""Converts an Apache Beam Row to a NumPy array."""
return numpy.array(list(row))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this adapter really makes sense. For example, lets say you have a row like:

Row(a=1, b=2, c=3)

These are all different features, and it is unlikely that you actually want to actually treat them as a single numpy array. Even if you do, there's not really a guarantee that this will happen in the right order.

Worse, this row could actually be:

Row(a=1, b=2, c='foo')

where we only want to run anomaly detection against a (we solve this elsewhere with

x = beam.Row(**{f: getattr(data, f) for f in self._underlying._features})
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've expressed this less strongly before, but seeing this in practice I really think that we're better off just supporting ModelHandler[beam.Row, float] and making users handle the conversion from row to input/output types.

I think a bunch of these adapters either don't really make sense or are overly opinionated in unpredictable ways which will be hard for users to reason about, and there is a much easier path for them to define the exact behavior they want (with_preproces_fn/with_postprocess_fn)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similarly true with the postprocessing. There is no single way that models will output an anomaly prediction, and it often may require some light postprocessing which can be pretty custom.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that it will be much simpler to only support ModelHandler[beam.Row, float], but I am also hesitating to put all the adapter burden to users, which could be a friction of adapting the new transform.

With that said, I think instead of putting those functions in the SDK, maybe we can show them in examples later. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, examples seem reasonable. I think taking the burden away from users would be great, but in practice I don't think their model preprocessing steps will be predictable enough to do that. Users will also be accustomed to having some simple preprocessing steps, and this fits in neatly with that.



@specifiable
class CustomDetector(AnomalyDetector):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're using RunInference, this should probably be called OfflineDetector or something similar. CustomDetector could also include online detectors the user defines themselves

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. That makes sense.

@shunping
Copy link
Collaborator Author

I didn't do a deep dive on the PR because I'm a bit skeptical about the high level of what this looks like. I think we can simplify the implementation and end up with a more coherent user experience by limiting the types of model handlers we support.

This PR has been superseded by #34310 and #34311.

@shunping shunping closed this Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants