-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[AnomalyDetection] Call RunInference for custom detectors #34286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d4d8953 to
cb3bb70
Compare
|
r: @damccorm |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
7b3218f to
670b948
Compare
- torch.Tensor(n) gives a nx1-tensor with zeros: i.e. Tensor(0, 0, ... 0). - torch.tensor(n) gives a 1x1-tensor with value n: i.e. Tensor(n).
670b948 to
863e293
Compare
damccorm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't do a deep dive on the PR because I'm a bit skeptical about the high level of what this looks like. I think we can simplify the implementation and end up with a more coherent user experience by limiting the types of model handlers we support.
|
|
||
| def _to_numpy_array(row: beam.Row): | ||
| """Converts an Apache Beam Row to a NumPy array.""" | ||
| return numpy.array(list(row)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this adapter really makes sense. For example, lets say you have a row like:
Row(a=1, b=2, c=3)
These are all different features, and it is unlikely that you actually want to actually treat them as a single numpy array. Even if you do, there's not really a guarantee that this will happen in the right order.
Worse, this row could actually be:
Row(a=1, b=2, c='foo')
where we only want to run anomaly detection against a (we solve this elsewhere with
| x = beam.Row(**{f: getattr(data, f) for f in self._underlying._features}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've expressed this less strongly before, but seeing this in practice I really think that we're better off just supporting ModelHandler[beam.Row, float] and making users handle the conversion from row to input/output types.
I think a bunch of these adapters either don't really make sense or are overly opinionated in unpredictable ways which will be hard for users to reason about, and there is a much easier path for them to define the exact behavior they want (with_preproces_fn/with_postprocess_fn)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similarly true with the postprocessing. There is no single way that models will output an anomaly prediction, and it often may require some light postprocessing which can be pretty custom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you that it will be much simpler to only support ModelHandler[beam.Row, float], but I am also hesitating to put all the adapter burden to users, which could be a friction of adapting the new transform.
With that said, I think instead of putting those functions in the SDK, maybe we can show them in examples later. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, examples seem reasonable. I think taking the burden away from users would be great, but in practice I don't think their model preprocessing steps will be predictable enough to do that. Users will also be accustomed to having some simple preprocessing steps, and this fits in neatly with that.
|
|
||
|
|
||
| @specifiable | ||
| class CustomDetector(AnomalyDetector): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're using RunInference, this should probably be called OfflineDetector or something similar. CustomDetector could also include online detectors the user defines themselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. That makes sense.
|
Depends on #34285