Skip to content

Conversation

@yeandy
Copy link
Contributor

@yeandy yeandy commented Feb 22, 2022

Adding common interface to the RunInference transform. Arguments will be wrapper classes around PyTorch and Scikit-learn models, along with other common parameters.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@codecov
Copy link

codecov bot commented Feb 22, 2022

Codecov Report

Merging #16917 (71e2f8c) into master (b2f2128) will increase coverage by 36.74%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #16917       +/-   ##
===========================================
+ Coverage   46.89%   83.63%   +36.74%     
===========================================
  Files         204      453      +249     
  Lines       20122    62528    +42406     
===========================================
+ Hits         9436    52295    +42859     
- Misses       9686    10233      +547     
+ Partials     1000        0     -1000     
Flag Coverage Δ
python 83.63% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/go/pkg/beam/runners/direct/impulse.go
sdks/go/pkg/beam/io/databaseio/writer.go
...ks/go/pkg/beam/runners/dataflow/dataflowlib/job.go
sdks/go/pkg/beam/core/runtime/exec/fn_arity.go
sdks/go/pkg/beam/beam.shims.go
sdks/go/pkg/beam/core/runtime/symbols.go
sdks/go/pkg/beam/util/shimx/generate.go
sdks/go/pkg/beam/runners/direct/buffer.go
sdks/go/pkg/beam/core/graph/coder/registry.go
sdks/go/pkg/beam/core/runtime/exec/cogbk.go
... and 647 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2f2128...71e2f8c. Read the comment docs.

raise NotImplementedError("Please implement _validate_model")


class PytorchModel(RunInferenceModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like PytorchModel and SklearnModel should be in their own file, only imported if one or the other is needed, rather than having a single module that holds all model types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'll refactor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep comments like this open until they're done for easier tracking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These implementations will be done in separate PRs (pytorch, sklearn) after this PR, as well as the PR for the base class #16970 are merged.

Comment on lines 46 to 60
_K = TypeVar('_K')
_INPUT_TYPE = TypeVar('_INPUT_TYPE')
_OUTPUT_TYPE = TypeVar('_OUTPUT_TYPE')


@dataclass
class PredictionResult:
key: _K
example: _INPUT_TYPE
inference: _OUTPUT_TYPE


@beam.ptransform_fn
@beam.typehints.with_input_types(Union[_INPUT_TYPE, Tuple[_K, _INPUT_TYPE]])
@beam.typehints.with_output_types(PredictionResult)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we deal with typing? TFX explicitly defines output as Union[_OUTPUT_TYPE, Tuple[_K, _OUTPUT_TYPE]]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, TFX is doing it right I think. They either have output type or a key with output type.

I'm not the typing expert, IMO its ok to have a typing specific PR that just concentrates on that too, if this gets messy.

We need typing to support TFX's proto use case as well as ours new frameworks use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the output type and pushed the change. I feel like we should try to get typing as accurate as possible early on, especially if we want to ensure that we can accommodate TFX's implementation.

Could we have, instead of a bare PredictionResult type, a parent class from which PredictionResult and prediction_log_pb2.PredictionLog inherit? Or use a Union? (but this may get messy).

This definitely requires input from someone with more experience with typing.

Copy link
Contributor

@ryanthompson591 ryanthompson591 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Add valentyn next as reviewer IMO.

Comment on lines 46 to 60
_K = TypeVar('_K')
_INPUT_TYPE = TypeVar('_INPUT_TYPE')
_OUTPUT_TYPE = TypeVar('_OUTPUT_TYPE')


@dataclass
class PredictionResult:
key: _K
example: _INPUT_TYPE
inference: _OUTPUT_TYPE


@beam.ptransform_fn
@beam.typehints.with_input_types(Union[_INPUT_TYPE, Tuple[_K, _INPUT_TYPE]])
@beam.typehints.with_output_types(PredictionResult)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, TFX is doing it right I think. They either have output type or a key with output type.

I'm not the typing expert, IMO its ok to have a typing specific PR that just concentrates on that too, if this gets messy.

We need typing to support TFX's proto use case as well as ours new frameworks use case.

@yeandy
Copy link
Contributor Author

yeandy commented Mar 2, 2022

CC: @TheNeuralBit and @tvalentyn

@yeandy
Copy link
Contributor Author

yeandy commented Mar 2, 2022

CC: @kevingg

@yeandy yeandy requested a review from robertwb March 3, 2022 22:23
@yeandy
Copy link
Contributor Author

yeandy commented Mar 3, 2022

R: @robertwb @tvalentyn
CC: @kerrydc

Copy link
Contributor

@ryanthompson591 ryanthompson591 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

"""
@abc.abstractmethod
def get_model_loader(self):
"Returns ModelLoader object"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good for now.

Just a note, but I don't think you need to do anything: Going forward it seems to me that there will be an implementation that uses a service (instead of loading model) a that maybe returns none for get_model_loader. I haven't decided if this should return None by default or if a NotImplementedError is fine.

@yeandy
Copy link
Contributor Author

yeandy commented Mar 9, 2022

Updated the interface per our discussion, PTAL. @robertwb

@aaltay
Copy link
Member

aaltay commented Mar 18, 2022

@robertwb - could you please review this PR?

Other folks on the PR, feel free to ping reviewers here or in other places if a review is open for more than a few business days. And where it makes sense you could also ask other committers to review PRs and spread the review load across committers.

@yeandy
Copy link
Contributor Author

yeandy commented Mar 18, 2022

@tvalentyn could you please review/merge?

@tvalentyn tvalentyn merged commit 9a548b2 into apache:master Mar 18, 2022
@robertwb
Copy link
Contributor

I'm actively involved in the review within the TFX codebase. I'm assuming, however, that the two will be reconciled. (And, generally, +1 to @aaltay 's advice about pining reviews that are lagging.)

@yeandy
Copy link
Contributor Author

yeandy commented Mar 18, 2022

Thanks. They will be reconciled once the change to TFX is finished.

@tvalentyn
Copy link
Contributor

Sorry, I misunderstood @yeandy in offline conversation, I thought you already reviewed this change, @robertwb .

@robertwb
Copy link
Contributor

robertwb commented Mar 18, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants