[BEAM-13982] A base class for run inference #16970

ryanthompson591 · 2022-02-28T19:31:58Z

Make a base class for a transform that run inferences.

It will measure metrics, load models (using beam.shared) and batch inferences and run them.

See design doc: https://s.apache.org/inference-sklearn-pytorch

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

codecov · 2022-02-28T20:04:43Z

Codecov Report

Merging #16970 (50b52ee) into master (33c514b) will increase coverage by 0.02%.
The diff coverage is 92.53%.

@@            Coverage Diff             @@
##           master   #16970      +/-   ##
==========================================
+ Coverage   73.94%   73.96%   +0.02%     
==========================================
  Files         684      685       +1     
  Lines       89519    89653     +134     
==========================================
+ Hits        66197    66315     +118     
- Misses      22162    22178      +16     
  Partials     1160     1160

Flag	Coverage Δ
python	`83.65% <92.53%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
sdks/python/apache_beam/ml/inference/base.py	`92.53% <92.53%> (ø)`
...eam/runners/portability/fn_api_runner/execution.py	`92.25% <0.00%> (-0.81%)`	⬇️
...che_beam/runners/interactive/interactive_runner.py	`92.25% <0.00%> (-0.71%)`	⬇️
sdks/python/apache_beam/runners/common.py	`89.98% <0.00%> (-0.15%)`	⬇️
sdks/python/apache_beam/internal/metrics/metric.py	`93.00% <0.00%> (+1.00%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33c514b...50b52ee. Read the comment docs.

…etrics name

ryanthompson591 · 2022-03-02T17:45:11Z

R: yeandy

yeandy · 2022-03-02T17:52:14Z

CC: @TheNeuralBit and @tvalentyn

yeandy

First pass looks good

sdks/python/apache_beam/ml/inference/base.py

yeandy · 2022-03-02T17:59:02Z

sdks/python/apache_beam/ml/inference/base.py

+
+class InferenceRunner:
+  """Implements running inferences for a framework."""
+  def run_inference(self, batch: Any, model: Any) -> List[PredictionResult]:


Do we want Iterable[PredictionResult] instead of List[PredictionResult]?

Yeah, I like Iterable, it is more generic. Done.

yeandy · 2022-03-02T18:00:14Z

sdks/python/apache_beam/ml/inference/base.py

+
+class ModelLoader:
+  """Has the ability to load an ML model."""
+  def load_model(self):


Suggested change

def load_model(self):

def load_model(self) -> InferenceRunner:

yeandy · 2022-03-02T18:13:33Z

sdks/python/apache_beam/ml/inference/base.py

+    raise NotImplementedError(type(self))
+
+
+class InferenceRunner:


Robert had something like

ExampleBatchType = TypeVar('ExampleBatchType') InferenceBatchType = TypeVar('InferenceBatchType') class InferenceRunner(Generic[ExampleBatchType, InferenceBatchType]): def to_impl_batch(batch: Iterable[Any] -> ExampleBatchType): raise NotImplementedError(type(self)) def from_impl_batch(inferences: InferenceBatchType) -> Iterable[Any]: raise NotImplementedError(type(self)) def run_inference(self, batch : ExampleBatchType) -> InferenceBatchType: raise NotImplementedError(type(self))

I'm guessing it's not necessary to have the type variables if you're not converting to and from batches? But how do we structure this if we want to leverage Batched DoFn in the future? Thought it's probably not necessary to do the batching explicitly if we can leverage something like @DoFn.yields_batches? @TheNeuralBit what are your thoughts?

How about doing that change as part of https://issues.apache.org/jira/browse/BEAM-14044. I suppose this sort of feature would depend a lot on how we implement batching more specifically per framework. Let me add Robert as a reviewer at this point and see what he thinks we should do for batching.

yeandy · 2022-03-02T18:14:19Z

sdks/python/apache_beam/ml/inference/base.py

+  return platform.system().startswith('CYGWIN_NT')
+
+
+class _Clock(object):


Do we want a test for Clock?

Since the clock is very environment specific its a little harder to unit test. However, I do think I should change this up (it's based on the TFX version) and make this mockable so we can write better unit tests for the metrics.

Agreed, I'd like to be able to do basic validation for Clock.

Is it necessary to define our own clock abstraction? The solution in BatchElements is just to provide a kwarg allowing tests to provide a mock to override time.time:

beam/sdks/python/apache_beam/transforms/util.py

Lines 626 to 633 in 06e7c20

def __init__(

self,

min_batch_size=1,

max_batch_size=10000,

target_batch_overhead=.05,

target_batch_duration_secs=1,

variance=0.25,

clock=time.time):

If we do need this (maybe for the fine-grained clock?) it would be nice to make it a general Beam solution as this is a common need.

I'm all for clock having the same API as time.time (i.e. a callable that returns the number of seconds as a floating point). We can have various implementations as needed for more precision but wouldn't need to add new concepts/APIs (and would be consistent with what we do for BatchElements).

I want to punt on this change for now and talk to the TFX team as to why they did this, and convince them to use the cleaner time.time solution if they have no reason not to.

https://issues.apache.org/jira/browse/BEAM-14255

I spent some time experimenting with this. AFAICT it's possible we lose 1 microsecond of precision in a measurement 50% of the time if we use the floating point time, instead of the nanosecond interface.

The cleanest would be to have no abstractions at all and just use time.

OK, please reference this jira in a TODO in the code.

Is it more complex than that they just wanted nanosecond precision?

Nope. It is not more complex than that - also this immediately gets converted to microsecond precision.

I'm going to talk with them about simplifying this. My intuition is that converting time.time() to a microsecond is adequate for their needs.

sdks/python/apache_beam/ml/inference/base.py

yeandy · 2022-03-02T18:15:07Z

sdks/python/apache_beam/ml/inference/base.py

+  def expand(self, pcoll: beam.PCollection) -> beam.PCollection:
+    return (
+        pcoll
+        # TODO: Hook into the batching DoFn APIs.


Let's file a JIRA for this

+1, please assign this to me

https://issues.apache.org/jira/browse/BEAM-14044

Go ahead and take this when we are ready.

I made one yesterday but forgot to update this comment 😆 I don't think I have delete permissions to get rid of this though

don't delete just mark duplicate.

yeandy · 2022-03-02T19:21:29Z

CC: @kevingg

ryanthompson591 · 2022-03-03T15:35:09Z

R: @tvalentyn @robertwb
CC: @kerrydc

…right with a generator

yeandy · 2022-03-04T21:10:46Z

sdks/python/apache_beam/ml/inference/base.py

+    inference_generator = self._inference_runner.run_inference(
+        examples, self._model)


The time saved is probably negligible, but should we put start_time and inference_latency around just inference_generator, instead of also around the manipulations w/ the keys and wrapping of PredictionResult? Technically, the operations w/ keys and PredictionResult are not related to the pure inference call.

I'm indifferent mostly I can see the advantages of both. I can move it down for now though and we can move it later if we are unhappy.

yeandy · 2022-03-04T21:16:51Z

sdks/python/apache_beam/ml/inference/base.py

+    self._model_loader = model_loader
+    self._inference_runner = inference_runner
+    self._shared_model_handle = shared.Shared()
+    # TODO: Compute a good metrics namespace


Will this be a future ticket? Or is it just a reminder for this PR? Maybe we could userun_inference_metrics?

I think we can put this in one of the framework defined classes. TFX has a namespace where they define a lot of things about the implementation details. Currently their namespace to reads bulkinference. The name space they define is very model specific.

yeandy · 2022-03-04T21:17:05Z

sdks/python/apache_beam/ml/inference/base.py

+    has_keys = isinstance(batch[0], tuple)
+    start_time = self._clock.get_current_time_in_microseconds()
+    if has_keys:
+      examples = [example for _, example in batch]
+      keys = [key for key, _ in batch]
+    else:
+      examples = batch
+      keys = None


Just for a little more understanding of logic, can we add a comment above like "Separating examples from keys"?

Yeah, I added a few comments. The TFX one's logic is a little tricky to follow. Hopefully how keys are separated and recombined make sense now with a few more comments.

sdks/python/apache_beam/ml/inference/base.py

…modleLoader class

tvalentyn · 2022-03-08T01:36:48Z

sdks/python/apache_beam/ml/inference/base.py

+_SECOND_TO_MICROSECOND = 1000000
+
+
+def _unbatch(maybe_keyed_batches: Any):


s/Any/Tuple[Any, Any] ?

sdks/python/apache_beam/ml/inference/base.py

tvalentyn · 2022-03-08T02:07:11Z

sdks/python/apache_beam/ml/inference/base_test.py

+from apache_beam.testing.test_pipeline import TestPipeline
+
+
+class MockModel:


nit: sounds like a FakeModel rather than MockModel, ditto below.

tvalentyn · 2022-03-08T02:22:23Z

sdks/python/apache_beam/ml/inference/base.py

+    return self._shared_model_handle.acquire(load)
+
+  def setup(self):
+    super().setup()


nit: it's a no-op.

sdks/python/apache_beam/ml/inference/base.py

typo fix valentyn's suggestion Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>

ryanthompson591 · 2022-04-07T17:05:40Z

I don't think the failing tests seem like they are related to this change, though I am not familiar with codecov/patch.

TheNeuralBit · 2022-04-07T21:17:49Z

@ryanthompson591 it looks like you picked up a bunch of commits (from master?) and now GitHub is indicating there are merge conflicts. Can you pull out just your commits and update this branch?

robertwb · 2022-04-08T22:07:27Z

@TheNeuralBit I agree that there are still improvements that can be done to both the public and private APIs. I would declare even the public API here to be non-final and we should iterate on it (pull requests welcome) before declaring it a such. It will be easier to parallelize the work here once the initial PR is in.

ryanthompson591 · 2022-04-11T16:54:23Z

@TheNeuralBit How about now? I tried to merge with master. I only see two new files added. What merge conflict message is given?

TheNeuralBit · 2022-04-11T18:34:28Z

Run Python_PVR_Flink PreCommit

TheNeuralBit · 2022-04-11T18:35:08Z

No it's not showing any merge conflicts now. The only blocker is Python PreCommit. That seems to be failing at HEAD sadly.

TheNeuralBit · 2022-04-11T18:44:07Z

Run Python PreCommit

TheNeuralBit · 2022-04-11T19:02:18Z

Filed BEAM-14288 for the PreCommit issue

TheNeuralBit · 2022-04-11T21:15:41Z

Run Python PreCommit

TheNeuralBit · 2022-04-11T21:16:34Z

Run Python_PVR_Flink PreCommit

TheNeuralBit · 2022-04-12T20:40:43Z

Run PythonDocker PreCommit

added initial commit

0436583

github-actions bot added examples python labels Feb 28, 2022

removed modified file

2cb876c

github-actions bot removed the examples label Feb 28, 2022

ryanthompson591 added 6 commits February 28, 2022 16:13

removed params that dont exist

5e8f4f9

added clock, removed generics that were causing pickle error, fixed m…

37b6198

…etrics name

fixed class names removed class that goes in apis

461776c

added base test file

38103c3

Added unit tests

fad3925

reordered imports

5fd7fe0

yeandy reviewed Mar 2, 2022

View reviewed changes

ryanthompson591 added 3 commits March 3, 2022 09:10

replied to comments

1226253

apis to api

5fe7edf

added license

5504687

added mock clock test for metrics, realized our metric wouldn't work …

eb8cf56

…right with a generator

yeandy reviewed Mar 4, 2022

View reviewed changes

sdks/python/apache_beam/ml/inference/base.py Show resolved Hide resolved

yeandy mentioned this pull request Mar 7, 2022

[BEAM-13972] Add RunInference interface #16917

Merged

4 tasks

Minor changes from Andys comments. Push metric namespace decision to …

11b9488

…modleLoader class

tvalentyn reviewed Mar 8, 2022

View reviewed changes

ryanthompson591 and others added 2 commits March 8, 2022 09:31

Update sdks/python/apache_beam/ml/inference/base.py

a1573da

typo fix valentyn's suggestion Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>

updated with changes from valentyns comments

1867029

linted

1be27a8

merge with master

89b553b

github-actions bot removed model kafka docker infra io examples go core runners java flink dataflow fn-execution gcp website labels Apr 11, 2022

Merge branch 'master' into baseRunInferenceClass

50b52ee

TheNeuralBit merged commit dffa7c1 into apache:master Apr 12, 2022

	def load_model(self):
	def load_model(self) -> InferenceRunner:

		return platform.system().startswith('CYGWIN_NT')


		class _Clock(object):

	def __init__(
	self,
	min_batch_size=1,
	max_batch_size=10000,
	target_batch_overhead=.05,
	target_batch_duration_secs=1,
	variance=0.25,
	clock=time.time):

		inference_generator = self._inference_runner.run_inference(
		examples, self._model)

		_SECOND_TO_MICROSECOND = 1000000


		def _unbatch(maybe_keyed_batches: Any):

		from apache_beam.testing.test_pipeline import TestPipeline


		class MockModel:

[BEAM-13982] A base class for run inference #16970

[BEAM-13982] A base class for run inference #16970

Uh oh!

Conversation

ryanthompson591 commented Feb 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GitHub Actions Tests Status (on master branch)

Uh oh!

codecov bot commented Feb 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ryanthompson591 commented Mar 2, 2022

Uh oh!

yeandy commented Mar 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yeandy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanthompson591 Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeandy commented Mar 2, 2022

Uh oh!

ryanthompson591 commented Mar 3, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ryanthompson591 commented Feb 28, 2022 •

edited

Loading

codecov bot commented Feb 28, 2022 •

edited

Loading

yeandy commented Mar 2, 2022 •

edited

Loading

ryanthompson591 Apr 6, 2022 •

edited

Loading