[BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK #10764

EDjur · 2020-02-04T14:19:19Z

This PR relates to https://issues.apache.org/jira/browse/BEAM-9146 and integrates GCP Video Intelligence functionality as part of a PTransform that implicitly accepts a PCollection of GCS URIs or raw bytes.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang	SDK	Apex	Dataflow	Gearpump	Samza	Spark
Go		---	---	---	---
Java
Python		---		---	---
XLang	---	---	---	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

EDjur · 2020-02-04T14:21:28Z

R: @aaltay @kamilwu

aaltay · 2020-02-04T17:00:41Z

Thank you @EDjur.

I was hoping that we can implement most of the functionality in tfx_bsl, and add cloud AI services by defining new endpoints. We had a discussion on dev list [1] related to this.

[1] https://lists.apache.org/thread.html/r3d9b0e3270668fda989f363eab1f79e4ef0e7fa3e0fcbe9c26344f14%40%3Cdev.beam.apache.org%3E

EDjur · 2020-02-04T17:14:08Z

I see. I signed up to the dev mailing list just last week so must've missed that discussion.

What does that mean in regards to this ticket?

I agree that it would be nice to avoid duplicated behaviour across libraries.

aaltay · 2020-02-05T00:22:46Z

@EDjur - Sorry for the miscommunication. These types of decisions should have reflected on JIRA as well. I feel bad that you spent time on this and we are changing direction after your PR is out.

I think what it means for this ticket is that:

If tfx_bsl has ability to call into services (which we need to add and TFX team agreed to support us with reviews)
We can add thin transforms for different things (e.g. AnnotateVideo) and each of these transform will have quite a bit of shared code and behavior.

@kamilwu may have other thoughts, since he was also working on this. I would like to hear what is his opinion.

For this PR, let's try to re-use as much as possible.

EDjur · 2020-02-05T08:16:50Z

@aaltay No worries - I've learnt a lot by creating this PR anyway! Curious to hear @kamilwu's thoughts on it too.

I'm open to modify / work on another PR related to tfx_bsl if needed.

kamilwu · 2020-02-05T10:40:17Z

Thanks @EDjur.

tfx_bsl seems to be useful only in some cases. One of these cases is AI Platform Prediction, which, by the way, we also would like to support in Beam in the nearest future.

Video Intelligence API is pretty much different. We should treat it as a fully-managed machine learning service. There is no machine learning model which could be trained and deployed. Instead, videos are annotated using pre-trained custom model provided by Google. All a user has to do is to enable API and make a request (using REST API or client library). The only input is a GCS path to a video file. As a result, there is little behavior that could be shared.

If you have any questions regarding tfx_bsl and how it connects with Beam, just let me know, as I'm currently working to provide support for remote inference (an ability to call into services) in tfx_bsl.

kamilwu · 2020-02-05T10:43:00Z

@EDjur Because you've added a new dependency, we have to specify it in setup.py file (sdks/python/setup.py)

sdks/python/apache_beam/io/gcp/ai/video_intelligence.py

aaltay · 2020-02-05T16:59:58Z

Thank you @kamilwu for the comment. That makes sense to me.

Related question, what is a good location for these family of transforms? io/... does not sounds right in this case. How about a top level ml/ folder similar to io. These transform could be put under ml/gcp?

kamilwu · 2020-02-06T13:07:17Z

Related question, what is a good location for these family of transforms? io/... does not sounds right in this case. How about a top level ml/ folder similar to io. These transform could be put under ml/gcp?

I agree io/... can be somewhat misleading. ml/gcp sounds good.

…nnotate_video. Added requirement in setup.py

EDjur · 2020-02-06T14:03:49Z

Thanks for the feedback! I've adjusted the code to conform to py23 standards as well as added the extra arguments to annotate_video.

I also added the google-cloud-videointelligence dependency which I've tested from version 1.8.0 up until 1.12.1.

sdks/python/apache_beam/ml/gcp/video_intelligence.py

sdks/python/apache_beam/ml/gcp/video_intelligence_helper.py

sdks/python/apache_beam/ml/gcp/video_intelligence.py

EDjur · 2020-02-06T18:26:34Z

Got one question regarding the naming. Is video_intelligence a good name for the module? I'm slightly worried it might cause confusion with the google.cloud.videointelligence.

aaltay · 2020-02-06T18:32:30Z

Got one question regarding the naming. Is video_intelligence a good name for the module? I'm slightly worried it might cause confusion with the google.cloud.videointelligence.

I think it is a good name reflects the underlying service as it is. Not very different from other gcp io with the similar names to the gcp services.

sdks/python/apache_beam/ml/gcp/video_intelligence.py

aaltay

LGTM. I will wait for @kamilwu to complete his review.

kamilwu · 2020-02-07T10:13:28Z

LGTM.

EDjur · 2020-02-07T12:30:22Z

Got one question regarding the naming. Is video_intelligence a good name for the module? I'm slightly worried it might cause confusion with the google.cloud.videointelligence.

I think it is a good name reflects the underlying service as it is. Not very different from other gcp io with the similar names to the gcp services.

I started work on https://issues.apache.org/jira/browse/BEAM-9247 to integrate the vision API in a similar PTransform. However, here the naming gets more complex. Consider e.g.

  from google.cloud import vision
  from apache_beam.ml.gcp import vision

One solution for the vision transforms is to rename the module vision_api instead. But I think we should be consistent across the different GCP ML APIs, so perhaps video_intelligence will need renaming too.

What's your take? @aaltay

aaltay · 2020-02-07T18:26:22Z

Got one question regarding the naming. Is video_intelligence a good name for the module? I'm slightly worried it might cause confusion with the google.cloud.videointelligence.

I think it is a good name reflects the underlying service as it is. Not very different from other gcp io with the similar names to the gcp services.

I started work on https://issues.apache.org/jira/browse/BEAM-9247 to integrate the vision API in a similar PTransform. However, here the naming gets more complex. Consider e.g.
  from google.cloud import vision
  from apache_beam.ml.gcp import vision
One solution for the vision transforms is to rename the module vision_api instead. But I think we should be consistent across the different GCP ML APIs, so perhaps video_intelligence will need renaming too.

What's your take? @aaltay

I do not have a good idea. Users could solve this with something like the following:

from google.cloud import vision as gcp_vision_api
from apache_beam.ml.gcp import vision

So, maybe it is not a big issue. I agree, it would be good if we can avoid this state.

Related, in the io folder we have modules named gcsio, bigtableio etc. but we also have modules named bigquery, pubsub. So it is not very consistent.

We can take the io example and change the names by appending ml, your example will look like:

from google.cloud import vision
from apache_beam.ml.gcp import visionml
from apache_beam.ml.gcp import videointelligenceml (Notice I also dropped _ between the words.)

What do you think?

mwalenia · 2020-02-13T08:15:32Z

retest this please

mwalenia · 2020-02-13T10:32:55Z

retest this please

mwalenia · 2020-02-13T10:33:30Z

@EDjur your phrase triggers unfortunately won't trigger the tests - there are Jenkins restrictions that won't let you do it

EDjur · 2020-02-13T10:37:31Z

@EDjur your phrase triggers unfortunately won't trigger the tests - there are Jenkins restrictions that won't let you do it

Gotcha. Thanks for triggering them!

mwalenia · 2020-02-13T11:21:11Z

retest this please

EDjur · 2020-02-13T13:19:16Z

I'm having a small discussion regarding the video_context with @kamilwu here: https://issues.apache.org/jira/browse/BEAM-9247

Perhaps this is something we should look at integrating to this PR before (or after considering the use-case seems small) merging too.

mwalenia · 2020-02-14T09:36:40Z

retest this please

EDjur · 2020-02-14T10:13:05Z

Thanks for retesting! Test failures look unrelated to this PR.

11:04:57 self = <apache_beam.runners.interactive.caching.streaming_cache_test.InMemoryReader object at 0x7fdcc6b742d0>
11:04:57 watermark = 0, processing_time = 1
11:04:57 
11:04:57     def advance_watermark(self, watermark, processing_time):
11:04:57       record = TestStreamFileRecord(
11:04:57           watermark=Timestamp.of(watermark).to_proto(),
11:04:57 >         processing_time=Timestamp.of(processing_time).to_proto())
11:04:57 E     ValueError: Protocol message TestStreamFileRecord has no "processing_time" field.

aaltay · 2020-02-14T17:34:19Z

Test error seems to be related to: https://github.com/apache/beam/pull/10856/files

…context as part of element tuple

aaltay · 2020-02-18T16:42:12Z

retest this please

aaltay · 2020-02-18T16:54:45Z

retest this please

aaltay · 2020-02-18T19:28:32Z

retest this please

aaltay · 2020-02-18T19:30:33Z

retest this please

aaltay · 2020-02-18T21:17:52Z

:sdks:python:test-suites:tox:pycommon:docs task is failing -- There might be pydocs issues in the change.

EDjur · 2020-02-19T11:27:47Z

Seems yapf doesn't format docstrings 🤷‍♂ Should be fixed now.

kkucharc · 2020-02-19T12:35:26Z

retest this please

aaltay · 2020-02-19T16:38:18Z

@EDjur -- Thank you, merged this.

Could we update 'google-cloud-videointelligence>=1.8.0<=1.12.1', to use the recently released 1.13.0 version as well? Do we need anything special?

EDjur · 2020-02-20T08:40:53Z

Looks fine to update! The 1.13.0 essentially added two new enums to the types.Feature, but since these are specified by the user, all should be well.

I also manually reran all the tests on version 1.13.0 without mocking the videointelligence client and they ran as expected.

Cheers!

EDjur added 2 commits February 4, 2020 15:06

[BEAM-9146] PTransform that integrates Video Intelligence functionality

6ab5013

Removed unused imports

0e54a64

EDjur changed the title ~~BEAM-9146/gcp video intelligence~~ [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK Feb 4, 2020

kamilwu reviewed Feb 5, 2020

View reviewed changes

sdks/python/apache_beam/io/gcp/ai/video_intelligence.py Outdated Show resolved Hide resolved

kamilwu reviewed Feb 5, 2020

View reviewed changes

sdks/python/apache_beam/io/gcp/ai/video_intelligence.py Outdated Show resolved Hide resolved

EDjur added 2 commits February 6, 2020 14:55

byte and string comparison now supporting py23. Added extra args to a…

86aeff8

…nnotate_video. Added requirement in setup.py

Folder restructuring

d2d3fc4

Yapf formatting

c1abf00

aaltay reviewed Feb 6, 2020

View reviewed changes

sdks/python/apache_beam/ml/gcp/video_intelligence.py Outdated Show resolved Hide resolved

EDjur added 2 commits February 6, 2020 19:53

Merged helper into main file. Added timeout default arg. Doc fixes.

bfe848a

Optimise imports

ebb701c

aaltay reviewed Feb 6, 2020

View reviewed changes

sdks/python/apache_beam/ml/gcp/video_intelligence.py Outdated Show resolved Hide resolved

aaltay approved these changes Feb 6, 2020

View reviewed changes

kamilwu approved these changes Feb 7, 2020

View reviewed changes

Re-ordered imports

ababbf6

Added call to unittest.main() in tests

6818f35

EDjur and others added 4 commits February 18, 2020 14:23

Split into 2 PTransforms -> 1. Video context as side input. 2. Video …

4c9a150

…context as part of element tuple

Merge branch 'master' into BEAM-9146/gcp-video-intelligence

86356f7

Added AnnotateVideoWithContext to CHANGES.md

702e3d7

Merge branch 'master' into BEAM-9146/gcp-video-intelligence

63e7044

Add final new line.

4c3af02

Fixed pycommon:docs issue

26a19b0

aaltay merged commit efe193e into apache:master Feb 19, 2020

aaltay mentioned this pull request Feb 20, 2020

Update google-cloud-videointelligence dependency #10913

Merged

4 tasks

EDjur mentioned this pull request Feb 25, 2020

[BEAM-9247] Integrate GCP Vision API functionality #10959

Merged

4 tasks

mwalenia mentioned this pull request Mar 30, 2020

[BEAM-9147] Add a VideoIntelligence transform to Java SDK #11261

Merged

4 tasks

[BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK #10764

[BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK #10764

Uh oh!

Conversation

EDjur commented Feb 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

Uh oh!

EDjur commented Feb 4, 2020

Uh oh!

aaltay commented Feb 4, 2020

Uh oh!

EDjur commented Feb 4, 2020

Uh oh!

aaltay commented Feb 5, 2020

Uh oh!

EDjur commented Feb 5, 2020

Uh oh!

kamilwu commented Feb 5, 2020

Uh oh!

kamilwu commented Feb 5, 2020

Uh oh!

Uh oh!

Uh oh!

aaltay commented Feb 5, 2020

Uh oh!

kamilwu commented Feb 6, 2020

Uh oh!

EDjur commented Feb 6, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EDjur commented Feb 6, 2020

Uh oh!

aaltay commented Feb 6, 2020

Uh oh!

Uh oh!

aaltay left a comment

Choose a reason for hiding this comment

Uh oh!

kamilwu commented Feb 7, 2020

Uh oh!

EDjur commented Feb 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aaltay commented Feb 7, 2020

Uh oh!

mwalenia commented Feb 13, 2020

Uh oh!

mwalenia commented Feb 13, 2020

Uh oh!

mwalenia commented Feb 13, 2020

Uh oh!

EDjur commented Feb 13, 2020

Uh oh!

mwalenia commented Feb 13, 2020

Uh oh!

EDjur commented Feb 13, 2020

Uh oh!

mwalenia commented Feb 14, 2020

Uh oh!

EDjur commented Feb 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aaltay commented Feb 14, 2020

Uh oh!

aaltay commented Feb 18, 2020

Uh oh!

aaltay commented Feb 18, 2020

Uh oh!

aaltay commented Feb 18, 2020

Uh oh!

aaltay commented Feb 18, 2020

Uh oh!

aaltay commented Feb 18, 2020

Uh oh!

EDjur commented Feb 4, 2020 •

edited

Loading

EDjur commented Feb 7, 2020 •

edited

Loading

EDjur commented Feb 14, 2020 •

edited

Loading

EDjur commented Feb 20, 2020 •

edited

Loading