[BEAM-4003] Futurize runners subpackage #5373

Fematich · 2018-05-15T21:15:18Z

This pull request prepares the runners subpackage for Python 3 support. This PR is part of a series in which all subpackages will be updated using the same approach.
This approach has been documented here and the first pull request in the series (Futurize coders subpackage) demonstrating this approach can be found at #5053.

R: @aaltay @tvalentyn @RobbeSneyders

Fematich · 2018-05-16T14:57:58Z

Run Python Dataflow ValidatesRunner

tvalentyn

Thank you, @Fematich . A few comments below, also we need to rebase this change based on current head.

tvalentyn · 2018-06-14T18:25:25Z

sdks/python/apache_beam/runners/direct/direct_runner.py

            args, kwargs = transform.raw_side_inputs
            args_to_check = itertools.chain(args,
-                                            kwargs.values())
+                                            list(kwargs.values()))


While technically this makes equivalent interpretation in Py2 and Py3, I think we can safely skip list conversion here in favor of cleaner code.

tvalentyn · 2018-06-14T18:26:37Z

sdks/python/apache_beam/runners/direct/evaluation_context.py

-    views_string = (', '.join(str(elm) for elm in self._views.values())
-                    if self._views.values() else '[]')
+    views_string = (', '.join(str(elm) for elm in list(self._views.values()))
+                    if list(self._views.values()) else '[]')


While technically this makes equivalent interpretation in Py2 and Py3, I think we can safely skip list conversion here in favor of cleaner code.

tvalentyn · 2018-06-14T19:52:02Z

sdks/python/apache_beam/runners/direct/transform_evaluator.py

    bundles = []
    bundle = None
-    for encoded_k, vs in self.gbk_items.iteritems():
+    for encoded_k, vs in self.gbk_items.items():


As we learned in #5053, Cython can translate iteration through items() without an intermediate object given a directive to interpret the code using Python3 semantics. However when the code is not cythonized, I would prefer to use future.util.iteritems() avoid impact on efficiency for Python 2. It's probably ok to use items() in tests, or when it's clear that the collection is small in size.

tvalentyn · 2018-06-14T21:59:41Z

sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers.py

      return False
    if self.context != IGNORED:
-      for key, name in self.context.iteritems():
+      for key, name in self.context.items():


Let's use future.utils.iteritems() here.

tvalentyn · 2018-06-14T22:00:51Z

sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py

    # Now we create the MetricResult elements.
    result = []
-    for metric_key, metric in metrics_by_name.iteritems():
+    for metric_key, metric in metrics_by_name.items():


Let's use future.utils.iteritems() here.

tvalentyn · 2018-06-15T02:51:05Z

sdks/python/apache_beam/runners/worker/logger.py


 """Python worker logging."""

+from __future__ import absolute_import


Let's add # cython: language_level=3, since this file will be cythonized.

tvalentyn · 2018-06-15T02:52:37Z

sdks/python/apache_beam/runners/common.py

+from builtins import object
+from builtins import zip

 import six


I think we should we avoid import six for consistency with the approach followed elsewhere.
What do you think, @RobbeSneyders ?
Looks like we are using six.reraise in a few places and six.text_type in apiclient.py.

tvalentyn · 2018-06-15T02:58:54Z

sdks/python/apache_beam/runners/worker/opcounters.py

 """Counters collect the progress of the Worker for reporting to the service."""

 from __future__ import absolute_import
+from __future__ import division


Let's add # cython: language_level=3, since this file will be cythonized.

tvalentyn · 2018-06-15T03:00:24Z

sdks/python/apache_beam/runners/common.py

@@ -22,8 +22,13 @@
 For internal use only; no backwards-compatibility guarantees.


Since this file will be cythonized, let's tell cython to use Python3 semantics and add:

# cython: language_level=3,

See also:

beam/sdks/python/setup.py

Line 168 in 7c3fba0

'apache_beam/**/*.pyx',

This has some interesting consequences for typechecks on strings.

if not isinstance(tag, six.string_types):

cannot be replaced by

if not isinstance(tag, (str, unicode)):

Since the string objects defined in other modules (in Python2) are seen as bytes in Cython code with Python3 semantics.

Therefore I replaced it by

try: basestring except NameError: basestring = str

and

if not isinstance(tag,basestring):

tvalentyn · 2018-06-15T03:31:07Z

sdks/python/apache_beam/runners/worker/opcounters_test.py

                          i, buckets[i],
                          10 * total_runs / i,
-                          buckets[i] / (10.0 * total_runs / i)))
+                          buckets[i] // (10.0 * total_runs / i)))


tvalentyn · 2018-06-15T22:48:18Z

R: @charlesccychen

Fematich · 2018-06-19T20:26:32Z

@tvalentyn and @charlesccychen : all the comments have been addressed and the PR has been rebased now

…f runners

cclauss · 2018-07-03T00:44:36Z

sdks/python/apache_beam/runners/worker/worker_id_interceptor.py

-  _worker_id = os.environ['WORKER_ID'] if os.environ.has_key(
-      'WORKER_ID') else str(uuid.uuid4())
+  _worker_id = os.environ['WORKER_ID'] if 'WORKER_ID' in os.environ else \
+      str(uuid.uuid4())


OUCH... Backshash is a bad idea (see PEP8). One space character to the right of the backslach and the script breaks on a change that is not visible to the reader. What about:

_worker_id = os.environ.get('WORKER_ID', str(uuid.uuid4())) instead?

cclauss

Are all the absolute_imports really essential?

charlesccychen

Thanks!

charlesccychen · 2018-07-03T00:56:04Z

sdks/python/apache_beam/runners/worker/worker_id_interceptor.py

-  _worker_id = os.environ['WORKER_ID'] if os.environ.has_key(
-      'WORKER_ID') else str(uuid.uuid4())
+  _worker_id = os.environ['WORKER_ID'] if 'WORKER_ID' in os.environ else \
+      str(uuid.uuid4())


charlesccychen · 2018-07-03T00:57:17Z

sdks/python/apache_beam/runners/worker/sdk_worker_main.py

 from apache_beam.runners.worker.log_handler import FnApiLogRecordHandler
 from apache_beam.runners.worker.sdk_worker import SdkHarness

+standard_library.install_aliases()


Do we want to call this before we do the import of "http.server" above? It looks like this only works because of an "accident" in that this module is imported after someone else already called install_aliases().

Order should be OK, see: #5373 (comment)

charlesccychen · 2018-07-03T00:57:42Z

sdks/python/apache_beam/runners/worker/sdk_worker.py

 from apache_beam.runners.worker import data_plane
 from apache_beam.runners.worker.worker_id_interceptor import WorkerIdInterceptor

+standard_library.install_aliases()


Do we want to call this before we do the import of "queue" above? It looks like this only works because of an "accident" in that this module is imported after someone else already called install_aliases().

Order should be OK, see: #5373 (comment)

charlesccychen · 2018-07-03T00:58:22Z

sdks/python/apache_beam/runners/worker/logger_test.py


 from apache_beam.runners.worker import logger

+standard_library.install_aliases()


Why is this needed in this module?

Removed now!

charlesccychen · 2018-07-03T00:58:34Z

sdks/python/apache_beam/runners/worker/log_handler.py

 from apache_beam.portability.api import beam_fn_api_pb2_grpc
 from apache_beam.runners.worker.worker_id_interceptor import WorkerIdInterceptor

+standard_library.install_aliases()


Do we want to call this before we do the import of "queue" above? It looks like this only works because of an "accident" in that this module is imported after someone else already called install_aliases().

Order should be OK, see: #5373 (comment)

charlesccychen · 2018-07-03T01:01:59Z

sdks/python/apache_beam/runners/direct/executor.py

 from apache_beam.transforms import sideinputs
 from apache_beam.utils import counters

+standard_library.install_aliases()


Do we want to call this before we do the import of "queue" above? It looks like this only works because of an "accident" in that this module is imported after someone else already called install_aliases().

pylint errors on placing install_aliases() above the imports. That's the reasoning for this location.
I tested functionality of this placement and this order works as well:
test script* (Python2):
Without install_aliases():

>>> import sys >>> sys.intern Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'module' object has no attribute 'intern'

With install_aliases() after import sys

>>> import sys >>> from future import standard_library >>> standard_library.install_aliases() >>> sys.intern <built-in function intern>

*Based on documented effect of aliased imports

How about

from future.moves.queue import Queue

charlesccychen · 2018-07-03T01:02:49Z

sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers.py

      return False
    if self.context != IGNORED:
-      for key, name in self.context.iteritems():
+      for key, name in iteritems(self.context):


Why iteritems here vs .items()?

To avoid efficiency losses in Python2, see: #5373 (comment)

charlesccychen · 2018-07-03T01:03:59Z

sdks/python/apache_beam/runners/dataflow/internal/apiclient.py

 import time
 from datetime import datetime
-from StringIO import StringIO
+from io import BytesIO


In some of your other pending changes you use io.BytesIO. Can we choose one style and be consistent?

Are there any guidelines on the style? If not, I'll make it consistent with the io.BytesIO

I think people usually use it in a qualified way, so +1 for io.BytesIO.

charlesccychen · 2018-07-03T01:04:41Z

sdks/python/apache_beam/runners/dataflow/dataflow_runner.py

 from apache_beam.utils import proto_utils
 from apache_beam.utils.plugin import BeamPlugin

+standard_library.install_aliases()


Why do we need this in this file?

Introduced by futurize step because of import urllib (documentation). However, since we import urllib from future, this is indeed not required anymore

charlesccychen · 2018-07-03T01:04:57Z

sdks/python/apache_beam/runners/dataflow/internal/apiclient.py

 from apitools.base.py import exceptions
-import six
+
+from future import standard_library


Do we want to call this before we do the import of "io" above? It looks like this only works because of an "accident" in that this module is imported after someone else already called install_aliases().

I do not believe that you can just write from future import standard_library. I believe that you must write from future.standard_library import install_aliases ; install_aliases() and (despite what the linters will tell you) the install_aliases() call must happen before you import effected modules. If I am wrong on this then please enlighten me.

As discussed in other PR, io has been added in Python 2.6, so we don't need to install_aliases to import it.

OK, I'll remove the install_aliases caused by io. However regarding the order, it seems that install_aliases doesn't need to be called before the imports (copied from here):

I tested functionality of this placement and this order works as well:
test script* (Python2):
Without install_aliases():

>>> sys.intern Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'module' object has no attribute 'intern'

With install_aliases() after import sys

>>> from future import standard_library >>> standard_library.install_aliases() >>> sys.intern <built-in function intern>

*Based on documented effect of aliased imports

charlesccychen · 2018-07-06T15:48:27Z

Thank you! LGTM. (ccy-benchmark-ok)

tvalentyn requested changes Jun 15, 2018

View reviewed changes

Fematich force-pushed the runners branch from 07d3d14 to d00c32b Compare June 19, 2018 17:25

Fematich added 4 commits June 21, 2018 17:32

Futurize direct runner

7548587

Futurize dataflow,experimental,job,portability and test subpackages o…

6f10dd3

…f runners

Futurize complete runners subpackage

0bf7a31

Cleanup futurize runners

b908795

Fematich force-pushed the runners branch from d00c32b to b908795 Compare June 21, 2018 15:35

charlesccychen mentioned this pull request Jul 2, 2018

[BEAM-1251] Modernize Python 2 code to get ready for Python 3 #5842

Merged

2 tasks

cclauss reviewed Jul 3, 2018

View reviewed changes

cclauss approved these changes Jul 3, 2018

View reviewed changes

charlesccychen reviewed Jul 3, 2018

View reviewed changes

removed unnecessary install_aliases, import io and unicode imports

0956fdf

Fematich mentioned this pull request Jul 6, 2018

[BEAM-4000] Futurize io subpackage #5715

Merged

charlesccychen merged commit 303a427 into apache:master Jul 6, 2018

charlesccychen mentioned this pull request Jul 8, 2018

[BEAM-4003] Fix missing iteritems import #5900

Merged


		"""Python worker logging."""

		from __future__ import absolute_import

		@@ -22,8 +22,13 @@
		For internal use only; no backwards-compatibility guarantees.


		from apache_beam.runners.worker import logger

		standard_library.install_aliases()

[BEAM-4003] Futurize runners subpackage #5373

[BEAM-4003] Futurize runners subpackage #5373

Uh oh!

Conversation

Fematich commented May 15, 2018

Uh oh!

Fematich commented May 16, 2018

Uh oh!

tvalentyn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tvalentyn commented Jun 15, 2018

Uh oh!

Fematich commented Jun 19, 2018

Uh oh!

cclauss Jul 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cclauss left a comment

Choose a reason for hiding this comment

Uh oh!

charlesccychen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fematich Jul 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

cclauss Jul 3, 2018 •

edited

Loading

Fematich Jul 4, 2018 •

edited

Loading

superbobry Jul 4, 2018 •

edited

Loading

cclauss Jul 4, 2018 •

edited

Loading

charlesccychen commented Jul 6, 2018 •

edited

Loading