[BEAM-3731] Enable tests to run in Python 3 #4730

luke-zhu · 2018-02-22T16:28:39Z

DESCRIPTION HERE

Follow this checklist to help us incorporate your contribution quickly and easily:

This PR allows us to install the package and run the test suite in Python 3 by removing the version constraints in setup.py and init.py

Currently the only way to run tests requires commenting out lines in the root init.py file. In addition, the number of tests that can be run is limited. Being able to run all of the tests in Python 3 will speed up speed up the compatibility process by a lot. Currently there are about 100 tests fail and another 400 cause errors.

python-modernize was used to eliminate many of the errors. I removed uses of six if they didn't seem necessary. Finally, the files were updated until the test suite would run all tests.

Note: I replaced a many .iteritems() calls with .items() instead of six.iteritems(). I may have changed a few which affect performance.

luke-zhu · 2018-02-22T17:33:31Z

R: @robertwb

holdenk

This is awesome! I'm going to go ahead and close https://github.com/apache/beam/pull/4078/files and focus on this :)

I've got some questions, but I'm not a committer so don't feel obliged to answer them, just questions from another contributor working on Python :)

Another thing the description of the PR says "This PR allows us to install the package and run the test suite in Python 3 by removing the version constraints in setup.py and init.py" but those changes don't appear to be included in this PR, did you perhaps forget to stage/add them? On the subject, to prevent downstream users from trying to install Beam in Python 3 if we were to make a release before we finished adding support, it might make sense to keep the checks for now, but skip them if an environment variable is present. What do you think?

Also as a heads up to @cclauss this seems to subsume a lot of #4697 (although since 4697 is smaller maybe it makes sense to do it more piece wise, idk).

holdenk · 2018-02-23T03:18:52Z

sdks/python/apache_beam/io/filebasedsink.py

        format.
    """
-    if not isinstance(file_path_prefix, (basestring, ValueProvider)):
+    if not isinstance(file_path_prefix, (six.string_types, ValueProvider)):


Another option is to just use from past.builtins import basestring to reduce the number of lines that need to change but this is fine as well.

I think this would be good. Are we ok with adding future as a dependency to master? Or should we get a separate Python3 branch first.

I think future is a reasonable dependency to take in master. Although the idea of a Python 3 branch could be good if we continue to take a long time to review/merge Python 3 fixes into master (although then we'd need people to be able to cooperate on a Python 3 branch explicitly).

holdenk · 2018-02-23T03:25:49Z

sdks/python/apache_beam/coders/coder_impl.py

@@ -280,7 +280,7 @@ def encode_to_stream(self, value, stream, nested):
    elif t is str:


It maybe makes sense to add an explicit byte test in coders_test

holdenk · 2018-02-23T03:27:22Z

sdks/python/apache_beam/coders/coders_test_common.py

@@ -143,9 +143,9 @@ def test_bytes_coder(self):



Now that we are in Python 3 for the bytes coder it maybe makes sense to test with bytes rather than strings.

I haven't done any work for making the tests pass in Python 3. There are many errors in coders due to string compatibility issues though. If we can get a version which builds on Python 3 up to a remote branch, it would be easy to communicate progress. I saw that you created a tox env in a PR.

holdenk · 2018-02-23T03:47:53Z

sdks/python/apache_beam/io/filebasedsource.py

 :class:`~apache_beam.io._AvroSource`.
 """

-from six import integer_types


Would it maybe make sense to import integer_types and string_types with the from directive here (or I guess what is the reason for the change)?

Either way works. I just chose one way to keep things consistent after applying the auto-converter. Is there anything I'm missing?

holdenk · 2018-02-23T04:01:00Z

sdks/python/apache_beam/coders/standard_coders_test.py

          lambda x: IntervalWindow(
              start=Timestamp(micros=(x['end'] - x['span']) * 1000),
              end=Timestamp(micros=x['end'] * 1000)),
-      'urn:beam:coders:stream:0.1': lambda x, parser: map(parser, x),


So this test is seems to indicate its for testing a stream but by putting list around it this isn't really a stream anymore? Or am I missreading the code here? (these tests are a little less than self-documenting maybe @vikkyrk who added the iterablecoder tests can chime in).

holdenk · 2018-02-23T04:08:00Z

sdks/python/apache_beam/io/filebasedsource_test.py

  def test_read(self):
-    sources = [TestConcatSource.DummySource(range(start, start + 10)) for start
-               in [0, 10, 20]]
+    sources = [TestConcatSource.DummySource(list(range(start, start + 10)))


Do these really need to be changed? From looking at DummySource it seems like the operations it does should work fine on ranges.

holdenk · 2018-02-23T04:11:30Z

sdks/python/apache_beam/io/gcp/gcsfilesystem.py

      file_sizes = gcsio.GcsIO().size_of_files_in_glob(pattern, limit)
      metadata_list = [FileMetadata(path, size)
-                       for path, size in file_sizes.iteritems()]
+                       for path, size in file_sizes.items()]


So in most places you seem to use six.iteritems it might be better to be consistent about that? It's possible that there could be a large number of input files as well in which case this would be less than great for Python 2.

holdenk · 2018-02-23T04:12:29Z

sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py

    # Now we create the MetricResult elements.
    result = []
-    for metric_key, metric in metrics_by_name.iteritems():
+    for metric_key, metric in metrics_by_name.items():


So in some places you seem to use six.iteritems it might be better to be consistent about that? There probably aren't that many metrics though so this isn't a big deal.

holdenk · 2018-02-23T04:27:51Z

sdks/python/apache_beam/coders/coders_test_common.py


  def test_varint_coder(self):
    # Small ints.
-    self.check_coder(coders.VarIntCoder(), *range(-10, 10))


So I don't think this change is actually necessary, the unpacking argument lists operator works just fine on ranges in Python 3.

cclauss · 2018-02-23T05:21:41Z

@luke-zhu I was trying to make progress with futurize (which seems to rely more on six) instead of modernize but I am happy to see progress made either way. Please resolve conflicts.

holdenk · 2018-02-24T01:34:30Z

So one quick note on the merge commits, the beam folks seem to prefer rebases to merges (although I personally find merges easier for keeping my PRs up to date so I understand).

from running all tests.

luke-zhu · 2018-02-24T16:57:11Z

Thanks for reviewing! Not sure if I want to squash yet.

holdenk · 2018-02-28T00:58:33Z

Legit, we can save the squash for later.

icoxfog417 · 2018-04-03T00:14:41Z

Is this pull request merged? I'm waiting for Python3 support.

aaltay · 2018-04-03T00:20:42Z

@icoxfog417 please use https://issues.apache.org/jira/browse/BEAM-1251 for tracking Python 3 support in Beam. It is tracking the overall support.

icoxfog417 · 2018-04-03T03:20:21Z

@aaltay Thank you for letting me know. I watched BEAM-1251 already and I understand this issue is subtask for BEAM-1251 (BEAM-3731).

But this issue is closed without merge and BEAM-3731 have not solved yet (BEAM-2713 is too).

aaltay · 2018-04-03T04:27:19Z

Yes they are still both open. If you are interested in helping you can start working on any subtasks under BEAM-1251.

icoxfog417 · 2018-04-03T13:57:41Z

Yes, I want to do it! But I feel this pull request is reviewed enough and commit log is arranged. So I want to know whether this pull request is waiting for merge or something is lacking to do it.

cclauss · 2018-04-03T14:10:52Z

You might look at the MetricName and MetricKey issues discussed at #4798 (comment) and see if you can determine where they are defined.

luke-zhu · 2018-04-03T15:51:04Z

Hi @icoxfog417. Thanks for bringing this up. There was a decision in early March to make the Python3 porting process more incremental. A lot of the changes here are duplicates and I haven't been contributing recently so I closed the PR. I've marked the sub-issue as a duplicate.

holdenk mentioned this pull request Feb 23, 2018

[BEAM-3141][WIP] Support the coders in Python 3 #4078

Closed

holdenk reviewed Feb 23, 2018

View reviewed changes

luke-zhu added 8 commits February 24, 2018 11:52

Python-modernize on coders/

54b59ee

Python-modernize on io/

f497023

Python-modernize on transforms

34eb5c6

Python-modernize on runners/

cc38e6a

Python-modernize on other files

4475bc7

Hand-removed remaining issues preventing "python3 setup.py test"

7a06b88

from running all tests.

Removed unnecessary list() calls.

39d4337

Use iteritems in gcsfilesystem

8b1795d

luke-zhu force-pushed the python3-v2 branch from 614b956 to 8b1795d Compare February 24, 2018 16:55

luke-zhu closed this Mar 22, 2018

		@@ -280,7 +280,7 @@ def encode_to_stream(self, value, stream, nested):
		elif t is str:

[BEAM-3731] Enable tests to run in Python 3 #4730

[BEAM-3731] Enable tests to run in Python 3 #4730

Uh oh!

Conversation

luke-zhu commented Feb 22, 2018

Uh oh!

luke-zhu commented Feb 22, 2018

Uh oh!

holdenk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cclauss commented Feb 23, 2018

Uh oh!

holdenk commented Feb 24, 2018

Uh oh!

luke-zhu commented Feb 24, 2018

Uh oh!

holdenk commented Feb 28, 2018

Uh oh!

icoxfog417 commented Apr 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aaltay commented Apr 3, 2018

Uh oh!

icoxfog417 commented Apr 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aaltay commented Apr 3, 2018

Uh oh!

icoxfog417 commented Apr 3, 2018

Uh oh!

cclauss commented Apr 3, 2018

Uh oh!

luke-zhu commented Apr 3, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

icoxfog417 commented Apr 3, 2018 •

edited

Loading

icoxfog417 commented Apr 3, 2018 •

edited

Loading