Skip to content

Conversation

@luke-zhu
Copy link
Contributor

DESCRIPTION HERE


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand:
    • What the pull request does
    • Why it does it
    • How it does it
    • Why this approach
  • Each commit in the pull request should have a meaningful subject line and body.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

This PR allows us to install the package and run the test suite in Python 3 by removing the version constraints in setup.py and init.py

Currently the only way to run tests requires commenting out lines in the root init.py file. In addition, the number of tests that can be run is limited. Being able to run all of the tests in Python 3 will speed up speed up the compatibility process by a lot. Currently there are about 100 tests fail and another 400 cause errors.

python-modernize was used to eliminate many of the errors. I removed uses of six if they didn't seem necessary. Finally, the files were updated until the test suite would run all tests.

Note: I replaced a many .iteritems() calls with .items() instead of six.iteritems(). I may have changed a few which affect performance.

@luke-zhu
Copy link
Contributor Author

R: @robertwb

Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! I'm going to go ahead and close https://github.com/apache/beam/pull/4078/files and focus on this :)

I've got some questions, but I'm not a committer so don't feel obliged to answer them, just questions from another contributor working on Python :)

Another thing the description of the PR says "This PR allows us to install the package and run the test suite in Python 3 by removing the version constraints in setup.py and init.py" but those changes don't appear to be included in this PR, did you perhaps forget to stage/add them? On the subject, to prevent downstream users from trying to install Beam in Python 3 if we were to make a release before we finished adding support, it might make sense to keep the checks for now, but skip them if an environment variable is present. What do you think?

Also as a heads up to @cclauss this seems to subsume a lot of #4697 (although since 4697 is smaller maybe it makes sense to do it more piece wise, idk).

format.
"""
if not isinstance(file_path_prefix, (basestring, ValueProvider)):
if not isinstance(file_path_prefix, (six.string_types, ValueProvider)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to just use from past.builtins import basestring to reduce the number of lines that need to change but this is fine as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be good. Are we ok with adding future as a dependency to master? Or should we get a separate Python3 branch first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think future is a reasonable dependency to take in master. Although the idea of a Python 3 branch could be good if we continue to take a long time to review/merge Python 3 fixes into master (although then we'd need people to be able to cooperate on a Python 3 branch explicitly).

@@ -280,7 +280,7 @@ def encode_to_stream(self, value, stream, nested):
elif t is str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It maybe makes sense to add an explicit byte test in coders_test

@@ -143,9 +143,9 @@ def test_bytes_coder(self):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are in Python 3 for the bytes coder it maybe makes sense to test with bytes rather than strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done any work for making the tests pass in Python 3. There are many errors in coders due to string compatibility issues though. If we can get a version which builds on Python 3 up to a remote branch, it would be easy to communicate progress. I saw that you created a tox env in a PR.

:class:`~apache_beam.io._AvroSource`.
"""

from six import integer_types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it maybe make sense to import integer_types and string_types with the from directive here (or I guess what is the reason for the change)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way works. I just chose one way to keep things consistent after applying the auto-converter. Is there anything I'm missing?

lambda x: IntervalWindow(
start=Timestamp(micros=(x['end'] - x['span']) * 1000),
end=Timestamp(micros=x['end'] * 1000)),
'urn:beam:coders:stream:0.1': lambda x, parser: map(parser, x),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this test is seems to indicate its for testing a stream but by putting list around it this isn't really a stream anymore? Or am I missreading the code here? (these tests are a little less than self-documenting maybe @vikkyrk who added the iterablecoder tests can chime in).

def test_read(self):
sources = [TestConcatSource.DummySource(range(start, start + 10)) for start
in [0, 10, 20]]
sources = [TestConcatSource.DummySource(list(range(start, start + 10)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these really need to be changed? From looking at DummySource it seems like the operations it does should work fine on ranges.

file_sizes = gcsio.GcsIO().size_of_files_in_glob(pattern, limit)
metadata_list = [FileMetadata(path, size)
for path, size in file_sizes.iteritems()]
for path, size in file_sizes.items()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in most places you seem to use six.iteritems it might be better to be consistent about that? It's possible that there could be a large number of input files as well in which case this would be less than great for Python 2.

# Now we create the MetricResult elements.
result = []
for metric_key, metric in metrics_by_name.iteritems():
for metric_key, metric in metrics_by_name.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in some places you seem to use six.iteritems it might be better to be consistent about that? There probably aren't that many metrics though so this isn't a big deal.


def test_varint_coder(self):
# Small ints.
self.check_coder(coders.VarIntCoder(), *range(-10, 10))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't think this change is actually necessary, the unpacking argument lists operator works just fine on ranges in Python 3.

@cclauss
Copy link

cclauss commented Feb 23, 2018

@luke-zhu I was trying to make progress with futurize (which seems to rely more on six) instead of modernize but I am happy to see progress made either way. Please resolve conflicts.

@holdenk
Copy link
Contributor

holdenk commented Feb 24, 2018

So one quick note on the merge commits, the beam folks seem to prefer rebases to merges (although I personally find merges easier for keeping my PRs up to date so I understand).

@luke-zhu
Copy link
Contributor Author

Thanks for reviewing! Not sure if I want to squash yet.

@holdenk
Copy link
Contributor

holdenk commented Feb 28, 2018

Legit, we can save the squash for later.

@luke-zhu luke-zhu closed this Mar 22, 2018
@icoxfog417
Copy link

icoxfog417 commented Apr 3, 2018

Is this pull request merged? I'm waiting for Python3 support.

@aaltay
Copy link
Member

aaltay commented Apr 3, 2018

@icoxfog417 please use https://issues.apache.org/jira/browse/BEAM-1251 for tracking Python 3 support in Beam. It is tracking the overall support.

@icoxfog417
Copy link

icoxfog417 commented Apr 3, 2018

@aaltay Thank you for letting me know. I watched BEAM-1251 already and I understand this issue is subtask for BEAM-1251 (BEAM-3731).

But this issue is closed without merge and BEAM-3731 have not solved yet (BEAM-2713 is too).

@aaltay
Copy link
Member

aaltay commented Apr 3, 2018

Yes they are still both open. If you are interested in helping you can start working on any subtasks under BEAM-1251.

@icoxfog417
Copy link

Yes, I want to do it! But I feel this pull request is reviewed enough and commit log is arranged. So I want to know whether this pull request is waiting for merge or something is lacking to do it.

@cclauss
Copy link

cclauss commented Apr 3, 2018

You might look at the MetricName and MetricKey issues discussed at #4798 (comment) and see if you can determine where they are defined.

@luke-zhu
Copy link
Contributor Author

luke-zhu commented Apr 3, 2018

Hi @icoxfog417. Thanks for bringing this up. There was a decision in early March to make the Python3 porting process more incremental. A lot of the changes here are duplicates and I haven't been contributing recently so I closed the PR. I've marked the sub-issue as a duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants