-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-3731] Enable tests to run in Python 3 #4730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
R: @robertwb |
holdenk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! I'm going to go ahead and close https://github.com/apache/beam/pull/4078/files and focus on this :)
I've got some questions, but I'm not a committer so don't feel obliged to answer them, just questions from another contributor working on Python :)
Another thing the description of the PR says "This PR allows us to install the package and run the test suite in Python 3 by removing the version constraints in setup.py and init.py" but those changes don't appear to be included in this PR, did you perhaps forget to stage/add them? On the subject, to prevent downstream users from trying to install Beam in Python 3 if we were to make a release before we finished adding support, it might make sense to keep the checks for now, but skip them if an environment variable is present. What do you think?
Also as a heads up to @cclauss this seems to subsume a lot of #4697 (although since 4697 is smaller maybe it makes sense to do it more piece wise, idk).
| format. | ||
| """ | ||
| if not isinstance(file_path_prefix, (basestring, ValueProvider)): | ||
| if not isinstance(file_path_prefix, (six.string_types, ValueProvider)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is to just use from past.builtins import basestring to reduce the number of lines that need to change but this is fine as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be good. Are we ok with adding future as a dependency to master? Or should we get a separate Python3 branch first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think future is a reasonable dependency to take in master. Although the idea of a Python 3 branch could be good if we continue to take a long time to review/merge Python 3 fixes into master (although then we'd need people to be able to cooperate on a Python 3 branch explicitly).
| @@ -280,7 +280,7 @@ def encode_to_stream(self, value, stream, nested): | |||
| elif t is str: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It maybe makes sense to add an explicit byte test in coders_test
| @@ -143,9 +143,9 @@ def test_bytes_coder(self): | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we are in Python 3 for the bytes coder it maybe makes sense to test with bytes rather than strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done any work for making the tests pass in Python 3. There are many errors in coders due to string compatibility issues though. If we can get a version which builds on Python 3 up to a remote branch, it would be easy to communicate progress. I saw that you created a tox env in a PR.
| :class:`~apache_beam.io._AvroSource`. | ||
| """ | ||
|
|
||
| from six import integer_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it maybe make sense to import integer_types and string_types with the from directive here (or I guess what is the reason for the change)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way works. I just chose one way to keep things consistent after applying the auto-converter. Is there anything I'm missing?
| lambda x: IntervalWindow( | ||
| start=Timestamp(micros=(x['end'] - x['span']) * 1000), | ||
| end=Timestamp(micros=x['end'] * 1000)), | ||
| 'urn:beam:coders:stream:0.1': lambda x, parser: map(parser, x), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this test is seems to indicate its for testing a stream but by putting list around it this isn't really a stream anymore? Or am I missreading the code here? (these tests are a little less than self-documenting maybe @vikkyrk who added the iterablecoder tests can chime in).
| def test_read(self): | ||
| sources = [TestConcatSource.DummySource(range(start, start + 10)) for start | ||
| in [0, 10, 20]] | ||
| sources = [TestConcatSource.DummySource(list(range(start, start + 10))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these really need to be changed? From looking at DummySource it seems like the operations it does should work fine on ranges.
| file_sizes = gcsio.GcsIO().size_of_files_in_glob(pattern, limit) | ||
| metadata_list = [FileMetadata(path, size) | ||
| for path, size in file_sizes.iteritems()] | ||
| for path, size in file_sizes.items()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in most places you seem to use six.iteritems it might be better to be consistent about that? It's possible that there could be a large number of input files as well in which case this would be less than great for Python 2.
| # Now we create the MetricResult elements. | ||
| result = [] | ||
| for metric_key, metric in metrics_by_name.iteritems(): | ||
| for metric_key, metric in metrics_by_name.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in some places you seem to use six.iteritems it might be better to be consistent about that? There probably aren't that many metrics though so this isn't a big deal.
|
|
||
| def test_varint_coder(self): | ||
| # Small ints. | ||
| self.check_coder(coders.VarIntCoder(), *range(-10, 10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I don't think this change is actually necessary, the unpacking argument lists operator works just fine on ranges in Python 3.
|
@luke-zhu I was trying to make progress with futurize (which seems to rely more on six) instead of modernize but I am happy to see progress made either way. Please resolve conflicts. |
|
So one quick note on the merge commits, the beam folks seem to prefer rebases to merges (although I personally find merges easier for keeping my PRs up to date so I understand). |
from running all tests.
|
Thanks for reviewing! Not sure if I want to squash yet. |
|
Legit, we can save the squash for later. |
|
Is this pull request merged? I'm waiting for Python3 support. |
|
@icoxfog417 please use https://issues.apache.org/jira/browse/BEAM-1251 for tracking Python 3 support in Beam. It is tracking the overall support. |
|
Yes they are still both open. If you are interested in helping you can start working on any subtasks under BEAM-1251. |
|
Yes, I want to do it! But I feel this pull request is reviewed enough and commit log is arranged. So I want to know whether this pull request is waiting for merge or something is lacking to do it. |
|
You might look at the MetricName and MetricKey issues discussed at #4798 (comment) and see if you can determine where they are defined. |
|
Hi @icoxfog417. Thanks for bringing this up. There was a decision in early March to make the Python3 porting process more incremental. A lot of the changes here are duplicates and I haven't been contributing recently so I closed the PR. I've marked the sub-issue as a duplicate. |
DESCRIPTION HERE
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue.mvn clean verifyto make sure basic checks pass. A more thorough check will be performed on your pull request automatically.This PR allows us to install the package and run the test suite in Python 3 by removing the version constraints in setup.py and init.py
Currently the only way to run tests requires commenting out lines in the root init.py file. In addition, the number of tests that can be run is limited. Being able to run all of the tests in Python 3 will speed up speed up the compatibility process by a lot. Currently there are about 100 tests fail and another 400 cause errors.
python-modernize was used to eliminate many of the errors. I removed uses of six if they didn't seem necessary. Finally, the files were updated until the test suite would run all tests.
Note: I replaced a many .iteritems() calls with .items() instead of six.iteritems(). I may have changed a few which affect performance.