-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-3444] Fix python3 flake8 errors e999 #4376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-3444] Fix python3 flake8 errors e999 #4376
Conversation
d55ded3 to
7f5e1bc
Compare
| @@ -151,10 +151,14 @@ def compute_term_frequency(uri_count_and_total): | |||
| # receives the value we listed after the lambda in Map(). Additional side | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment block is referring to the removed lambda.
aaltay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @holdenk!
| # receives the value we listed after the lambda in Map(). Additional side | ||
| # inputs (and ordinary Python values, too) can be provided to MapFns and | ||
| # DoFns in this way. | ||
| def div_word_count_by_total(word_counts, total): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename word_counts to word_count?
|
|
||
| except: | ||
| print res._metrics_by_stage | ||
| print(res._metrics_by_stage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use logging. logging.info perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure since this is done in a test, I think printing might actually be the right behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, printing is fine. It's just so that if this test fails, there is some context.
| @@ -50,7 +51,7 @@ def call_fn(): | |||
| thread.join(timeout_secs) | |||
| if exc_info: | |||
| t, v, tb = exc_info # pylint: disable=unbalanced-tuple-unpacking | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need disable=unbalanced-tuple-unpacking, we do not have it in data_plane.py:186
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, either way its probably unrelated to the flake8 changes. I think doing a follow up sweep of the linter disable statements could be a good starter task for someone later?
| odd_one_out = [sorted_data[-1]] if len(sorted_data) % 2 == 1 else [] | ||
| # Sort the pairs by how different they are. | ||
|
|
||
| def div_keys(kv1_kv_2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename kv1_kv_2 to kv1_kv2.
| error_msg = ('Runtime type violation detected within ParDo(%s): ' | ||
| '%s' % (self.full_label, e)) | ||
| raise TypeCheckError, error_msg, sys.exc_info()[2] | ||
| raise_with_traceback(TypeCheckError(error_msg), sys.exc_info()[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use from future.utils import raise_ similar to other changes? (Like raise_(TypeCheckError, error_msg, sys.exc_info()[2])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean we can, but since were constructing a new one this is maybe nicer? raise_ is a little ugly imho. But I'm happy to use it more places if we need to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to raise_with_traceback.
| except StopIteration: | ||
| # Re-raise the original exception since we finished the retries. | ||
| raise exn, None, exn_traceback # pylint: disable=raising-bad-type | ||
| raise_with_traceback(exn, exn_traceback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use raise_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we don't need to be explicit about the msg I think this is good.
| fi | ||
|
|
||
| echo "Running flake8 for module $MODULE:" | ||
| flake8 $MODULE --count --select=E999 --show-source --statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a separate file? I see the we are running the same thing in run_pylint.sh already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So many of the current linters don't pass when run in a Py3 env. We can copy them over 1 and a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
Optional: We don't need mini in the name.
| "avroio_test.py" | ||
| "datastore_wordcount.py" | ||
| "datastoreio_test.py" | ||
| "hadoopfilesystem.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need exceptions for a growing list of files?
- @ehudm specifically for this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw this (my username is udim). Was wondering why this file was added to this list?
| whitelist_externals=time | ||
| commands = | ||
| time pip install -e .[test] | ||
| time {toxinidir}/run_mini_py3lint.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need two lint environments? We run this command in the single lint environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a new one for Python3 linting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear flake8 only catches the E999 issues when run in Python3
sdks/python/tox.ini
Outdated
| time {toxinidir}/run_pylint.sh | ||
| passenv = TRAVIS* | ||
|
|
||
| [testenv:lint2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps rename this to lint3 lint_py3 or something similar. 2 might be confusing.
57eb4a8 to
0df7e5a
Compare
|
Gentle re-ping :) |
|
Ping @aaltay ? |
|
Ok so looks like y'all merged some conflicting changes I'm open to updating this again but I'd like to know when to expect a review so I don't chase my tail (especially since y'all request rebasing). |
0df7e5a to
2522a55
Compare
|
@holdenk I can make a pass over this next week. I do not think you need to rebased till we agree on the comments, that way you can only do one rebase after the last round. If you prefer, it is fine to rebase at any point as well. |
|
Ok, I know in some other projects PRs which aren't mergeable tend to get lower review priority but if that is not the case here I'll focus my time on actual changes which are more fun/useful. |
|
Ping? Are there other reviewers interested in Py3 migration I should reach out to? |
robertwb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only minor comments. The loss of tuple unpacking for function arguments is particularly sad for our use, but so be it.
| # inputs (and ordinary Python values, too) can be provided to MapFns and | ||
| # DoFns in this way. | ||
| def div_word_count_by_total(word_count, total): | ||
| (word, count) = word_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: You don't need parentheses in the unpacking or the packing into tuples.
|
|
||
| except: | ||
| print res._metrics_by_stage | ||
| print(res._metrics_by_stage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, printing is fine. It's just so that if this test fails, there is some context.
| if self._exc_info: | ||
| raise self.exc_info[0], self.exc_info[1], self.exc_info[2] | ||
| t, v, tb = self._exc_info | ||
| raise_(t, v, tb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or raise_(*self._exc_info)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm flexible, I like the explicit unpack but sure.
| error_msg = ('Runtime type violation detected within ParDo(%s): ' | ||
| '%s' % (self.full_label, e)) | ||
| raise TypeCheckError, error_msg, sys.exc_info()[2] | ||
| raise_with_traceback(TypeCheckError(error_msg), sys.exc_info()[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to raise_with_traceback.
|
So on Feb 15nth @robertwb merged a fix for some of these except using a different library and without tests for regressions. I'm going to rebase this to simply act as a test for the fixes (although it would have been nice if we had merged this before) in that PR. |
2522a55 to
996eba1
Compare
|
Just to clarify I don't mean necessarily that one or the other should be merged its just with the long review cycle to dev cycle of the small patches we end up with a lot of conflicts (which is ok if less than most-fun). |
aaltay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
One general comment, how does py3 environment works on Jenkins? Those machines do not have python3 installed (pending: https://issues.apache.org/jira/browse/BEAM-3671). #4610 is also blocked on that issue.
| fi | ||
|
|
||
| echo "Running flake8 for module $MODULE:" | ||
| flake8 $MODULE --count --select=E999 --show-source --statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
Optional: We don't need mini in the name.
|
@aaltay so I was thinking of keeping mini to indicate its not full until we have the full linting working in Python 3. Does that sound reasonable? Open to changing if not. |
|
And the answer is -- for Jenkins it does not (although I can remove the env from the list if we want to merge it based on local runs while we wait for Jenkins to update). |
|
The name is fine either way (with or without mini), it is up to you. We can always change it later. Let's remove it from the env list, until the JIRA I mentioned is fixed. Because otherwise running tox without specifying environment will fail on Jenkins. Although this is sad and it will not prevent not merging more breaking changes. (You can add a comment about this to the JIRA and we can add it back to the list once python3 is installed on the workers.) |
|
Added https://issues.apache.org/jira/browse/BEAM-3738 with a link to 3671 |
|
Sounds good. I can merge this once tests pass. I would suggest squashing your commits. |
|
Sure I can squash this down to one.
|
3b048b8 to
b70af6d
Compare
…d by flake8 Mini style fixes Quick unpacking fixes Fix type tests to match the changed func Fix band to bandc oops Passe the message text through Switch mostly to raise_ even though raise_with_traceback should be closer. Make mini_py3lint executable Install future as well Fix tfidf div_word_count_by_total function (oops) Take @aaltay's comments into account Update comment, explicitly unpack tuple in test. Standardize on six to match @luke-zhu's work Add the envlists Remove lint_py3 for now
b70af6d to
9850d3f
Compare
|
There is a lint error Running pylint for module apache_beam: |
…efully we can remove isort after the py3 migration is complete and just depend on pylint.
|
ok there we go :) |
|
West coast working time ping @aaltay :) |
|
Thank you. Merged. |
| python setup.py test | ||
| passenv = TRAVIS* | ||
|
|
||
| [testenv:lint] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the name changed?
Please revert or update https://beam.apache.org/contribute/contribution-guide/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll ping you on the PR for that.
| + step_annotation) | ||
| new_exn._tagged_with_step = True | ||
| raise new_exn, None, original_traceback | ||
| six.raise_from(new_exn, original_traceback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use of raise_from instead of reraise led to a bug with dropped stacktraces: https://issues.apache.org/jira/projects/BEAM/issues/BEAM-3956
Are the other uses of raise_from instead of reraise in this PR appropriate?
Fix the E999 errors from Python3 flake 8 & attempt to add flake8 Python3 to our linting since we've had some new things pop up which would have been caught by this.