Skip to content

Conversation

@holdenk
Copy link
Contributor

@holdenk holdenk commented Jan 10, 2018

Fix the E999 errors from Python3 flake 8 & attempt to add flake8 Python3 to our linting since we've had some new things pop up which would have been caught by this.

@holdenk holdenk changed the title [BEAM-3444] Fix python3 flake8 errors e999 [WIP][BEAM-3444] Fix python3 flake8 errors e999 Jan 11, 2018
@holdenk holdenk force-pushed the BEAM-3444-fix-flake8-errors-e999 branch 2 times, most recently from d55ded3 to 7f5e1bc Compare January 18, 2018 03:05
@holdenk holdenk changed the title [WIP][BEAM-3444] Fix python3 flake8 errors e999 [BEAM-3444] Fix python3 flake8 errors e999 Jan 19, 2018
@holdenk
Copy link
Contributor Author

holdenk commented Jan 19, 2018

Ping @aaltay @ehudm @robertwb

@@ -151,10 +151,14 @@ def compute_term_frequency(uri_count_and_total):
# receives the value we listed after the lambda in Map(). Additional side
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment block is referring to the removed lambda.

Copy link
Member

@aaltay aaltay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @holdenk!

# receives the value we listed after the lambda in Map(). Additional side
# inputs (and ordinary Python values, too) can be provided to MapFns and
# DoFns in this way.
def div_word_count_by_total(word_counts, total):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename word_counts to word_count?


except:
print res._metrics_by_stage
print(res._metrics_by_stage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use logging. logging.info perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure since this is done in a test, I think printing might actually be the right behaviour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, printing is fine. It's just so that if this test fails, there is some context.

@@ -50,7 +51,7 @@ def call_fn():
thread.join(timeout_secs)
if exc_info:
t, v, tb = exc_info # pylint: disable=unbalanced-tuple-unpacking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need disable=unbalanced-tuple-unpacking, we do not have it in data_plane.py:186

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, either way its probably unrelated to the flake8 changes. I think doing a follow up sweep of the linter disable statements could be a good starter task for someone later?

odd_one_out = [sorted_data[-1]] if len(sorted_data) % 2 == 1 else []
# Sort the pairs by how different they are.

def div_keys(kv1_kv_2):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename kv1_kv_2 to kv1_kv2.

error_msg = ('Runtime type violation detected within ParDo(%s): '
'%s' % (self.full_label, e))
raise TypeCheckError, error_msg, sys.exc_info()[2]
raise_with_traceback(TypeCheckError(error_msg), sys.exc_info()[2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use from future.utils import raise_ similar to other changes? (Like raise_(TypeCheckError, error_msg, sys.exc_info()[2])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we can, but since were constructing a new one this is maybe nicer? raise_ is a little ugly imho. But I'm happy to use it more places if we need to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to raise_with_traceback.

except StopIteration:
# Re-raise the original exception since we finished the retries.
raise exn, None, exn_traceback # pylint: disable=raising-bad-type
raise_with_traceback(exn, exn_traceback)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use raise_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we don't need to be explicit about the msg I think this is good.

fi

echo "Running flake8 for module $MODULE:"
flake8 $MODULE --count --select=E999 --show-source --statistics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate file? I see the we are running the same thing in run_pylint.sh already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So many of the current linters don't pass when run in a Py3 env. We can copy them over 1 and a time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Optional: We don't need mini in the name.

"avroio_test.py"
"datastore_wordcount.py"
"datastoreio_test.py"
"hadoopfilesystem.py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need exceptions for a growing list of files?

  • @ehudm specifically for this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw this (my username is udim). Was wondering why this file was added to this list?

whitelist_externals=time
commands =
time pip install -e .[test]
time {toxinidir}/run_mini_py3lint.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need two lint environments? We run this command in the single lint environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a new one for Python3 linting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear flake8 only catches the E999 issues when run in Python3

time {toxinidir}/run_pylint.sh
passenv = TRAVIS*

[testenv:lint2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps rename this to lint3 lint_py3 or something similar. 2 might be confusing.

@holdenk holdenk force-pushed the BEAM-3444-fix-flake8-errors-e999 branch from 57eb4a8 to 0df7e5a Compare February 12, 2018 07:22
@holdenk
Copy link
Contributor Author

holdenk commented Feb 12, 2018

Gentle re-ping :)

@holdenk
Copy link
Contributor Author

holdenk commented Feb 14, 2018

Ping @aaltay ?

@holdenk
Copy link
Contributor Author

holdenk commented Feb 15, 2018

Ping @aaltay @robertwb ? Would be nice to get in so people stop committing syntactically invalid py3 code.

@holdenk
Copy link
Contributor Author

holdenk commented Feb 15, 2018

Ok so looks like y'all merged some conflicting changes I'm open to updating this again but I'd like to know when to expect a review so I don't chase my tail (especially since y'all request rebasing).

@holdenk holdenk force-pushed the BEAM-3444-fix-flake8-errors-e999 branch from 0df7e5a to 2522a55 Compare February 15, 2018 13:33
@holdenk
Copy link
Contributor Author

holdenk commented Feb 17, 2018

Let’s try a ping during west coast working hours. CC @robertwb @aaltay

@aaltay
Copy link
Member

aaltay commented Feb 17, 2018

@holdenk I can make a pass over this next week. I do not think you need to rebased till we agree on the comments, that way you can only do one rebase after the last round. If you prefer, it is fine to rebase at any point as well.

@holdenk
Copy link
Contributor Author

holdenk commented Feb 17, 2018

Ok, I know in some other projects PRs which aren't mergeable tend to get lower review priority but if that is not the case here I'll focus my time on actual changes which are more fun/useful.

@holdenk
Copy link
Contributor Author

holdenk commented Feb 21, 2018

Ping? Are there other reviewers interested in Py3 migration I should reach out to?

@holdenk
Copy link
Contributor Author

holdenk commented Feb 23, 2018

@robertwb ?

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments. The loss of tuple unpacking for function arguments is particularly sad for our use, but so be it.

# inputs (and ordinary Python values, too) can be provided to MapFns and
# DoFns in this way.
def div_word_count_by_total(word_count, total):
(word, count) = word_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: You don't need parentheses in the unpacking or the packing into tuples.


except:
print res._metrics_by_stage
print(res._metrics_by_stage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, printing is fine. It's just so that if this test fails, there is some context.

if self._exc_info:
raise self.exc_info[0], self.exc_info[1], self.exc_info[2]
t, v, tb = self._exc_info
raise_(t, v, tb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or raise_(*self._exc_info)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm flexible, I like the explicit unpack but sure.

error_msg = ('Runtime type violation detected within ParDo(%s): '
'%s' % (self.full_label, e))
raise TypeCheckError, error_msg, sys.exc_info()[2]
raise_with_traceback(TypeCheckError(error_msg), sys.exc_info()[2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to raise_with_traceback.

@holdenk
Copy link
Contributor Author

holdenk commented Feb 23, 2018

So on Feb 15nth @robertwb merged a fix for some of these except using a different library and without tests for regressions. I'm going to rebase this to simply act as a test for the fixes (although it would have been nice if we had merged this before) in that PR.

@holdenk holdenk force-pushed the BEAM-3444-fix-flake8-errors-e999 branch from 2522a55 to 996eba1 Compare February 23, 2018 03:07
@holdenk
Copy link
Contributor Author

holdenk commented Feb 23, 2018

Just to clarify I don't mean necessarily that one or the other should be merged its just with the long review cycle to dev cycle of the small patches we end up with a lot of conflicts (which is ok if less than most-fun).

Copy link
Member

@aaltay aaltay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

One general comment, how does py3 environment works on Jenkins? Those machines do not have python3 installed (pending: https://issues.apache.org/jira/browse/BEAM-3671). #4610 is also blocked on that issue.

fi

echo "Running flake8 for module $MODULE:"
flake8 $MODULE --count --select=E999 --show-source --statistics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Optional: We don't need mini in the name.

@holdenk
Copy link
Contributor Author

holdenk commented Feb 23, 2018

@aaltay so I was thinking of keeping mini to indicate its not full until we have the full linting working in Python 3. Does that sound reasonable? Open to changing if not.

@holdenk
Copy link
Contributor Author

holdenk commented Feb 23, 2018

And the answer is -- for Jenkins it does not (although I can remove the env from the list if we want to merge it based on local runs while we wait for Jenkins to update).

@aaltay
Copy link
Member

aaltay commented Feb 23, 2018

The name is fine either way (with or without mini), it is up to you. We can always change it later.

Let's remove it from the env list, until the JIRA I mentioned is fixed. Because otherwise running tox without specifying environment will fail on Jenkins. Although this is sad and it will not prevent not merging more breaking changes. (You can add a comment about this to the JIRA and we can add it back to the list once python3 is installed on the workers.)

@holdenk
Copy link
Contributor Author

holdenk commented Feb 23, 2018

Added https://issues.apache.org/jira/browse/BEAM-3738 with a link to 3671

@aaltay
Copy link
Member

aaltay commented Feb 23, 2018

Sounds good. I can merge this once tests pass. I would suggest squashing your commits.

@holdenk
Copy link
Contributor Author

holdenk commented Feb 23, 2018 via email

@holdenk holdenk force-pushed the BEAM-3444-fix-flake8-errors-e999 branch from 3b048b8 to b70af6d Compare February 23, 2018 07:14
…d by flake8

Mini style fixes

Quick unpacking fixes

Fix type tests to match the changed func

Fix band to bandc oops

Passe the message text through

Switch mostly to raise_ even though raise_with_traceback should be closer.

Make mini_py3lint executable

Install future as well

Fix tfidf div_word_count_by_total function (oops)

Take @aaltay's comments into account

Update comment, explicitly unpack tuple in test.

Standardize on six to match @luke-zhu's work

Add the envlists

Remove lint_py3 for now
@holdenk holdenk force-pushed the BEAM-3444-fix-flake8-errors-e999 branch from b70af6d to 9850d3f Compare February 23, 2018 09:06
@aaltay
Copy link
Member

aaltay commented Feb 23, 2018

There is a lint error

Running pylint for module apache_beam:
************* Module apache_beam.runners.worker.data_plane
E:187,12: Too many positional arguments for function call (too-many-function-args)
************* Module apache_beam.runners.worker.sdk_worker
E:293, 8: Too many positional arguments for function call (too-many-function-args)
************* Module apache_beam.runners.worker.data_plane_test
E: 54, 8: Too many positional arguments for function call (too-many-function-args)

…efully we can remove isort after the py3 migration is complete and just depend on pylint.
@holdenk
Copy link
Contributor Author

holdenk commented Feb 24, 2018

ok there we go :)

@holdenk
Copy link
Contributor Author

holdenk commented Feb 26, 2018

West coast working time ping @aaltay :)

@aaltay aaltay merged commit d29f7b8 into apache:master Feb 27, 2018
@aaltay
Copy link
Member

aaltay commented Feb 27, 2018

Thank you. Merged.

python setup.py test
passenv = TRAVIS*

[testenv:lint]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the name changed?
Please revert or update https://beam.apache.org/contribute/contribution-guide/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll ping you on the PR for that.

+ step_annotation)
new_exn._tagged_with_step = True
raise new_exn, None, original_traceback
six.raise_from(new_exn, original_traceback)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use of raise_from instead of reraise led to a bug with dropped stacktraces: https://issues.apache.org/jira/projects/BEAM/issues/BEAM-3956

Are the other uses of raise_from instead of reraise in this PR appropriate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants