-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-12000] Update deprecated collections to use collections.abc in python sdk #15415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-12000] Update deprecated collections to use collections.abc in python sdk #15415
Conversation
Codecov Report
@@ Coverage Diff @@
## master #15415 +/- ##
==========================================
+ Coverage 83.83% 84.54% +0.70%
==========================================
Files 440 443 +3
Lines 59886 59128 -758
==========================================
- Hits 50207 49990 -217
+ Misses 9679 9138 -541
Continue to review full report at Codecov.
|
tvalentyn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this fix.
Should we import collecitons.abc as well? I am seeing tests are passing, so not sure if all the affected codepaths were triggered, but seeing this locally:
:~$ python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import collections
>>> isinstance(object, collections.abc.Iterable)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.9/collections/__init__.py", line 68, in __getattr__
raise AttributeError(f'module {__name__!r} has no attribute {name!r}')
AttributeError: module 'collections' has no attribute 'abc'
>>> import collections.abc
>>> isinstance(object, collections.abc.Iterable)
False
|
what's the status for this PR? |
|
I've updated the import to cover collections.abc. I'm unsure why the tests are fine as some tests definitely do pass by those lines based on adding breakpoints, but better to be safe. Most likely reason why tests are fine is that some other import indirectly adds collections.abc. It's a common library to import so most of the time you will get somewhere. PR status was mainly I needed to get dev environment set up. Back when I made the pr I made the edits without setting up an environment. Now that I've got beam locally set up, I added a simple unit test that should hopefully fix coverage failure. Coverage failure was only thing blocking the pr earlier. For coverage I noticed overload lines were counting as untyped when they should be skipped so added overload to exclude_lines. RAT precommit hook error confuses me, pylint error I'll fix in a bit. edit: On coverage check I'm a little confused. Why is there both .coveragerc and setup.cfg with coverage options? Which one is used by CI? Can there only be one? |
| def pack_Any(msg): | ||
| # type: (message.Message) -> any_pb2.Any | ||
| pass | ||
| ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass vs ... are similar but not same. pass has an inferred return type of None for some type checkers. Pyright will give you an error for using pass here. ... is typical way to mark overloads as stubs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are test files even type checked? I tried running mypy on this file and got an error. Also discovered mypy.ini for the repo has an invalid value giving this error message,
mypy.ini: [mypy]: follow_imports: invalid choice 'true' (choose from 'normal', 'silent', 'skip', 'error')
The mypy error that prevents it from type checking this file is,
apache_beam/portability/api/metrics_pb2.pyi:117: error: invalid syntax [syntax] Looks like root cause of that error is a comment using type: notation for doc string and confusing it with standard type comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I think generated _pb2.py files should be excluded from the type checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know about this usage of Ellipsis. Found: https://mypy.readthedocs.io/en/stable/stubs.html and python/typing#109 where this is disussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pb2.py is not type checked, but pb2.pyi is needed for type inference. Looks like proto file has a comment that confuses mypy. I can make a tweak to the comment to make it fine for mypy.
| args: ["--rcfile=sdks/python/.pylintrc"] | ||
| files: ^sdks/python/apache_beam/ | ||
| exclude: *exclude | ||
| - repo: https://github.com/pycqa/isort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as CI runs isort but pre-commit doesn't. Should have same behavior as ci. The setup.cfg tweaks I did were to match run_pylint.sh in CI.
|
I'm confused by test failure on ubuntu-latest 3.7, but 3.7 works on other platforms + other ubuntus work. None of the changes I made look platform related. Can that check be re-run/potentially flaky? isort failure looks like I need to find one more config option to make pre-commit match ci fully. |
It's a flake, it's being fixed in BEAM-12794.
I don't know, you could ask on dev@ or @udim might know. |
| orig_msg = unpack_Any(packed_msg, timestamp_pb2.Timestamp) | ||
| none_msg = unpack_Any(packed_msg, None) | ||
| assert proto_timestamp == orig_msg | ||
| assert none_msg is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a left over given that you have a separate test_none_pack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_None_pack is to check packing None. unpack_Any does not allow None for the msg, but allows None for the msg type. So this is intended. I wanted to test both None msg pack and None class unpack.
| output = target/site/cobertura/coverage.xml | ||
|
|
||
| [isort] | ||
| line_length = 120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does this number come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/beam/blob/master/sdks/python/scripts/run_pylint.sh#L105 it's chosen to match isort usage in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Came across https://issues.apache.org/jira/browse/BEAM-3745 with touches on isort & lint. perhaps we can close it already or after your changes.
|
|
||
| # Don't complain about missing debug-only code: | ||
| def __repr__ | ||
| if self\.debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about: if self\.debug_logging_enabled to avoid pattern-matching with if self.debug_options which occurs in the codebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. The reason I did if self.debug was to make .coveragerc consistent with setup.cfg coverage option. For the infra pr I can make them both self.debug_logging_enabled
| @@ -0,0 +1,25 @@ | |||
| import unittest | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RAT check wants you to add a license here, see other files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you will fix.
|
Thanks, I left some comments. It would be better to split infra changes in this PR into self-contained changes, either as separate PRs or separate commits appropriately squashed. If you choose to have multiple commits in the same PR, please don't squash the commits before review on this PR finalizes. |
|
I'll split the pr into two prs. One for infra changes and one for unit test + abc.collections change. Unit test is only needed with the abc.collections change for coverage check to be satisfied. |
|
hi there! any updates on this PR? : ) |
|
Closing in favor of #15850 @hmc-cs-mdrissi feel free to rebase this PR to keep the infrastructure changes that you made and reopen it. Thank you! |
Update all usages of collections.Sequence/collections.Iterable/etc to use collections.abc as old name is deprecated and will be removed in 3.10. collections.abc was added in 3.3 so all python versions beam supports should be compatible with this change.
@tvalentyn This fix was motivated by seeing the deprecation warning and this ticket, https://issues.apache.org/jira/browse/BEAM-12000?jql=text%20~%20%223.9%22. Doesn't solve the main work for the ticket, but covers one easy change.