-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13208: [Python][CI] Create a build for validating python docstrings #7732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d4608a9 to
356c300
Compare
|
We have many undocumented arguments in the pyarrow API. It would be nice if we could introduce the numpydoc checks, but at some point we need to enable at least a single rule causing the build to fail. |
|
@kszucs Why not. Is there a way to selectively disable the check on some functions? (a Python comment perhaps?) |
|
@kszucs Can you paste the current error log somewhere? |
|
Truncated log with a single rule (PR01) to check enabled: |
It retrieves data from the python symbols rather than parsing the source files. Otherwise we can maintain a list of symbols (modules for example) to ignore as well as a list to traverse at the first place. |
|
Note that several of the "errors" in the output come from not using a space before the colon (
I think your archery command allows to specify which modules to test? So we can start with selectively only testing the ones that pass. |
|
A good first step might be to ensure no new ones can be added. In other contexts I used to A similar strategy might be applied here, while we fix the existing ones, we can put in place a check that prevents new ones from occurring by simply counting the number of existing errors (or if you want to go further you can compare the list of functions with errors, but In my experience just comparing the count does 90% of the job) |
|
@kszucs Can you rebase and fix conflicts? |
2efeb99 to
77db100
Compare
| 'pyarrow.dataset', | ||
| 'pyarrow.feather', | ||
| 'pyarrow.flight', | ||
| # 'pyarrow.flight', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporarily disabled, created a follow-up: https://issues.apache.org/jira/browse/ARROW-14995
| return Projector.create(result, pool) | ||
|
|
||
|
|
||
| cpdef make_filter(Schema schema, Condition condition): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions are only used from the unittest, so we should not expose them to the public API.
Created a follow-up https://issues.apache.org/jira/browse/ARROW-14996
|
cc @amol- |
|
Created a follow up to iteratively fix and enable more numpydoc checks https://issues.apache.org/jira/browse/ARROW-15006 |
| 'setuptools_scm'], | ||
| 'crossbow-upload': ['github3.py', jinja_req, 'ruamel.yaml', | ||
| 'setuptools_scm'], | ||
| 'numpydoc': ['numpydoc==1.1.0'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason for pinning to this exact version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This it the first version which provides the class and function we use from numpydoc which seem a bit like internal-ish at the moment.
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
|
Benchmark runs are scheduled for baseline = b31dd51 and contender = 2bffb82. 2bffb82 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Using the existing numpydoc checker in archery. I've enabled a single rule to check undocumented arguments (we have 189 currently).
Perhaps we should fix these in this PR then enable it globally to keep the docstrings in good shape?