-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13637: [Python] Fix docstrings #11245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your effort here! Added a few comments
| {0} : optional | ||
| Parameter for {1} constructor. Either `options` | ||
| or `{0}` can be passed, but not both at the same time. | ||
| """.format(p.name, option_class.__name__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! (now we should still have a way to automatically add actual explanation of the keyword, but that's for another JIRA :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at the additional commits
| num_record_batches : number of record batches. | ||
| num_dictionary_batches : number of dictionary batches. | ||
| num_dictionary_deltas : delta of dictionaries. | ||
| num_replaced_dictionaries : number of replaced dictionaries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice those are never created by the user, so not sure how useful such a docstring is (except for passing the check ..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although those can of course still be accessed by the user (so strictly speaking it might actually be more logical to list those as "Attributes" instead of "Parameters", but OK :))
| the size of individual record batches or table chunks. | ||
| Minimum valid value for block size is 1 | ||
| skip_rows: int, optional (default 0) | ||
| skip_rows : int, optional (default 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the record, did you have to do these by hand or is there a mechanical fixer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did those by hand, I wanted to check what issues the docstrings had and thus had to look at them anyway. At that point fixing them was a minor deal.
If it doesn't already exist I guess that an automation to fix them wouldn't be too hard to build, but given that @kszucs has a PR to prevent broken/missing docstrings from happening again I hope this is the only time we have to do this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python/pyarrow/tensor.pxi
Outdated
| Range of the rows, | ||
| The i-th row spans from `indptr[i]` to `indptr[i+1]` in the data. | ||
| indices : numpy.ndarray | ||
| Column indices of the corresponding non-zero values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you have copy-pasted all this from the previous from_numpy methods, but it doesn't really correspond to what's expected. Below you can see that these are lists of ndarrays (CSF is a n-dimensional generalization of the idea behind CSR and CSC, there's an explanation here: https://www.boristhebrave.com/2021/01/01/compressed-sparse-fibers-explained/).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have some text here https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs#L162-L200
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I copy/pasted from CSR / CSC and didn't notice in this case was a multidimensional Tensor.
I would have expected that for CSF Tensor the arguments were multiple arrays but they still are indptr+indices which confused me, I'll edit the docstring accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tweaked the docstrings to match what we have in https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
Hopefully this should do
python/pyarrow/tensor.pxi
Outdated
| shape : tuple | ||
| Shape of the matrix. | ||
| axis_order : list, optional | ||
| The order of the axis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say for example "The dimensions corresponding to each array in indptr and indices"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed docstring to match what we have in https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Address all docstrings to make sure they pass `archery numpydoc --allow-rule PR01` Closes apache#11245 from amol-/ARROW-13637 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Address all docstrings to make sure they pass
archery numpydoc --allow-rule PR01