Skip to content

Conversation

@amol-
Copy link
Member

@amol- amol- commented Sep 27, 2021

Address all docstrings to make sure they pass archery numpydoc --allow-rule PR01

@github-actions
Copy link

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your effort here! Added a few comments

{0} : optional
Parameter for {1} constructor. Either `options`
or `{0}` can be passed, but not both at the same time.
""".format(p.name, option_class.__name__))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! (now we should still have a way to automatically add actual explanation of the keyword, but that's for another JIRA :))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amol- amol- marked this pull request as ready for review September 30, 2021 07:57
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at the additional commits

num_record_batches : number of record batches.
num_dictionary_batches : number of dictionary batches.
num_dictionary_deltas : delta of dictionaries.
num_replaced_dictionaries : number of replaced dictionaries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice those are never created by the user, so not sure how useful such a docstring is (except for passing the check ..)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although those can of course still be accessed by the user (so strictly speaking it might actually be more logical to list those as "Attributes" instead of "Parameters", but OK :))

the size of individual record batches or table chunks.
Minimum valid value for block size is 1
skip_rows: int, optional (default 0)
skip_rows : int, optional (default 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record, did you have to do these by hand or is there a mechanical fixer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did those by hand, I wanted to check what issues the docstrings had and thus had to look at them anyway. At that point fixing them was a minor deal.

If it doesn't already exist I guess that an automation to fix them wouldn't be too hard to build, but given that @kszucs has a PR to prevent broken/missing docstrings from happening again I hope this is the only time we have to do this work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amol- yes, #7732 should be ready depending on which rules do we want to enable at first.

Range of the rows,
The i-th row spans from `indptr[i]` to `indptr[i+1]` in the data.
indices : numpy.ndarray
Column indices of the corresponding non-zero values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you have copy-pasted all this from the previous from_numpy methods, but it doesn't really correspond to what's expected. Below you can see that these are lists of ndarrays (CSF is a n-dimensional generalization of the idea behind CSR and CSC, there's an explanation here: https://www.boristhebrave.com/2021/01/01/compressed-sparse-fibers-explained/).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I copy/pasted from CSR / CSC and didn't notice in this case was a multidimensional Tensor.
I would have expected that for CSF Tensor the arguments were multiple arrays but they still are indptr+indices which confused me, I'll edit the docstring accordingly.

Copy link
Member Author

@amol- amol- Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked the docstrings to match what we have in https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs

Hopefully this should do

shape : tuple
Shape of the matrix.
axis_order : list, optional
The order of the axis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say for example "The dimensions corresponding to each array in indptr and indices"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou pitrou closed this in 0dfe592 Oct 4, 2021
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
Address all docstrings to make sure they pass `archery numpydoc --allow-rule PR01`

Closes apache#11245 from amol-/ARROW-13637

Authored-by: Alessandro Molina <amol@turbogears.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants