Skip to content

Conversation

@jorisvandenbossche
Copy link
Member

No description provided.

@jorisvandenbossche
Copy link
Member Author

cc @amol-

After #11745, those functions are not available anymore in the pyarrow.compute namespace (so we can't use autosummary like this to list them). I converted it into a plain list to still keep an overview of the available functions (since those can be specified in group_by().aggregate(..)).

@amol-
Copy link
Member

amol- commented Nov 30, 2021

Thanks for catching this, I actually wanted to rework the compute documentation in https://issues.apache.org/jira/browse/ARROW-13832 but didn't yet have a chance to get there.

I wonder if this is something the CI should be able to catch.

@jorisvandenbossche
Copy link
Member Author

I wonder if this is something the CI should be able to catch.

We can make the doc build fail on warnings (we eg do that in pandas' CI) to catch things like this (sphinx has an option for that). The main problem is that:

  • We first need to get the sphinx doc build warning free in general
  • We currently don't run the DOC build in the CI on a PR (only nightly)
  • The doc build also regularly fails for other reasons (eg Java download failures), so to be able to run it on PRs we should either make it more stable, or split it up so we can only run the relevant part of the doc build

@jorisvandenbossche
Copy link
Member Author

Although #11567 is already addressing some of those issues I think

* ``hash_stddev``
* ``hash_sum``
* ``hash_tdigest``
* ``hash_variance``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will have to also add the docstring content somehow. Otherwise we will be losing the column where we explain what the function does in the output.

sshot

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(IMO then we could also simply not hide the functions, and keep them available for introspection like that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I think we can now remove them from the API Reference (in the end if you can't invoke them it's not true that they are an available api) and I ended up adding the list of them in the compute documentation improvement I was working on: https://github.com/apache/arrow/pull/11830/files

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can remove this list in #11830 (or replace it with your custom directive), instead of merging this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, feel free to close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants