-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-14798: [C++][Python][R] Add container window to PrettyPrintOptions #12091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
da6aefd to
bb1fe2d
Compare
|
I understand that For example, for printing a ChunkedArray, you could then use That doesn't give a great control, I think, but I suppose there is not much to do about that with the current generic interface. |
@jorisvandenbossche Do you like the |
|
So in the latest commits, you changed from What's the exact consequence of that change on the repr in your example in the top post? (I am having a bit troubles wrapping my head around it .. ;)) |
|
Sidenote: for a short-term fix for 7.0.0, an option could also be to simply truncate the output of |
Yeah a container right now is just a Default
That output is up-to-date with the latest changes. The place where this change makes the most difference is if there are two-levels of container, such as a ChunkedArray of a ListArray. I would think about it in terms of the window at each level: |
I would also be in favor of having some character cap available. |
|
@jorisvandenbossche I created a Jira issue for the character limit idea: https://issues.apache.org/jira/browse/ARROW-15329 I'll work on a quick PR for that. |
|
@wjones127 there are some C++ linting errors |
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Antoine Pitrou <antoine@python.org>
04d8aaf to
496a04b
Compare
|
I was testing this locally, I think we might actually eventually want both this and some character cap as in #12148 (but maybe at the "scalar level"). The data I was testing with is a large table with geometry data. For the normal columns (some ints and strings) this PR is a nice improvement. But the geometry column basically consists of a binary column with big blobs (individual scalar values of the column are for this specific dataset up to a length of 8000). And so even when only printing 10 values of this, that still floods the console. So long term (not necessarily for 7.0 though :)), we might want (in addition to this PR) to limit the max size of the repr for individual scalars? (when printed in a table, not when printed as individual scalar) Instead of limiting the max size of the column as you do in #12148 (limiting it per scalar might also be easier, because then you don't get the complexities around truncating the column repr nicely between scalars, as discussed in #12148 (comment) ) |
Yes, I think this works in combination with the character cap measures. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. This is a very a nice improvement. Thank you @wjones127 !
|
@jorisvandenbossche Do you want to make a final review here? |
|
Benchmark runs are scheduled for baseline = ff92930 and contender = ae1ce19. ae1ce19 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Summary
This PR makes a few changes to PrettyPrinting to make output shorter, particularly for ChunkedArray and ListArray types.
container_windowargument toPrettyPrinterOptions, which controls the window for ChunkedArray and ListArray separately from other types.PrettyPrinterto pass downChildOptions()to recursive calls. The main effect of this is thatskip_new_linesis now passed down to children of StructArrays. It also makes sure thatwindowandcontainerwindow are passed down to children.ChunkedArrayprinter to always put new lines between sub-arrays of StructArray.ChunkedArrayprint output after ellipsis.MapArrayprinter to only indent if being printed on multiple lines.These changes affect the C++, Python, and R implementations.
Example
Here's a little test snippet:
Output Before
Output after