Skip to content

Conversation

@wjones127
Copy link
Member

@wjones127 wjones127 commented Jan 6, 2022

Summary

This PR makes a few changes to PrettyPrinting to make output shorter, particularly for ChunkedArray and ListArray types.

  • Introduces container_window argument to PrettyPrinterOptions, which controls the window for ChunkedArray and ListArray separately from other types.
  • Modified PrettyPrinter to pass down ChildOptions() to recursive calls. The main effect of this is that skip_new_lines is now passed down to children of StructArrays. It also makes sure that window and container window are passed down to children.
  • Modified ChunkedArray printer to always put new lines between sub-arrays of StructArray.
  • Added missing comma in ChunkedArray print output after ellipsis.
  • Changed MapArray printer to only indent if being printed on multiple lines.

These changes affect the C++, Python, and R implementations.

Example

Here's a little test snippet:

from random import sample, choice
import pyarrow as pa

arr_int = pa.array(range(50))
tree_parts = ["roots", "trunk", "crown", "seeds"]
arr_list = pa.array([sample(tree_parts, k=choice(range(len(tree_parts)))) for _ in range(50)])
arr_struct = pa.StructArray.from_arrays([arr_int, arr_list], names=['int_nested', 'list_nested'])
arr_map = pa.array(
    [
        [(part, choice(range(10))) for part in sample(tree_parts, k=choice(range(len(tree_parts))))]
        for _ in range(50)
    ],
    type=pa.map_(pa.utf8(), pa.int64())
)

table = pa.table({
    'int': pa.chunked_array([arr_int] * 10),
    'list': pa.chunked_array([arr_list] * 10),
    'struct': pa.chunked_array([arr_struct] * 10),
    'map': pa.chunked_array([arr_map] * 10),
})
print(table)
Output Before
pyarrow.Table
int: int64
list: list<item: string>
  child 0, item: string
struct: struct<int_nested: int64, list_nested: list<item: string>>
  child 0, int_nested: int64
  child 1, list_nested: list<item: string>
      child 0, item: string
map: map<string, int64>
  child 0, entries: struct<key: string not null, value: int64> not null
      child 0, key: string not null
      child 1, value: int64
----
int: [[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49],[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49],[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49],[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49],[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49],[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49],[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49],[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49],[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49],[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49]]
list: [[["roots","trunk"],["trunk","crown","roots"],["crown","seeds"],["trunk"],[],["crown"],["seeds","crown"],["seeds","roots","trunk"],["roots"],["crown"],...,["trunk","seeds","crown"],["roots","crown","trunk"],["roots"],["crown","trunk","roots"],["crown"],["crown"],["trunk"],["seeds","crown","roots"],[],["trunk","roots"]],[["roots","trunk"],["trunk","crown","roots"],["crown","seeds"],["trunk"],[],["crown"],["seeds","crown"],["seeds","roots","trunk"],["roots"],["crown"],...,["trunk","seeds","crown"],["roots","crown","trunk"],["roots"],["crown","trunk","roots"],["crown"],["crown"],["trunk"],["seeds","crown","roots"],[],["trunk","roots"]],[["roots","trunk"],["trunk","crown","roots"],["crown","seeds"],["trunk"],[],["crown"],["seeds","crown"],["seeds","roots","trunk"],["roots"],["crown"],...,["trunk","seeds","crown"],["roots","crown","trunk"],["roots"],["crown","trunk","roots"],["crown"],["crown"],["trunk"],["seeds","crown","roots"],[],["trunk","roots"]],[["roots","trunk"],["trunk","crown","roots"],["crown","seeds"],["trunk"],[],["crown"],["seeds","crown"],["seeds","roots","trunk"],["roots"],["crown"],...,["trunk","seeds","crown"],["roots","crown","trunk"],["roots"],["crown","trunk","roots"],["crown"],["crown"],["trunk"],["seeds","crown","roots"],[],["trunk","roots"]],[["roots","trunk"],["trunk","crown","roots"],["crown","seeds"],["trunk"],[],["crown"],["seeds","crown"],["seeds","roots","trunk"],["roots"],["crown"],...,["trunk","seeds","crown"],["roots","crown","trunk"],["roots"],["crown","trunk","roots"],["crown"],["crown"],["trunk"],["seeds","crown","roots"],[],["trunk","roots"]],[["roots","trunk"],["trunk","crown","roots"],["crown","seeds"],["trunk"],[],["crown"],["seeds","crown"],["seeds","roots","trunk"],["roots"],["crown"],...,["trunk","seeds","crown"],["roots","crown","trunk"],["roots"],["crown","trunk","roots"],["crown"],["crown"],["trunk"],["seeds","crown","roots"],[],["trunk","roots"]],[["roots","trunk"],["trunk","crown","roots"],["crown","seeds"],["trunk"],[],["crown"],["seeds","crown"],["seeds","roots","trunk"],["roots"],["crown"],...,["trunk","seeds","crown"],["roots","crown","trunk"],["roots"],["crown","trunk","roots"],["crown"],["crown"],["trunk"],["seeds","crown","roots"],[],["trunk","roots"]],[["roots","trunk"],["trunk","crown","roots"],["crown","seeds"],["trunk"],[],["crown"],["seeds","crown"],["seeds","roots","trunk"],["roots"],["crown"],...,["trunk","seeds","crown"],["roots","crown","trunk"],["roots"],["crown","trunk","roots"],["crown"],["crown"],["trunk"],["seeds","crown","roots"],[],["trunk","roots"]],[["roots","trunk"],["trunk","crown","roots"],["crown","seeds"],["trunk"],[],["crown"],["seeds","crown"],["seeds","roots","trunk"],["roots"],["crown"],...,["trunk","seeds","crown"],["roots","crown","trunk"],["roots"],["crown","trunk","roots"],["crown"],["crown"],["trunk"],["seeds","crown","roots"],[],["trunk","roots"]],[["roots","trunk"],["trunk","crown","roots"],["crown","seeds"],["trunk"],[],["crown"],["seeds","crown"],["seeds","roots","trunk"],["roots"],["crown"],...,["trunk","seeds","crown"],["roots","crown","trunk"],["roots"],["crown","trunk","roots"],["crown"],["crown"],["trunk"],["seeds","crown","roots"],[],["trunk","roots"]]]
struct: [  -- is_valid: all not null  -- child 0 type: int64
    [
      0,
      1,
      2,
      3,
      4,
      5,
      6,
      7,
      8,
      9,
      ...
      40,
      41,
      42,
      43,
      44,
      45,
      46,
      47,
      48,
      49
    ]  -- child 1 type: list<item: string>
    [
      [
        "roots",
        "trunk"
      ],
      [
        "trunk",
        "crown",
        "roots"
      ],
      [
        "crown",
        "seeds"
      ],
      [
        "trunk"
      ],
      [],
      [
        "crown"
      ],
      [
        "seeds",
        "crown"
      ],
      [
        "seeds",
        "roots",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown"
      ],
      ...
      [
        "trunk",
        "seeds",
        "crown"
      ],
      [
        "roots",
        "crown",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown",
        "trunk",
        "roots"
      ],
      [
        "crown"
      ],
      [
        "crown"
      ],
      [
        "trunk"
      ],
      [
        "seeds",
        "crown",
        "roots"
      ],
      [],
      [
        "trunk",
        "roots"
      ]
    ],  -- is_valid: all not null  -- child 0 type: int64
    [
      0,
      1,
      2,
      3,
      4,
      5,
      6,
      7,
      8,
      9,
      ...
      40,
      41,
      42,
      43,
      44,
      45,
      46,
      47,
      48,
      49
    ]  -- child 1 type: list<item: string>
    [
      [
        "roots",
        "trunk"
      ],
      [
        "trunk",
        "crown",
        "roots"
      ],
      [
        "crown",
        "seeds"
      ],
      [
        "trunk"
      ],
      [],
      [
        "crown"
      ],
      [
        "seeds",
        "crown"
      ],
      [
        "seeds",
        "roots",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown"
      ],
      ...
      [
        "trunk",
        "seeds",
        "crown"
      ],
      [
        "roots",
        "crown",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown",
        "trunk",
        "roots"
      ],
      [
        "crown"
      ],
      [
        "crown"
      ],
      [
        "trunk"
      ],
      [
        "seeds",
        "crown",
        "roots"
      ],
      [],
      [
        "trunk",
        "roots"
      ]
    ],  -- is_valid: all not null  -- child 0 type: int64
    [
      0,
      1,
      2,
      3,
      4,
      5,
      6,
      7,
      8,
      9,
      ...
      40,
      41,
      42,
      43,
      44,
      45,
      46,
      47,
      48,
      49
    ]  -- child 1 type: list<item: string>
    [
      [
        "roots",
        "trunk"
      ],
      [
        "trunk",
        "crown",
        "roots"
      ],
      [
        "crown",
        "seeds"
      ],
      [
        "trunk"
      ],
      [],
      [
        "crown"
      ],
      [
        "seeds",
        "crown"
      ],
      [
        "seeds",
        "roots",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown"
      ],
      ...
      [
        "trunk",
        "seeds",
        "crown"
      ],
      [
        "roots",
        "crown",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown",
        "trunk",
        "roots"
      ],
      [
        "crown"
      ],
      [
        "crown"
      ],
      [
        "trunk"
      ],
      [
        "seeds",
        "crown",
        "roots"
      ],
      [],
      [
        "trunk",
        "roots"
      ]
    ],  -- is_valid: all not null  -- child 0 type: int64
    [
      0,
      1,
      2,
      3,
      4,
      5,
      6,
      7,
      8,
      9,
      ...
      40,
      41,
      42,
      43,
      44,
      45,
      46,
      47,
      48,
      49
    ]  -- child 1 type: list<item: string>
    [
      [
        "roots",
        "trunk"
      ],
      [
        "trunk",
        "crown",
        "roots"
      ],
      [
        "crown",
        "seeds"
      ],
      [
        "trunk"
      ],
      [],
      [
        "crown"
      ],
      [
        "seeds",
        "crown"
      ],
      [
        "seeds",
        "roots",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown"
      ],
      ...
      [
        "trunk",
        "seeds",
        "crown"
      ],
      [
        "roots",
        "crown",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown",
        "trunk",
        "roots"
      ],
      [
        "crown"
      ],
      [
        "crown"
      ],
      [
        "trunk"
      ],
      [
        "seeds",
        "crown",
        "roots"
      ],
      [],
      [
        "trunk",
        "roots"
      ]
    ],  -- is_valid: all not null  -- child 0 type: int64
    [
      0,
      1,
      2,
      3,
      4,
      5,
      6,
      7,
      8,
      9,
      ...
      40,
      41,
      42,
      43,
      44,
      45,
      46,
      47,
      48,
      49
    ]  -- child 1 type: list<item: string>
    [
      [
        "roots",
        "trunk"
      ],
      [
        "trunk",
        "crown",
        "roots"
      ],
      [
        "crown",
        "seeds"
      ],
      [
        "trunk"
      ],
      [],
      [
        "crown"
      ],
      [
        "seeds",
        "crown"
      ],
      [
        "seeds",
        "roots",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown"
      ],
      ...
      [
        "trunk",
        "seeds",
        "crown"
      ],
      [
        "roots",
        "crown",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown",
        "trunk",
        "roots"
      ],
      [
        "crown"
      ],
      [
        "crown"
      ],
      [
        "trunk"
      ],
      [
        "seeds",
        "crown",
        "roots"
      ],
      [],
      [
        "trunk",
        "roots"
      ]
    ],  -- is_valid: all not null  -- child 0 type: int64
    [
      0,
      1,
      2,
      3,
      4,
      5,
      6,
      7,
      8,
      9,
      ...
      40,
      41,
      42,
      43,
      44,
      45,
      46,
      47,
      48,
      49
    ]  -- child 1 type: list<item: string>
    [
      [
        "roots",
        "trunk"
      ],
      [
        "trunk",
        "crown",
        "roots"
      ],
      [
        "crown",
        "seeds"
      ],
      [
        "trunk"
      ],
      [],
      [
        "crown"
      ],
      [
        "seeds",
        "crown"
      ],
      [
        "seeds",
        "roots",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown"
      ],
      ...
      [
        "trunk",
        "seeds",
        "crown"
      ],
      [
        "roots",
        "crown",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown",
        "trunk",
        "roots"
      ],
      [
        "crown"
      ],
      [
        "crown"
      ],
      [
        "trunk"
      ],
      [
        "seeds",
        "crown",
        "roots"
      ],
      [],
      [
        "trunk",
        "roots"
      ]
    ],  -- is_valid: all not null  -- child 0 type: int64
    [
      0,
      1,
      2,
      3,
      4,
      5,
      6,
      7,
      8,
      9,
      ...
      40,
      41,
      42,
      43,
      44,
      45,
      46,
      47,
      48,
      49
    ]  -- child 1 type: list<item: string>
    [
      [
        "roots",
        "trunk"
      ],
      [
        "trunk",
        "crown",
        "roots"
      ],
      [
        "crown",
        "seeds"
      ],
      [
        "trunk"
      ],
      [],
      [
        "crown"
      ],
      [
        "seeds",
        "crown"
      ],
      [
        "seeds",
        "roots",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown"
      ],
      ...
      [
        "trunk",
        "seeds",
        "crown"
      ],
      [
        "roots",
        "crown",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown",
        "trunk",
        "roots"
      ],
      [
        "crown"
      ],
      [
        "crown"
      ],
      [
        "trunk"
      ],
      [
        "seeds",
        "crown",
        "roots"
      ],
      [],
      [
        "trunk",
        "roots"
      ]
    ],  -- is_valid: all not null  -- child 0 type: int64
    [
      0,
      1,
      2,
      3,
      4,
      5,
      6,
      7,
      8,
      9,
      ...
      40,
      41,
      42,
      43,
      44,
      45,
      46,
      47,
      48,
      49
    ]  -- child 1 type: list<item: string>
    [
      [
        "roots",
        "trunk"
      ],
      [
        "trunk",
        "crown",
        "roots"
      ],
      [
        "crown",
        "seeds"
      ],
      [
        "trunk"
      ],
      [],
      [
        "crown"
      ],
      [
        "seeds",
        "crown"
      ],
      [
        "seeds",
        "roots",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown"
      ],
      ...
      [
        "trunk",
        "seeds",
        "crown"
      ],
      [
        "roots",
        "crown",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown",
        "trunk",
        "roots"
      ],
      [
        "crown"
      ],
      [
        "crown"
      ],
      [
        "trunk"
      ],
      [
        "seeds",
        "crown",
        "roots"
      ],
      [],
      [
        "trunk",
        "roots"
      ]
    ],  -- is_valid: all not null  -- child 0 type: int64
    [
      0,
      1,
      2,
      3,
      4,
      5,
      6,
      7,
      8,
      9,
      ...
      40,
      41,
      42,
      43,
      44,
      45,
      46,
      47,
      48,
      49
    ]  -- child 1 type: list<item: string>
    [
      [
        "roots",
        "trunk"
      ],
      [
        "trunk",
        "crown",
        "roots"
      ],
      [
        "crown",
        "seeds"
      ],
      [
        "trunk"
      ],
      [],
      [
        "crown"
      ],
      [
        "seeds",
        "crown"
      ],
      [
        "seeds",
        "roots",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown"
      ],
      ...
      [
        "trunk",
        "seeds",
        "crown"
      ],
      [
        "roots",
        "crown",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown",
        "trunk",
        "roots"
      ],
      [
        "crown"
      ],
      [
        "crown"
      ],
      [
        "trunk"
      ],
      [
        "seeds",
        "crown",
        "roots"
      ],
      [],
      [
        "trunk",
        "roots"
      ]
    ],  -- is_valid: all not null  -- child 0 type: int64
    [
      0,
      1,
      2,
      3,
      4,
      5,
      6,
      7,
      8,
      9,
      ...
      40,
      41,
      42,
      43,
      44,
      45,
      46,
      47,
      48,
      49
    ]  -- child 1 type: list<item: string>
    [
      [
        "roots",
        "trunk"
      ],
      [
        "trunk",
        "crown",
        "roots"
      ],
      [
        "crown",
        "seeds"
      ],
      [
        "trunk"
      ],
      [],
      [
        "crown"
      ],
      [
        "seeds",
        "crown"
      ],
      [
        "seeds",
        "roots",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown"
      ],
      ...
      [
        "trunk",
        "seeds",
        "crown"
      ],
      [
        "roots",
        "crown",
        "trunk"
      ],
      [
        "roots"
      ],
      [
        "crown",
        "trunk",
        "roots"
      ],
      [
        "crown"
      ],
      [
        "crown"
      ],
      [
        "trunk"
      ],
      [
        "seeds",
        "crown",
        "roots"
      ],
      [],
      [
        "trunk",
        "roots"
      ]
    ]]
map: [[    keys:["crown"]values:[4],    keys:["seeds"]values:[7],    keys:["trunk"]values:[7],    keys:["roots","trunk","crown"]values:[4,8,0],    keys:["crown","trunk","roots"]values:[3,6,8],    keys:["crown","trunk","seeds"]values:[9,3,2],    keys:["crown","seeds","roots"]values:[1,3,8],    keys:["trunk","seeds"]values:[3,1],    keys:[]values:[],    keys:["roots","seeds","trunk"]values:[0,8,2],...,    keys:[]values:[],    keys:["trunk","crown","roots"]values:[7,2,8],    keys:["seeds","trunk"]values:[9,5],    keys:["trunk"]values:[7],    keys:["roots"]values:[1],    keys:["crown"]values:[5],    keys:["crown","seeds","roots"]values:[2,7,2],    keys:[]values:[],    keys:[]values:[],    keys:["roots","crown","trunk"]values:[2,1,5]],[    keys:["crown"]values:[4],    keys:["seeds"]values:[7],    keys:["trunk"]values:[7],    keys:["roots","trunk","crown"]values:[4,8,0],    keys:["crown","trunk","roots"]values:[3,6,8],    keys:["crown","trunk","seeds"]values:[9,3,2],    keys:["crown","seeds","roots"]values:[1,3,8],    keys:["trunk","seeds"]values:[3,1],    keys:[]values:[],    keys:["roots","seeds","trunk"]values:[0,8,2],...,    keys:[]values:[],    keys:["trunk","crown","roots"]values:[7,2,8],    keys:["seeds","trunk"]values:[9,5],    keys:["trunk"]values:[7],    keys:["roots"]values:[1],    keys:["crown"]values:[5],    keys:["crown","seeds","roots"]values:[2,7,2],    keys:[]values:[],    keys:[]values:[],    keys:["roots","crown","trunk"]values:[2,1,5]],[    keys:["crown"]values:[4],    keys:["seeds"]values:[7],    keys:["trunk"]values:[7],    keys:["roots","trunk","crown"]values:[4,8,0],    keys:["crown","trunk","roots"]values:[3,6,8],    keys:["crown","trunk","seeds"]values:[9,3,2],    keys:["crown","seeds","roots"]values:[1,3,8],    keys:["trunk","seeds"]values:[3,1],    keys:[]values:[],    keys:["roots","seeds","trunk"]values:[0,8,2],...,    keys:[]values:[],    keys:["trunk","crown","roots"]values:[7,2,8],    keys:["seeds","trunk"]values:[9,5],    keys:["trunk"]values:[7],    keys:["roots"]values:[1],    keys:["crown"]values:[5],    keys:["crown","seeds","roots"]values:[2,7,2],    keys:[]values:[],    keys:[]values:[],    keys:["roots","crown","trunk"]values:[2,1,5]],[    keys:["crown"]values:[4],    keys:["seeds"]values:[7],    keys:["trunk"]values:[7],    keys:["roots","trunk","crown"]values:[4,8,0],    keys:["crown","trunk","roots"]values:[3,6,8],    keys:["crown","trunk","seeds"]values:[9,3,2],    keys:["crown","seeds","roots"]values:[1,3,8],    keys:["trunk","seeds"]values:[3,1],    keys:[]values:[],    keys:["roots","seeds","trunk"]values:[0,8,2],...,    keys:[]values:[],    keys:["trunk","crown","roots"]values:[7,2,8],    keys:["seeds","trunk"]values:[9,5],    keys:["trunk"]values:[7],    keys:["roots"]values:[1],    keys:["crown"]values:[5],    keys:["crown","seeds","roots"]values:[2,7,2],    keys:[]values:[],    keys:[]values:[],    keys:["roots","crown","trunk"]values:[2,1,5]],[    keys:["crown"]values:[4],    keys:["seeds"]values:[7],    keys:["trunk"]values:[7],    keys:["roots","trunk","crown"]values:[4,8,0],    keys:["crown","trunk","roots"]values:[3,6,8],    keys:["crown","trunk","seeds"]values:[9,3,2],    keys:["crown","seeds","roots"]values:[1,3,8],    keys:["trunk","seeds"]values:[3,1],    keys:[]values:[],    keys:["roots","seeds","trunk"]values:[0,8,2],...,    keys:[]values:[],    keys:["trunk","crown","roots"]values:[7,2,8],    keys:["seeds","trunk"]values:[9,5],    keys:["trunk"]values:[7],    keys:["roots"]values:[1],    keys:["crown"]values:[5],    keys:["crown","seeds","roots"]values:[2,7,2],    keys:[]values:[],    keys:[]values:[],    keys:["roots","crown","trunk"]values:[2,1,5]],[    keys:["crown"]values:[4],    keys:["seeds"]values:[7],    keys:["trunk"]values:[7],    keys:["roots","trunk","crown"]values:[4,8,0],    keys:["crown","trunk","roots"]values:[3,6,8],    keys:["crown","trunk","seeds"]values:[9,3,2],    keys:["crown","seeds","roots"]values:[1,3,8],    keys:["trunk","seeds"]values:[3,1],    keys:[]values:[],    keys:["roots","seeds","trunk"]values:[0,8,2],...,    keys:[]values:[],    keys:["trunk","crown","roots"]values:[7,2,8],    keys:["seeds","trunk"]values:[9,5],    keys:["trunk"]values:[7],    keys:["roots"]values:[1],    keys:["crown"]values:[5],    keys:["crown","seeds","roots"]values:[2,7,2],    keys:[]values:[],    keys:[]values:[],    keys:["roots","crown","trunk"]values:[2,1,5]],[    keys:["crown"]values:[4],    keys:["seeds"]values:[7],    keys:["trunk"]values:[7],    keys:["roots","trunk","crown"]values:[4,8,0],    keys:["crown","trunk","roots"]values:[3,6,8],    keys:["crown","trunk","seeds"]values:[9,3,2],    keys:["crown","seeds","roots"]values:[1,3,8],    keys:["trunk","seeds"]values:[3,1],    keys:[]values:[],    keys:["roots","seeds","trunk"]values:[0,8,2],...,    keys:[]values:[],    keys:["trunk","crown","roots"]values:[7,2,8],    keys:["seeds","trunk"]values:[9,5],    keys:["trunk"]values:[7],    keys:["roots"]values:[1],    keys:["crown"]values:[5],    keys:["crown","seeds","roots"]values:[2,7,2],    keys:[]values:[],    keys:[]values:[],    keys:["roots","crown","trunk"]values:[2,1,5]],[    keys:["crown"]values:[4],    keys:["seeds"]values:[7],    keys:["trunk"]values:[7],    keys:["roots","trunk","crown"]values:[4,8,0],    keys:["crown","trunk","roots"]values:[3,6,8],    keys:["crown","trunk","seeds"]values:[9,3,2],    keys:["crown","seeds","roots"]values:[1,3,8],    keys:["trunk","seeds"]values:[3,1],    keys:[]values:[],    keys:["roots","seeds","trunk"]values:[0,8,2],...,    keys:[]values:[],    keys:["trunk","crown","roots"]values:[7,2,8],    keys:["seeds","trunk"]values:[9,5],    keys:["trunk"]values:[7],    keys:["roots"]values:[1],    keys:["crown"]values:[5],    keys:["crown","seeds","roots"]values:[2,7,2],    keys:[]values:[],    keys:[]values:[],    keys:["roots","crown","trunk"]values:[2,1,5]],[    keys:["crown"]values:[4],    keys:["seeds"]values:[7],    keys:["trunk"]values:[7],    keys:["roots","trunk","crown"]values:[4,8,0],    keys:["crown","trunk","roots"]values:[3,6,8],    keys:["crown","trunk","seeds"]values:[9,3,2],    keys:["crown","seeds","roots"]values:[1,3,8],    keys:["trunk","seeds"]values:[3,1],    keys:[]values:[],    keys:["roots","seeds","trunk"]values:[0,8,2],...,    keys:[]values:[],    keys:["trunk","crown","roots"]values:[7,2,8],    keys:["seeds","trunk"]values:[9,5],    keys:["trunk"]values:[7],    keys:["roots"]values:[1],    keys:["crown"]values:[5],    keys:["crown","seeds","roots"]values:[2,7,2],    keys:[]values:[],    keys:[]values:[],    keys:["roots","crown","trunk"]values:[2,1,5]],[    keys:["crown"]values:[4],    keys:["seeds"]values:[7],    keys:["trunk"]values:[7],    keys:["roots","trunk","crown"]values:[4,8,0],    keys:["crown","trunk","roots"]values:[3,6,8],    keys:["crown","trunk","seeds"]values:[9,3,2],    keys:["crown","seeds","roots"]values:[1,3,8],    keys:["trunk","seeds"]values:[3,1],    keys:[]values:[],    keys:["roots","seeds","trunk"]values:[0,8,2],...,    keys:[]values:[],    keys:["trunk","crown","roots"]values:[7,2,8],    keys:["seeds","trunk"]values:[9,5],    keys:["trunk"]values:[7],    keys:["roots"]values:[1],    keys:["crown"]values:[5],    keys:["crown","seeds","roots"]values:[2,7,2],    keys:[]values:[],    keys:[]values:[],    keys:["roots","crown","trunk"]values:[2,1,5]]]
Output after
pyarrow.Table
int: int64
list: list<item: string>
  child 0, item: string
struct: struct<int_nested: int64, list_nested: list<item: string>>
  child 0, int_nested: int64
  child 1, list_nested: list<item: string>
      child 0, item: string
map: map<string, int64>
  child 0, entries: struct<key: string not null, value: int64> not null
      child 0, key: string not null
      child 1, value: int64
----
int: [[0,1,2,3,4,...,45,46,47,48,49],[0,1,2,3,4,...,45,46,47,48,49],...,[0,1,2,3,4,...,45,46,47,48,49],[0,1,2,3,4,...,45,46,47,48,49]]
list: [[["crown","trunk","roots"],["roots","seeds"],...,[],["crown"]],[["crown","trunk","roots"],["roots","seeds"],...,[],["crown"]],...,[["crown","trunk","roots"],["roots","seeds"],...,[],["crown"]],[["crown","trunk","roots"],["roots","seeds"],...,[],["crown"]]]
struct: [
  -- is_valid: all not null
  -- child 0 type: int64
  [0,1,2,3,4,...,45,46,47,48,49]
  -- child 1 type: list<item: string>
  [["crown","trunk","roots"],["roots","seeds"],...,[],["crown"]],
  -- is_valid: all not null
  -- child 0 type: int64
  [0,1,2,3,4,...,45,46,47,48,49]
  -- child 1 type: list<item: string>
  [["crown","trunk","roots"],["roots","seeds"],...,[],["crown"]],
...,
  -- is_valid: all not null
  -- child 0 type: int64
  [0,1,2,3,4,...,45,46,47,48,49]
  -- child 1 type: list<item: string>
  [["crown","trunk","roots"],["roots","seeds"],...,[],["crown"]],
  -- is_valid: all not null
  -- child 0 type: int64
  [0,1,2,3,4,...,45,46,47,48,49]
  -- child 1 type: list<item: string>
  [["crown","trunk","roots"],["roots","seeds"],...,[],["crown"]]]
map: [[keys:["trunk"]values:[2],keys:["seeds","roots"]values:[2,4],keys:["trunk","crown"]values:[2,7],keys:["trunk","crown","roots"]values:[8,8,0],keys:[]values:[],...,keys:["trunk","roots"]values:[2,8],keys:["trunk","crown"]values:[6,9],keys:[]values:[],keys:["seeds","trunk"]values:[9,6],keys:["crown","roots","trunk"]values:[0,3,9]],[keys:["trunk"]values:[2],keys:["seeds","roots"]values:[2,4],keys:["trunk","crown"]values:[2,7],keys:["trunk","crown","roots"]values:[8,8,0],keys:[]values:[],...,keys:["trunk","roots"]values:[2,8],keys:["trunk","crown"]values:[6,9],keys:[]values:[],keys:["seeds","trunk"]values:[9,6],keys:["crown","roots","trunk"]values:[0,3,9]],...,[keys:["trunk"]values:[2],keys:["seeds","roots"]values:[2,4],keys:["trunk","crown"]values:[2,7],keys:["trunk","crown","roots"]values:[8,8,0],keys:[]values:[],...,keys:["trunk","roots"]values:[2,8],keys:["trunk","crown"]values:[6,9],keys:[]values:[],keys:["seeds","trunk"]values:[9,6],keys:["crown","roots","trunk"]values:[0,3,9]],[keys:["trunk"]values:[2],keys:["seeds","roots"]values:[2,4],keys:["trunk","crown"]values:[2,7],keys:["trunk","crown","roots"]values:[8,8,0],keys:[]values:[],...,keys:["trunk","roots"]values:[2,8],keys:["trunk","crown"]values:[6,9],keys:[]values:[],keys:["seeds","trunk"]values:[9,6],keys:["crown","roots","trunk"]values:[0,3,9]]]

@github-actions
Copy link

github-actions bot commented Jan 6, 2022

@wjones127 wjones127 changed the title ARROW-14798: [C++][Python] Add child window to PrettyPrintOptions [WIP] ARROW-14798: [C++][Python] Add child window to PrettyPrintOptions Jan 6, 2022
@wjones127 wjones127 force-pushed the ARROW-14798-repr-child-limit branch from da6aefd to bb1fe2d Compare January 6, 2022 22:37
@wjones127 wjones127 marked this pull request as ready for review January 6, 2022 22:37
@wjones127 wjones127 marked this pull request as draft January 7, 2022 16:06
@jorisvandenbossche
Copy link
Member

I understand that PrettyPrintOptions is generic, but so that also means that the interpretation of "child" depends on what you are printing.

For example, for printing a ChunkedArray, you could then use window to determine how many values from start and end to print, and child_window to limit the number of values to show for nested data (lists, structs). While if you are printing a table, the window determines how many chunked arrays to show at start and end, and child_window is used for both how many values from start and end to show for each chunked array, as for the number of elements to show for a nested data type.

That doesn't give a great control, I think, but I suppose there is not much to do about that with the current generic interface.
(and the current PR is certainly already an improvement for the table repr!)

@wjones127
Copy link
Member Author

That doesn't give a great control, I think, but I suppose there is not much to do about that with the current generic interface.
(and the current PR is certainly already an improvement for the table repr!)

@jorisvandenbossche Do you like the child_window formulation better than the container_window idea?

https://issues.apache.org/jira/browse/ARROW-14798?focusedCommentId=17470902&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17470902

@wjones127 wjones127 changed the title ARROW-14798: [C++][Python] Add child window to PrettyPrintOptions ARROW-14798: [C++][Python] Add container window to PrettyPrintOptions Jan 10, 2022
@wjones127 wjones127 changed the title ARROW-14798: [C++][Python] Add container window to PrettyPrintOptions ARROW-14798: [C++][Python][R] Add container window to PrettyPrintOptions Jan 11, 2022
@wjones127 wjones127 marked this pull request as ready for review January 12, 2022 01:28
@jorisvandenbossche
Copy link
Member

So in the latest commits, you changed from child_window to container_window ? (and a ChunkedArray is considered as a container?)

What's the exact consequence of that change on the repr in your example in the top post? (I am having a bit troubles wrapping my head around it .. ;))

@jorisvandenbossche
Copy link
Member

Sidenote: for a short-term fix for 7.0.0, an option could also be to simply truncate the output of self.column(i).to_string(indent=0, skip_new_lines=True) (so the string repr for each column), eg truncate that for each individual column at 100 chars.

@wjones127
Copy link
Member Author

wjones127 commented Jan 13, 2022

So in the latest commits, you changed from child_window to container_window ? (and a ChunkedArray is considered
as a container?)

Yeah a container right now is just a ChunkedArray and ListArray. Intended for any array-like type whose elements are also Arrays, where a window parameter becomes recursive. I found this does a better job of controlling the size of print output, while still allowing the user to view a meaningful number of primitive elements in the array.

Default window is now 10 and container_window is 2, though in Python interface the ChunkedArray to string sets window to 5.

What's the exact consequence of that change on the repr in your example in the top post? (I am having a bit troubles wrapping my head around it .. ;))

That output is up-to-date with the latest changes. The place where this change makes the most difference is if there are two-levels of container, such as a ChunkedArray of a ListArray. I would think about it in terms of the window at each level:

window: 10, container_window: 2
ChunkedArray<ListArray<StringArray>> (window 2)
  ListArray<StringArray> (window 2)
    StringArray (window 10)

window: 2, child_window: 10
ChunkedArray<ListArray<StringArray>> (window 2)
  ListArray<StringArray> (window 10)
    StringArray (window 10)

@wjones127
Copy link
Member Author

Sidenote: for a short-term fix for 7.0.0, an option could also be to simply truncate the output of self.column(i).to_string(indent=0, skip_new_lines=True) (so the string repr for each column), eg truncate that for each individual column at 100 chars.

I would also be in favor of having some character cap available.

@wjones127
Copy link
Member Author

@jorisvandenbossche I created a Jira issue for the character limit idea: https://issues.apache.org/jira/browse/ARROW-15329

I'll work on a quick PR for that.

@jorisvandenbossche
Copy link
Member

@wjones127 there are some C++ linting errors

wjones127 and others added 3 commits January 18, 2022 13:30
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Antoine Pitrou <antoine@python.org>
@wjones127 wjones127 force-pushed the ARROW-14798-repr-child-limit branch from 04d8aaf to 496a04b Compare January 18, 2022 21:32
@jorisvandenbossche
Copy link
Member

I was testing this locally, I think we might actually eventually want both this and some character cap as in #12148 (but maybe at the "scalar level").

The data I was testing with is a large table with geometry data. For the normal columns (some ints and strings) this PR is a nice improvement. But the geometry column basically consists of a binary column with big blobs (individual scalar values of the column are for this specific dataset up to a length of 8000). And so even when only printing 10 values of this, that still floods the console.

So long term (not necessarily for 7.0 though :)), we might want (in addition to this PR) to limit the max size of the repr for individual scalars? (when printed in a table, not when printed as individual scalar) Instead of limiting the max size of the column as you do in #12148 (limiting it per scalar might also be easier, because then you don't get the complexities around truncating the column repr nicely between scalars, as discussed in #12148 (comment) )

@wjones127
Copy link
Member Author

I was testing this locally, I think we might actually eventually want both this and some character cap as in #12148 (but maybe at the "scalar level").

Yes, I think this works in combination with the character cap measures.

@wjones127 wjones127 requested a review from pitrou January 21, 2022 18:43
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. This is a very a nice improvement. Thank you @wjones127 !

@pitrou
Copy link
Member

pitrou commented Jan 27, 2022

@jorisvandenbossche Do you want to make a final review here?

@pitrou pitrou closed this in ae1ce19 Feb 24, 2022
@wjones127 wjones127 deleted the ARROW-14798-repr-child-limit branch February 24, 2022 17:25
@ursabot
Copy link

ursabot commented Feb 24, 2022

Benchmark runs are scheduled for baseline = ff92930 and contender = ae1ce19. ae1ce19 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.17% ⬆️0.08%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.34% ⬆️0.04%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants