Skip to content

Conversation

@goldmedal
Copy link
Contributor

Which issue does this PR close?

  • No corresponding issue

Rationale for this change

I implemented the datatype_parse for the struct type. I follow the DuckDB SQL style to parse a struct type.
The format is

Struct(<field_name> <data_type>, ..)

I also implemented the Display for the struct type. The display result is reversible.

What changes are included in this PR?

  • datatype_parse for StructType
  • Display for StructType
  • The required tests.

Are there any user-facing changes?

Now, we can parse a text as a struct type.

@github-actions github-actions bot added the arrow Changes to the arrow crate label May 5, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @goldmedal -- this is really nice and very well tested 👍

("Decimal128(3, 500)", "Error converting 500 into i8 for Decimal128: out of range integral type conversion attempted"),
("Decimal256(3, 500)", "Error converting 500 into i8 for Decimal256: out of range integral type conversion attempted"),

("Struct(f1, Int64)", "Error finding next type, got unexpected ','"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb alamb changed the title Support to parse and display pretty for StructType Support parsing and display pretty for StructType May 9, 2025
@alamb alamb merged commit 84646ac into apache:main May 9, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented May 9, 2025

Thank you @goldmedal

@goldmedal goldmedal deleted the feat/struct-fmt-parse branch May 10, 2025 08:08
@goldmedal
Copy link
Contributor Author

Thanks @alamb

OussamaSaoudi added a commit to delta-io/delta-kernel-rs that referenced this pull request Aug 4, 2025
## What changes are proposed in this pull request?
This PR fixes a test that fails on an error message to work across all
arrow versions. the error message depends on the Debug print of arrow
type. Due to this
[commit](apache/arrow-rs#7469), the debug print
result differs between arrow versions.

 
## How was this change tested?
existing unit tests.

Indentation was tested by breaking a test and checking that the error
message is easy to read.
OussamaSaoudi added a commit to OussamaSaoudi/delta-kernel-rs that referenced this pull request Aug 8, 2025
…#1135)

## What changes are proposed in this pull request?
This PR fixes a test that fails on an error message to work across all
arrow versions. the error message depends on the Debug print of arrow
type. Due to this
[commit](apache/arrow-rs#7469), the debug print
result differs between arrow versions.

 
## How was this change tested?
existing unit tests.

Indentation was tested by breaking a test and checking that the error
message is easy to read.
mbrobbel pushed a commit that referenced this pull request Sep 23, 2025
This is part of an attempt to improve the error reporting of `arrow-rs`,
`datafusion`, and any other 3rd party crates.

I believe that error messages should be as readable as possible. Aim for
`rustc` more than `gcc`.

Here's an example of how this PR improves some existing error messages:

Before:
> Casting from Map(Field { name: "entries", data_type: Struct([Field {
name: "key", data_type: Utf8, nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} }, Field { name: "value",
data_type: Interval(DayTime), nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} }, false) to Map(Field { name:
"entries", data_type: Struct([Field { name: "key", data_type: Utf8,
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: "value", data_type: Duration(Second), nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }, true) not supported

After:
> Casting from Map(Field { "entries": Struct(key Utf8, value nullable
Interval(DayTime)) }, false) to Map(Field { "entries": Struct(key Utf8,
value Duration(Second)) }, true) not supported

# Which issue does this PR close?
- Closes #7048
- Continues and closes #7051
- Continues #7469
- More improvements coming in
#8291
- Sibling PR: apache/datafusion#17565
- Part of #8351

# Rationale for this change
DataType:s are often shown in error messages. Making these error
messages readable is _very important_.

# What changes are included in this PR?
## ~Unify `Debug` and `Display`~
~The `Display` and `Debug` of `DataType` are now the SAME.~

~Why? Both are frequently used in error messages (both in arrow, and
`datafusion`), and both benefit from being readable yet reversible.~

Reverted based on PR feedback. I will try to improve the `Debug`
formatting in a future PR, with clever use of
https://doc.rust-lang.org/std/fmt/struct.Formatter.html#method.debug_struct

## Improve `Display` of lists
Improve the `Display` formatting of
* `DataType::List`
* `DataType::LargeList`
* `DataType::FixedSizeList`

**Before**: `List(Field { name: \"item\", data_type: Int32, nullable:
true, dict_id: 0, dict_is_ordered: false, metadata: {} })`
**After**: `List(nullable Int32)`

**Before**: `FixedSizeList(Field { name: \"item\", data_type: Int32,
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 5)`
**After**: `FixedSizeList(5 x Int32)`

### Better formatting of `DataType::Struct`
The formatting of `Struct` is now reversible, including nullability and
metadata.

- Continues #7469

### ~Improve `Debug` format of `Field`~
~Best understood with this diff for an existing test:~

<img width="1140" height="499" alt="Screenshot 2025-09-07 at 18 30 44"
src="https://github.com/user-attachments/assets/794b4de9-8459-4ee7-82d2-8f5ae248614c"
/>

**EDIT**: reverted

# Are these changes tested?
Yes - new tests cover them

# Are there any user-facing changes?
`Display/to_string` has changed, and so this is a **BREAKING CHANGE**.

Care has been taken that the formatting contains all necessary
information (i.e. is reversible), though the actual `FromStr`
implementation is still not written (it is missing on `main`, and
missing in this PR - so no change).


----

Let me know if I went to far… or not far enough 😆

---------

Co-authored-by: irenjj <renj.jiang@gmail.com>
mbrobbel pushed a commit that referenced this pull request Sep 25, 2025
# Which issue does this PR close?
* Follows #8290 (merge that
first, and the diff of this PR will drop)
* #7469
* Part of #8351

# Rationale for this change
We would previously format structs like so:

`Struct(name1 type1, name2 nullable type2)`

This will break badly whenever the field name is anything but a simple
identifier. In other words: it allows [string
injection](https://xkcd.com/327/) if the field name contains an
end-paranthesis.

Except for that, it is also difficult to debug mistakingly bad field
names like " " or "\n".

# What changes are included in this PR?
We change the `Display` and `Debug` formatting of `Struct`

**Before**: `Struct(name1 type1, name2 nullable type2)`
**After**: `Struct("name1": type1, "name2": nullable type2)`

# Are these changes tested?
Yes - I've updated the existing tests.

# Are there any user-facing changes?
Yes, changing the `Display` formatting is a **breaking change**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants