Skip to content

Conversation

@felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Apr 26, 2023

Rationale for this change

Mailing list discussion: https://lists.apache.org/thread/r28rw5n39jwtvn08oljl09d4q2c1ysvb

What changes are included in this PR?

Initial implementation of the new format in C++.

Are these changes tested?

Unit tests being written on every commit adding new functionality. More needs to be implemented for Integration Tests (required) to be implementable.

Are there any user-facing changes?

A new array format. It should have no impact for users that don't use it.

@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #35344 has been automatically assigned in GitHub to PR creator.

@felipecrv
Copy link
Contributor Author

@bkietz

@felipecrv felipecrv force-pushed the list_view branch 2 times, most recently from 3204c80 to 5b0944c Compare April 28, 2023 23:59
@felipecrv felipecrv force-pushed the list_view branch 3 times, most recently from 90ce26e to f3a325a Compare May 12, 2023 14:59
@felipecrv felipecrv force-pushed the list_view branch 4 times, most recently from 06ca3f2 to 2c21e52 Compare July 20, 2023 03:34
@felipecrv felipecrv force-pushed the list_view branch 11 times, most recently from b4c6992 to 5e3a24b Compare August 8, 2023 23:03
return rag_.ArrayOf(std::move(type), size, null_probability);
}

// TODO(GH-38656): Use the random array generators from testing/random.h here
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou I isolated all the random-generation code in this class and removed the complicated List[View]ConcatenationChecker templates.

@felipecrv
Copy link
Contributor Author

@pitrou

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for finding two more nits. Feel free to ping when done!

Comment on lines 273 to 285
if (sizes[position] > 0) {
// NOTE: Concatenate can be called during IPC reads to append delta
// dictionaries. Avoid UB on non-validated input by doing the addition in the
// unsigned domain. (the result can later be validated using
// Array::ValidateFull)
const auto displaced_offset = SafeSignedAdd(offsets[position], displacement);
// displaced_offset>=0 is guaranteed by RangeOfValuesUsed returning the
// smallest offset of valid and non-empty list-views.
DCHECK_GE(displaced_offset, 0);
dst[position] = displaced_offset;
} else {
// Do nothing to leave dst[position] as 0.
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misreading, but is it just the same as visit_not_null(i)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I extracted the function from below when I noticed the dup, but forgot to do the reverse-inlining above.

Pushing soon.

@wgtmac wgtmac removed their request for review November 22, 2023 01:40
@pitrou
Copy link
Member

pitrou commented Nov 22, 2023

@felipecrv We'll want to update https://github.com/apache/arrow/blob/main/docs/source/status.rst in a followup PR.

@felipecrv
Copy link
Contributor Author

@felipecrv We'll want to update https://github.com/apache/arrow/blob/main/docs/source/status.rst in a followup PR.

I will be extremely glad to send that PR.

@pitrou pitrou merged commit 8cc71ab into apache:main Nov 22, 2023
@pitrou pitrou removed the awaiting change review Awaiting change review label Nov 22, 2023
@mapleFU
Copy link
Member

mapleFU commented Nov 22, 2023

bravo 🍺!

@mapleFU
Copy link
Member

mapleFU commented Nov 22, 2023

I've create an issue about parquet. #38849

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 8cc71ab.

There were 5 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 14 possible false positives for unstable benchmarks that are known to sometimes produce them.

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…pache#37877)

### Rationale for this change

More details in the draft implementations of this spec:

 - C++: apache#35345
 - Go: apache#37468

### What changes are included in this PR?

 - Some unrelated fixes to the spec text (I can extract these to another PR if necessary)
 - Changes to the spec text
 - Additions to the Flatbuffers specifications of the Arrow format

### Are these changes tested?

N/A.

### Are there any user-facing changes?

Changes in documentation and backwards compatible additions to the format spec.

* Closes: apache#37876

Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Co-authored-by: David Li <li.davidm96@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…E_LIST_VIEW array formats (apache#37468)

### Rationale for this change

Go implementation of apache#35345.

### What changes are included in this PR?

- [x] Add `LIST_VIEW` and `LARGE_LIST_VIEW` to datatype.go
- [x] Add `ListView` and `LargeListView` to list.go
- [x] Add `ListViewType` and `LargeListViewType` to datatype_nested.go
- [x] Add list-view builders
- [x] Implement list-view comparison in compare.go
- [x] String conversion in both directions
- [x] Validation of list-view arrays
- [x] Generation of random list-view arrays
- [x] Concatenation of list-view arrays in concat.go
- [x] JSON serialization/deserialization
- [x] Add data used for tests in `arrdata.go`
- [x] Add Flatbuffer changes
- [x] Add IPC support

### Are these changes tested?

Yes. Existing tests are being changed to also cover list-view variations as well as new tests focused solely on the list-view format.

### Are there any user-facing changes?

New structs and functions introduced.
* Closes: apache#35344

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…GE_LIST_VIEW array formats (apache#35345)

### Rationale for this change

Mailing list discussion: https://lists.apache.org/thread/r28rw5n39jwtvn08oljl09d4q2c1ysvb

### What changes are included in this PR?

Initial implementation of the new format in C++.

### Are these changes tested?

Unit tests being written on every commit adding new functionality. More needs to be implemented for Integration Tests (required) to be implementable.

### Are there any user-facing changes?

A new array format. It should have no impact for users that don't use it.
* Closes: apache#35344

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Format] Draft an implementation of the LIST_VIEW array format

5 participants