Skip to content

Conversation

@jonkeane
Copy link
Member

@jonkeane jonkeane commented Mar 27, 2025

This gets rid of OBJECT, DATAPTR has been replaced with INTEGER(), REAL(), etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things!

CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see #45951 (comment))

I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏

r/src/altrep.cpp Outdated
Comment on lines 1321 to 1325
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something about this isn't quite right, because the test here (and other assertions that strings are materialized) fail:

expect_true(test_arrow_altrep_is_materialized(altrep))

But I haven't yet figured out if this is a real problem with this code change, or maybe it's an assumption in the tests that no longer holds?

@nealrichardson @paleolimbot y'all might have thoughts ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-on for myself that line 1319 is duplicative and should be removed (but I don't want to outdate ^^^ just yet)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DATAPTR and STRING_ELT work a bit different for altrep so it seems reasonable to me to assume we were implicitly materializing before and no longer are with STRING_ELT. It seems like this is an assumption of the test that no longer holds and the expectation could be removed.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes awaiting change review Awaiting change review labels Mar 29, 2025
Comment on lines 194 to 203
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this chunk is necessary to return the array of strings here. But oddly(??) when I tried this block as simply return STRING_ELT(vec, 0); (which would return just the first element IIUC), all tests passed. So maybe I misunderstand what's happening in the MutableBuffer there and we actually only need an object of the right type? Or we don't have test coverage that ensures that the full vector is there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just doing return STRING_ELT(vec, 0) actually makes sense to me since it looks like we just need a pointer to the same address as what we'd get with DATAPTR. Seems like that's what STRING_ELT(vec, 0) should accomplish. It does seem like a strange way to do it but it also seems like what we're doing here is already breaking the rules CRAN wants us to play by. If it works, I'm +1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've also commented this to be explicit for the next person

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 29, 2025
@nealrichardson
Copy link
Member

I don't understand this code well enough to have much to say. I did look at the part of WRE that the CRAN check points to, and it suggests using DATAPTR_RO instead of DATAPTR: https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Moving-into-C-API-compliance

Is that not an option for us?

@jonkeane
Copy link
Member Author

suggests using DATAPTR_RO instead of DATAPTR: https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Moving-into-C-API-compliance

Is that not an option for us?

It is an option in one place and I did use it there, but not all: I would get segfaults/illegal access in the places that I didn't use it. If I'm reading the code correctly, those are places where we are actually mutating in place

Copy link
Member

@amoeba amoeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I took a look through and left a couple of comments. Once the altrep test failures get figured out I'll be a +1 on this.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 31, 2025
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment change and the one below are not behavior changes with this PR, but I think the comments were simply wrong (either old or copy pasted). I've tried to correct them to be accurate descriptions of what's going on (but see "does not materialize" and then two lines later expect_true(test_arrow_altrep_is_materialized(altrep)) is at odds with each other

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct here that my earlier comment was simply wrong 🙂

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 31, 2025
@jonkeane jonkeane requested a review from amoeba March 31, 2025 13:42
@jonkeane jonkeane force-pushed the 45949_nonapi_again branch from 310d9c2 to a4b457d Compare April 5, 2025 14:49
@jonkeane
Copy link
Member Author

jonkeane commented Apr 5, 2025

@github-actions crossbow submit -g r

@jonkeane jonkeane requested a review from paleolimbot April 5, 2025 14:49
@github-actions
Copy link

github-actions bot commented Apr 5, 2025

Revision: a4b457d

Submitted crossbow builds: ursacomputing/crossbow @ actions-97a8d02cea

Task Status
r-binary-packages GitHub Actions
r-recheck-most GitHub Actions
test-r-arrow-backwards-compatibility GitHub Actions
test-r-clang-sanitizer GitHub Actions
test-r-depsource-bundled Azure
test-r-depsource-system GitHub Actions
test-r-dev-duckdb GitHub Actions
test-r-devdocs GitHub Actions
test-r-extra-packages GitHub Actions
test-r-gcc-11 GitHub Actions
test-r-gcc-12 GitHub Actions
test-r-install-local GitHub Actions
test-r-install-local-minsizerel GitHub Actions
test-r-linux-as-cran GitHub Actions
test-r-linux-rchk GitHub Actions
test-r-linux-sanitizer GitHub Actions
test-r-linux-valgrind GitHub Actions
test-r-macos-as-cran GitHub Actions
test-r-minimal-build Azure
test-r-offline-maximal GitHub Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-debian-gcc-release-custom-ccache Azure
test-r-rhub-ubuntu-release-latest Azure
test-r-rocker-r-ver-latest Azure
test-r-rstudio-r-base-4.1-opensuse155 Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 GitHub Actions
test-r-versions GitHub Actions

@jonkeane
Copy link
Member Author

jonkeane commented Apr 5, 2025

Failres: test-r-rstudio-r-base-4.1-opensuse155 and test-r-offline-maximal are unrelated / being fixed elsewhere.

Copy link
Member

@amoeba amoeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

r/src/altrep.cpp Outdated

// copy the data from the array, through Get_region
Get_region(alt, 0, size, reinterpret_cast<int*>(DATAPTR(copy)));
Get_region(alt, 0, size, reinterpret_cast<int*>(INTEGER(copy)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Get_region(alt, 0, size, reinterpret_cast<int*>(INTEGER(copy)));
Get_region(alt, 0, size, INTEGER(copy));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, thanks for that reminder, I found a few others too

r/src/altrep.cpp Outdated
Comment on lines 1311 to 1312
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking that! I don't recall any tests that checked the whether something was materialized more than once but it was quite a long time ago 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@github-actions github-actions bot added awaiting merge Awaiting merge awaiting review Awaiting review awaiting changes Awaiting changes and removed awaiting change review Awaiting change review awaiting review Awaiting review awaiting merge Awaiting merge labels Apr 5, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 5, 2025
@jonkeane jonkeane merged commit 34a984c into apache:main Apr 5, 2025
10 checks passed
@jonkeane jonkeane removed the awaiting change review Awaiting change review label Apr 5, 2025
@jonkeane
Copy link
Member Author

jonkeane commented Apr 5, 2025

@assignUser would it be possible to pull this into 20? We will need to patch it in our CRAN release regardless, but it would be nice to have it on the actual release.

@assignUser
Copy link
Member

Of course!

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 34a984c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

amoeba pushed a commit that referenced this pull request Apr 6, 2025
This gets rid of `OBJECT`, `DATAPTR` has been replaced with `INTEGER()`, `REAL()`, etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things!

CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see #45951 (comment))

I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏 

* GitHub Issue: #45949

Lead-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
assignUser pushed a commit that referenced this pull request Apr 7, 2025
This gets rid of `OBJECT`, `DATAPTR` has been replaced with `INTEGER()`, `REAL()`, etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things!

CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see #45951 (comment))

I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏 

* GitHub Issue: #45949

Lead-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Apr 15, 2025
…pache#45951)

This gets rid of `OBJECT`, `DATAPTR` has been replaced with `INTEGER()`, `REAL()`, etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things!

CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see apache#45951 (comment))

I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏 

* GitHub Issue: apache#45949

Lead-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
@h-vetinari
Copy link
Contributor

I believe this PR is the likely reason for the following failure I'm seeing in conda-forge/r-arrow-feedstock#101:

In file included from r_to_arrow.cpp:18:
.\./arrow_types.h:193:14: error: called object type 'arrow::r::RVectorType' is not a function or function pointer
  193 |       return COMPLEX(vec);
      |              ^~~~~~~
.\./arrow_types.h:176:50: note: in instantiation of member function 'arrow::r::RBuffer<cpp11::r_vector<int>>::getDataPointer' requested here
  176 |       : MutableBuffer(reinterpret_cast<uint8_t*>(getDataPointer(vec)),
      |                                                  ^
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.43.34808\include\xutility:506:58: note: in instantiation of member function 'arrow::r::RBuffer<cpp11::r_vector<int>>::RBuffer' requested here
  506 |         ::new (static_cast<void*>(_STD addressof(_Obj))) _Ty(_STD forward<_Types>(_Args)...);
      |                                                          ^
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.43.34808\include\memory:2092:18: note: in instantiation of function template specialization 'std::_Construct_in_place<arrow::r::RBuffer<cpp11::r_vector<int>>, cpp11::r_vector<int> &>' requested here
 2092 |             _STD _Construct_in_place(_Storage._Value, _STD forward<_Types>(_Args)...);
      |                  ^
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.43.34808\include\memory:2903:26: note: in instantiation of function template specialization 'std::_Ref_count_obj2<arrow::r::RBuffer<cpp11::r_vector<int>>>::_Ref_count_obj2<cpp11::r_vector<int> &>' requested here
 2903 |     const auto _Rx = new _Ref_count_obj2<_Ty>(_STD forward<_Types>(_Args)...);
      |                          ^
r_to_arrow.cpp:1240:53: note: in instantiation of function template specialization 'std::make_shared<arrow::r::RBuffer<cpp11::r_vector<int>>, cpp11::r_vector<int> &>' requested here
 1240 |                                                std::make_shared<RBuffer<RVector>>(vec)};
      |                                                     ^
r_to_arrow.cpp:1283:12: note: in instantiation of function template specialization 'arrow::r::MakeSimpleArray<13, cpp11::r_vector<int>, arrow::Int32Type>' requested here
 1283 |     return MakeSimpleArray<INTSXP, cpp11::integers, Int32Type>(x);
      |            ^

This smells like macro leakage from the UCRT headers (e.g. if there's something called COMPLEX there already), which is always an issue on windows... C.f. NOMINMAX, WIN32_LEAN_AND_MEAN etc.

We're using a clang on windows (adapted for the needs of R packages), together with the standard windows runtimes. CC @xhochy

QuietCraftsmanship pushed a commit to QuietCraftsmanship/arrow that referenced this pull request Jul 7, 2025
This gets rid of `OBJECT`, `DATAPTR` has been replaced with `INTEGER()`, `REAL()`, etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things!

CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see apache/arrow#45951 (comment))

I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏 

* GitHub Issue: #45949

Lead-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants