Skip to content

Conversation

@jduo
Copy link
Member

@jduo jduo commented Jul 6, 2022

No description provided.

@github-actions
Copy link

github-actions bot commented Jul 6, 2022

@jduo jduo force-pushed the 32_bit_windows_fix branch 3 times, most recently from 16c622c to aadbd21 Compare July 6, 2022 22:36
@lidavidm
Copy link
Member

lidavidm commented Jul 7, 2022

Looks like it still doesn't build on 32-bit Windows and there are tons of compiler warnings about casts (one of which is an error). We might want a general way to deal with int64_t -> size_t conversions.

@jduo
Copy link
Member Author

jduo commented Jul 7, 2022

Looks like it still doesn't build on 32-bit Windows and there are tons of compiler warnings about casts (one of which is an error). We might want a general way to deal with int64_t -> size_t conversions.

I'm thinking to add a templated ToSizeT function to a utils header for converting integer types to size_t, that adds pragmas to suppress warnings on 32-bit platforms (or when going from a signed integer).

@lidavidm
Copy link
Member

lidavidm commented Jul 7, 2022

FWIW, an explicit static_cast is enough to suppress the warning. I'm more thinking: on 32-bit platforms, do we also want to check for potential overflow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what is this for? Can I find some documentation for this somewhere? A Google search yields nothing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are build variables in the GitHub workflow.
Platform is used to specify the CPU configuration to build using MSBuild.
openssl-n-bits is used to expand the directory name that OpenSSL got installed to using the chocalatey package manager.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand about platform, but I can't find any reference to openssl-n-bits anywhere. Did I miss something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openssl-n-bits is newly created. Previously the build looked for OpenSSL in the hardcoded directory C:\Program Files\OpenSSL-Win64 but it needs to be C:\Program Files (x86)\OpenSSL-Win32 when compiling the 32-bit version.
This is from where choclatey installs OpenSSL:
I'm not sure if the install location is documented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but ${{ matrix.openssl-n-bits }} is never used below, so how does it work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got mixed up. Yes this got folded into cmake-args now. I'll remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha :-)

@pitrou
Copy link
Member

pitrou commented Jul 11, 2022

I would be fine with static_cast<size_t>. A 32-bit build cannot represent any in-memory containers larger than 2**32 elements (except perhaps a BooleanArray...).

@lidavidm
Copy link
Member

lidavidm commented Jul 11, 2022

Fair, though ChunkedArray and IPC may throw that off? I suppose we only need to check in certain places for that though

@pitrou
Copy link
Member

pitrou commented Jul 11, 2022

ChunkedArray is still in-memory. For other places (such as file access) we might want to check.

@jduo jduo force-pushed the 32_bit_windows_fix branch 17 times, most recently from 75012d8 to 2955383 Compare July 18, 2022 11:01
@jduo jduo force-pushed the 32_bit_windows_fix branch 11 times, most recently from 5820943 to 2622b35 Compare August 2, 2022 18:02
@jduo
Copy link
Member Author

jduo commented Aug 2, 2022

@lidavidm @pitrou , the warnings are now all fixed for the 32-bit MSVC build.

It is failing to build linking some parquet tests that depend on boost::filesystem (trying to link to the 64-bit version incorrectly). I did set ARROW_PARQUET to off in the github workflow though, so I'm not sure why it's even trying to build this test or the parquet code.

This build is also failing archery linting. Mostly around line wrapping. Is there an automated way of fixing the archery formatting warnings?

@lidavidm
Copy link
Member

lidavidm commented Aug 2, 2022

@jduo
Copy link
Member Author

jduo commented Aug 2, 2022

@github-actions autotune

@lidavidm
Copy link
Member

lidavidm commented Aug 2, 2022

@jduo can you rebase here? The autotune, weirdly enough, does not actually kick off CI.

@jduo jduo force-pushed the 32_bit_windows_fix branch from 83d6b86 to 4600a4d Compare August 2, 2022 21:38
@jduo jduo force-pushed the 32_bit_windows_fix branch from 4600a4d to 5c70344 Compare August 3, 2022 10:06
@lidavidm lidavidm self-requested a review August 3, 2022 16:27
@lidavidm
Copy link
Member

lidavidm commented Aug 4, 2022

@jduo it turns out ARROW_SUBSTRAIT implies ARROW_PARQUET so we should turn that off, too

@lidavidm
Copy link
Member

lidavidm commented Aug 4, 2022

Hmm, I suppose you could still have a NA array with > (edit: size_t) rows on 32-bit, since there's no actual storage…

@lidavidm
Copy link
Member

lidavidm commented Aug 4, 2022

Or actually. A boolean array could also technically exceed size_t rows, since there's 8 rows per byte?

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I scanned through things and things generally look good, though are we worried about NA/boolean arrays? If so, some kernels and I/O may need to do some overflow checks.

Also, have we tested what happens if we try to read a very large IPC file on 32-bit Windows?

auto data_size = chunk->size();
auto copy_size = std::min(required_size, data_size);
memcpy(static_cast<uint8_t*>(out) + offset, data, copy_size);
memcpy(static_cast<uint8_t*>(out) + offset, data, static_cast<size_t>(copy_size));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like somewhere in this file, there should be an overflow check when we read/parse a message. Or maybe that's handled by Flatbuffers?

Future<> AllComplete(const std::vector<Future<>>& futures) {
struct State {
explicit State(int64_t n_futures) : mutex(), n_remaining(n_futures) {}
explicit State(int64_t n_futures)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just change the type of the argument instead?

@pitrou
Copy link
Member

pitrou commented Aug 8, 2022

While it is technically ok, I think sprinkling static_cast<size_t> everywhere is both cumbersome (on the code writing and reading sides) and will turn out fragile maintenance-wise.

I also don't think we're very interested in ensuring highest quality support for 32-bit Windows, so I would favor an approach that's much lighter on maintenance and code writing demands.
(for example by selectively disabling some warnings on 32-bit MSVC)

@jduo
Copy link
Member Author

jduo commented Aug 8, 2022

While it is technically ok, I think sprinkling static_cast<size_t> everywhere is both cumbersome (on the code writing and reading sides) and will turn out fragile maintenance-wise.

I also don't think we're very interested in ensuring highest quality support for 32-bit Windows, so I would favor an approach that's much lighter on maintenance and code writing demands. (for example by selectively disabling some warnings on 32-bit MSVC)

The argument against disabling compilation warnings is that if the warnings come from a public header, it the library user inherits the warning and can't really work around it. They would need to disable the warning on their side if they report warnings as failures in their build system, which may be undesirable for example.

@pitrou
Copy link
Member

pitrou commented Aug 8, 2022

Well, we could selectively disable warnings in Arrow headers, but I fear that a lot of headers would probably be affected :-(

Perhaps this needs to be further discussed on the mailing-list. My opinion is that 32-bit support should not come with undue maintenance load.

@amol-
Copy link
Member

amol- commented Mar 30, 2023

Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants