Skip to content

Conversation

@javierluraschi
Copy link
Contributor

@javierluraschi javierluraschi commented Oct 24, 2018

Fix for, https://issues.apache.org/jira/browse/ARROW-3604

Enabled collecting int64s as R integers by replacing overflows and underflows with NAs and triggering a waring.

In sparklyr, this enables:

> sdf_len(sc, 10) %>% dplyr::transmute(new = cast("123456789123451234" %as% BIGINT))
# Source: spark<?> [?? x 1]
     new
 * <int>
 1    NA
 2    NA
 3    NA
 4    NA
 5    NA
 6    NA
 7    NA
 8    NA
 9    NA
10    NA
# ... with more rows
Warning message:
In RecordBatch__to_dataframe(x) :
  Integer overflow, 10 values replaced with NAs. Consider using 'options(arrow.int64 = "bit64")'.

CC: @romainfrancois

@javierluraschi javierluraschi changed the title ARROW-3604: [R] Support to collect int64 as ints ARROW-3604: [R] Support to collect int64s as R integers Oct 24, 2018
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one probably needs some tests

// the integer64 sentinel
static const int64_t NA_INT64 = std::numeric_limits<int64_t>::min();
static const int64_t MAX_INT32 = std::numeric_limits<int32_t>::max();
static const int64_t MIN_INT32 = std::numeric_limits<int32_t>::min();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change these to constexpr?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Does that mean they have to be functions ?

Otherwise, since ::minuend ::maxare themselvesconstexpr` perhaps we don't even need the constants here.

Also, not sure what MIN_INT32 is used for, but we need to be careful because it's also the NA sentinel for int in R.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes they are constexpr, e.g.

static constexpr hash_slot_t kHashSlotEmpty = std::numeric_limits<int32_t>::max();

@romainfrancois
Copy link
Contributor

This, and #2819 should be discussed more broadly. I'm not a fan of relying on a global options here.

I'd rather have those lossy conversions to be explicit.

@wesm
Copy link
Member

wesm commented Oct 25, 2018

Yeah I would suggest passing an options list to the conversion entry point

@javierluraschi
Copy link
Contributor Author

From the decimals PR, looks like we would preffer users to explicitly cast, as opposed to configuring with options the desired behavior. I'll close this one then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants