From 44b5567eb97a98faa3b42834c892f4649f3659b4 Mon Sep 17 00:00:00 2001 From: Michael Chirico Date: Wed, 1 Aug 2018 19:10:51 +0800 Subject: [PATCH 1/5] Improve helpfulness of warning message during on-assignment type coercion --- NEWS.md | 1 + src/assign.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index ad95f1cb01..444e4a22c0 100644 --- a/NEWS.md +++ b/NEWS.md @@ -9,6 +9,7 @@ #### NOTES +1. Softened the tone of column-type-mismatch-on-assignment warning message to be friendlier. Thanks to @sarahbeeysian on Twitter for voicing this. ### Changes in v1.11.4 (on CRAN 27 May 2018) diff --git a/src/assign.c b/src/assign.c index 1f5d9161a7..6c88a445b7 100644 --- a/src/assign.c +++ b/src/assign.c @@ -607,7 +607,7 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v s1 = (char *)type2char(TYPEOF(targetcol)); s2 = (char *)type2char(TYPEOF(thisvalue)); if (isReal(thisvalue)) s3="; may have truncated precision"; else s3=""; - warning("Coerced '%s' RHS to '%s' to match the column's type%s. Either change the target column ['%s'] to '%s' first (by creating a new '%s' vector length %d (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to '%s' (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please.", s2, s1, s3, CHAR(STRING_ELT(names, coln)), s2, s2, LENGTH(VECTOR_ELT(dt,0)), s1); + warning("Coerced '%s' RHS to '%s' to match the column's type%s. Either change the target column ['%s'] to '%s' first (by creating a new '%s' vector length %d (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to '%s' (e.g. 1L, NA_[real|integer]_, as.*, etc.) to make your intent clear and for speed. A common source of this error is mismatch of return types when assigning by group, especially in edge cases. Another common pitfall comes from wrong assumptions about your table's column types; check print(x, class = TRUE) or sapply(x, class) to inspect your initial LHS types and be sure they're as expected.", s2, s1, s3, CHAR(STRING_ELT(names, coln)), s2, s2, LENGTH(VECTOR_ELT(dt,0)), s1); } } } From 6822c29b95c4c0fac66366ebed2a38bc14215c98 Mon Sep 17 00:00:00 2001 From: Michael Chirico Date: Wed, 1 Aug 2018 20:06:15 +0800 Subject: [PATCH 2/5] fix tests to match message --- inst/tests/tests.Rraw | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 4384ab81b4..28ee48002b 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -4685,7 +4685,7 @@ test(1294.2, dt[, a := 1.5]$a, rep(1L, 3L), warning="Coerced 'double' RHS to 'i test(1294.3, dt[, a := NA]$a, rep(NA_integer_, 3L)) test(1294.4, dt[, a := "a"]$a, rep(NA_integer_, 3L), warning=c("NAs introduced by coercion", - "Coerced 'character' RHS to 'integer' to match the column's type.*please")) + "Coerced 'character' RHS to 'integer' to match the column's type.*common pitfall")) test(1294.5, dt[, a := list(list(1))]$a, rep(1L, 3L), warning="Coerced 'list' RHS to 'integer' to match the column's type") test(1294.6, dt[, a := list(1L)]$a, rep(1L, 3L)) test(1294.7, dt[, a := list(1)]$a, rep(1L, 3L)) @@ -4694,7 +4694,7 @@ test(1294.9, dt[, b := 1L]$b, rep(1,3)) test(1294.10, dt[, b := NA]$b, rep(NA_real_,3)) test(1294.11, dt[, b := "bla"]$b, rep(NA_real_, 3), warning=c("NAs introduced by coercion", - "Coerced 'character' RHS to 'double' to match the column's type.*please")) + "Coerced 'character' RHS to 'double' to match the column's type.*common pitfall")) test(1294.12, dt[, b := list(list(1))]$b, rep(1,3), warning="Coerced 'list' RHS to 'double' to match the column's type") test(1294.13, dt[, b := TRUE]$b, rep(1,3), warning="Coerced 'logical' RHS to 'double' to match the column's type") test(1294.14, dt[, b := list(1)]$b, rep(1,3)) From 6c658fe62891abaf9c65c6208d631f67c62d38c8 Mon Sep 17 00:00:00 2001 From: mattdowle Date: Thu, 9 Aug 2018 18:51:59 -0700 Subject: [PATCH 3/5] Further refinement of message. Added more context info. Now checks for truncation and informs user the first truncated item. --- NEWS.md | 23 ++++++++++++- inst/tests/tests.Rraw | 78 +++++++++++++++++++++---------------------- src/assign.c | 38 +++++++++++++-------- src/data.table.h | 1 + src/forder.c | 4 +-- 5 files changed, 89 insertions(+), 55 deletions(-) diff --git a/NEWS.md b/NEWS.md index 444e4a22c0..3176d36857 100644 --- a/NEWS.md +++ b/NEWS.md @@ -9,7 +9,28 @@ #### NOTES -1. Softened the tone of column-type-mismatch-on-assignment warning message to be friendlier. Thanks to @sarahbeeysian on Twitter for voicing this. +1. The type coercion warning message has been improved, [#2989](https://github.com/Rdatatable/data.table/pull/2989). Thanks to @sarahbeeysian on [Twitter](https://twitter.com/sarahbeeysian/status/1021359529789775872) for highlighting. For example, given the follow statements: +``` + DT = data.table(id=1:3) + DT[2, id:="foo"] +``` +the warning message has changed from : +``` + Coerced character RHS to integer to match the column's type. Either change the target column ['id'] to character first (by creating a new character vector length 3 (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to integer (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please. +``` +to : +``` + Coerced character RHS to integer to match the type of the target column (column 1 named 'id'). If the target column's type integer is correct, it's best for efficiency to avoid the coercion and create the RHS as type integer. To achieve that consider R's type postfix: typeof(0L) vs typeof(0), and typeof(NA) vs typeof(NA_integer_) vs typeof(NA_real_). You can wrap the RHS with as.integer() to avoid this warning, but that will still perform the coercion. If the target column's type is not correct, it's best to revisit where the DT was created and fix the column type there; e.g., by using colClasses= in fread(). Otherwise, you can change the column type now by plonking a new column (of the desired type) over the top of it; e.g. DT[, `id`:=as.character(`id`)]. If the RHS of := has nrow(DT) elements, then the assignment is called a column plonk and is the way to change a column's type. Column types can be observed with print(x,class=TRUE) and sapply(x,class). +``` + +Further, if a coercion from double to integer is performed, fractional data such as 3.14 is now detected and the truncation to 3 is warned about if and only if truncation has occurred. +``` + DT = data.table(v=1:3) + DT[2, v:=3.14] + Warning message: + Coerced double RHS to integer to match the type of the target column (column 1 named 'v'). One or more RHS values contain fractions which have been lost; e.g. item 1 with value 3.140000 has been truncated to 3. +``` + ### Changes in v1.11.4 (on CRAN 27 May 2018) diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 28ee48002b..d9629a274f 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -857,16 +857,16 @@ test(299.03, truelength(DT)>length(DT)) # the := over-allocated, by 100 by def # FR #2551 - old 299.3 and 299.5 are changed to include length(RHS) > 1 to issue the warning DT[,c:=rep(42L,.N)] # plonk test(299.04, DT, data.table(a=1:3, c=42L)) -test(299.05, DT[2:3,c:=c(42, 42)], data.table(a=1:3,c=42L), warning="Coerced 'double' RHS to 'integer' to match the column's type.*length 3 [(]nrows of entire table[)]") +test(299.05, DT[2:3,c:=c(42, 42)], data.table(a=1:3,c=42L), warning="Coerced double RHS to integer.*column 2 named 'c'.*RHS.*no fractions.*more efficiently.*integer.*Consider.*L") # FR #2551 - length(RHS) = 1 - no warning for type conversion test(299.06, DT[2,c:=42], data.table(a=1:3,c=42L)) # also see tests 302 and 303. (Ok, new test file for fast assign would be tidier). test(299.07, DT[,c:=rep(FALSE,nrow(DT))], data.table(a=1:3,c=FALSE)) # replace c column with logical -test(299.08, DT[2:3,c:=c(42,0)], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced 'double' RHS to 'logical' to match the column's type.*length 3 [(]nrows of entire table[)]") +test(299.08, DT[2:3,c:=c(42,0)], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced double RHS to logical.*column 2 named 'c'.*If the target column's type logical is correct") # FR #2551 is now changed to fit in / fix bug #5442. Stricter warnings are in place now. Check tests 1294.1-34 below. -test(299.09, DT[2,c:=42], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced 'double' RHS to 'logical' to match") -test(299.11, DT[2,c:=42L], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced 'integer' RHS to 'logical' to match") -test(299.12, DT[2:3,c:=c(0L, 0L)], data.table(a=1:3,c=FALSE), warning="Coerced 'integer' RHS to 'logical' to match the column's type.*length 3 [(]nrows of entire table[)]") +test(299.09, DT[2,c:=42], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced double RHS to logical to match") +test(299.11, DT[2,c:=42L], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced integer RHS to logical to match") +test(299.12, DT[2:3,c:=c(0L, 0L)], data.table(a=1:3,c=FALSE), warning="Coerced integer RHS to logical to match the type of the target column.*If the target column's type logical is correct") # Test bug fix #1468, combining i and by. @@ -958,7 +958,7 @@ test(335, DT[,2:1]<-NULL, error="Attempt to assign to column") DT = data.table(a=1:2, b=1:6) test(336, DT[,z:=a/b], data.table(a=1:2,b=1:6,z=(1:2)/(1:6))) -test(337, DT[3:4,z:=a*b], data.table(a=1:2,b=1:6,z=c(1,1,3,8,1/5,2/6)), warning="Coerced 'integer' RHS to 'double' to match the colum") +test(337, DT[3:4,z:=a*b], data.table(a=1:2,b=1:6,z=c(1,1,3,8,1/5,2/6)), warning="Coerced integer RHS to double to match") # test eval of LHS of := (using with=FALSE gives a warning here from v1.9.3) @@ -1015,7 +1015,7 @@ test(355, DT[11:2010,f:=newlevels], data.table(f=factor(c(rep("000",10),newlevel DT = data.table(f=c("a","b"),x=1:4) # Test coercing factor to character column test(355.5, DT[3,f:=factor("foo")], data.table(f=c("a","b","foo","b"),x=1:4)) -test(355.6, DT[4,f:=factor("bar"),verbose=TRUE], data.table(f=c("a","b","foo","bar"),x=1:4), output="Coerced factor to character to match the column") +test(355.6, DT[4,f:=factor("bar"),verbose=TRUE], data.table(f=c("a","b","foo","bar"),x=1:4), output="Coerced factor RHS to character to match the column") # See datatable-help post and NEWS item for 1.6.7 @@ -2235,7 +2235,7 @@ test(836, DT[,a:=as.integer(a)], data.table(a=INT(-1,0,1))) test(837, DT[,a:=cbind(1,2)], data.table(a=c(1L,2L,1L)), warning=c("2 column matrix RHS of := will be treated as one vector", "Supplied 2 items to be assigned to 3 items.*recycled", - "Coerced 'double' RHS to 'integer' to match the column's type")) + "Coerced double RHS to integer to match.*column 1 named 'a'.*values contain no fractions")) DT = data.table(a=1:3,b=1:6) test(838, DT[,c:=scale(b), by=a][,c:=as.integer(1000*c)], data.table(a=1:3,b=1:6,c=rep(as.integer(1000*scale(1:2)), each=3))) @@ -4681,38 +4681,38 @@ test(1293, ans1, ans2) dt <- data.table(a=1:3, b=c(7,8,9), c=c(TRUE, NA, FALSE), d=as.list(4:6), e=c("a", "b", "c")) test(1294.1, dt[, a := 1]$a, rep(1L, 3L)) -test(1294.2, dt[, a := 1.5]$a, rep(1L, 3L), warning="Coerced 'double' RHS to 'integer' to match the column's type") +test(1294.2, dt[, a := 1.5]$a, rep(1L, 3L), warning="Coerced double RHS to integer.*column 1 named 'a'.*fractions which have been lost; e.g. item 1 with value 1.5.* truncated to 1.") test(1294.3, dt[, a := NA]$a, rep(NA_integer_, 3L)) test(1294.4, dt[, a := "a"]$a, rep(NA_integer_, 3L), warning=c("NAs introduced by coercion", - "Coerced 'character' RHS to 'integer' to match the column's type.*common pitfall")) -test(1294.5, dt[, a := list(list(1))]$a, rep(1L, 3L), warning="Coerced 'list' RHS to 'integer' to match the column's type") + "Coerced character RHS to integer.*create the RHS as type integer.*with as.integer.. to avoid this warning.*DT., `a`:=as.character.`a`.*")) +test(1294.5, dt[, a := list(list(1))]$a, rep(1L, 3L), warning="Coerced list RHS to integer to match.*column 1 named 'a'") test(1294.6, dt[, a := list(1L)]$a, rep(1L, 3L)) test(1294.7, dt[, a := list(1)]$a, rep(1L, 3L)) -test(1294.8, dt[, a := TRUE]$a, rep(1L, 3L), warning="Coerced 'logical' RHS to 'integer' to match the column's type") +test(1294.8, dt[, a := TRUE]$a, rep(1L, 3L), warning="Coerced logical RHS to integer") test(1294.9, dt[, b := 1L]$b, rep(1,3)) test(1294.10, dt[, b := NA]$b, rep(NA_real_,3)) test(1294.11, dt[, b := "bla"]$b, rep(NA_real_, 3), warning=c("NAs introduced by coercion", - "Coerced 'character' RHS to 'double' to match the column's type.*common pitfall")) -test(1294.12, dt[, b := list(list(1))]$b, rep(1,3), warning="Coerced 'list' RHS to 'double' to match the column's type") -test(1294.13, dt[, b := TRUE]$b, rep(1,3), warning="Coerced 'logical' RHS to 'double' to match the column's type") + "Coerced character RHS to double to match.*column 2 named 'b'.*DT[, `b`:=as.character(`b`)]")) +test(1294.12, dt[, b := list(list(1))]$b, rep(1,3), warning="Coerced list RHS to double") +test(1294.13, dt[, b := TRUE]$b, rep(1,3), warning="Coerced logical RHS to double") test(1294.14, dt[, b := list(1)]$b, rep(1,3)) -test(1294.15, dt[, c := 1]$c, rep(TRUE, 3), warning="Coerced 'double' RHS to 'logical' to match the column's type") -test(1294.16, dt[, c := 1L]$c, rep(TRUE, 3), warning="Coerced 'integer' RHS to 'logical' to match the column's type") +test(1294.15, dt[, c := 1]$c, rep(TRUE, 3), warning="Coerced double RHS to logical") +test(1294.16, dt[, c := 1L]$c, rep(TRUE, 3), warning="Coerced integer RHS to logical") test(1294.17, dt[, c := NA]$c, rep(NA, 3)) -test(1294.18, dt[, c := list(1)]$c, rep(TRUE, 3), warning="Coerced 'double' RHS to 'logical' to match the column's type") -test(1294.19, dt[, c := list(list(1))]$c, rep(TRUE, 3), warning="Coerced 'list' RHS to 'logical' to match the column's type") -test(1294.20, dt[, c := "bla"]$c, rep(NA, 3), warning="Coerced 'character' RHS to 'logical' to match the column's type") -test(1294.21, dt[, d := 1]$d, rep(list(1), 3), warning="Coerced 'double' RHS to 'list' to match the column's type") -test(1294.22, dt[, d := 1L]$d, rep(list(1L), 3), warning="Coerced 'integer' RHS to 'list' to match the column's type") -test(1294.23, dt[, d := TRUE]$d, rep(list(TRUE), 3), warning="Coerced 'logical' RHS to 'list' to match the column's type") -test(1294.24, dt[, d := "bla"]$d, rep(list("bla"), 3), warning="Coerced 'character' RHS to 'list' to match the column's type") +test(1294.18, dt[, c := list(1)]$c, rep(TRUE, 3), warning="Coerced double RHS to logical") +test(1294.19, dt[, c := list(list(1))]$c, rep(TRUE, 3), warning="Coerced list RHS to logical") +test(1294.20, dt[, c := "bla"]$c, rep(NA, 3), warning="Coerced character RHS to logical") +test(1294.21, dt[, d := 1]$d, rep(list(1), 3), warning="Coerced double RHS to list") +test(1294.22, dt[, d := 1L]$d, rep(list(1L), 3), warning="Coerced integer RHS to list") +test(1294.23, dt[, d := TRUE]$d, rep(list(TRUE), 3), warning="Coerced logical RHS to list") +test(1294.24, dt[, d := "bla"]$d, rep(list("bla"), 3), warning="Coerced character RHS to list") test(1294.25, dt[, d := list(list(1))]$d, rep(list(1), 3)) -test(1294.26, dt[, e := 1]$e, rep("1", 3), warning="Coerced 'double' RHS to 'character' to match the column's type") -test(1294.27, dt[, e := 1L]$e, rep("1", 3), warning="Coerced 'integer' RHS to 'character' to match the column's type") -test(1294.28, dt[, e := TRUE]$e, rep("TRUE", 3), warning="Coerced 'logical' RHS to 'character' to match the column's type") -test(1294.29, dt[, e := list(list(1))]$e, rep("1", 3), warning="Coerced 'list' RHS to 'character' to match the column's type") +test(1294.26, dt[, e := 1]$e, rep("1", 3), warning="Coerced double RHS to character") +test(1294.27, dt[, e := 1L]$e, rep("1", 3), warning="Coerced integer RHS to character") +test(1294.28, dt[, e := TRUE]$e, rep("TRUE", 3), warning="Coerced logical RHS to character") +test(1294.29, dt[, e := list(list(1))]$e, rep("1", 3), warning="Coerced list RHS to character") test(1294.30, dt[, e := "bla"]$e, rep("bla", 3)) test(1294.31, dt[, e := list("bla2")]$e, rep("bla2", 3)) @@ -6718,19 +6718,19 @@ test(1513, setkey(as.data.table(X), a), setDT(X, key="a")) # Adding tests for `isReallyReal` x = as.numeric(sample(10)) -test(1514.1, isReallyReal(x), FALSE) +test(1514.1, isReallyReal(x), 0L) x = as.numeric(sample(c(1:5, NA))) -test(1514.2, isReallyReal(x), FALSE) # NAs are handled properly -x = as.numeric(sample(c(1:2, NaN, NA))) -test(1514.3, isReallyReal(x), TRUE) -x = as.numeric(sample(c(1:2, Inf, NA))) -test(1514.4, isReallyReal(x), TRUE) -x = as.numeric(sample(c(1:2, -Inf, NA))) -test(1514.5, isReallyReal(x), TRUE) -x = as.numeric(runif(2)) -test(1514.6, isReallyReal(x), TRUE) +test(1514.2, isReallyReal(x), 0L) # NAs in numeric can be coerced to integer NA without loss +x = c(1:2, NaN, NA) +test(1514.3, isReallyReal(x), 3L) +x = c(1:2, Inf, NA) +test(1514.4, isReallyReal(x), 3L) +x = c(1:2, -Inf, NA) +test(1514.5, isReallyReal(x), 3L) +x = runif(2) +test(1514.6, isReallyReal(x), 1L) x = numeric() -test(1514.7, isReallyReal(x), FALSE) +test(1514.7, isReallyReal(x), 0L) test(1514.8, isReallyReal(9L), error="x must be of type double") # #1091 diff --git a/src/assign.c b/src/assign.c index 6c88a445b7..c463b20768 100644 --- a/src/assign.c +++ b/src/assign.c @@ -281,7 +281,6 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v SEXP targetcol, RHS, names, nullint, thisvalue, thisv, targetlevels, newcol, s, colnam, class, tmp, colorder, key, index, a, assignedNames, indexNames; SEXP bindingIsLocked = getAttrib(dt, install(".data.table.locked")); Rboolean verbose = LOGICAL(verb)[0], anytodelete=FALSE, isDataTable=FALSE; - char *s1, *s2, *s3, *s4, *s5; const char *c1, *tc1, *tc2; int *buf, k=0, newKeyLength, indexNo; size_t size; // must be size_t otherwise overflow later in memcpy @@ -593,21 +592,34 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v if (isString(targetcol) && isFactor(thisvalue)) { PROTECT(RHS = asCharacterFactor(thisvalue)); protecti++; - if (verbose) Rprintf("Coerced factor to character to match the column's type (coercion is inefficient)\n"); // TO DO: datatable.pedantic would turn this into warning + if (verbose) Rprintf("Coerced factor RHS to character to match the column's type. Avoid this coercion if possible, for efficiency, by creating RHS as type character.\n"); + // TO DO: datatable.pedantic could turn this into warning } else { PROTECT(RHS = coerceVector(thisvalue,TYPEOF(targetcol))); protecti++; + char *s1 = (char *)type2char(TYPEOF(targetcol)); + char *s2 = (char *)type2char(TYPEOF(thisvalue)); // FR #2551, added test for equality between RHS and thisvalue to not provide the warning when length(thisvalue) == 1 - if ( length(thisvalue) == 1 && TYPEOF(RHS) != VECSXP && TYPEOF(thisvalue) != VECSXP && ( - (isReal(thisvalue) && isInteger(targetcol) && REAL(thisvalue)[0] == INTEGER(RHS)[0]) || - (isLogical(thisvalue) && LOGICAL(thisvalue)[0] == NA_LOGICAL) || - (isReal(RHS) && isInteger(thisvalue)) )) { - ; + if ( length(thisvalue)==1 && TYPEOF(RHS)!=VECSXP && TYPEOF(thisvalue)!=VECSXP && ( + ( isReal(thisvalue) && isInteger(targetcol) && REAL(thisvalue)[0]==INTEGER(RHS)[0] ) || // DT[,intCol:=4] rather than DT[,intCol:=4L] + ( isLogical(thisvalue) && LOGICAL(thisvalue)[0] == NA_LOGICAL ) || // DT[,intCol:=NA] + ( isReal(targetcol) && isInteger(thisvalue) ) )) { + if (verbose) Rprintf("Coerced length-1 RHS from %s to %s to match column's type.%s If this assign is happening a lot inside a loop, in particular via set(), then it may be worth avoiding this coercion by using R's type postfix on the value being assigned; e.g. typeof(0) vs typeof(0L), and typeof(NA) vs typeof(NA_integer_) vs typeof(NA_real_).\n", s2, s1, + isInteger(targetcol) && isReal(thisvalue) ? "No precision was lost. " : ""); + // TO DO: datatable.pedantic could turn this into warning + } else { + if (isReal(thisvalue) && isInteger(targetcol)) { + int w = INTEGER(isReallyReal(thisvalue))[0]; // first fraction present (1-based), 0 if none + if (w>0) { + warning("Coerced double RHS to integer to match the type of the target column (column %d named '%s'). One or more RHS values contain fractions which have been lost; e.g. item %d with value %f has been truncated to %d.", + coln+1, CHAR(STRING_ELT(names, coln)), w, REAL(thisvalue)[w-1], INTEGER(RHS)[w-1]); + } else { + warning("Coerced double RHS to integer to match the type of the target column (column %d named '%s'). The RHS values contain no fractions so would be more efficiently created as integer. Consider using R's 'L' postfix (typeof(0L) vs typeof(0)) to create constants as integer and avoid this warning. Wrapping the RHS with as.integer() will avoid this warning too but it's better if possible to create the RHS as integer in the first place so that the cost of the coercion can be avoided.", coln+1, CHAR(STRING_ELT(names, coln))); + } } else { - s1 = (char *)type2char(TYPEOF(targetcol)); - s2 = (char *)type2char(TYPEOF(thisvalue)); - if (isReal(thisvalue)) s3="; may have truncated precision"; else s3=""; - warning("Coerced '%s' RHS to '%s' to match the column's type%s. Either change the target column ['%s'] to '%s' first (by creating a new '%s' vector length %d (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to '%s' (e.g. 1L, NA_[real|integer]_, as.*, etc.) to make your intent clear and for speed. A common source of this error is mismatch of return types when assigning by group, especially in edge cases. Another common pitfall comes from wrong assumptions about your table's column types; check print(x, class = TRUE) or sapply(x, class) to inspect your initial LHS types and be sure they're as expected.", s2, s1, s3, CHAR(STRING_ELT(names, coln)), s2, s2, LENGTH(VECTOR_ELT(dt,0)), s1); + warning("Coerced %s RHS to %s to match the type of the target column (column %d named '%s'). If the target column's type %s is correct, it's best for efficiency to avoid the coercion and create the RHS as type %s. To achieve that consider R's type postfix: typeof(0L) vs typeof(0), and typeof(NA) vs typeof(NA_integer_) vs typeof(NA_real_). You can wrap the RHS with as.%s() to avoid this warning, but that will still perform the coercion. If the target column's type is not correct, it's best to revisit where the DT was created and fix the column type there; e.g., by using colClasses= in fread(). Otherwise, you can change the column type now by plonking a new column (of the desired type) over the top of it; e.g. DT[, `%s`:=as.%s(`%s`)]. If the RHS of := has nrow(DT) elements, then the assignment is called a column plonk and is the way to change a column's type. Column types can be observed with print(x,class=TRUE) and sapply(x,class).", + s2, s1, coln+1, CHAR(STRING_ELT(names, coln)), s1, s1, s1, CHAR(STRING_ELT(names, coln)), s2, CHAR(STRING_ELT(names, coln))); + } } } } @@ -674,7 +686,7 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v if (!*tc1) error("Internal error: index name ends with trailing __"); // check the position of the first appearance of an assigned column in the index. // the new index will be truncated to this position. - s4 = (char*) malloc(strlen(c1) + 3); + char *s4 = (char*) malloc(strlen(c1) + 3); if(s4 == NULL){ error("Internal error: Couldn't allocate memory for s4."); } @@ -684,7 +696,7 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v newKeyLength = strlen(c1); for(int i = 0; i < xlength(assignedNames); i++){ tc2 = CHAR(STRING_ELT(assignedNames, i)); - s5 = (char*) malloc(strlen(tc2) + 5); //4 * '_' + \0 + char *s5 = (char*) malloc(strlen(tc2) + 5); //4 * '_' + \0 if(s5 == NULL){ free(s4); error("Internal error: Couldn't allocate memory for s5."); diff --git a/src/data.table.h b/src/data.table.h index 1f1fd46a62..955120d690 100644 --- a/src/data.table.h +++ b/src/data.table.h @@ -95,6 +95,7 @@ unsigned long long i64twiddle(void *p, int i, int order); unsigned long long (*twiddle)(void *, int, int); SEXP forder(SEXP DT, SEXP by, SEXP retGrp, SEXP sortStrArg, SEXP orderArg, SEXP naArg); bool need2utf8(SEXP x, int n); +SEXP isReallyReal(SEXP); // reorder.c SEXP reorder(SEXP x, SEXP order); diff --git a/src/forder.c b/src/forder.c index 1dbf762564..de856b5543 100644 --- a/src/forder.c +++ b/src/forder.c @@ -1450,13 +1450,13 @@ SEXP isReallyReal(SEXP x) { if (!isReal(x)) error("x must be of type double."); n = length(x); - ans = PROTECT(allocVector(LGLSXP, 1)); + ans = PROTECT(allocVector(INTSXP, 1)); while (i Date: Fri, 10 Aug 2018 11:59:04 -0700 Subject: [PATCH 4/5] class=>typeof thanks to Frank's catch --- src/assign.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/assign.c b/src/assign.c index c463b20768..ccbf0166e7 100644 --- a/src/assign.c +++ b/src/assign.c @@ -617,7 +617,7 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v warning("Coerced double RHS to integer to match the type of the target column (column %d named '%s'). The RHS values contain no fractions so would be more efficiently created as integer. Consider using R's 'L' postfix (typeof(0L) vs typeof(0)) to create constants as integer and avoid this warning. Wrapping the RHS with as.integer() will avoid this warning too but it's better if possible to create the RHS as integer in the first place so that the cost of the coercion can be avoided.", coln+1, CHAR(STRING_ELT(names, coln))); } } else { - warning("Coerced %s RHS to %s to match the type of the target column (column %d named '%s'). If the target column's type %s is correct, it's best for efficiency to avoid the coercion and create the RHS as type %s. To achieve that consider R's type postfix: typeof(0L) vs typeof(0), and typeof(NA) vs typeof(NA_integer_) vs typeof(NA_real_). You can wrap the RHS with as.%s() to avoid this warning, but that will still perform the coercion. If the target column's type is not correct, it's best to revisit where the DT was created and fix the column type there; e.g., by using colClasses= in fread(). Otherwise, you can change the column type now by plonking a new column (of the desired type) over the top of it; e.g. DT[, `%s`:=as.%s(`%s`)]. If the RHS of := has nrow(DT) elements, then the assignment is called a column plonk and is the way to change a column's type. Column types can be observed with print(x,class=TRUE) and sapply(x,class).", + warning("Coerced %s RHS to %s to match the type of the target column (column %d named '%s'). If the target column's type %s is correct, it's best for efficiency to avoid the coercion and create the RHS as type %s. To achieve that consider R's type postfix: typeof(0L) vs typeof(0), and typeof(NA) vs typeof(NA_integer_) vs typeof(NA_real_). You can wrap the RHS with as.%s() to avoid this warning, but that will still perform the coercion. If the target column's type is not correct, it's best to revisit where the DT was created and fix the column type there; e.g., by using colClasses= in fread(). Otherwise, you can change the column type now by plonking a new column (of the desired type) over the top of it; e.g. DT[, `%s`:=as.%s(`%s`)]. If the RHS of := has nrow(DT) elements then the assignment is called a column plonk and is the way to change a column's type. Column types can be observed with sapply(DT,typeof).", s2, s1, coln+1, CHAR(STRING_ELT(names, coln)), s1, s1, s1, CHAR(STRING_ELT(names, coln)), s2, CHAR(STRING_ELT(names, coln))); } } From a221558247e107c682075eb2e5d84df7e8ab969d Mon Sep 17 00:00:00 2001 From: mattdowle Date: Fri, 10 Aug 2018 13:07:24 -0700 Subject: [PATCH 5/5] NEWS item too --- NEWS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index 3176d36857..31d9e344dd 100644 --- a/NEWS.md +++ b/NEWS.md @@ -20,7 +20,7 @@ the warning message has changed from : ``` to : ``` - Coerced character RHS to integer to match the type of the target column (column 1 named 'id'). If the target column's type integer is correct, it's best for efficiency to avoid the coercion and create the RHS as type integer. To achieve that consider R's type postfix: typeof(0L) vs typeof(0), and typeof(NA) vs typeof(NA_integer_) vs typeof(NA_real_). You can wrap the RHS with as.integer() to avoid this warning, but that will still perform the coercion. If the target column's type is not correct, it's best to revisit where the DT was created and fix the column type there; e.g., by using colClasses= in fread(). Otherwise, you can change the column type now by plonking a new column (of the desired type) over the top of it; e.g. DT[, `id`:=as.character(`id`)]. If the RHS of := has nrow(DT) elements, then the assignment is called a column plonk and is the way to change a column's type. Column types can be observed with print(x,class=TRUE) and sapply(x,class). + Coerced character RHS to integer to match the type of the target column (column 1 named 'id'). If the target column's type integer is correct, it's best for efficiency to avoid the coercion and create the RHS as type integer. To achieve that consider R's type postfix: typeof(0L) vs typeof(0), and typeof(NA) vs typeof(NA_integer_) vs typeof(NA_real_). You can wrap the RHS with as.integer() to avoid this warning, but that will still perform the coercion. If the target column's type is not correct, it's best to revisit where the DT was created and fix the column type there; e.g., by using colClasses= in fread(). Otherwise, you can change the column type now by plonking a new column (of the desired type) over the top of it; e.g. DT[, `id`:=as.character(`id`)]. If the RHS of := has nrow(DT) elements, then the assignment is called a column plonk and is the way to change a column's type. Column types can be observed with sapply(x,typeof). ``` Further, if a coercion from double to integer is performed, fractional data such as 3.14 is now detected and the truncation to 3 is warned about if and only if truncation has occurred.