Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ These options are meant for temporary use to aid your migration, [#2652](https:/
* Detecting whether a very long input string is a file name or data is now much faster, [#2531](https://github.com/Rdatatable/data.table/issues/2531). Many thanks to @javrucebo for the detailed report, benchmarks and suggestions.
* A column of `TRUE/FALSE`s is ok, as well as `True/False`s and `true/false`s, but mixing styles (e.g. `TRUE/false`) is not and will be read as type `character`.
* New argument `index` to parallel the existing `key` argument for applying secondary orderings out of the box for convenience, [#2633](https://github.com/Rdatatable/data.table/issues/2633).
* Many thanks to @yaakovfeldman, Guillermo Ponce, Arun Srinivasan, Hugh Parsonage, Mark Klik, Pasha Stetsenko, Mahyar K, Tom Crockett, @cnoelke, @qinjs, @etienne-s, Mark Danese, Avraham Adler, @franknarf1, @MichaelChirico, @tdhock, Luke Tierney for testing dev and reporting these regressions before release to CRAN: #2070, #2073, #2087, #2091, #2107, #2118, #2092, #1888, #2123, #2167, #2194, #2238, #2228, #1464, #2201, #2287, #2299, #2285, #2251, #2347, #2222, #2352, #2246, #2370, #2371, #2404, #2196, #2322, #2453, #2446, #2464, #2457, #1895, #2481, #2499, #2516, #2520, #2512, #2523, #2542, #2526, #2518, #2515, #1671, #2267, #2561, #2625, #2265, #2548, #2535
* Many thanks to @yaakovfeldman, Guillermo Ponce, Arun Srinivasan, Hugh Parsonage, Mark Klik, Pasha Stetsenko, Mahyar K, Tom Crockett, @cnoelke, @qinjs, @etienne-s, Mark Danese, Avraham Adler, @franknarf1, @MichaelChirico, @tdhock, Luke Tierney for testing dev and reporting these regressions before release to CRAN: #2070, #2073, #2087, #2091, #2107, #2118, #2092, #1888, #2123, #2167, #2194, #2238, #2228, #1464, #2201, #2287, #2299, #2285, #2251, #2347, #2222, #2352, #2246, #2370, #2371, #2404, #2196, #2322, #2453, #2446, #2464, #2457, #1895, #2481, #2499, #2516, #2520, #2512, #2523, #2542, #2526, #2518, #2515, #1671, #2267, #2561, #2625, #2265, #2548, #2535, #2744

2. `fwrite()`:
* empty strings are now always quoted (`,"",`) to distinguish them from `NA` which by default is still empty (`,,`) but can be changed using `na=` as before. If `na=` is provided and `quote=` is the default `'auto'` then `quote=` is set to `TRUE` so that if the `na=` value occurs in the data, it can be distinguished from `NA`. Thanks to Ethan Welty for the request [#2214](https://github.com/Rdatatable/data.table/issues/2214) and Pasha for the code change and tests, [#2215](https://github.com/Rdatatable/data.table/issues/2215).
Expand Down Expand Up @@ -108,7 +108,7 @@ Thanks to @MichaelChirico for reporting and to @MarkusBonsch for the implementat
> When j is a symbol prefixed with `..` it will be looked up in calling scope and its value taken to be column names or numbers.
> When you see the `..` prefix think one-level-up, like the directory `..` in all operating systems means the parent directory.
> In future the `..` prefix could be made to work on all symbols apearing anywhere inside `DT[...]`.

The response has been positive ([this tweet](https://twitter.com/MattDowle/status/967290562725359617) and [FR#2655](https://github.com/Rdatatable/data.table/issues/2655)) and so this prefix is now expanded to all symbols appearing in `j=` as a first step; e.g. :
```R
cols = "colB"
Expand Down Expand Up @@ -192,7 +192,7 @@ Thanks to @MichaelChirico for reporting and to @MarkusBonsch for the implementat

34. Fixed cases where the result of `merge.data.table()` would contain duplicate column names if `by.x` was also in `names(y)`.
`merge.data.table()` gains the `no.dups` argument (default TRUE) to match the correpsonding patched behaviour in `base:::merge.data.frame()`. Now, when `by.x` is also in `names(y)` the column name from `y` has the corresponding `suffixes` added to it. `by.x` remains unchanged for backwards compatibility reasons.
In addition, where duplicate column names arise anyway (i.e. `suffixes = c("", "")`) `merge.data.table()` will now throw a warning to match the behaviour of `base:::merge.data.frame()`.
In addition, where duplicate column names arise anyway (i.e. `suffixes = c("", "")`) `merge.data.table()` will now throw a warning to match the behaviour of `base:::merge.data.frame()`.
Thanks to @sritchie73 for reporting and fixing [PR#2631](https://github.com/Rdatatable/data.table/pull/2631) and [PR#2653](https://github.com/Rdatatable/data.table/pull/2653)

35. `CJ()` now fails with proper error message when results would exceed max integer, [#2636](https://github.com/Rdatatable/data.table/issues/2636).
Expand All @@ -203,7 +203,7 @@ Thanks to @sritchie73 for reporting and fixing [PR#2631](https://github.com/Rdat

38. Fixed a bug on Windows that `data.table` may break if the garbage collecting was triggered when sorting a large number of non-ASCII characters. Thanks to @shrektan for reporting and fixing [PR#2678](https://github.com/Rdatatable/data.table/pull/2678), [#2674](https://github.com/Rdatatable/data.table/issues/2674).

39. Internal aliasing of `.` to `list` was over-aggressive in applying `list` even when `.` was intended within `bquote`, [#1912](https://github.com/Rdatatable/data.table/issues/1912). Thanks @MichaelChirico for reporting/filing and @ecoRoland for suggesting and testing a fix.
39. Internal aliasing of `.` to `list` was over-aggressive in applying `list` even when `.` was intended within `bquote`, [#1912](https://github.com/Rdatatable/data.table/issues/1912). Thanks @MichaelChirico for reporting/filing and @ecoRoland for suggesting and testing a fix.

#### NOTES

Expand Down
7 changes: 7 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -11631,6 +11631,13 @@ DT = data.table(x = 1:5, y = 6:10)
test(1901.1, DT[, bquote(z==.(sum(x)))], bquote(z==.(DT[, sum(x)])))
test(1901.2, DT[, .(.(bquote(z==.(sd(x-y)))))], data.table(V1=list(bquote(z==.(DT[, sd(x-y)])))))

# check quote rule detection logic, #2744
src = '"C\\\\D"\nAB\\x20CD\\n\n"\\"one\\", \\\'two\\\', three"\n"\\r\\t\\v\\a\\b\\071\\uABCD"\n'
test(1902, fread(src, verbose=TRUE),
data.table("C\\\\D"=c("AB\\x20CD\\n", "\\\"one\\\", \\'two\\', three", "\\r\\t\\v\\a\\b\\071\\uABCD")),
output="Quote rule picked = 1")


###################################
# Add new tests above this line #
###################################
Expand Down
7 changes: 5 additions & 2 deletions src/fread.c
Original file line number Diff line number Diff line change
Expand Up @@ -1440,7 +1440,7 @@ int freadMain(freadMainArgs _args) {
int topNumLines=0; // the most number of lines with the same number of fields, so far
int topNumFields=1; // how many fields that was, to resolve ties
char topSep=127; // which sep that was, by default 127 (ascii del) means no sep i.e. single-column input (1 field)
int topQuoteRule=0; // which quote rule that was
int topQuoteRule=-1; // which quote rule that was
int topNmax=1; // for that sep and quote rule, what was the max number of columns (just for fill=true)
// (when fill=true, the max is usually the header row and is the longest but there are more
// lines of fewer)
Expand Down Expand Up @@ -1479,6 +1479,8 @@ int freadMain(freadMainArgs _args) {
}
if (numFields[0]==-1) continue;
if (firstJumpEnd==NULL) firstJumpEnd=ch; // if this wins (doesn't get updated), it'll be single column input
// Even if numFields[i]==1 for all sep/QR combos, we still want to know which quote rule was able to parse the input correctly
if (topQuoteRule<0) topQuoteRule = quoteRule;
bool updated=false;
int nmax=0;

Expand Down Expand Up @@ -1511,6 +1513,7 @@ int freadMain(freadMainArgs _args) {
}
}
if (!firstJumpEnd) STOP("Internal error: no sep won");
if (topQuoteRule < 0) STOP("Quote rule never updated");
quoteRule = topQuoteRule;
sep = topSep;
whiteChar = (sep==' ' ? '\t' : (sep=='\t' ? ' ' : 0));
Expand Down Expand Up @@ -2203,7 +2206,7 @@ int freadMain(freadMainArgs _args) {
myNrow = 0; // discard my buffer
}
else if (headPos!=thisJumpStart) {
snprintf(internalErr, internalErrSize, "Internal error: invalid head position. jump=%d, headPos=%p, thisJumpStart=%p, sof=%p", jump, headPos, thisJumpStart, sof);
snprintf(internalErr, internalErrSize, "Internal error: invalid head position. jump=%d, headPos=%p, thisJumpStart=%p, sof=%p", jump, (void*)headPos, (void*)thisJumpStart, (void*)sof);
stopTeam = true;
}
else {
Expand Down