-
Notifications
You must be signed in to change notification settings - Fork 1k
Speedup of unique.data.table #2474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4331dd9
5735c09
96dc495
4f2035f
aacf2b9
f9b9619
84c44d8
0178d4b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -134,10 +134,9 @@ setreordervec <- function(x, order) .Call(Creorder, x, order) | |
| # Maybe just a grep through *.R for use of these function internally would be better (TO DO). | ||
|
|
||
| # Don't use base::is.unsorted internally, because : | ||
| # 1) it returns NA if any(is.na(.)) where NAs are detected at R level, inefficiently | ||
| # 2) it uses locale whereas in data.table we control locale sorting independently (C locale currently, but | ||
| # 1) it uses locale whereas in data.table we control locale sorting independently (C locale currently, but | ||
| # "sorted" attribute will need an extra attribute "locale" so we can check if key's locale is the current locale) | ||
| # 3) wrapper needed, used to be : | ||
| # 2) wrapper needed, used to be : | ||
| # identical(FALSE,is.unsorted(x)) && !(length(x)==1 && is.na(x)) | ||
| # where the && was needed to maintain backwards compatibility after r-devel's change of is.unsorted(NA) to FALSE (was NA) [May 2013]. | ||
| # The others (order, sort.int etc) are turned off to protect ourselves from using them internally, for speed and for | ||
|
|
@@ -177,7 +176,14 @@ forderv <- function(x, by=seq_along(x), retGrp=FALSE, sort=TRUE, order=1L, na.la | |
| } else { | ||
| if (!length(x)) return(integer(0)) # to be consistent with base::order. this'll make sure forderv(NULL) will result in error | ||
| # (as base does) but forderv(data.table(NULL)) and forderv(list()) will return integer(0)) | ||
| if (is.character(by)) by=chmatch(by, names(x)) | ||
| if (is.character(by)) { | ||
| w = chmatch(by, names(x)) | ||
| if (anyNA(w)) stop("'by' contains '",by[is.na(w)][1],"' which is not a column name") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. heads up that
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here changing to
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Comment for completeness ... yep good point, we're now >= R 3.1 for other reasons, so can use |
||
| by = w | ||
| } | ||
| else if (typeof(by)=="double" && isReallyReal(by)) { | ||
| stop("'by' is type 'double' but one or more items in it are not whole integers") | ||
| } | ||
| by = as.integer(by) | ||
| if ( (length(order) != 1L && length(order) != length(by)) || any(!order %in% c(1L, -1L)) ) | ||
| stop("x is a list, length(order) must be either =1 or =length(by) and each value should be 1 or -1 for each column in 'by', corresponding to ascending or descending order, respectively. If length(order) == 1, it will be recycled to length(by).") | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is removing index now? Won't we end up with corrupted index here?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jangorecki hmm can't say I know... @mattdowle I think this came from your commit: aacf2b9
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK it will make index corrupted. Once #1762 will be solved we don't need to care about that anymore. Could you ensure there is a unit test for this index corruption now? @MichaelChirico update: probably solved already by Matt, details in mentioned issue.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jangorecki Yes, as you noted in #1762 the index is removed inside CsubsetDT. It's better to remove it in a central place at C level as close to where the update happens, rather than having to remember to clear the index each time we call CsubsetDT. It contains the comment "// clear any index that was copied over by copyMostAttrib() above, e.g. #1760 and #1734 (test 1678)"