topn for efficiently doing sorted head/tail

Inspired by Matt's observation here: https://github.com/Rdatatable/data.table/pull/3604/commits/e1ac66373f7f3d20dfa0299f2accddc1b40a0e5d

`DT[topn(score, 5L)]` also looks nicer than `DT[order(score)[1:5]]` or `DT[order(score)][1:5]`.

A quick search suggests two possible implementations which might be better in one situation or another:

https://stackoverflow.com/questions/4956593/optimal-algorithm-for-returning-top-k-values-from-an-array-of-length-n

Will take a look at feasibility to implement cleanly.

Just looked now and `dplyr` also has `top_n` but seems they implement it inefficiently:

```
dplyr:::top_n_rank
function (n, wt) 
{
    if (n > 0) {
        min_rank(desc(wt)) <= n
    }
    else {
        min_rank(wt) <= abs(n)
    }
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

topn for efficiently doing sorted head/tail #3804

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

topn for efficiently doing sorted head/tail #3804

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions