[BUG] dcast.data.table leads to inconsistent keying with NA

Hello all, 

this is my very first post on Github and I want to thank the creators of data.table for their amazing job!

I currently use data.table to deal with huge datasets (50 datasets of 15 millions rows / dozens of columns each) and I noticed a strange behavior of dcast.data.table  which should likely be considered as a bug (although it could just be a misunderstanding of data.table’s internals).

Many thanks to those who will take the time to understand and address my problem, 

Galileo

P.S : I looked for a similar issue on SO and Github, but did not encountered it, so I decided to open a new issue. please let me know if a similar bug has already been reported.

P.S.2 : The data.table version I use is the  "data.table_1.10.4" which seems to be [the latest on CRAN](https://github.com/Rdatatable/data.table/wiki/Installation).

-------------------------------------------------------------------------------
### Bug description
dcast data.table marks as « keyed » the cast data.table, while it handles NA differently than setkey. This leads to inconsistent results when further merging. 

To my understanding, and related to [this SO answer]( https://stackoverflow.com/questions/30384431/how-does-data-table-sort-na-values-on-key-columns), setkey considers NA as large negative integer and sorts them consistently with `base:: sort(x,na.last=FALSE)`. This seems to be the desired behaviour of data.table
dcast.data.table seems to not sort the NA.

### Minimal Reproducible Example
First, I create the dataset :
```
# first create a toy dataset
toy_data = structure(list(id = structure(c(1L, 1L, 1L, 1L, NA, NA),
                                         .Label = "123456", class = "factor"),
                          factor_var = structure(c(1L, 1L, 1L, 1L, 1L, 1L),
                                                 .Label = "U", class = "factor"),
                          num_var1 = c(0,300, 600, 500, 0, 800),
                          num_var2 = c(0,15, 50, 30, 0, 50)),
                     .Names = c("id", "factor_var", "num_var1", "num_var2"),
                     row.names = c(NA, -6L),
                     class = c("data.table","data.frame"))

#      id factor_var num_var1 num_var2
# 1: 123456          U        0        0
# 2: 123456          U      300       15
# 3: 123456          U      600       50
# 4: 123456          U      500       30
# 5:    NA          U        0        0
# 6:    NA          U      800       50

#REMARK1 : The dataset is NOT KEYED!
key(toy_data_agg )
#REMARK2 : The dataset the factor has only one level here, but not in my real data so It is very unlikely that it is the cause of the error
```
I want to aggregate some values by id, since it is the only relevant information :
```
# Aggregate by id
toy_data_agg = toy_data[,.(num_var1 = sum(num_var1),
                           num_var2 = sum(num_var2)),
                        by=.(id,factor_var)]   
#      id factor_var num_var1 num_var2
# 1: 123456          U     1400       95
# 2:    NA          U      800       50

#REMARK1 : The dataset is NOT KEYED!
key(toy_data_agg )
# REMARK2 : NA appears in last !
```
I want to cast the data.table to turn rows as columns (in the real dataset, I have much more variables and levels in factor_var) :
```
# cast by factor variable
toy_data_cast = dcast.data.table(data = toy_data_agg,
                                 formula = id ~ factor_var,
                                 value.var = c("num_var1","num_var2"))
#      id num_var1_U num_var2_U
# 1: 123456       1400         95
# 2:    NA        800         50
#REMARK1 : The dataset is now KEYED, without any warning!
key(toy_data_cast)
# REMARK2 : NA appears in LAST! THIS SEEMS INCONSISTENT WITH KEYING NA? SEE BELOW.
```
This is where the bug lies : the dataset is marked as « keyed » while this is not coherent with setkey.
```
setkey(toy_data_cast,id) 
# Warning message:
# In setkeyv(x, cols, verbose = verbose, physical = physical) :
# Already keyed by this key but had invalid row order, key rebuilt. If you didn't go under the hood 
# please let datatable-help know so the root cause can be fixed.

toy_data_cast
#      id num_var1_U num_var2_U
# 1:    NA        800         50
# 2: 123456       1400         95
# REMARK : NA appears in FIRST!
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] dcast.data.table leads to inconsistent keying with NA #2202

Bug description

Minimal Reproducible Example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] dcast.data.table leads to inconsistent keying with NA #2202

Description

Bug description

Minimal Reproducible Example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions