Skip to content

Allow 'type.convert' argument in tstrsplit function to support a function/list of functions/named list.#5099

Merged
mattdowle merged 12 commits intoRdatatable:masterfrom
Kamgang-B:master
Aug 13, 2021
Merged

Allow 'type.convert' argument in tstrsplit function to support a function/list of functions/named list.#5099
mattdowle merged 12 commits intoRdatatable:masterfrom
Kamgang-B:master

Conversation

@Kamgang-B
Copy link
Copy Markdown
Contributor

@Kamgang-B Kamgang-B commented Aug 13, 2021

Closes #5094.
In addition to the current behavior of the tstrsplit function, I suggest to extend its features by allowing its 'type.convert' argument
to support a function/list of functions/named list (where its names are functions).
In the case of a named list, the last element may not be named if it is a function.

New features suggested (The pieces of code in the list below are assumed to be the specification of the argument type.convert of the function tstrsplit.):
if:
1. type.convert=fun, then the function fun applies to each element of the transpose list.
2. type.convert=list(fun), --> same behavior.
3. type.convert=list(fun_1, fun_2, ..., fun_n) and the transpose list has length n, then fun_1 applies to the first element, fun_2 to the second element, etc.
4. type.convert=list(fun_1=c(1L, 3L)), then fun_1 applies to the first and third elements (and the remaining elements are kept as is; that is, as character vectors).
5. type.convert=list(fun_1=c(1L, 3L), fun_2=4:5), then fun_1 applies to the first and third elements, and fun_2 applies to the fourth and fifth elements (and the remaining elements are kept as is).
6. type.convert=list(fun_1=c(1L, 3L), fun_2=4:5, fun_3), --> same but the fun_3 applies to the remaining elements; so they are not kept as is.

Any new behavior different from those above is likely unexpected.

Any critic is welcome.

I can also extend or shrink the features above in case there is a need.

To show how these new features work in practice, here are some illustrative examples:

library(data.table)
options(datatable.print.class=TRUE)

DT = data.table(
  w = c("Yes/F", "No/M"),
  x = c("Yes 2000-03-01 A/T", "No 2000-04-01 E/R"),
  y = c("1/1/2", "2/5/2.5"),
  z = c("Yes/1/2", "No/5/3.5"),
  v = c("Yes 10 30.5 2000-03-01 A/T", "No 20 10.2 2000-04-01 E/R"))

#         w                    x           y          z                            v
#    <char>               <char>      <char>     <char>                       <char>
# 1:  Yes/F   Yes 2000-03-01 A/T       1/1/2    Yes/1/2   Yes 10 30.5 2000-03-01 A/T
# 2:   No/M    No 2000-04-01 E/R     2/5/2.5   No/5/3.5    No 20 10.2 2000-04-01 E/R

# convert each element in the transpose list to type numeric
DT[, tstrsplit(y, "/", type.convert=as.numeric)] 
#       V1    V2    V3
#    <num> <num> <num>
# 1:     1     1   2.0
# 2:     2     5   2.5

# convert each element in the transpose list to type factor
DT[, tstrsplit(w, "/", type.convert=as.factor)]
#        V1     V2
#    <fctr> <fctr>
# 1:    Yes      F
# 2:     No      M
DT[, tstrsplit(w, "/", type.convert=list(as.factor))]  # same

# can 'also' convert all character vectors to factors
DT[, tstrsplit(z, "/", type.convert=function(x) type.convert(x, as.is=FALSE))]
#        V1    V2    V3
#    <fctr> <int> <num>
# 1:    Yes     1   2.0
# 2:     No     5   3.5

# convert part (some elements) of the transpose list and leave another part (if there is any) unchanged.
DT[, tstrsplit(z, "/", type.convert=list(as.numeric=2:3))]
#        V1    V2    V3
#    <char> <num> <num>
# 1:    Yes     1   2.0
# 2:     No     5   3.5

# convert some elements to the corrisponding specified types and the remaining elements to a given type
DT[, tstrsplit(z, "/", type.convert=list(as.factor, as.integer, as.numeric))]          
#        V1    V2    V3
#    <fctr> <int> <num>
# 1:    Yes     1   2.0
# 2:     No     5   3.5

DT[, tstrsplit(z, "/", type.convert=list(as.factor=1L, as.integer=2L, as.numeric=3L))]
#        V1    V2    V3
#    <fctr> <int> <num>
# 1:    Yes     1   2.0
# 2:     No     5   3.5

# convert some elements to the corrisponding specified types and the remaining elements to a given type
DT[, tstrsplit(z, "/", type.convert=list(as.factor=1L, as.numeric))]
#        V1    V2    V3
#    <fctr> <num> <num>
# 1:    Yes     1   2.0
# 2:     No     5   3.5

# convert given elements to specific types and convert the remaining elements using 'type.convert' function.
DT[, tstrsplit(v, " ", type.convert=list(as.factor=1L, as.IDate=4L, function(x) type.convert(x, as.is=TRUE)))]
#        V1    V2    V3         V4     V5
#    <fctr> <int> <num>     <IDat> <char>
# 1:    Yes    10  30.5 2000-03-01    A/T
# 2:     No    20  10.2 2000-04-01    E/R

DT[, tstrsplit(w, "/", type.convert=list(as.factor=1:2, as.numeric))]
#        V1     V2
#    <fctr> <fctr>
# 1:    Yes      F
# 2:     No      M
# Warning message:
# In the argument 'type.convert', 'as.numeric' was ignored because all elements in the transpose list or elements corrisponding to indices specified in the 'keep' argument have already been converted. 

# errors: 
DT[, tstrsplit(z, "/", type.convert=list(as.factor, as.numeric))]      #  2-length type.convert vs 3-length output
DT[, tstrsplit(z, "/", type.convert=list(as.integer=2L), keep=5L)]     #  type.convert not contained in the keep argument
DT[, tstrsplit(w, "/", keep=integer())]                                # keep empty. CURRENT BEHAVIOR OF tstrsplit returns empty list with warnings.
DT[, tstrsplit(w, "/", type.convert=list())]                           # empty list non supported     

@mattdowle mattdowle added this to the 1.14.1 milestone Aug 13, 2021
@mattdowle
Copy link
Copy Markdown
Member

mattdowle commented Aug 13, 2021

LGTM. Great PR. Thanks.
Have invited you to be project member so among other things you can create branches in the main project now rather than in a fork. The invite should be a button you need to click to accept in your GitHub profile or projects page.
If you'd like your surname to be in DESCRIPTION (displayed on CRAN page here: https://cran.r-project.org/web/packages/data.table/index.html) instead of just "B" please let me know what it is.

@mattdowle mattdowle merged commit 94a1247 into Rdatatable:master Aug 13, 2021
@Kamgang-B
Copy link
Copy Markdown
Contributor Author

Hi @mattdowle
Yes, I prefer my full name (if possible).
name: Boniface Christian
Family name: Kamgang
person("Boniface Christian", "Kamgang")

mattdowle added a commit that referenced this pull request Aug 14, 2021
mattdowle added a commit that referenced this pull request Aug 14, 2021
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow the argument type.convert of tstrsplit to accept a named list.

3 participants