Adding plike by KyleHaynes · Pull Request #4129 · Rdatatable/data.table

KyleHaynes · 2019-12-18T22:01:40Z

Added %plike%.

codecov · 2019-12-18T22:34:45Z

Codecov Report

Merging #4129 (2099862) into master (2791043) will increase coverage by 0.13%.
The diff coverage is 100.00%.

❗ Current head 2099862 differs from pull request most recent head c476ec7. Consider uploading reports for the commit c476ec7 to get more accurate results

@@            Coverage Diff             @@
##           master    #4129      +/-   ##
==========================================
+ Coverage   99.47%   99.60%   +0.13%     
==========================================
  Files          75       72       -3     
  Lines       14808    13918     -890     
==========================================
- Hits        14730    13863     -867     
+ Misses         78       55      -23

Impacted Files	Coverage Δ
R/like.R	`100.00% <100.00%> (ø)`
src/fmelt.c	`99.00% <0.00%> (-1.00%)`	⬇️
src/ijoin.c	`95.29% <0.00%> (-0.18%)`	⬇️
src/fsort.c	`95.83% <0.00%> (-0.10%)`	⬇️
src/cj.c	`100.00% <0.00%> (ø)`
R/fcast.R	`100.00% <0.00%> (ø)`
R/fmelt.R	`100.00% <0.00%> (ø)`
R/frank.R	`100.00% <0.00%> (ø)`
R/fread.R	`100.00% <0.00%> (ø)`
... and 48 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 60a4553...c476ec7. Read the comment docs.

MichaelChirico · 2019-12-19T00:48:08Z


 5. `nafill` and `setnafill` gain `nan` argument to say whether `NaN` should be considered the same as `NA` for filling purposes, [#4020](https://github.com/Rdatatable/data.table/issues/4020). Prior versions had an implicit value of `nan=NaN`; the default is now `nan=NA`, i.e., `NaN` is treated as if it's missing. Thanks @AnonymousBoba for the suggestion. Also, while `nafill` still respects `getOption('datatable.verbose')`, the `verbose` argument has been removed.

+6. New convenience function `%plike%` which map the existing `like()` argument `perl`, [#3702](https://github.com/Rdatatable/data.table/issues/3702). `%plike%` uses Perl-compatible regular expression (PCRE) which extends on TRE and is more efficient. Thanks @KyleHaynes for the suggestion and PR.


If we're going to say "more efficient", it might help to either point to someone else's benchmark or add our own. I'm not sure it's widely known that perl=TRUE is faster (I certainly didn't).

PS I prefer to disambiguate -- memory efficient, or computationally efficient?

MichaelChirico · 2019-12-19T00:51:48Z

 # Don't use * or % like SQL's like.  Uses regexpr syntax - more powerful.
 # returns 'logical' so can be combined with other where clauses.
-like = function(vector, pattern, ignore.case = FALSE, fixed = FALSE) {
+like = function(vector, pattern, ignore.case = FALSE, fixed = FALSE, perl = FALSE) {


I don't think it's common enough to need another infix for it, but we might as well add useBytes as an argument too? Comes up occasionally when working with messy strings.

Or maybe even just change the signature to function(vector, pattern, ...) and pass it on, although grepl is more limited than grep...

MichaelChirico · 2020-02-02T08:11:24Z

Travis error is due to man/ issues:

Codoc mismatches from documentation object 'like':
like
  Code: function(vector, pattern, ...)
  Docs: function(vector, pattern, ignore.case = FALSE, fixed = FALSE,
                 perl = FALSE)
  Argument names in code not in docs:
    ...
  Argument names in docs not in code:
    ignore.case fixed perl
  Mismatches in argument names:
    Position: 3 Code: ... Docs: ignore.case

jangorecki · 2020-05-18T07:30:37Z

@KyleHaynes are you planning to work out this PR?

KyleHaynes · 2020-05-19T02:04:43Z

@jangorecki, Sorry for the delay. I've updated to a point I'm happy with it.

I note that the codecov/project isn't passing (not overly familiar with codecov) and not sure why (seems to reference some code commited by Matt quite some time ago)?

MichaelChirico · 2020-05-19T02:24:16Z

@KyleHaynes yea don't worry about that Codecov bit. A line got uncovered when moving to R 4.1... you'll see all the other PRs failing for the same reason

MichaelChirico · 2020-05-19T02:27:39Z


 14. Added support for `round()` and `trunc()` to extend functionality of `ITime`. `round()` and `trunc()` can be used with argument units: "hours" or "minutes". Thanks to @JensPederM for the suggestion and PR.

+15. New convenience function `%plike%` which map the existing `like()` argument `perl`, [#3702](https://github.com/Rdatatable/data.table/issues/3702). `%plike%` uses Perl-compatible regular expression (PCRE) which extends on TRE and is computationally more efficient. Thanks @KyleHaynes for the suggestion and PR.


This makes it seem like it's unequivocally better, but I've definitely found cases where perl=TRUE is slower, e.g.:

#4447

mattdowle · 2021-08-04T22:17:36Z

Sorry for the very long delay here.
I've added you to DESCRIPTION and invited you to be a project member which amongst other things allows you to create branches in the main project which makes it easier for others to push to the branch. The invite should be a button in your GitHub profile or projects page that you need to click to accept.
Many thanks.

KyleHaynes · 2021-08-04T22:34:59Z

Thanks @mattdowle (and @MichaelChirico); a small contribution, but thanks for accepting it!

Keep up the good work on such an invaluable package!

Kyle Haynes and others added 4 commits December 19, 2019 07:51

added %plike%

b982c42

fixed typo

e1077ed

fixed case

28f6b1e

Fixed tests.Rraw

044dd0e

MichaelChirico reviewed Dec 19, 2019

View reviewed changes

changes made based on feedback from @MichaelChirico

d626d2f

KyleHaynes and others added 3 commits May 19, 2020 08:17

Merge branch 'master' into adding_plike

16ddf62

Updated code error in like

66029fb

reverted back to explicit arguments for like.

2099862

MichaelChirico reviewed May 19, 2020

View reviewed changes

Merge branch 'master' into adding_plike

f7b5a56

MichaelChirico added this to the 1.14.1 milestone May 10, 2021

mattdowle added 3 commits August 4, 2021 15:44

Merge branch 'master' into adding_plike

1e2bf2e

more guarded efficiency wording

c50a6ef

Add Kyle to contributor list in DESCRIPTION

67a9cb9

restore x for the test

c476ec7

mattdowle merged commit d610642 into Rdatatable:master Aug 4, 2021

jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023

MichaelChirico mentioned this pull request Sep 23, 2024

Inconsistent code formatting / minor fixes to vignettes #6521

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding plike#4129

Adding plike#4129
mattdowle merged 13 commits intoRdatatable:masterfrom
KyleHaynes:adding_plike

KyleHaynes commented Dec 18, 2019

Uh oh!

codecov bot commented Dec 18, 2019 •

edited

Loading

Uh oh!

MichaelChirico Dec 19, 2019

Uh oh!

MichaelChirico Dec 19, 2019

Uh oh!

MichaelChirico commented Feb 2, 2020

Uh oh!

jangorecki commented May 18, 2020

Uh oh!

KyleHaynes commented May 19, 2020

Uh oh!

MichaelChirico commented May 19, 2020

Uh oh!

Uh oh!

MichaelChirico May 19, 2020

Uh oh!

mattdowle commented Aug 4, 2021

Uh oh!

KyleHaynes commented Aug 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		5. `nafill` and `setnafill` gain `nan` argument to say whether `NaN` should be considered the same as `NA` for filling purposes, [#4020](https://github.com/Rdatatable/data.table/issues/4020). Prior versions had an implicit value of `nan=NaN`; the default is now `nan=NA`, i.e., `NaN` is treated as if it's missing. Thanks @AnonymousBoba for the suggestion. Also, while `nafill` still respects `getOption('datatable.verbose')`, the `verbose` argument has been removed.

		6. New convenience function `%plike%` which map the existing `like()` argument `perl`, [#3702](https://github.com/Rdatatable/data.table/issues/3702). `%plike%` uses Perl-compatible regular expression (PCRE) which extends on TRE and is more efficient. Thanks @KyleHaynes for the suggestion and PR.


		14. Added support for `round()` and `trunc()` to extend functionality of `ITime`. `round()` and `trunc()` can be used with argument units: "hours" or "minutes". Thanks to @JensPederM for the suggestion and PR.

		15. New convenience function `%plike%` which map the existing `like()` argument `perl`, [#3702](https://github.com/Rdatatable/data.table/issues/3702). `%plike%` uses Perl-compatible regular expression (PCRE) which extends on TRE and is computationally more efficient. Thanks @KyleHaynes for the suggestion and PR.

Conversation

KyleHaynes commented Dec 18, 2019

Uh oh!

codecov bot commented Dec 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MichaelChirico Dec 19, 2019

Choose a reason for hiding this comment

Uh oh!

MichaelChirico Dec 19, 2019

Choose a reason for hiding this comment

Uh oh!

MichaelChirico commented Feb 2, 2020

Uh oh!

jangorecki commented May 18, 2020

Uh oh!

KyleHaynes commented May 19, 2020

Uh oh!

MichaelChirico commented May 19, 2020

Uh oh!

Uh oh!

MichaelChirico May 19, 2020

Choose a reason for hiding this comment

Uh oh!

mattdowle commented Aug 4, 2021

Uh oh!

KyleHaynes commented Aug 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Dec 18, 2019 •

edited

Loading