-
Notifications
You must be signed in to change notification settings - Fork 1k
replace substring globally with substr #4447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
ab7522f
replace substring globally with substr [efficiency]
826d04d
fix mistakes
467a591
one more
683349b
add some tests
6aef8ff
Merge branch 'master' into substring-substr
mattdowle 001a2ba
tweaks
mattdowle e8ac4ee
inherits=FALSE in the test too
mattdowle File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing out loud ... I suspect that passing
stop=1000000Lhere instead ofnchar(..name)would work, and avoid the allocation for thenchar()result.Passing
.Machine$integer.maxmight be more robust. Butsubstring()'s default forlast=is a hard coded1000000L. Maybe R's internal max string length is indeed1000000Lcharacters, but if they ever change that they'll have to remember to updatesubstring'slast=argument. If R's maximum string length is greater than1000000Lthensubstring's argument is already not correct and a bug report could be created.So, let's see ...
So that's a bug in R it seems. I looked at
?substringand although the default of 1000000 is there in the definition, I don't see any text indicating this is intended behaviour. And I see no reason that the default could not be 2^31-1 which is what the max string length is in R, iiuc. Do you feel like reporting it @MichaelChirico? I feel like you get some traction on BugZilla and enjoy interacting there. So I'd appreciate it if you could handle that :-)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a look. My own quick investigation:
So I agree it seems like an odd default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posted here: https://stat.ethz.ch/pipermail/r-devel/2021-June/080841.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Interested to see what their response is. I think my short example would have been better to post as-is, as the post to me on first reading dwells too much on
.Machine$integer.maxvs.Machine$integer.max-1. A phrase like "makes no sense" can be seen as critical and therefore risks raising hairs and inducing defensive responses. Better just to stick to code, use less English, and be constructive. But who am I to give advice, I didn't even want to engage on R-devel or Bugzilla myself because these difficulties are magnified 10x on those forums. You are braver than I.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the replies now and saw that Brodie concentrated on
.Machine$integer.maxvs.Machine$integer.max-1. That to me is inefficient communication, as I feared. I see also time spent on defending why the default was 1000000, as I predicted. But it got there in the end which is good. It could have been more efficient by just pointing to the problem (my short example) and leaving them to come up with the best solution. But again: forum threads like this is why I don't engage, so you're braver than me.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can guarantee you that some people (perhaps most, perhaps a few, certainly not all, but definitely some) reading that thread now think that the problem is to do with strings that are 2GB long. They therefore think that that it's a silly edge case that hardly ever comes up and folk that do have strings that are 2GB long shouldn't be doing that anyway and instead be restructuring their code.
Where in fact, the problem occurs at 1e6 characters, which is 976K. At under 1MB, let alone GB, a string that large is relatively much more reasonable, commonplace and has relatively little to do with large servers or esoteric edge cases.
That view being formed could have been avoided by using the 1e6 example that I showed above, which is why I created it that way with that in mind. So I should have written "please post this example" and been explicit in that way. I wrote these comments for future reference, for next time.