Skip to content

add test against regression on #2041#3843

Merged
mattdowle merged 3 commits intomasterfrom
subset_utf_test
Sep 11, 2019
Merged

add test against regression on #2041#3843
mattdowle merged 3 commits intomasterfrom
subset_utf_test

Conversation

@MichaelChirico
Copy link
Copy Markdown
Member

Closes #2041

(also hoping Appveyor will let us know if the issue persists on Windows)

@shrektan
Copy link
Copy Markdown
Member

shrektan commented Sep 9, 2019

I think the test is not needed because it should have been addressed by previous PRs a long time ago, see #2566, #2678, and #3451 ... In fact, except for the recent two issues, I'm pretty sure everything works fine because I'm using data.table to handle large numbers of non-ASCII related problems on Windows every day.

@jangorecki
Copy link
Copy Markdown
Member

It is not needed as of now but we are adding it to be future proof

@shrektan
Copy link
Copy Markdown
Member

I mean the tests are already there.

data.table/inst/tests/tests.Rraw

Lines 11948 to 11964 in 3bb130d

# Fix the bug when keys contain non UTF8 strings #2566 #2462 #1826
utf8_strings = c("\u00e7ile", "fa\u00e7ile", "El. pa\u00c5\u00a1tas", "\u00a1tas", "\u00de")
latin1_strings = iconv(utf8_strings, from = "UTF-8", to = "latin1")
mixed_strings = c(utf8_strings, latin1_strings)
DT1 = data.table(x = mixed_strings, y = c(latin1_strings, utf8_strings), z = 1:10)
DT2 = copy(DT1)
setkey(DT1, x)
setkey(DT2, y)
# the ans is generated by `sort(c(utf8_strings, utf8_strings), method = "radix")`
# but we should not use radix sort in the test because it's introduced after R3.3.0
ans = c("El. pa\u00c5\u00a1tas", "El. pa\u00c5\u00a1tas", "fa\u00e7ile", "fa\u00e7ile",
"\u00a1tas", "\u00a1tas", "\u00de", "\u00de", "\u00e7ile", "\u00e7ile")
test(1864.1, DT1$x, ans)
test(1864.2, DT2$y, ans)
ans = c(1L, 6L, 2L, 7L, 3L, 8L, 4L, 9L, 5L, 10L)
test(1864.3, DT1[c(utf8_strings, latin1_strings), z], c(ans, ans))
test(1864.4, DT2[c(utf8_strings, latin1_strings), z], c(ans, ans))

@MichaelChirico
Copy link
Copy Markdown
Member Author

@shrektan thanks for tracking that down. I think that helps pin down why the test fails to compile on Windows.

To your point, this test is a little different though, right? That test is about keys on non-UTF8, this test has no keys & is just about subsetting.

@codecov
Copy link
Copy Markdown

codecov bot commented Sep 10, 2019

Codecov Report

Merging #3843 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3843   +/-   ##
=======================================
  Coverage   99.42%   99.42%           
=======================================
  Files          71       71           
  Lines       13423    13423           
=======================================
  Hits        13346    13346           
  Misses         77       77

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a804b0f...84cad3a. Read the comment docs.

@codecov
Copy link
Copy Markdown

codecov bot commented Sep 10, 2019

Codecov Report

Merging #3843 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3843   +/-   ##
=======================================
  Coverage   99.42%   99.42%           
=======================================
  Files          71       71           
  Lines       13408    13408           
=======================================
  Hits        13331    13331           
  Misses         77       77

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4eb90a1...cf8f020. Read the comment docs.

@mattdowle mattdowle added this to the 1.12.4 milestone Sep 11, 2019
@mattdowle mattdowle merged commit 010e624 into master Sep 11, 2019
@mattdowle mattdowle deleted the subset_utf_test branch September 11, 2019 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

join or select row by col with Chinese in

4 participants