Skip to content

csvy reading capabilities#2656

Merged
mattdowle merged 4 commits intomasterfrom
csvy_support
May 2, 2019
Merged

csvy reading capabilities#2656
mattdowle merged 4 commits intomasterfrom
csvy_support

Conversation

@MichaelChirico
Copy link
Copy Markdown
Member

@MichaelChirico MichaelChirico commented Mar 4, 2018

Closes #1701

adding @HughParsonage to review in anticipation of interactions with his new paradigm for handling colClasses.

  • it might be nice to have yaml = grepl('\\.csvy$', input) but I'm wary of using grepl since fread spends too much time in is_url/is_secureurl/is_file for long in memory input #2531. there should be a way to set has_yaml = 'auto' and deal with it like that but I haven't put too much thought into it.

  • could also consider letting yaml potential give the location of a separate file which contains the metadata

  • should spend some time ironing this into the standard laid out here to the extent reasonable, though some things should be ignored (e.g., standard calls for default column type to be string, but fread is capable of being smarter than that)

suggestions welcome.

Comment thread R/fread.R Outdated
@HughParsonage
Copy link
Copy Markdown
Member

Thanks. Happy to review, but I'll defer till PR #2545 is resolved.

@MichaelChirico MichaelChirico force-pushed the csvy_support branch 3 times, most recently from d3d6341 to 0da8cf4 Compare March 4, 2018 10:11
@MichaelChirico MichaelChirico changed the title (WIP) csvy reading capabilities csvy reading capabilities, #1705 Mar 4, 2018
@MichaelChirico MichaelChirico force-pushed the csvy_support branch 6 times, most recently from adf29d0 to fc63f9c Compare March 4, 2018 15:23
@MichaelChirico
Copy link
Copy Markdown
Member Author

hmm, seems appveyor didn't figure out how to add yaml newly added to Suggests (though Travis did), any suggestion?

@mattdowle
Copy link
Copy Markdown
Member

mattdowle commented Mar 5, 2018

@MichaelChirico At the top of tests.Rraw, yaml needs to be added to line 4 there too. I can't quite remember the reason the test suite has this extra step, but it's probably to do with allowing the compatibility tests (i.e. tests with Suggests packages) to be optional in dev where typically I skip all those to speed up the dev cycle. I realize that R CMD check can be told to ignore not-installed packages, but I need to skip those tests when those packages are installed, too. Which currently I do just by not require()'ing them in my dev R session. Perhaps it's better to split them off into a different file and have a logical argument to test.data.table() to run them or not.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Mar 5, 2018

Codecov Report

Merging #2656 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2656      +/-   ##
==========================================
+ Coverage   97.03%   97.05%   +0.02%     
==========================================
  Files          66       66              
  Lines       12558    12645      +87     
==========================================
+ Hits        12186    12273      +87     
  Misses        372      372
Impacted Files Coverage Δ
R/fread.R 99.6% <100%> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f057a66...9b1a462. Read the comment docs.

@mattdowle
Copy link
Copy Markdown
Member

mattdowle commented Mar 5, 2018

@MichaelChirico Ok, and now the new yaml tests need wrapping in a branch if ("yaml" %in% search()) ... Have a look at the other tests of any of the Suggests packages for the exact condition. On AppVeyor we don't run Suggests tests just to speed it up. It saves installing the (sometimes large) Suggests packages and all their dependencies in the VM image. It's just on Travis that all Suggests tests run too. We figure that in dev it's enough to test Suggests on one platform. It's nice for AppVeyor to return results quickly as it's Windows we're most interesting in after push, since we've dev'd and tested on Linux. In pre-release-to-CRAN checks, Suggests tests then run on win-builder because those packages are pre-installed there.

@MichaelChirico
Copy link
Copy Markdown
Member Author

OK, will do.

I'm on a holding pattern for this for now, as there was a recent change to the csvy standard and now some back-and-forth on perhaps updating it again:

leeper/csvy#13

@MichaelChirico MichaelChirico changed the title csvy reading capabilities, #1705 [holding] csvy reading capabilities, #1705 Mar 6, 2018
@jangorecki
Copy link
Copy Markdown
Member

jangorecki commented Mar 6, 2018

You can find CI job which runs all Suggests on Linux. Just push branch here and read only gitlab mirror will run Ci for it: https://gitlab.com/Rdatatable/data.table/pipelines
Recently tested commit in this branch raise doc mismatch warning https://gitlab.com/Rdatatable/data.table/-/jobs/55570937

@jangorecki jangorecki added this to the 1.12.0 milestone Jun 26, 2018
@jangorecki jangorecki modified the milestones: 1.12.0, 1.12.2 Jan 5, 2019
@mattdowle mattdowle removed this from the 1.12.2 milestone Jan 11, 2019
@MichaelChirico
Copy link
Copy Markdown
Member Author

FWIW, yaml is a dependency of knitr, which is already in our Suggests, so I don't think adding yaml to Suggests increases our total dependency burden

@MichaelChirico MichaelChirico changed the title [holding] csvy reading capabilities, #1705 csvy reading capabilities, #1705 Feb 10, 2019
@jangorecki
Copy link
Copy Markdown
Member

jangorecki commented Feb 10, 2019

Does it support to provide factor levels in yaml? If so please add simple tests for that. If not could we get that in scope or as new follow-up issue?
filled #3391

@MichaelChirico
Copy link
Copy Markdown
Member Author

MichaelChirico commented Feb 10, 2019

@jangorecki I don't see anything in the official spec re:factors, so I guess we'll be the market leaders there.

@mattdowle With that last commit, fread side is done; we can merge whenever (not sure if you want to wait for 1.12.3). Also thought of maybe doing a refactor to port some of that YAML-processing code outside of the fread body. Will do if you prefer.

I think I know how I'll do the fwrite side, but probably best to save for another PR.

@MichaelChirico
Copy link
Copy Markdown
Member Author

@mattdowle tried with using missing() instead of match.call(), but by the time we reach the yaml section some of the arguments have already been populated (and are no longer missing()), whereas the actual call returned by match.call() is more static.

Could add something at the head of fread along the lines of if (yaml) { missing_sep = missing(sep); missing_na.strings = missing(na.strings); ...} but I like the match.call() approach better... reverting

Comment thread R/fread.R Outdated
Comment thread R/fread.R
Comment thread R/fread.R
@MichaelChirico
Copy link
Copy Markdown
Member Author

So much for trying git pull origin master --rebase....

@jangorecki
Copy link
Copy Markdown
Member

@MichaelChirico another good option instead of rebase is git reset --soft <last common sha to master> then git stash, git pull upstream master, git stash pop, resolve conflicts

remove accidentally committed file
@MichaelChirico
Copy link
Copy Markdown
Member Author

@jangorecki thanks, seems I still couldn't get around a --force though. oh well

Comment thread appveyor.yml
@mattdowle mattdowle changed the title csvy reading capabilities, #1705 csvy reading capabilities May 2, 2019
@mattdowle mattdowle added this to the 1.12.4 milestone May 2, 2019
@mattdowle mattdowle merged commit 2e5336d into master May 2, 2019
@mattdowle mattdowle deleted the csvy_support branch May 2, 2019 21:05
Comment thread DESCRIPTION Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that zlib has to be present to install data.table, even if one is not going to use compression? I hope not

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zlib necessary to compile. Seems like it's standard though. Windows and Mac users installing binaries from CRAN won't need to install anything special, afaik.
If there's a way to make zlib optional, would be ideal of course. But there's plenty of other higher priority things to do first. I'd wait to see if anyone reports difficulties due to missing zlib first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CSVY support for fread()

6 participants