Skip to content

added fread(file=URL) support#5097

Merged
mattdowle merged 8 commits intomasterfrom
fread_file_url
Aug 9, 2021
Merged

added fread(file=URL) support#5097
mattdowle merged 8 commits intomasterfrom
fread_file_url

Conversation

@ben-schwen
Copy link
Copy Markdown
Member

Closes #4952.

Refactors file download to separate function for better reuse/maintenance.

Copy link
Copy Markdown
Member

@mattdowle mattdowle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just these questions on first glance.

Comment thread R/fread.R Outdated
download.file(input, tmpFile, method=method, mode="wb", quiet=!showProgress)
# In text mode on Windows-only, R doubles up \r to make \r\r\n line endings. mode="wb" avoids that. See ?connections:"CRLF"
}
assign("file", tmpFile, envir=parent.env(environment()))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just return tmpFile?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe my view was locked in thinking about the different environments. A return is definitely the more straightforward way.

Comment thread R/fread.R Outdated
do.call(
on.exit, list(substitute(unlink(tmpFile)), add = TRUE),
envir = parent.frame()
)
Copy link
Copy Markdown
Member

@mattdowle mattdowle Aug 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the do.call here; i.e. can't it just be on.exit(unlink(tmpFile), add=TRUE) as before?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the do.call here, because we want to unlink the file only after the exiting the parent.frame() (which is the fread function). Using on.exit(unlink(tmpFile), add=TRUE) as before would lead to unlinking the file after leaving the newly added download_file function.

Comment thread R/fread.R Outdated
nThread=as.integer(nThread)
stopifnot(nThread>=1L)

is_url = function(x) any(startsWith(x, c("http://", "https://", "ftp://", "ftps://", "file://")))
Copy link
Copy Markdown
Member

@mattdowle mattdowle Aug 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I glanced at our backport of startsWith to support R prior to 3.3.0 (see utils.R) and it doesn't look like it supports a set like that. If so, our backport needs a tweak to support this usage then. Currently the PR would fail on GLCI after merge when it tests using R 3.1.0.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the usage here, before fiddling with the backport. This also seems okayish since supported set of URLs will probably not change.

@mattdowle mattdowle added this to the 1.14.1 milestone Aug 7, 2021
@mattdowle mattdowle merged commit 78da3bd into master Aug 9, 2021
@mattdowle mattdowle deleted the fread_file_url branch August 9, 2021 23:09
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fread only allows URLs on input, not file

4 participants