Conversation
Add a new with_bom parameter to create a UTF-8 file with BOM. By default, with_bom is FALSE and fwrite creates a UTF-8 file without BOM. When with_bom is TRUE, BOM sequence (EF BB BF) is added at the beginning of the file but only when col.names is TRUE (default).
|
Awesome PR! Tackling For argument name, maybe |
| file <- path.expand(file) # "~/foo/bar" | ||
| if (append && missing(col.names) && (file=="" || file.exists(file))) | ||
| col.names = FALSE # test 1658.16 checks this | ||
| if (with_bom && !col.names) stop("with_bom can be TRUE only if col.names is TRUE") |
There was a problem hiding this comment.
Maybe just ignore with_bom here (possibly with warning)?
There was a problem hiding this comment.
Shouldn't with_bom && append be a warning/error/ignore case as well?
There was a problem hiding this comment.
Why BOM couldn't be used without col names? I recall using some excel data without col names.
There was a problem hiding this comment.
Because of the organization of fwrite : rows writing is threaded and every thread has its own buffer. It's difficult to write BOM in this section of fwrite. The header is written with its own buffer : with_bom uses this buffer and so col.names must be TRUE.
There was a problem hiding this comment.
Shouldn't
with_bom && appendbe a warning/error/ignore case as well?
You're right. I propose that when appending, bom is silently set to FALSE, like col.names.
See f877558. So stop is only when creating and col.names = FALSE.
| bom[1] <- as.raw(0xEF) | ||
| bom[2] <- as.raw(0xBB) | ||
| bom[3] <- as.raw(0xBF) | ||
| writeBin(bom, file) |
There was a problem hiding this comment.
Does this interact properly with append? I guess with_bom && append should be another error or ignore case
There was a problem hiding this comment.
I see, this is nested within if(yaml) so already append case is handled
There was a problem hiding this comment.
I would add internal stopifnot(!(bom && append)) # nocov just in case if the code will be moved later on, to ensure what Michael mentioned.
|
My vote for arg name would be just |
|
BOM is often written in uppercase. What do you think of |
so is YAML... |
|
Let's go for |
In some cases of appending, `append && missing(col.names) && (file=="" || file.exists(file))` col.names is set to FALSE. Now bom is silently set to FALSE too.
|
Could you clarify the relation to #1770? |
|
#1770 is not very clear and his title is fwrite UTF8, so it seems to be related. My understanding is that, in Windows world (I use Linux), utf-8 files often need to have a BOM to be recognized as UTF-8. If not, they may be considered as "Latin-1" and there are "encoding" problems. I'm french, so I meet often "Latin-1" files. I'm happy that |
|
OK, that's what I was thinking as well. We can leave it open and post in that thread seeing if it solves peoples' problems after merging. |
|
Hi @philippechataignon. Looks great. I'll try and get it passing now and merge. |
|
Good - passing now. I needed to change yaml tests 2033.06 and 2033.07 to pass. I just replaced with the new values without understanding why or if those are correct to change: @philippechataignon @MichaelChirico please check. |
|
I had a glance at the logs yesterday and saw that, seems familiar to an error I was getting on the original PR that was overcome by explicitly setting eol across platforms. Dunno why this PR affected it. maybe strip.white is needed? will check |
|
Investigating, not sure I'll have time to finish. I notice that with the interaction with I see: vs normal behavior |
|
@philippechataignon i believe the problem is that
|
|
@mattdowle @philippechataignon made some changes -- as noted, This solved the 2033.06/2033.07 problem (the indices in the tests were referring to the parts of the file written for earlier tests), and Travis is now passing... but the Windows problem is stubborn & I'm out of ideas for now. From the fail log: Perhaps the |
|
Latest commit tried using |
|
What do you think of removing compatibility between |
|
the I'm not sure 1.12.4 release is imminent so no need to write it off yet. that said, I'd be fine merging this and filing a follow-up to fix the windows bug. |
…d yaml and bom when no column names too
Done. Great idea. It also now writes the bom when no column names too. |
|
That's odd. I don't see why the two errors happen on Windows : |
Codecov Report
@@ Coverage Diff @@
## master #3580 +/- ##
==========================================
+ Coverage 97.62% 97.63% +<.01%
==========================================
Files 66 66
Lines 12696 12721 +25
==========================================
+ Hits 12395 12420 +25
Misses 301 301
Continue to review full report at Codecov.
|
|
nice Matt! do I understand right that your debugging for Windows here was to put a debugging tracer line then push & wait for appveyor & repeat? |
|
Yep. |
|
I'm expecting you and Philippe will likely need to revise. I just wanted to merge now to see if it passed the stronger tests in GL pipelines, and get it into master too so further work (follow up PRs) can be in a branch. |
|
gcc-8 -Wall -pedantic |
|
Thanks @jangorecki. Now fixed in c21ac65. |
|
@mattdowle we have |
Closes #3488
Related to #1770 too. It adds a parameter 'with_bom' in fwrite to create a UTF-8 file with BOM. By default, with_bom is FALSE and fwrite creates a UTF-8 file without BOM.
When with_bom is TRUE, BOM sequence (EF BB BF) is added at the beginning of the file.
with_bom is compatible with options
yaml = Tandcompress="gzip"