Skip to content

fread error: Sampling jump point is before the last jump ended #2173

@markdanese

Description

@markdanese

RxTerms201704.txt
I am reading in a pipe delimited file from a fresh session. See below. This is on the development version. I tested 1.10.4 and it read in fine. Attaching the problematic file. Running Apple Sierra 10.12.4 using RStudio 1.0.143.

As always, thanks for all your work on data.table.

R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(data.table)
data.table 1.10.5 IN DEVELOPMENT built 2017-05-15 22:57:07 UTC; travis
  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
  Release notes, videos and slides: http://r-datatable.com
> rxterms <- fread("./data/base/RxTerms201704/RxTerms201704.txt", sep = "|", verbose = TRUE)
Input contains no \n. Taking this to be a filename to open
NAstrings = [<<NA>>]
None of the NAstrings are numeric (such as '-9999').
`filename` argument given, attempting to open a file with such name
File opened, size 0.005821 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Positioned on line 1 starting: <<RXCUI|GENERIC_RXCUI|TTY|FULL_N>>
Using supplied sep '|'
  sep=='|'(ascii 124)  with 100 lines of 18 fields using quote rule 0
Detected 18 columns on line 1. This line is either column names or first data row (first 30 chars): <<RXCUI|GENERIC_RXCUI|TTY|FULL_N>>
All the fields on line 1 are character fields. Treating as the column names.
Number of sampling jump points = 101 because 6250296 bytes from row 1 to eof / (2 * 30729 jump0size) == 101
Type codes (jump 000)    : 226666666666613666  Quote rule 0
Error in fread("./data/base/RxTerms201704/RxTerms201704.txt", sep = "|",  : 
  Internal error: Sampling jump point 79 is before the last jump ended

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions