Skip to content

fread for huge fixed width files with smart handing of trailing spaces #1345

@mikedolanfliss

Description

@mikedolanfliss

Hi there, big fan of DT and fread.

I've got large (>4gig) files dumped out as flat, undelimited, fixed-ish* width files from SQL server. I say "-ish*" because, and I guess this is standard, the files have regular start positions but no true fixed width to the lines. That is, there may be trailing spaces or not on the final element in the record - so I know where every field starts, but not where every record ends. Of course, this is dumb, but SQL server is likely not to change its ways, and I don't have control of the output (to get in CSV, etc.)

Could fread be extended for true fixed width and fixed-widthISH files like this? Existing fwf readers (LaF, read.fwf) can't handle this problem yet either. Stack Exchange pros have suggested a pretty complicated awk work around... but it'd be nice to have it push-button in fread.

Thanks for considering!
mike

Sidenote: SQL also dumps a format file, something like below, that I can use to easily suck out start positions for the fields:
https://msdn.microsoft.com/en-us/library/ms191516.aspx

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions