-
Notifications
You must be signed in to change notification settings - Fork 35
Description
Hi.
I have to inflate a .csv.gz file which should return a 4 GB CSV with 25 million rows.
When I use an app or the gzip command line, I get the full file without issue.
When I use Zlib::GzipReader, only the first row is returned.
> Zlib::GzipReader.open("adresses-france.csv.gz") { |gz| print gz.read }
id;id_fantoir;numero;rep;nom_voie;code_postal;code_insee;nom_commune;code_insee_ancienne_commune;nom_ancienne_commune;x;y;lon;lat;type_position;alias;nom_ld;libelle_acheminement;nom_afnor;source_position;source_nom_voie;certification_commune;cad_parcelles
=> nilThe file is provided by the french government:
- the directory: https://adresse.data.gouv.fr/data/ban/adresses/latest/csv
- the file: https://adresse.data.gouv.fr/data/ban/adresses/latest/csv/adresses-france.csv.gz
There are many other files in the directory (for each region) but I cannot reproduce the issue with other files.
This service also provided a similar file in Addok format (https://adresse.data.gouv.fr/data/ban/adresses/latest/addok/adresses-addok-france.ndjson.gz) which should return a 3GB file with 2 million rows, but only the 25k first rows are returned by Zlib::GzipReader.
Is there any limit to what Zlib can support ? (size, rows, ..)
Does it come from the compressed file ?