CSVs are a ubiquitous format for data transfer that are commonly riddled with issues. Most CSV libraries abort with an unhelpful error, CSV GP allows you to pinpoint these common issues with a CSV file, as well as export just the parsable lines from a file.
CSV GP can be used in three ways.
- Install rust
- Clone the repo and navigate into it
- Run
cargo install --path csv_gp - The
csv-gpcommand will now be available to run, please seecsv-gp --helpfor usage
Add the following to your Cargo.toml:
csv-gp = { git = "https://github.com/xelixdev/csv-gp", rev = "<optional git tag>" }
The library is available on PyPI, at https://pypi.org/project/csv-gp/ so you can just run:
pip install csv-gp
- Install rust
- Install (
pip install maturin) - Clone the repo
- Run
make all cd csv_gp_python && maturin develop
After installing the binary, the default usage is running csv-gp $FILE. This will print a diagnosis of the file. The command provides options to change the delimiter and the encoding of the file. See csv-gp -h for details.
Another option provided is --correct-rows-path which will export only the correct rows to the provided path.
The python library exposes two main functions, check_file and get_rows.
The check file function takes a path to file, the delimiter and the encoding (see https://github.com/xelixdev/csv-gp/blob/0f77c62841509c134a3bbe06ec178426e9c5aa10/csv_gp_python/csv_gp.pyi) and returns an instance of a class CSVDetails which provides details about the file. See the same file to see all the available attributes and their names/types.
If the valid_rows_output_path argument is provided to the function, only the correct rows will be exported to that path.
The get_rows once again takes a path to file, the delimiter and the encoding and additionally a list of row numbers. The function will then return the parsed cells for given rows. See the above file for the exact typing of the parameter and returned values.
- Update version numbers in
csv_gp_python/Cargo.tomlandcsv_gp/Cargo.toml - Run
cargo checkto update the lock files with new versions - Merge this change into main
- Create a new release on GitHub, creating a tag in the form
vX.Y.Z - The 'Publish' pipeline should begin running, and the new version will be published
Run cargo test.
Follow the instructions on compiling from source. Then you can run pytest.