Search excel-, csv- and tsv-files through arbitrary columns for one or multiple search terms.
Following file-formats are supported: xls, xlsx, xlsm, xlsb, odf, ods, odt, csv & tsv
For the use of py_grep in your terminal there are two options.
- clone this git-repository to your computer using the command
git clone https://github.com/DataSpott/SGT-Analysis.git- if not already installed use the following command to install pip for python3 in your terminal
sudo apt install python3-pip- use the following command to set up the necessary python-modules
pip3 install pandas- to start py_grep use the following command in the repository directory
$PWD/py_grep.py --help- make sure docker is installed at your system as described under https://docs.docker.com/get-docker/
- locate your data to the git repository
- use following command in the repository directory to pull the docker-image to your system, mount all data in the current directory to the container-directory "/input"* and execute it
docker run --rm -it -v $PWD:/input dataspott/sgt_analyser:v0.9.1- inside the docker a python3 environment is pre-installed and py_grep can be executed as follows
/input/py_grep.py --helppy_grep offers support for xls-, tsv- and csv-files. Conversion is done automatically.
Following flags can be used to control the program:
Necessary flags
| Flag | Description |
|---|---|
| [-i] [--input] | Path to the input file to read the data from |
| [-s] [--search] | One or multiple terms to search for |
Optional flags
| Flag | Description |
|---|---|
| [-c] [--columns] | One or multiple columns of the table to search through. By default all columns will be searched [default: all]. |
| [-d] [--deact_search_str] | Deactivates the output of the information string (which terms were searched in which columns) in the first row of the result-file. |
| [-n] [--sheet] | Number or name of the excel-sheet, which is taken as input. You can also specify a range of sheets without whitespaces, e.g. '1-4'. By default all sheets of the file are taken [default: None]. |
| [-o] [--output] | Output-directory where py_grep creates its result-directory. By default the output-directory is the current working directory [default: $PWD]. |
| [-r] [--reverse] | Reverses the search, so that py_grep searches for everything except the search-terms given via the [--search]-flag. |
| [--md] | Activates output of the result-file in .md-format. |
| [--csv] | Activates output of the result-file in .csv-format. |
| [--tsv] | Deactivates output of the result-file in .tsv-format. |
| [--isnull] | Searches for NaN (Overwrites [-s]). |
| [--notnull] | Searches for notNaN (Overwrites [-s]). |
| [-h] [--help] | Shows the help-message. |