textParser is a tool to extract useful information from academic papers.
textParser needs the library pyparsing:
http://pyparsing.wikispaces.com/
Please download the package and put it under the main folder.
Of course you can also install pyparsing, then please modify "./scripts/chromatin.py"
Most of the cases the academic papers are in the format of pdf.
Please use pdfbox: https://pdfbox.apache.org/download.cgi. to transform the pdf file into txt file and put the file in the folder "./txt"
This parser has two parsing modes:
mode 0: histone grammar as X + Y + Z
mode "mapName": parsing protein/drugs/disease informatio.n
You need to build your own map to parse the information you want. The maps are in .dat format. Please put your maps in ./scripts/maps
see the file "call.py"
Input should be the pmid of the article you want to parse.
Remember to put the txt file as well as the xml file of the article in the folder ./txt and ./xml
The outputs are in the folder ./csv. If you use mode 0 for parsing, the outputs are in ./csv/histone. If you use mode "mapName", the outputs are in ./csv/yourMapName.
The output format is .csv