textParser

textParser is a tool to extract useful information from academic papers.

Prerequisites

textParser needs the library pyparsing:

http://pyparsing.wikispaces.com/

Please download the package and put it under the main folder.

Of course you can also install pyparsing, then please modify "./scripts/chromatin.py"

Before parsing

Most of the cases the academic papers are in the format of pdf.

Please use pdfbox: https://pdfbox.apache.org/download.cgi. to transform the pdf file into txt file and put the file in the folder "./txt"

How to use

This parser has two parsing modes:

mode 0: histone grammar as X + Y + Z

mode "mapName": parsing protein/drugs/disease informatio.n

How to build the map

You need to build your own map to parse the information you want. The maps are in .dat format. Please put your maps in ./scripts/maps

To call the function

see the file "call.py"

Input should be the pmid of the article you want to parse.

Remember to put the txt file as well as the xml file of the article in the folder ./txt and ./xml

Where you can find your output

The outputs are in the folder ./csv. If you use mode 0 for parsing, the outputs are in ./csv/histone. If you use mode "mapName", the outputs are in ./csv/yourMapName.

The output format is .csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

textParser

Prerequisites

Before parsing

How to use

How to build the map

To call the function

Where you can find your output

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
csv		csv
scripts		scripts
txt		txt
xml		xml
README.md		README.md

shilingwang/textParser

Folders and files

Latest commit

History

Repository files navigation

textParser

Prerequisites

Before parsing

How to use

How to build the map

To call the function

Where you can find your output

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages