Chapter 03 #24
sunagparasu
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What is semanticClimate?
semanticClimate is a project which aims to convert climate related documents by the IPCC to a semantic form, understood by both humans and computers.
Why are such conversions required?
The goal is to extract useful information from the IPCC documents so that relevant information could be made accessible for everyone to read and understand. This is achieved by creating dictionaries of different types for different purposes.
What is chapter 03 about?
Summary of the Chapter
Environment and Origin
LinuxOS - Debian
python version 3.8.0
docanalysis version 0.2.0
py4ami version 0.0.45
IDE used - Anaconda v.3.0
Tools used: py4ami and docanalysis
How we set up the software:
./anaconda-navigatorpip install py4amiandpip install docanalysisHow to create dictionaries?
We create 3 different dictionaries which are an abbreviation dictionary, a manual dictionary, and a keyword dictionary.
Create a .html file from the specific chapter PDF by using the
py4amiprogram. This can be achieved by following the steps below:python -m py4ami.ami_pdf --inpath /home/anything/Documents/semanticClimate/ipcc/ar6/wg3/Chapter03/fulltext.pdf --outdir {insert output directory here} --maxpage 110The abbreviation dictionary is created by docanalysis using the spacy method using the steps mentioned below:
mkdir wiki_hackathoncd wiki_hackathonmkdir Chapter03cd Chapter03mkdir sectionscd sectionsmkdir 0_main_body.htmlfile of your chapter inside the 0_main_body directory and then enter the following command:docanalysis --project_name wiki_hackathon --output dict_search_5.csv --make_json dict_search_5.json --make_ami_dict entities --extract_abb ip_3_3_urban_abbwhere,
--project_name– the name of the project (here, wiki_hackathon)--output- a csv for dictionary search (not of our use, but required to be created)--make_json- just enter this. Not of current use, but required.--make_ami_dict– uses the entities created in the above command.--extract_abb- the abbreviation dictionary that is the output.The keywords dictionary is created using the keyword extraction program which uses the gensim method.
The manual dictionary is manually created by the chapter champions from reading the chapter and picking out words or bi-grams that are less frequently used or are difficult to understand in the context of the report.
Chapter Annotation:
The complete chapter
.htmlfile is annotated with the abbreviations from the abbreviations dictionary to make it easier to read. It is done using the following command:py4ami HTML --annotate --dict <dict_path> --inpath <html_path> --outpath <outdir_path> --color <color>where,
dict_path– dictionary used for annotation.html_path– html file to be annotated.outdir_path– output directory for annotated html file.color- color to highlight annotation (symbolic name, egYELLOW, or RGB, eg'#ff7700').Beta Was this translation helpful? Give feedback.
All reactions