Skip to content

SIMPATICOProject/simpa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SimPA corpus

This is the first release of the SimPA corpus. It contains:

  • 1,100 original sentences
  • 3,300 lexical simplifications (3 for each original sentence)
  • 1,100 syntactic simplifications (1 for each original sentence)

The lexical and syntactic simplifications were done in two steps. Firstly, sentences were lexically simplified by volunteers. Then, a set of the lexically simplified sentences were syntactically simplified (also by volunteers).

This corpus is divided in five files:

  • ls.original: original sentences before lexical simplification (this file contains repetitions - each original is repeated three times)
  • ls.simplified: lexical simplifications for each entry of ls.original
  • ss.original: original sentences before lexical and syntactic simplification
  • ss.ls-simplified: lexically simplified sentences used as input for the syntactic simplification task
  • ss.simplified: syntactic simplifications for each entry of ss.ls-simplified

Citing SimPA

Carolina Scarton, Gustavo Henrique Paetzold and Lucia Specia (2018): SimPA: A Sentence-Level Simplification Corpus\ for the Public Administration Domain. To appear in Proceedings of LREC 2018, Miyazaki, Japan. [PDF] [BIBTEX]

About

SimPA corpus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published