jpate/ShakesEM
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
Author: John K Pate
Release date: Jan 25 2010
E-mail: j.k.pate@sms.ed.ac.uk
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation, either version 3 of the License, or (at your option)
any later version.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.
This is the first release of the ShakesEM library for doing
Expectation-Maximization for Probabilistic Context Free Grammars. The library
may be compiled with simply:
$ scalac ShakesEM.scala
The name of the library, ShakesEM, is a reference to William Shakespeare due to
the library's use of Scala Actors for distributed processing.
The ``example'' directory shows a basic use of the library. It contains an
example grammar file, an example lexicon, a corpus of 10 (mostly nonsense)
sentences, and a directory that stores resulting grammars. The rest of the files
were generated with:
$ scala shakesEMExample toyGrammar.txt toyLexicon.txt testSentences.txt 2 \
0.001 exampleOutput/exampleRun &> exampleRun.log
The number following ``testSentences.txt'' in the above example corresponds to
the number of parsers that are started. You can start as many parsers as you
like, up to (and including) the number of sentences in your corpus. If you start
fewer parsers than you have processor cores, you will use as many cores as you
have parsers. If you start more parsers than you have processor cores, you will
use all your cores and the parsers will share computing resources transparently.
Note that both scalac and scala use the '-d' flag to decide where to place and
search for, respectively, JVM bytecode.
The ``scaladoc'' directory contains documentation generated by scaladoc (similar
to javadoc)