Python package to read MMIF files to create transcripts in AAPB Transcript JSON, WebVTT, and other formats, along with associated transcript metadata.
These routines require existing MMIF files containing annotations of audio, as produced by a CLAMS ASR app, like the CLAMS whisper-wrapper app.
The package is designed to be used by other Python modules by calling the mmif_to_all function. It is also used by clams-kitchen. A basic version of this package can be used from the CLI.
Clone the repository. Change to the repository directory and do a pip install . to install the package and its dependencies.
(For developers, do pip install -e . to install in editable mode.)
If you have an existing MMIF file, you can create a transcript in AAPB JSON and associated TPME, via the CLI, by running
aatc PATH/TO/YOURFILE.mmifTo see additional options, run
aatc -h This package is intended to be used in other Python projects, primarily via one primary function called mmif_to_all. That function takes a string of MMIF and returns a dictionary of strings containing transcripts and transcript metadata in various formats.
Sample code:
import transcript_converter as tc
print("transcript_converter version:", tc.__version__)
mmif_dirpath = "PATH/TO/YOUR/MMIF/DIR"
mmif_filename = "YOUR_ITEM.mmif"
mmif_path = mmif_dirpath + "/" + mmif_filename
with open( mmif_path, "r") as f:
mmif_str = f.read()
d = tc.mmif_to_all( mmif_str, item_id="YOUR_ITEM_ID", mmif_filename=mmif_filename )
print("Keys in dictionary from `mmif_to_all`:")
for k in d:
print(k)
print("TPME data from transcript in AAPB JSON format:")
print(d["tpme_aajson"])For full usage details of the mmif_to_all function, see its docstring convert.py, or run
import transcript_converter as tc
help(tc.mmif_to_all)