Skip to content

WGBH-MLA/transcript_converter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AAPB Transcript Converter

Python package to read MMIF files to create transcripts in AAPB Transcript JSON, WebVTT, and other formats, along with associated transcript metadata.

These routines require existing MMIF files containing annotations of audio, as produced by a CLAMS ASR app, like the CLAMS whisper-wrapper app.

The package is designed to be used by other Python modules by calling the mmif_to_all function. It is also used by clams-kitchen. A basic version of this package can be used from the CLI.

Installation

Clone the repository. Change to the repository directory and do a pip install . to install the package and its dependencies.

(For developers, do pip install -e . to install in editable mode.)

Usage

CLI

If you have an existing MMIF file, you can create a transcript in AAPB JSON and associated TPME, via the CLI, by running

aatc PATH/TO/YOURFILE.mmif

To see additional options, run

aatc -h 

Importing into other Python projects

This package is intended to be used in other Python projects, primarily via one primary function called mmif_to_all. That function takes a string of MMIF and returns a dictionary of strings containing transcripts and transcript metadata in various formats.

Sample code:

import transcript_converter as tc

print("transcript_converter version:", tc.__version__)

mmif_dirpath = "PATH/TO/YOUR/MMIF/DIR"
mmif_filename = "YOUR_ITEM.mmif"
mmif_path = mmif_dirpath + "/" + mmif_filename
with open( mmif_path, "r") as f:
    mmif_str = f.read()

d = tc.mmif_to_all( mmif_str, item_id="YOUR_ITEM_ID", mmif_filename=mmif_filename )

print("Keys in dictionary from `mmif_to_all`:")
for k in d:
    print(k)

print("TPME data from transcript in AAPB JSON format:")
print(d["tpme_aajson"])

For full usage details of the mmif_to_all function, see its docstring convert.py, or run

import transcript_converter as tc
help(tc.mmif_to_all)

About

Routines to convert MMIF transcripts into other formats and create transcript metadata

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages