Install Dependencies
linux/osx
apt-get/yum/brew install libreofficewindows
install libreoffice
append "install_dir\LibreOffice\program" to ENVIRONMENT PATH
Install Magic-Doc
git clone https://github.com/magicpdf/Magic-Doc (#TODO)
cd Magic-Doc
pip install -r requirements.txt
python setup.py installMagic-Doc is a lightweight open-source tool that allows users to convert mulitple file type (PPT/PPTX/DOC/DOCX/PDF) to markdown. It supports both local file and S3 file.
from magic_doc.docconv import DocConverter, S3Config
s3_config = S3Config(ak='${ak}', sk='${sk}', endpoint='${endpoint}')
converter = DocConverter(s3_config=s3_config)
markdown_cotent, time_cost = converter("some_doc.pptx", "/tmp/convert_progress.txt", conv_timeout=300)| File Type | Speed |
|---|---|
| PDF (digital) | 347 (page/s) |
| PDF (OCR) | 2.7 (page/s) |
| PPT | 20 (page/s) |
| PPTX | 149 (page/s) |
| DOC | 600 (page/s) |
| DOCX | 1482 (page/s) |
This project is released under the Apache 2.0 license.