Skip to content
/ Magic-Doc Public
forked from magicpdf/Magic-Doc

conversion doc(pdf/html/doc/docx/ppt/pptx)to markdown

License

Notifications You must be signed in to change notification settings

DTwz/Magic-Doc

 
 

Repository files navigation

Install

Install Dependencies

linux/osx

apt-get/yum/brew install libreoffice

windows

install libreoffice 
append "install_dir\LibreOffice\program" to ENVIRONMENT PATH

Install Magic-Doc

git clone https://github.com/magicpdf/Magic-Doc (#TODO)
cd Magic-Doc
pip install -r requirements.txt
python setup.py install

Introduction

Magic-Doc is a lightweight open-source tool that allows users to convert mulitple file type (PPT/PPTX/DOC/DOCX/PDF) to markdown. It supports both local file and S3 file.

Example

from magic_doc.docconv import DocConverter, S3Config

s3_config = S3Config(ak='${ak}', sk='${sk}', endpoint='${endpoint}')
converter = DocConverter(s3_config=s3_config)
markdown_cotent, time_cost = converter("some_doc.pptx", "/tmp/convert_progress.txt", conv_timeout=300)

Performance

File Type Speed
PDF (digital) 347 (page/s)
PDF (OCR) 2.7 (page/s)
PPT 20 (page/s)
PPTX 149 (page/s)
DOC 600 (page/s)
DOCX 1482 (page/s)

All Thanks To Our Contributors:

License

This project is released under the Apache 2.0 license.

🔼 Back to top

About

conversion doc(pdf/html/doc/docx/ppt/pptx)to markdown

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 59.7%
  • XSLT 40.3%