Skip to content

IstvanOri/HTML2MD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyTest

HTML2MD

Configurable HTML to MarkDown converter

There are planty of converters from HTML to MD, but there are a lot of custom HTML tags and MarkDown dialects. So that this project aims to provide a configurable converter, in which the conversion rules can be altered.

I also use this project to learn Python :)

Mechanism

  1. Read input
  2. Transform input by simple transformations (e.g. string replace)
  3. Process HTML and translate to MD
  4. Transform output by simple transformations
  5. Write result

Transformations

Transformations are configured in pretransform.txt and postransform.txt

Currently available transformation methods:

  • LinkFixer()
  • Replace(target,replacemnet)
  • RemoveWhiteSpace()

Rules

Rules are configured in rulebook.txt

Currently available conversion commands:

  • Config(config_name, value)
  • Ignore()
  • Indent(indentation_prefix, is_firstline_indents)
  • IndentIn(indentation_prefix, is_firstline_indents, tag_list)
  • Strip()
  • Table(prefix, suffix)
  • Wrap(prefix, suffix [, allow_empty, line_by_line])
  • WrapIn(prefix, suffix[, allow_empty, line_by_line], tag_list)
  • WrapOut(prefix, suffix[, allow_empty, line_by_line], tag_list)
  • WrapWithAttribute(prefix, suffix, attr_name, attr_prefix, attr_suffix)

For details please check the rulebook.txt file

How to Run

main.py [input_dir] [output_dir]

Where

  • input_dir is the directory in which the html files are
  • output_dir is the directory to write .md files to in the same hierarchy as in the input_dir

About

Configurable HTML to MarkDown converter

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published