Configurable HTML to MarkDown converter
There are planty of converters from HTML to MD, but there are a lot of custom HTML tags and MarkDown dialects. So that this project aims to provide a configurable converter, in which the conversion rules can be altered.
I also use this project to learn Python :)
- Read input
- Transform input by simple transformations (e.g. string replace)
- Process HTML and translate to MD
- Transform output by simple transformations
- Write result
Transformations are configured in pretransform.txt and postransform.txt
Currently available transformation methods:
LinkFixer()Replace(target,replacemnet)RemoveWhiteSpace()
Rules are configured in rulebook.txt
Currently available conversion commands:
Config(config_name, value)Ignore()Indent(indentation_prefix, is_firstline_indents)IndentIn(indentation_prefix, is_firstline_indents, tag_list)Strip()Table(prefix, suffix)Wrap(prefix, suffix [, allow_empty, line_by_line])WrapIn(prefix, suffix[, allow_empty, line_by_line], tag_list)WrapOut(prefix, suffix[, allow_empty, line_by_line], tag_list)WrapWithAttribute(prefix, suffix, attr_name, attr_prefix, attr_suffix)
For details please check the rulebook.txt file
main.py [input_dir] [output_dir]
Where
input_diris the directory in which the html files areoutput_diris the directory to write .md files to in the same hierarchy as in theinput_dir