toc is HTML table of contents generator. It parses html, generate table of contents, and put anchors into original html.
toc_html, body = table_of_contents(html)
toc_html, body = table_of_contents(html, url='http://somedomain.com/somepath')
toc_html, body = table_of_contents(html, anchor_type='following-marker')
- anchor_type
- following-marker : Add anchor tag to the end of heading tags. Anchor text is
# - stacked-number : Add anchor tag to the begining of heading tags. Anchor text is like
1.2.3.
- following-marker : Add anchor tag to the end of heading tags. Anchor text is
toc_html: table of contentsbody: modified html
pip install toc
- toc use html5lib for html parser. It's much slower than the popular xml library for python, lxml, but parses more precisely, especially for html5.
- I don't think ElementTree is more pythonic than DOM. So I used
minidomfor treebuilder andpy-dom-xpathfor xpath.