The current line-based buffering for HTML tables adds significant complexity to the main process_stream function with additional state variables (html_buf, html_depth, in_html).
Suggested improvement:
Extract HTML-to-Markdown conversion into a separate preprocessing step that:
- Runs before the main table reflow logic
- Simplifies the main processing loop
- Improves separation of concerns
- Makes the code more maintainable and testable
This would create a cleaner architecture where HTML tables are converted to Markdown tables first, then processed through the existing Markdown table reflow logic.
Related:
The current line-based buffering for HTML tables adds significant complexity to the main
process_streamfunction with additional state variables (html_buf,html_depth,in_html).Suggested improvement:
Extract HTML-to-Markdown conversion into a separate preprocessing step that:
This would create a cleaner architecture where HTML tables are converted to Markdown tables first, then processed through the existing Markdown table reflow logic.
Related: