Skip to content

add support for plain table conversion#10

Open
rmendocna wants to merge 1 commit intodlon:masterfrom
rmendocna:master
Open

add support for plain table conversion#10
rmendocna wants to merge 1 commit intodlon:masterfrom
rmendocna:master

Conversation

@rmendocna
Copy link
Copy Markdown

Hi,
First 'pull-request' in quite a long, not sure how this works.
Basically suggesting a non-recursive way of markdownifying an html table with plain output in each cell.
For my own purposed I aimed at preserving column widths across rows and observing any given style="text-align: xxx" cell attribute.
Cell contents are rendered as plain tag.text but I believe you can easily improve this to deliver markdownified contents.

@dlon
Copy link
Copy Markdown
Owner

dlon commented Aug 8, 2019 via email

Copy link
Copy Markdown
Owner

@dlon dlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch! Leaving some notes (for myself, if you don't want to update it).

It fails with Python 3 at the moment, though it's easy to fix.

It will have to be stricter before being merged into master. For example, tbody and thead are ignored at the moment, and if <th> isn't present, an invalid table is still generated.

Finally, since there is no standard table syntax in Markdown, the conversion of tables should perhaps be optional.

Comment thread html2markdown.py
widths = []
aligns = []
output = []
if tag.children.next().name in ['thead', 'tbody']:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag.children.next().name fails with Python 3.7. children[0].name may suffice.

Comment thread html2markdown.py
widths = []
aligns = []
output = []
if tag.children.next().name in ['thead', 'tbody']:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GFM tables consist of a header of one row, with the remaining rows lying within the body, so tables should probably only be allowed if they conform to this layout:

<table>

<thead>

<tr>
<th>One</th>
<th>Two</th>
<!-- <th> more columns... </th> -->
</tr>

</thead>

<tbody>

<tr>
<td>1</td>
<td>2</td>
</tr>

<tr>
<td>3</td>
<td>4</td>
</tr>

<!-- <tr> ... </tr> -->

</tbody>

</table>

Comment thread html2markdown.py
tag.insert_before('`` ')
tag.insert_after(' ``')
tag.unwrap()
elif tag.name == 'table':
Copy link
Copy Markdown
Owner

@dlon dlon Aug 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tags within tables should be processed, eg <strong>. (As you noted.)

Caveat: A cell (in Markdown) may only contain one line of text.

Comment thread html2markdown.py
def _iterate_cols(row, widths, aligns):
_widths = copy(widths)
_aligns = copy(aligns)
for i, cell in enumerate(row.children):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be verified that these are <td> or <th> elements only.

Comment thread html2markdown.py
if tag.children.next().name in ['thead', 'tbody']:
for block in tag.children:
block.unwrap()
for row in tag.children:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be verified that these are <tr> elements (only).

Comment thread html2markdown.py


def _iterate_cols(row, widths, aligns):
_widths = copy(widths)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point: I think the copies are unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants