Conversation
|
Thanks for this! I'll be able to check it out in a week or so
Den tors 25 juli 2019 15:09rmendocna <notifications@github.com> skrev:
… Hi,
First 'pull-request' in quite a long, not sure how this works.
Basically suggesting a non-recursive way of markdownifying an html table
with plain output in each cell.
For my own purposed I aimed at preserving column widths across rows and
observing any given style="text-align: xxx" cell attribute.
Cell contents are rendered as plain tag.text but I believe you can easily
improve this to deliver markdownified contents.
------------------------------
You can view, comment on, or merge this pull request online at:
#10
Commit Summary
- add support for plain table conversion
File Changes
- *M* html2markdown.py
<https://github.com/dlon/html2markdown/pull/10/files#diff-0> (79)
- *M* tests.py
<https://github.com/dlon/html2markdown/pull/10/files#diff-1> (11)
Patch Links:
- https://github.com/dlon/html2markdown/pull/10.patch
- https://github.com/dlon/html2markdown/pull/10.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#10?email_source=notifications&email_token=AAUH2LPFEGOW4UJFX73KIJTQBGQXNA5CNFSM4IG2O43KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HBPG2NQ>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAUH2LPS7OJQOHTETESZIHTQBGQXNANCNFSM4IG2O43A>
.
|
dlon
left a comment
There was a problem hiding this comment.
Thank you for the patch! Leaving some notes (for myself, if you don't want to update it).
It fails with Python 3 at the moment, though it's easy to fix.
It will have to be stricter before being merged into master. For example, tbody and thead are ignored at the moment, and if <th> isn't present, an invalid table is still generated.
Finally, since there is no standard table syntax in Markdown, the conversion of tables should perhaps be optional.
| widths = [] | ||
| aligns = [] | ||
| output = [] | ||
| if tag.children.next().name in ['thead', 'tbody']: |
There was a problem hiding this comment.
tag.children.next().name fails with Python 3.7. children[0].name may suffice.
| widths = [] | ||
| aligns = [] | ||
| output = [] | ||
| if tag.children.next().name in ['thead', 'tbody']: |
There was a problem hiding this comment.
GFM tables consist of a header of one row, with the remaining rows lying within the body, so tables should probably only be allowed if they conform to this layout:
<table>
<thead>
<tr>
<th>One</th>
<th>Two</th>
<!-- <th> more columns... </th> -->
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
</tr>
<!-- <tr> ... </tr> -->
</tbody>
</table>| tag.insert_before('`` ') | ||
| tag.insert_after(' ``') | ||
| tag.unwrap() | ||
| elif tag.name == 'table': |
There was a problem hiding this comment.
Some tags within tables should be processed, eg <strong>. (As you noted.)
Caveat: A cell (in Markdown) may only contain one line of text.
| def _iterate_cols(row, widths, aligns): | ||
| _widths = copy(widths) | ||
| _aligns = copy(aligns) | ||
| for i, cell in enumerate(row.children): |
There was a problem hiding this comment.
It must be verified that these are <td> or <th> elements only.
| if tag.children.next().name in ['thead', 'tbody']: | ||
| for block in tag.children: | ||
| block.unwrap() | ||
| for row in tag.children: |
There was a problem hiding this comment.
It must be verified that these are <tr> elements (only).
|
|
||
|
|
||
| def _iterate_cols(row, widths, aligns): | ||
| _widths = copy(widths) |
There was a problem hiding this comment.
Minor point: I think the copies are unnecessary.
Hi,
First 'pull-request' in quite a long, not sure how this works.
Basically suggesting a non-recursive way of markdownifying an html table with plain output in each cell.
For my own purposed I aimed at preserving column widths across rows and observing any given
style="text-align: xxx"cell attribute.Cell contents are rendered as plain tag.text but I believe you can easily improve this to deliver markdownified contents.