extension for extract hyperlinks by badbye · Pull Request #10 · ankushshah89/python-docx2txt

badbye · 2017-03-02T09:36:22Z

Hi, this request tries to extract the hyperlinks in docx files.
Related issue: #9

I have to say that I did a lot of changes. However, it works as usual. I will really appreciate if you merge it.

Usage

I create a DOCReader class which store all the information of a docx file. After running the process method, the data attribute will store the text of header, footer, links, document.

obj = DOCReader(docx_file)
text = obj.process()
print obj.data['links']  # a list of tuple (text, hyperlink)

How it works

The hyperlinks are stored in the word/_rels/document.xml.rels file. All the links has a Id attribute, for example:

<Relationship Id="rId14" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="https://www.google.com" TargetMode="External"/>

The corresponding texts are stored in the other documents, which also have the Id attributes.
When parsing the other documents (the xml2text method), I add some code to find the corresponding text.

Fix a typo in README.rst

badbye and others added 12 commits March 2, 2017 17:09

extension for links

7d6dd2e

dict->tuple

7ddc742

fix encoding peoblem

793a4c4

pydocx

9facf37

docxpy

7bf8694

travis

78935b6

build status

3c671ca

...

9e4e9a6

py 2.7 or py3.3

ea4da1b

~

2391aed

Fix a typo in README.rst

df393f3

Merge pull request #1 from seiteta/patch-1

1006301

Fix a typo in README.rst

badbye closed this Mar 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extension for extract hyperlinks#10

extension for extract hyperlinks#10
badbye wants to merge 12 commits intoankushshah89:masterfrom
badbye:master

badbye commented Mar 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

badbye commented Mar 2, 2017

Usage

How it works

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants