- Remove Python 3.8 support.
- Refactor File.path inference to support rare files with rels in a
word/glossarydirectory. - Test Python 3.13 support.
- edit and save rels files. You can now access the
rels_elementattribute of File instances to update hyperlink urls and other values. These will be saves on DocxReader.save(). This is an advanced feature and will not change text extraction.
- skip elements with invalid tags. Issue a warning. These are usually the result of faulty conversion software.
- add an
elemattribute toParinstances, returning the xml element from which the paragraph was generated
- The html and duplicate_merged_cells arguments to docx2python are now keyword only.
- Inserts empty cells and whitespace into exported tables.
- Removed IndexedItem class which was probably only used internally, but it was a part of the public interface.
- Function get_text was a public function. It mirrored the identical flatten_text from the docx_text module.
- This change breaks the way paragraph styles (internally pStyle) were handled.
The input argument
do_pStylewill no now raise an error. - This doesn't change the interface and doesn't break any of my tests, but it took a lot of refactoring to make this change and it may break some unofficial patches I've made for clients.
- improve type hints for DocxContent properties
- insert blank cells to match gridSpan
- add list_position attribute for Par instances
- explicate return types in iterators
- use input file namespace
- eliminate double html tags for paragraph styles
- make boolean args keyword only
- use pathlib in lieu of os.path
- remove Any types from DocxContent close method
- convert HtmlFormatter lambdas to defs
- specialize join_leaves into join_runs
- insert html when extracting text
- make queuing text outside paragraphs explicit
- make _open_pars private
- stop accepting extract_image bool argument
- default duplicate_merged_cells to True
- remove unused helper functions
- use pathlib in conftest
- expose numPr, ilvl, and number in BulletGenerator
- remove redundant functions
- remove do_pStyle argument from flatten_text
- remove function get_text from iterators module
- store content table as nested list of Par instances
- move xml2html_format attrib from TagRunner to DepthCollector
- factor out DepthCollector.item_depth param
- make set_caret recursive
- remove unused
styledparam from insert_text_as_new_run - remove relative imports in src modules
- remove relative imports in src modules
- move paragraphs to main dependencies
- support checkox "true"/"false" values
- extract hyperlinks in comments
- remove open_par limit in DepthCollector
- return empty list when comments fails
- comb full-text and line-text formatting
- refactor element text extractors into methods
- extract comments from docx files
- capture comment ranges
- expose DepthCollector instance for File object
- expose DepthCollector instance when get_text
- capture hyperlink anchors
- sync commitizen and poetry version numbers
- update poetry lock file
- update and pass pre-commit hooks
- preserve newlines in replace_docx_text
- add py.typed for typecheckers
- add argument duplicate_merged_cells for docx tables
- add context manager protocol
- allow type IOBytes for filename arguments
- add and mostly pass pre-commit hooks
- remove Python 3.7 support
- move pre-commit to dev requirement