pdf2docx.page.Page module¶
Page object parsed with PDF raw dict.
In addition to base structure described in RawPage,
some new features, e.g. sections, table block, are also included.
Page elements structure:
{
"id": 0, # page index
"width" : w,
"height": h,
"margin": [left, right, top, bottom],
"sections": [{
... # section properties
}, ...],
"floats": [{
... # floating picture
}, ...]
}
- class pdf2docx.page.Page.Page(id: int = -1, skip_parsing: bool = True, width: float = 0.0, height: float = 0.0, header: str = None, footer: str = None, margin: tuple = None, sections: Sections = None, float_images: BaseCollection = None)¶
Bases:
BasePageObject representing the whole page, e.g. margins, sections.
- extract_tables(**settings)¶
Extract content from tables (top layout only).
Note
Before running this method, the page layout must be either parsed from source page or restored from parsed data.
- property finalized¶
- make_docx(doc)¶
Set page size, margin, and create page.
Note
Before running this method, the page layout must be either parsed from source page or restored from parsed data.
- Args:
doc (Document):
python-docxdocument object
- parse(**kwargs)¶
- restore(data: dict)¶
Restore Layout from parsed results.
- store()¶
Store parsed layout in dict format.