pdf2docx.page.Page module

Page object parsed with PDF raw dict.

In addition to base structure described in RawPage, some new features, e.g. sections, table block, are also included. Page elements structure:

{
    "id": 0, # page index
    "width" : w,
    "height": h,
    "margin": [left, right, top, bottom],
    "sections": [{
        ... # section properties
    }, ...],
    "floats": [{
        ... # floating picture
    }, ...]
}
class pdf2docx.page.Page.Page(id: int = -1, skip_parsing: bool = True, width: float = 0.0, height: float = 0.0, header: str = None, footer: str = None, margin: tuple = None, sections: Sections = None, float_images: BaseCollection = None)

Bases: BasePage

Object representing the whole page, e.g. margins, sections.

extract_tables(**settings)

Extract content from tables (top layout only).

Note

Before running this method, the page layout must be either parsed from source page or restored from parsed data.

property finalized
make_docx(doc)

Set page size, margin, and create page.

Note

Before running this method, the page layout must be either parsed from source page or restored from parsed data.

Args:

doc (Document): python-docx document object

parse(**kwargs)
restore(data: dict)

Restore Layout from parsed results.

store()

Store parsed layout in dict format.