pdf2docx.main module

Entry for pdf2docx command line.

class pdf2docx.main.PDF2DOCX

Bases: object

Command line interface for pdf2docx.

static convert(pdf_file: str, docx_file: str = None, password: str = None, start: int = 0, end: int = None, pages: list = None, **kwargs)

Convert pdf file to docx file.

Args:

pdf_file (str) : PDF filename to read from. docx_file (str, optional): docx filename to write to. Defaults to None. password (str): Password for encrypted pdf. Default to None if not encrypted. start (int, optional): First page to process. Defaults to 0. end (int, optional): Last page to process. Defaults to None. pages (list, optional): Range of pages, e.g. –pages=1,3,5. Defaults to None. kwargs (dict) : Configuration parameters.

Note

Refer to convert() for detailed description on above arguments.

static debug(pdf_file: str, password: str = None, page: int = 0, docx_file: str = None, debug_pdf: str = None, layout_file: str = 'layout.json', **kwargs)

Convert one PDF page and plot layout information for debugging.

Args:

pdf_file (str) : PDF filename to read from. password (str): Password for encrypted pdf. Default to None if not encrypted. page (int, optional): Page index to convert. docx_file (str, optional): docx filename to write to. debug_pdf (str, optional): Filename for new pdf storing layout information.

Defaults to same name with pdf file.

layout_file (str, optional): Filename for new json file storing parsed layout data.

Defaults to layout.json.

kwargs (dict) : Configuration parameters.

static gui()

Simple user interface.

static table(pdf_file, password: str = None, start: int = 0, end: int = None, pages: list = None, **kwargs)

Extract table content from pdf pages.

Args:

pdf_file (str) : PDF filename to read from. password (str): Password for encrypted pdf. Default to None if not encrypted. start (int, optional): First page to process. Defaults to 0. end (int, optional): Last page to process. Defaults to None. pages (list, optional): Range of pages, e.g. –pages=1,3,5. Defaults to None.

pdf2docx.main.main()

Command line entry.

pdf2docx.main.parse(pdf_file: str, docx_file: str = None, password: str = None, start: int = 0, end: int = None, pages: list = None, **kwargs)

Convert pdf file to docx file.

Args:

pdf_file (str) : PDF filename to read from. docx_file (str, optional): docx filename to write to. Defaults to None. password (str): Password for encrypted pdf. Default to None if not encrypted. start (int, optional): First page to process. Defaults to 0. end (int, optional): Last page to process. Defaults to None. pages (list, optional): Range of pages, e.g. –pages=1,3,5. Defaults to None. kwargs (dict) : Configuration parameters.

Note

Refer to convert() for detailed description on above arguments.