pdf2docx.main module¶
Entry for pdf2docx command line.
- class pdf2docx.main.PDF2DOCX¶
Bases:
objectCommand line interface for
pdf2docx.- static convert(pdf_file: str, docx_file: str = None, password: str = None, start: int = 0, end: int = None, pages: list = None, **kwargs)¶
Convert pdf file to docx file.
- Args:
pdf_file (str) : PDF filename to read from. docx_file (str, optional): docx filename to write to. Defaults to None. password (str): Password for encrypted pdf. Default to None if not encrypted. start (int, optional): First page to process. Defaults to 0. end (int, optional): Last page to process. Defaults to None. pages (list, optional): Range of pages, e.g. –pages=1,3,5. Defaults to None. kwargs (dict) : Configuration parameters.
Note
Refer to
convert()for detailed description on above arguments.
- static debug(pdf_file: str, password: str = None, page: int = 0, docx_file: str = None, debug_pdf: str = None, layout_file: str = 'layout.json', **kwargs)¶
Convert one PDF page and plot layout information for debugging.
- Args:
pdf_file (str) : PDF filename to read from. password (str): Password for encrypted pdf. Default to None if not encrypted. page (int, optional): Page index to convert. docx_file (str, optional): docx filename to write to. debug_pdf (str, optional): Filename for new pdf storing layout information.
Defaults to same name with pdf file.
- layout_file (str, optional): Filename for new json file storing parsed layout data.
Defaults to
layout.json.
kwargs (dict) : Configuration parameters.
- static gui()¶
Simple user interface.
- static table(pdf_file, password: str = None, start: int = 0, end: int = None, pages: list = None, **kwargs)¶
Extract table content from pdf pages.
- Args:
pdf_file (str) : PDF filename to read from. password (str): Password for encrypted pdf. Default to None if not encrypted. start (int, optional): First page to process. Defaults to 0. end (int, optional): Last page to process. Defaults to None. pages (list, optional): Range of pages, e.g. –pages=1,3,5. Defaults to None.
- pdf2docx.main.main()¶
Command line entry.
- pdf2docx.main.parse(pdf_file: str, docx_file: str = None, password: str = None, start: int = 0, end: int = None, pages: list = None, **kwargs)¶
Convert pdf file to docx file.
- Args:
pdf_file (str) : PDF filename to read from. docx_file (str, optional): docx filename to write to. Defaults to None. password (str): Password for encrypted pdf. Default to None if not encrypted. start (int, optional): First page to process. Defaults to 0. end (int, optional): Last page to process. Defaults to None. pages (list, optional): Range of pages, e.g. –pages=1,3,5. Defaults to None. kwargs (dict) : Configuration parameters.
Note
Refer to
convert()for detailed description on above arguments.