pdf2docx.common.share module

Common methods.

class pdf2docx.common.share.BlockType(*values)

Bases: Enum

Block types.

FLOAT_IMAGE = 4
IMAGE = 1
LATTICE_TABLE = 2
STREAM_TABLE = 3
TEXT = 0
UNDEFINED = -1
class pdf2docx.common.share.IText

Bases: object

Text related interface considering text direction.

property is_horizontal_text

Check whether text direction is from left to right.

property is_mix_text

Check whether text direction is either from left to right or from bottom to top.

property is_vertical_text

Check whether text direction is from bottom to top.

property text_direction

Text direction is from left to right by default.

class pdf2docx.common.share.RectType(*values)

Bases: Enum

Shape type in context.

BORDER = 16
HIGHLIGHT = 1
SHADING = 32
STRIKE = 4
UNDERLINE = 2
class pdf2docx.common.share.TextAlignment(*values)

Bases: Enum

Text alignment.

Note

The difference between NONE and UNKNOWN:

  • NONE: none of left/right/center align -> need TAB stop

  • UNKNOWN: can’t decide, e.g. single line only

CENTER = 2
JUSTIFY = 4
LEFT = 1
NONE = -1
RIGHT = 3
UNKNOWN = 0
class pdf2docx.common.share.TextDirection(*values)

Bases: Enum

Text direction. * LEFT_RIGHT: from left to right within a line, and lines go from top to bottom * BOTTOM_TOP: from bottom to top within a line, and lines go from left to right * MIX : a mixture if LEFT_RIGHT and BOTTOM_TOP * IGNORE : neither LEFT_RIGHT nor BOTTOM_TOP

BOTTOM_TOP = 1
IGNORE = -1
LEFT_RIGHT = 0
MIX = 2
pdf2docx.common.share.cmyk_to_rgb(c: float, m: float, y: float, k: float, cmyk_scale: float = 100)

CMYK components to GRB value.

pdf2docx.common.share.debug_plot(title: str, show=True)

Plot the returned objects of inner function.

Args:

title (str): Page title. show (bool, optional): Don’t plot if show==False. Default to True.

Note

Prerequisite of the inner function:
  • the first argument is a BasePage instance.

  • the last argument is configuration parameters in dict type.

pdf2docx.common.share.decode(s: str)

Try to decode a unicode string.

pdf2docx.common.share.flatten(items, klass)

Yield items from any nested iterable.

pdf2docx.common.share.is_list_item(text, bullets=True, numbers=True)

Returns text if bullets is true and text is a bullet character, or numbers is true and text is not empty and consists entirely of digits 0-9. Otherwise returns None.

If bullets is True we use an internal list of bullet characters; otherwise it should be a list of integer Unicode values.

pdf2docx.common.share.is_number(str_number)

Whether can be converted to a float.

class pdf2docx.common.share.lazyproperty(func)

Bases: object

Calculate only once and cache property value.

pdf2docx.common.share.lower_round(number: float, ndigits: int = 0)

Round number to lower bound with specified digits, e.g. lower_round(1.26, 1)=1.2

pdf2docx.common.share.new_page(doc, width: float, height: float, title: str)

Insert a new page with given title.

Args:

doc (fitz.Document): pdf document object. width (float): Page width. height (float): Page height. title (str): Page title shown in page.

pdf2docx.common.share.rgb_component(srgb: int)

srgb value to R,G,B components, e.g. 16711680 -> (255, 0, 0).

Equal to PyMuPDF built-in method:

[int(255*x) for x in fitz.sRGB_to_pdf(x)]
pdf2docx.common.share.rgb_component_from_name(name: str = '')

Get a named RGB color (or random color) from fitz predefined colors, e.g. ‘red’ -> (1.0,0.0,0.0).

pdf2docx.common.share.rgb_to_value(rgb: list)

RGB components to decimal value, e.g. (1,0,0) -> 16711680.

pdf2docx.common.share.rgb_value(components: list)

Gray/RGB/CMYK mode components to color value.