pdf2docx.shape.Shapes module

A group of Shape instances.

class pdf2docx.shape.Shapes.Shapes(instances: list = None, parent=None)

Bases: ElementCollection

A collection of Shape instances: Stroke or Fill.

assign_to_tables(tables: list)

Add Shape to associated cells of given tables.

Args:

tables (list): A list of TableBlock instances.

clean_up(max_border_width: float, shape_min_dimension: float)

Clean rectangles.

  • Delete shapes out of page.

  • Delete small shapes (either width or height).

  • Merge shapes with same filling color.

  • Detect semantic type.

Args:

max_border_width (float): The max border width. shape_min_dimension (float): Ignore shape if both width and height

is lower than this value.

property fillings

Fill Shapes, including cell shading and highlight.

Hyperlink Shapes.

plot(page)

Plot shapes for debug purpose. Different colors are used to display the shapes in detected semantic types, e.g. yellow for text based shape (stroke, underline and highlight). Due to overlaps between Stroke and Fill related groups, some shapes are plot twice.

Args:

page (fitz.Page): pdf page.

restore(raws: list)

Clean current instances and restore them from source dicts.

property strokes

Stroke Shapes, including table border, text underline and strike-through.

property table_fillings

Potential table shadings.

property table_strokes

Potential table borders.

property text_style_shapes

Potential text style based shapes, e.g. underline, strike-through, highlight and hyperlink.