pdf2docx.shape.Path module¶
Objects representing PDF path (stroke and filling) extracted from pdf drawings and annotations.
Data structure based on results of page.get_drawings():
{
'color': (x,x,x) or None, # stroke color
'fill' : (x,x,x) or None, # fill color
'width': float, # line width
'closePath': bool, # whether to connect last and first point
'rect' : rect, # page area covered by this path
'items': [ # list of draw commands: lines, rectangle or curves.
("l", p1, p2), # a line from p1 to p2
("c", p1, p2, p3, p4), # cubic Bézier curve from p1 to p4, p2 and p3
# are the control points
("re", rect), # a rect represented with two diagonal points
("qu", quad) # a quad represented with four corner points
],
...
}
- References:
Note
The coordinates extracted by page.get_drawings() is based on real page CS,
i.e. with rotation considered. This is different from page.get_text('rawdict').
- class pdf2docx.shape.Path.C(item)¶
Bases:
SegmentBezier curve path with source
("c", p1, p2, p3, p4).
- class pdf2docx.shape.Path.L(item)¶
Bases:
SegmentLine path with source
("l", p1, p2).- property length¶
Length of line.
- to_strokes(width: float, color: list)¶
Convert to stroke dict.
- Args:
width (float): Specify width for the stroke. color (list): Specify color for the stroke.
- Returns:
list: A list of
Strokedicts.
Note
A line corresponds to one stroke, but considering the consistence, the return stroke dict is append to a list. So, the length of list is always 1.
- class pdf2docx.shape.Path.Path(raw: dict)¶
Bases:
objectPath extracted from PDF, consist of one or more
Segments.- property is_fill¶
- property is_iso_oriented¶
It is iso-oriented when all contained segments are iso-oriented.
- property is_stroke¶
- plot(canvas)¶
Plot path for debug purpose.
- Args:
canvas:
PyMuPDFdrawing canvas bypage.new_shape().
Reference:
- to_shapes()¶
Convert path to
Shaperaw dicts.- Returns:
list: A list of
Shapedict.
- class pdf2docx.shape.Path.R(item)¶
Bases:
SegmentRect path with source
("re", rect).- to_strokes(width: float, color: list)¶
Convert each edge to stroke dict.
- Args:
width (float): Specify width for the stroke. color (list): Specify color for the stroke.
- Returns:
list: A list of
Strokedicts.
Note
One Rect path is converted to a list of 4 stroke dicts.
- class pdf2docx.shape.Path.Segment(item)¶
Bases:
objectA segment of path, e.g. a line or a rectangle or a curve.
- to_strokes(width: float, color: list)¶
- class pdf2docx.shape.Path.Segments(items: list, close_path=False)¶
Bases:
objectA sub-path composed of one or more segments.
- property area¶
Calculate segments area with Green formulas. Note the boundary of Bezier curve is simplified with its control points.
- property bbox¶
Calculate segments bbox.
- property is_iso_oriented¶
ISO-oriented criterion: the ratio of real area to bbox exceeds 0.9.
- property points¶
Connected points of segments.
- to_fill(color: list)¶
Convert segment closed area to a
Filldict.- Args:
color (list): Specify fill color.
- Returns:
dict:
Filldict.
- to_strokes(width: float, color: list)¶
Convert each segment to a
Strokedict.- Args:
width (float): Specify stroke width. color (list): Specify stroke color.
- Returns:
list: A list of
Strokedicts.