pdf2docx.common.Element module¶
Object with a bounding box, e.g. Block, Line, Span.
Based on PyMuPDF, the coordinates (e.g. bbox of page.get_text('rawdict')) are generally
provided relative to the un-rotated page; while this pdf2docx library works under real page
coordinate system, i.e. with rotation considered. So, any instances created by this Class are
always applied a rotation matrix automatically.
Therefore, the bbox parameter used to create Element instance MUST be relative to un-rotated
CS. If final coordinates are provided, should update it after creating an empty object:
Element().update_bbox(final_bbox)
Note
An exception is page.get_drawings(), the coordinates are converted to real page CS already.
- class pdf2docx.common.Element.Element(raw: dict = None, parent=None)¶
Bases:
ITextBoundary box with attribute in fitz.Rect type.
- ROTATION_MATRIX = Matrix(1.0, 0.0, -0.0, 1.0, 0.0, 0.0)¶
- contains(e: Element, threshold: float = 1.0)¶
Whether given element is contained in this instance, with margin considered.
- Args:
e (Element): Target element threshold (float, optional): Intersection rate.
Defaults to 1.0. The larger, the stricter.
- Returns:
bool: [description]
- copy()¶
make a deep copy.
- get_expand_bbox(dt: float)¶
Get expanded bbox with margin in both x- and y- direction.
- Args:
dt (float): Expanding margin.
- Returns:
fitz.Rect: Expanded bbox.
Note
This method creates a new bbox, rather than changing the bbox of itself.
- get_main_bbox(e, threshold: float = 0.95)¶
If the intersection with
eexceeds the threshold, return the union of these two elements; else return None.- Args:
e (Element): Target element. threshold (float, optional): Intersection rate. Defaults to 0.95.
- Returns:
fitz.Rect: Union bbox or None.
- horizontally_align_with(e, factor: float = 0.0, text_direction: bool = True)¶
Check whether two Element instances have enough intersection in horizontal direction, i.e. along the reading direction.
- Args:
e (Element): Element to check with factor (float, optional): threshold of overlap ratio, the larger it is, the higher
probability the two bbox-es are aligned.
text_direction (bool, optional): consider text direction or not. True by default.
Examples:
+--------------+ | | L1 +--------------------+ +--------------+ | | L2 +--------------------+
An enough intersection is defined based on the minimum width of two boxes:
L1+L2-L>factor*min(L1,L2)
- in_same_row(e)¶
Check whether in same row/line with specified Element instance. With text direction considered.
Taking horizontal text as an example:
yes: the bottom edge of each box is lower than the centerline of the other one;
otherwise, not in same row.
- Args:
e (Element): Target object.
Note
The difference to method
horizontally_align_with: they may not in same line, though aligned horizontally.
- property parent¶
- plot(page, stroke: tuple = (0, 0, 0), width: float = 0.5, fill: tuple = None, dashes: str = None)¶
Plot bbox in PDF page for debug purpose.
- classmethod pure_rotation_matrix()¶
Pure rotation matrix used for calculating text direction after rotation.
- classmethod set_rotation_matrix(rotation_matrix)¶
Set global rotation matrix.
- Args:
Rotation_matrix (fitz.Matrix): target matrix
- store()¶
Store properties in raw dict.
- union_bbox(e)¶
Update current bbox to the union with specified Element.
- Args:
e (Element): The target to get union
- Returns:
Element: self
- update_bbox(rect)¶
Update current bbox to specified
rect.- Args:
- rect (fitz.Rect or list): bbox-like
(x0, y0, x1, y1), in real page CS (with rotation considered).
- rect (fitz.Rect or list): bbox-like
- vertically_align_with(e, factor: float = 0.0, text_direction: bool = True)¶
Check whether two Element instances have enough intersection in vertical direction, i.e. perpendicular to reading direction.
- Args:
e (Element): Object to check with factor (float, optional): Threshold of overlap ratio, the larger it is, the higher
probability the two bbox-es are aligned.
text_direction (bool, optional): Consider text direction or not. True by default.
- Returns:
bool: [description]
Examples:
+--------------+ | | +--------------+ L1 +-------------------+ | | +-------------------+ L2
An enough intersection is defined based on the minimum width of two boxes:
L1+L2-L>factor*min(L1,L2)