pdf2docx.text.Char module¶
Char object based on PDF raw dict extracted with PyMuPDF.
Data structure refer to this link:
{
'bbox' : (x0, y0, x1, y1),
'c' : str,
'origin': (x,y)
}
- class pdf2docx.text.Char.Char(raw: dict = None)¶
Bases:
ElementObject representing a character.
- contained_in_rect(rect: Shape, horizontal: bool = True)¶
Detect whether it locates in a rect.
- Args:
rect (Shape): Target rect to check. horizontal (bool, optional): Text direction is horizontal if True. Defaults to True.
- Returns:
bool: Whether a Char locates in target rect.
Note
It’s considered as contained in the target rect if the intersection is larger than half of the char bbox.
- store()¶
Store properties in raw dict.