pdf2docx.text.Char module

Char object based on PDF raw dict extracted with PyMuPDF.

Data structure refer to this link:

{
    'bbox'  : (x0, y0, x1, y1), 
    'c'     : str, 
    'origin': (x,y)
}
class pdf2docx.text.Char.Char(raw: dict = None)

Bases: Element

Object representing a character.

contained_in_rect(rect: Shape, horizontal: bool = True)

Detect whether it locates in a rect.

Args:

rect (Shape): Target rect to check. horizontal (bool, optional): Text direction is horizontal if True. Defaults to True.

Returns:

bool: Whether a Char locates in target rect.

Note

It’s considered as contained in the target rect if the intersection is larger than half of the char bbox.

store()

Store properties in raw dict.