pdf2docx.text.Line module¶
Text Line objects based on PDF raw dict extracted with PyMuPDF.
Data structure of line in text block referring to this link:
{
'bbox': (x0,y0,x1,y1),
'wmode': m,
'dir': [x,y],
'spans': [ spans ]
}
- class pdf2docx.text.Line.Line(raw: dict = None)¶
Bases:
ElementObject representing a line in text block.
- add(span_or_list)¶
Add span list to current Line.
- Args:
span_or_list (Span, Iterable): TextSpan or TextSpan list to add.
- property image_spans¶
Get image spans in this Line.
- intersects(rect)¶
Create new Line object with spans contained in given bbox.
- Args:
rect (fitz.Rect): Target bbox.
- Returns:
Line: The created Line instance.
- make_docx(p)¶
Create docx line, i.e. a run in
python-docx.
- property raw_text¶
Joining span text with image ignored.
- store()¶
Store properties in raw dict.
- strip()¶
Remove redundant blanks at the begin/end span.
- property text¶
Joining span text. Note image is translated to a placeholder
<image>.
- property text_direction¶
Get text direction. Consider
LEFT_RIGHTandLEFT_RIGHTonly.- Returns:
TextDirection: Text direction of this line.
- property white_space_only¶
If this line contains only white space or not. If True, this line is safe to be removed.