pdf2docx.table.TableBlock module¶
Table block object parsed from raw image and text blocks.
Data Structure:
{
'type': int
'bbox': (x0, y0, x1, y1),
'rows': [
{
"bbox": (x0, y0, x1, y1),
"height": float,
"cells": [
{
'bbox': (x0, y0, x1, y1),
'border_color': (sRGB,,,), # top, right, bottom, left
'bg_color': sRGB,
'border_width': (,,,),
'merged_cells': (x,y), # this is the bottom-right cell of merged region: x rows, y cols
'blocks': [ {text blocks} ]
}, # end of cell
{},
None, # merged cell
...
]
}, # end of row
{...} # more rows
] # end of row
}
- class pdf2docx.table.TableBlock.TableBlock(raw: dict = None)¶
Bases:
BlockTable block.
- append(row: Row)¶
Append row to table and update bbox accordingly.
- Args:
row (Row): Target row to add.
- assign_blocks(blocks: list)¶
Assign
blocksto associated cell.- Args:
blocks (list): A list of text/table blocks.
- assign_shapes(shapes: list)¶
Assign
shapesto associated cell.- Args:
shapes (list): A list of Shape.
- make_docx(table)¶
Create docx table.
- Args:
table (Table):
python-docxtable instance.
- property num_cols¶
Count of columns.
- property num_rows¶
Count of rows.
- property outer_bbox¶
Outer bbox with border considered.
- parse(**settings)¶
Parse layout under cell level.
- Args:
settings (dict): Layout parsing parameters.
- plot(page)¶
Plot table block, i.e. cell/line/span, for debug purpose.
- Args:
page (fitz.Page): pdf page. content (bool): Plot text blocks contained in cells if True. style (bool): Plot cell style if True, e.g. border width, shading. color (bool): Plot border stroke color if
style=False.
- store()¶
Store attributes in json format.
- property text¶
Get text contained in each cell.
- Returns:
list: 2D-list with each element representing text in cell.