Python 如何在图像中找到类似表格的结构
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50829874/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to find table like structure in image
提问by Mohamed Thasin ah
I have different type of invoice files, I want to find table in each invoice file. In this table position is not constant. So I go for image processing. First I tried to convert my invoice into image, then I found contour based on table borders, Finally I can catch table position. For the task I used below code.
我有不同类型的发票文件,我想在每个发票文件中找到表格。在这张桌子上的位置不是恒定的。所以我去图像处理。首先,我尝试将发票转换为图像,然后根据表格边框找到轮廓,最后我可以捕捉表格位置。对于我使用以下代码的任务。
with Image(page) as page_image:
page_image.alpha_channel = False #eliminates transperancy
img_buffer=np.asarray(bytearray(page_image.make_blob()), dtype=np.uint8)
img = cv2.imdecode(img_buffer, cv2.IMREAD_UNCHANGED)
ret, thresh = cv2.threshold(img, 127, 255, 0)
im2, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
margin=[]
for contour in contours:
# get rectangle bounding contour
[x, y, w, h] = cv2.boundingRect(contour)
# Don't plot small false positives that aren't text
if (w >thresh1 and h> thresh2):
margin.append([x, y, x + w, y + h])
#data cleanup on margin to extract required position values.
In this code thresh1
, thresh2
i'll update based on the file.
在此代码中thresh1
,thresh2
我将根据文件进行更新。
So using this code I can successfully read positions of tables in images, using this position i'll work on my invoice pdf file. For example
因此,使用此代码我可以成功读取图像中表格的位置,使用此位置我将处理我的发票 pdf 文件。例如
Sample 1:
示例 1:
Sample 2:
示例 2:
Output:
输出:
Sample 1:
示例 1:
Sample 2:
示例 2:
Sample 3:
示例 3:
But, now I have a new format which doesn't have any borders but it's a table. How to solve this? Because my entire operation depends only on borders of the tables. But now I don't have a table borders. How can I achieve this? I don't have any idea to move out from this problem. My question is, Is there any way to find position based on table structure?.
但是,现在我有了一个没有任何边框但它是一个表格的新格式。如何解决这个问题?因为我的整个操作仅取决于表格的边框。但是现在我没有表格边框。我怎样才能做到这一点?我没有任何想法摆脱这个问题。我的问题是,有没有办法根据表结构找到位置?
For example My problem input looks like below:
例如我的问题输入如下所示:
I would like to find its position like below:
How can I solve this? It is really appreciable to give me an idea to solve the problem.
我该如何解决这个问题?给我一个解决问题的想法真的很值得。
Thanks in advance.
提前致谢。
回答by Dmytro
Vaibhav is right. You can experiment with the different morphological transforms to extract or group pixels into different shapes, lines, etc. For example, the approach can be the following:
瓦巴夫是对的。您可以尝试使用不同的形态变换将像素提取或分组为不同的形状、线条等。例如,方法如下:
- Start from the Dilation to convert the text into the solid spots.
- Then apply the findContours function as a next step to find text bounding boxes.
- After having the text bounding boxes it is possible to apply some heuristics algorithm to cluster the text boxes into groups by their coordinates. This way you can find a groups of text areas aligned into rows and columns.
- Then you can apply sorting by x and y coordinates and/or some analysis to the groups to try to find if the grouped text boxes can form a table.
- 从扩张开始,将文本转换为实心点。
- 然后应用 findContours 函数作为下一步查找文本边界框。
- 在拥有文本边界框之后,可以应用一些启发式算法来根据文本框的坐标将文本框分组。通过这种方式,您可以找到一组对齐成行和列的文本区域。
- 然后,您可以按 x 和 y 坐标排序和/或对组进行一些分析,以尝试查找分组的文本框是否可以形成表格。
I wrote a small sample illustrating the idea. I hope the code is self explanatory. I've put some comments there too.
我写了一个小样本来说明这个想法。我希望代码是不言自明的。我也在那里发表了一些评论。
import os
import cv2
import imutils
# This only works if there's only one table on a page
# Important parameters:
# - morph_size
# - min_text_height_limit
# - max_text_height_limit
# - cell_threshold
# - min_columns
def pre_process_image(img, save_in_file, morph_size=(8, 8)):
# get rid of the color
pre = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Otsu threshold
pre = cv2.threshold(pre, 250, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# dilate the text to make it solid spot
cpy = pre.copy()
struct = cv2.getStructuringElement(cv2.MORPH_RECT, morph_size)
cpy = cv2.dilate(~cpy, struct, anchor=(-1, -1), iterations=1)
pre = ~cpy
if save_in_file is not None:
cv2.imwrite(save_in_file, pre)
return pre
def find_text_boxes(pre, min_text_height_limit=6, max_text_height_limit=40):
# Looking for the text spots contours
# OpenCV 3
# img, contours, hierarchy = cv2.findContours(pre, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# OpenCV 4
contours, hierarchy = cv2.findContours(pre, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# Getting the texts bounding boxes based on the text size assumptions
boxes = []
for contour in contours:
box = cv2.boundingRect(contour)
h = box[3]
if min_text_height_limit < h < max_text_height_limit:
boxes.append(box)
return boxes
def find_table_in_boxes(boxes, cell_threshold=10, min_columns=2):
rows = {}
cols = {}
# Clustering the bounding boxes by their positions
for box in boxes:
(x, y, w, h) = box
col_key = x // cell_threshold
row_key = y // cell_threshold
cols[row_key] = [box] if col_key not in cols else cols[col_key] + [box]
rows[row_key] = [box] if row_key not in rows else rows[row_key] + [box]
# Filtering out the clusters having less than 2 cols
table_cells = list(filter(lambda r: len(r) >= min_columns, rows.values()))
# Sorting the row cells by x coord
table_cells = [list(sorted(tb)) for tb in table_cells]
# Sorting rows by the y coord
table_cells = list(sorted(table_cells, key=lambda r: r[0][1]))
return table_cells
def build_lines(table_cells):
if table_cells is None or len(table_cells) <= 0:
return [], []
max_last_col_width_row = max(table_cells, key=lambda b: b[-1][2])
max_x = max_last_col_width_row[-1][0] + max_last_col_width_row[-1][2]
max_last_row_height_box = max(table_cells[-1], key=lambda b: b[3])
max_y = max_last_row_height_box[1] + max_last_row_height_box[3]
hor_lines = []
ver_lines = []
for box in table_cells:
x = box[0][0]
y = box[0][1]
hor_lines.append((x, y, max_x, y))
for box in table_cells[0]:
x = box[0]
y = box[1]
ver_lines.append((x, y, x, max_y))
(x, y, w, h) = table_cells[0][-1]
ver_lines.append((max_x, y, max_x, max_y))
(x, y, w, h) = table_cells[0][0]
hor_lines.append((x, max_y, max_x, max_y))
return hor_lines, ver_lines
if __name__ == "__main__":
in_file = os.path.join("data", "page.jpg")
pre_file = os.path.join("data", "pre.png")
out_file = os.path.join("data", "out.png")
img = cv2.imread(os.path.join(in_file))
pre_processed = pre_process_image(img, pre_file)
text_boxes = find_text_boxes(pre_processed)
cells = find_table_in_boxes(text_boxes)
hor_lines, ver_lines = build_lines(cells)
# Visualize the result
vis = img.copy()
# for box in text_boxes:
# (x, y, w, h) = box
# cv2.rectangle(vis, (x, y), (x + w - 2, y + h - 2), (0, 255, 0), 1)
for line in hor_lines:
[x1, y1, x2, y2] = line
cv2.line(vis, (x1, y1), (x2, y2), (0, 0, 255), 1)
for line in ver_lines:
[x1, y1, x2, y2] = line
cv2.line(vis, (x1, y1), (x2, y2), (0, 0, 255), 1)
cv2.imwrite(out_file, vis)
I've got the following output:
我有以下输出:
Of course to make the algorithm more robust and applicable to a variety of different input images it has to be adjusted correspondingly.
当然,为了使算法更健壮并适用于各种不同的输入图像,它必须进行相应的调整。
Update:Updated the code with respect to the OpenCV API changes for findContours
. If you have older version of OpenCV installed - use the corresponding call. Related post.
更新:更新了关于 OpenCV API 更改的代码findContours
。如果您安装了旧版本的 OpenCV - 使用相应的调用。相关帖子。
回答by Vaibhav Mehrotra
You can try applying some morphological transforms (such as Dilation, Erosion or Gaussian Blur) as a pre-processing step before your findContours function
您可以尝试在 findContours 函数之前应用一些形态变换(例如膨胀、侵蚀或高斯模糊)作为预处理步骤
For example
例如
blur = cv2.GaussianBlur(g, (3, 3), 0)
ret, thresh1 = cv2.threshold(blur, 150, 255, cv2.THRESH_BINARY)
bitwise = cv2.bitwise_not(thresh1)
erosion = cv2.erode(bitwise, np.ones((1, 1) ,np.uint8), iterations=5)
dilation = cv2.dilate(erosion, np.ones((3, 3) ,np.uint8), iterations=5)
The last argument, iterations shows the degree of dilation/erosion that will take place (in your case, on the text). Having a small value will results in small independent contours even within an alphabet and large values will club many nearby elements. You need to find the ideal value so that only that block of your image gets.
最后一个参数,迭代显示将发生的膨胀/侵蚀程度(在您的情况下,在文本上)。即使在字母表中,具有较小的值也会导致较小的独立轮廓,而较大的值将包含许多附近的元素。您需要找到理想的值,以便只有您的图像块获得。
Please note that I've taken 150 as the threshold parameter because I've been working on extracting text from images with varying backgrounds and this worked out better. You can choose to continue with the value you've taken since it's a black & white image.
请注意,我将 150 作为阈值参数,因为我一直致力于从具有不同背景的图像中提取文本,并且效果更好。您可以选择继续使用您所取的值,因为它是黑白图像。
回答by Devashish Prasad
There are many types of tables in the document images with too much variations and layouts. No matter how many rules you write, there will always appear a table for which your rules will fail. This types of problems are genrally solved using ML(Machine Learning) based solutions. You can find many pre-implemented codes on github for solving the problem of detecting tables in the images using ML or DL (Deep Learning).
文档图像中的表格类型很多,变化和布局太多。不管你写了多少规则,总会出现一张你的规则会失败的表。这类问题一般使用基于 ML(机器学习)的解决方案来解决。您可以在 github 上找到许多预先实现的代码,用于解决使用 ML 或 DL(深度学习)检测图像中的表的问题。
Here is my code along with the deep learning models, the model can detect various types of tables as well as the structure cells from the tables: https://github.com/DevashishPrasad/CascadeTabNet
这是我的代码以及深度学习模型,该模型可以检测各种类型的表格以及表格中的结构单元:https: //github.com/DevashishPrasad/CascadeTabNet
The approach achieves state of the art on various public datasets right now (10th May 2020) as far as the accuracy is concerned
就准确性而言,该方法目前(2020 年 5 月 10 日)在各种公共数据集上达到了最先进的水平
More details : https://arxiv.org/abs/2004.12629