5/02/2020

get image rect list from pdf

extract all image rect list from pdf using pymupdf
look at the sample code

..

#pip install PyMuPDF
#document : https://pymupdf.readthedocs.io/en/latest/

#pip install opencv-python
#github : https://github.com/skvark/opencv-python

import fitz

img_bbox = []
doc1 =fitz.open('test.pdf')
page1 = doc1[0] #first page

d = page1.getText("dict")
blocks = d["blocks"]
imgblocks = [b for b in blocks if b["type"] == 1]
for v in imgblocks:
[x1, y1, x2, y2] = v['bbox']
#print(x1, y1, x2, y2)
img_bbox.append({'left':int(x1), 'top':int(y1), 'right':int(x2), 'bottom':int(y2)})
..

No comments:

Post a Comment