anaconda3+ paddleOCR安装使用

老实说，官网文档有点云山雾罩。

windows下又不让用nvidia-docker，只好anaconda的方式装

综合

https://www.paddlepaddle.org.cn/install/quick

https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/installation_en.md

配置conda 国内镜像

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
conda config --set show_channel_urls yes

1 建立3.7虚环境

conda create --name paddle python=3.7
activate paddle

一定要有版本

2 安装 paddlepaddle-gpu 2.0.0版

只能用pip从百度的镜像里安装，conda和公网的pypi都没有这个版本

百度的镜像里只有2.0.0a0， docker版才有2.0.0b0

python -m pip install paddlepaddle-gpu==2.0.0a0 -i https://mirror.baidu.com/pypi/simple

3.7下只有 2.0.0a0

3 paddleOCR

3.1 下载源码，安装依赖

git clone https://gitee.com/paddlepaddle/PaddleOCR
cd PaddleOCR
pip install -r requirments.txt

3.2 手动下载安装shapely

https://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely

3.3 手动解改名、解压缩、复制dll！

https://github.com/PaddlePaddle/PaddleOCR/issues/212

从您提供的网址下载了 Shapely-1.7.0-cp39-cp39-win_amd64.whl，pip install 此whl文件不成功。

于是更名为Shapely-1.7.0-cp39-cp39-win_amd64.rar，

然后解压缩，从其子目录shapely\DLLs\中找到geos_c.dll，并将geos_c.dll拷贝到conda的环境（我的命名是ocr）目录 C:\Users\myusername\Miniconda3\envs\ocr\Library\bin中。问题解决。

3.4 修改PaddleOCR/paddleocr.py

然后参考

https://github.com/PaddlePaddle/PaddleOCR/issues/832

在PaddleOCR/paddleocr.py中，找到def parse_args():

在下面加入下面这行

parser.add_argument("--use_pdserving", type=bool, default=False)

否则后面运行会报错

Namespace(cls=False, cls_batch_num=30, cls_image_shape='3, 48, 192', cls_model_dir='C:\\Users\\xuqinghan/.paddleocr/cls', cls_thresh=0.9, det=True, det_algorithm='DB', det_db_box_thresh=0.5, det_db_thresh=0.3, det_db_unclip_ratio=2.0, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_max_side_len=960, det_model_dir='C:\\Users\\xuqinghan/.paddleocr/det', enable_mkldnn=False, gpu_mem=8000, image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', max_text_length=25, rec=True, rec_algorithm='CRNN', rec_batch_num=30, rec_char_dict_path='./ppocr/utils/ppocr_keys_v1.txt', rec_char_type='ch', rec_image_shape='3, 32, 320', rec_model_dir='C:\\Users\\xuqinghan/.paddleocr/rec/ch', use_angle_cls=False, use_gpu=True, use_space_char=True, use_tensorrt=False, use_zero_copy_run=False)
Traceback (most recent call last):
File "D:\dev\chem\chart-nmr-spectrum\test_paddle_ocr.py", line 10, in
ocr = PaddleOCR() # need to run only once to download and load model into memory
File "D:\Users\xuqinghan\anaconda3\envs\paddle\lib\site-packages\paddleocr-1.0.0-py3.7.egg\paddleocr\paddleocr.py", line 222, in __init__
super().__init__(postprocess_params)
File "D:\Users\xuqinghan\anaconda3\envs\paddle\lib\site-packages\paddleocr-1.0.0-py3.7.egg\paddleocr\tools\infer\predict_system.py", line 41, in __init__
self.text_detector = predict_det.TextDetector(args)
File "D:\Users\xuqinghan\anaconda3\envs\paddle\lib\site-packages\paddleocr-1.0.0-py3.7.egg\paddleocr\tools\infer\predict_det.py", line 77, in __init__
if args.use_pdserving is False:
AttributeError: 'Namespace' object has no attribute 'use_pdserving'

3.5 安装

cd PaddleOCR
python setup.py install

最终运行个demo 应该不会报错了

import os
from paddleocr import PaddleOCR, draw_ocr

if __name__ == '__main__':

    PATH_IMG_IN = './in'
    filename = os.path.join(PATH_IMG_IN, '1.png')

    ocr = PaddleOCR() # need to run only once to download and load model into memory
    start = time.perf_counter()
    result = ocr.ocr(filename, rec=False)
    end = time.perf_counter()
    print('检测文字区域 耗时{}'.format(end-start))
    #每个矩形，从左上角顺时针排列

    for rect1 in rects:
        print(rect1)

小结：

文档混乱，安装过程到处是坑。但是看在效果还凑合的份上，凑合用吧

paddle OCR

anaconda3+ paddleOCR安装使用

1 建立3.7虚环境

2 安装 paddlepaddle-gpu 2.0.0版

3 paddleOCR

相关

paddlex 使用-5 Andrdroid4.1报错插件无法删除

深入学习Tesseract-ocr识别中文并训练字库的方法

想要将pdf文件转换成word格式，快来用PDF to Word OCR

PDF to Word OCR for Mac(PDF转换器)

用Python写了一个图像文字识别OCR工具

paddlex_gui_win10（飞浆）

【华为云技术分享】华为云文字识别服务关键技术、能力和产品落地需要注意的事宜（OCR系

用paddlex转换模型.nb格式

软件课设：OCR文字标注

腾讯云OCR服务二次开发

IoCreateDevice 驱动设备名称

OCR文字识别-开源方案本地部署

标签