基于PP−OCRv3迁移学习的煤矿数字显示器字符识别研究

    Research on character recognition of coal mine digital displays based on PP-OCRv3 transfer learning

    • 摘要: 煤矿设备数字显示器(简称“数显屏”)字符识别是煤矿智能化建设中的研究热点之一,然而该领域仍然面临2个显著的问题:①由于有效识别区域小,井下光照条件复杂,图像质量低等干扰因素导致的识别效果不佳;②受煤矿井下作业环境条件制约引起样本数据采集受限从而导致模型泛化能力不足。针对上述存在的问题,提出了一种基于PP−OCRv3 (A Practical Ultra Light Weight Optical−Character Recognition, PP−OCR)迁移学习的煤矿数显屏字符识别算法。首先,采用PP−OCRv3作为预训练模型,提高文本通用特征表达能力,提升煤矿复杂环境中字符检测和识别的精度;其次,分别以文本识别公共数据集、自制数显屏字符数据集、真实和模拟煤矿数显屏字符数据集为驱动,多次逐步迁移PP−OCRv3模型,驱使模型从一般场景自适应转变到煤矿的特殊场景,实现模型跨场景泛化性能和识别速度的提升。试验验证表明:在抗干扰能力测试中,迁移优化模型平均准确度达78.83%(提升17.29%),其中在干扰块场景下的提升尤为显著,高达79.73%(提升29.32%);实时性评估显示,推理帧率平均提升27.295 帧/s,其中在模糊场景下提升高达57.67 帧/s;多次迁移后的PP−OCRv3模型在有效降低对标注数据样本的依赖性的同时,识别准确性和识别速度均优于对比模型。

       

      Abstract: Character recognition on digital displays of coal mine equipment is one of the research hotspots in the intelligent construction of coal mines. However, this field still faces two significant problems: poor recognition effect due to interference factors such as the small effective recognition area, complex underground lighting conditions, and low image quality; due to the constraints of the underground working environment in coal mines, the collection of sample data is limited, resulting in insufficient generalization ability of the model. Aiming at the existing problems mentioned above, a character recognition algorithm for digital displays in coal mines based on PP-OCRv3(a practical ultra light weight optical character recognition, PP-OCR) transfer learning is proposed. Firstly, PP-OCRv3 is adopted as the pre-training model to improve the expression ability of general text features and enhance the accuracy of character detection and recognition in the complex environment of coal mines. Secondly, driven by the public data set for text recognition, the self-made digital display character data set, and the real and simulated coal mine digital display character data sets respectively, the PP-OCRv3 model was gradually migrated multiple times to drive the model to adaptively transform from the general scene to the special scene of the coal mine, achieving the improvement of the cross-scene generalization. The experimental verification shows that in the anti-interference ability test, the average accuracy of the migration optimization model reaches 78.83% (an increase of 17.29%), among which the improvement in the interference block scenario is particularly significant, reaching as high as 79.73% (an increase of 29.32%). The real-time evaluation shows that the average inference frame rate has increased by 27.295 fps, among which the increase in the fuzzy scene is as high as 57.67 fps. The PP-OCRv3 model after multiple migrations effectively reduces the dependence on labeled data samples while having better recognition accuracy and recognition speed than the comparison models.

       

    /

    返回文章
    返回