基于K-means SMOTE和随机森林算法的陷落柱识别模型

郝　帅; 王怀秀; 刘最亮

基于K-means SMOTE和随机森林算法的陷落柱识别模型

Collapsed column identification model based on K-means SMOTE and random forest algorithm

摘要

摘要: 为了克服单一地震属性在对陷落柱进行识别时出现多解性和不确定性问题以及样本数据不平衡带来的识别准确率偏移问题，构建了基于K-means SMOTE和随机森林二分类陷落柱识别模型，通过对多个地震属性进行联合分析以达到识别陷落柱的目的。以山西新元煤炭责任有限公司首采区东翼南部矿区作为研究区域，将前方解释人员通过三维地震勘探技术提取到的12种地震属性作为样本特征，并将实际揭露的陷落柱信息作为样本标签，构建地震多属性数据集；通过相关性分析和聚类分析评估以及随机森林重要性分析进行地震属性优选，最终优选相对独立的6种地震属性作为样本特征；利用K-means SMOTE算法对数据集进行平衡处理，补充得到8 992个数据，选取其中6 294个数据作为训练集，2 698个数据作为测试集；基于python语言平台搭建随机森林二分类模型，最终预测陷落柱的准确率可达到87%。通过对比3种常见机器学习分类算法，该模型识别陷落柱的准确率更高。

Abstract: In order to overcome the problem of multiple solutions and uncertainties in the identification of collapsed columns with a single seismic attribute and the problem of identification accuracy shift caused by unbalanced sample data, a binary classification collapsed column based on K-means SMOTE and random forest was constructed. The model can identify collapse columns by joint analysis of multiple seismic attributes. Taking the southern mining area of the east wing of the first mining area of Shanxi Xinyuan Coal Company as the research area, 12 seismic attributes extracted by the front interpreters through 3D seismic exploration technology are used as sample features, and the actually revealed collapse column information is used as sample labels to build a seismic multi-attribute attribute dataset; seismic attribute selection is carried out through correlation analysis, cluster analysis evaluation and random forest importance analysis, and 6 relatively independent seismic attributes are finally selected as sample features; the K-means SMOTE algorithm is used to balance the data set, and 8 992 data are obtained, of which 6 294 data are selected as the training set and 2 698 data are used as the test set; the random forest binary classification model is built based on the python language platform, and the final accuracy of predicting the collapsed column can reach 87%. By comparing three common machine learning classification algorithms, the model identified collapsed columns with higher accuracy.

HTML全文

参考文献(14)

施引文献

资源附件(0)