A group straightening method for hydraulic supports based on multi-agent global optimization
-
Abstract
The linearity of the mining face is crucial for the stability of surrounding rocks and the efficiency of coal recovery. To address the issue of establishing an accurate control model to achieve the optimal group straightening of the hydraulic supports during the mining process, which is affected by dynamic changes in the coal wall, roof and floor, a multi-agent decision model for group straightening of hydraulic supports is proposed based on Markov decision process (MDP). Each hydraulic support is treated as an agent, thus avoiding reliance on control models. A multi-agent decision model for group straightening of hydraulic supports is established, where the agents collaborate with each other to achieve optimal decisions for the group straightening using reinforcement learning (RL). Given that the actions during hydraulic supports straightening are continuous numerical, the twin delayed deep deterministic policy gradient (TD3) algorithm is extended to the multi-agent straightening process of hydraulic supports. To overcome the slow convergence speed of straightening policies under multi-agent conditions, layer normalization is introduced into the Actor-Critic network structure. This helps to aggregate the inputs of each layer under a unified probability distribution, accelerating the convergence of the neural network weights. To solve the difficulty of determining the optimal policy for multi-agent, a global action optimization (GAO) module is proposed, which globally optimizes the decision outcomes of multi-agent hydraulic supports, thereby achieving the optimal decision for hydraulic supports group straightening. Furthermore, based on the engineering practice of the coal mining face, a simulation platform for intelligent decision-making in group straightening of supports is developed using the Gym framework (OpenAI Gym). The experimental results show that the algorithm proposed in this paper significantly outperforms other algorithms in terms of the average value and standard deviation of linearity. When the number of “S-bend” grouped supports is seven, the proposed algorithm improves the coal mining face linearity by 86% compared to the MA-TD3 algorithm, and reduces the standard deviation by 71%.
-
-