ε-greedy,行动选择," /> ε-greedy,行动选择,"/> ε-greedy,action selection,"/> A Study of Action Selection Policies for Multi-Agent Nash Q-Learning Model
主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

Chinese Journal of Management Science ›› 2025, Vol. 33 ›› Issue (12): 110-120.doi: 10.16381/j.cnki.issn1003-207x.2023.0142

Previous Articles     Next Articles

A Study of Action Selection Policies for Multi-Agent Nash Q-Learning Model

Song Han(), Can Li   

  1. School of Economics,Renmin University of China,Beijing 100872,China
  • Received:2023-01-30 Revised:2023-03-30 Online:2025-12-25 Published:2025-12-25
  • Contact: Song Han E-mail:hansong@ruc.edu.cn

Abstract:

Optimization of action selection strategy for multi-agent Q-Learning model is one of the urgent problems to be solved in the simulation process of complex economics games. The forced ε-greedy action selection policy is introduced into the multi-agent Nash Q-Learning model, the effect of this action selection policy is compared with the classical ε-greedy policy through game experiments, and the effect of this action selection policy on the computational speed and convergence of the algorithm is explored; at the same time, theoretical verification of the veracity of the algorithm is carried out according to the experimental results, and a theoretical verification of the algorithm based on the nature of the multi-agent model is given. The simulation results show that forced ε-greedy is suitable for more complex games involving more state actions and more rounds, at which time it can effectively improve the running performance of the multi-agent Q-Learning algorithm, but due to its nature of initially increasing the exploration of actions, which will consume some rounds and lead to a decrease in the equilibrium convergence rate. Therefore, the performance improvement brought by enforcing ε-greedy versus the lost equilibrium convergence rate is an issue that users need to weigh when applying this policy.The main contributions of this paper are:1) comparing the model performance under the classical action selection strategy and the extended action selection policy, and conducting theoretical validation of the algorithm's veracity based on the simulation results, and summarizing the characteristics of the application of this extended policy in the multi-agent Nash Q model; 2) based on the analysis of the theoretical properties of multiple multi-agent models and the process of proof of convergence, the conclusion of the generalization of this action selection policy is given for the multi-agent reinforcement learning algorithms with action selection link provide a strategy to enhance the model computing speed and improve the convergence results. This promotes the operation efficiency of complex reinforcement learning models in economics, and to a certain extent, it can reduce the cost of experiments, and increase the number of experiments within the allowable range of computational volume to improve the reliability of experimental results.

Key words: Nash Q-Learning, ε-greedy')">forcedε-greedy, action selection

CLC Number: