ε-greedy,行动选择," /> ε-greedy,行动选择,"/> ε-greedy,action selection,"/> 多代理Nash Q-Learning模型行动选择策略研究
主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2025, Vol. 33 ›› Issue (12): 110-120.doi: 10.16381/j.cnki.issn1003-207x.2023.0142cstr: 32146.14.j.cnki.issn1003-207x.2023.0142

• • 上一篇    下一篇

多代理Nash Q-Learning模型行动选择策略研究

韩松(), 李璨   

  1. 中国人民大学经济学院,北京 100872
  • 收稿日期:2023-01-30 修回日期:2023-03-30 出版日期:2025-12-25 发布日期:2025-12-25
  • 通讯作者: 韩松 E-mail:hansong@ruc.edu.cn
  • 基金资助:
    国家自然科学基金面上项目(72471228);中国人民大学“求是学术”育人育才项目(RUC24QSDL035);中国人民大学2023年“中央高校建设世界一流大学(学科)和特色发展引导专项资金项目

A Study of Action Selection Policies for Multi-Agent Nash Q-Learning Model

Song Han(), Can Li   

  1. School of Economics,Renmin University of China,Beijing 100872,China
  • Received:2023-01-30 Revised:2023-03-30 Online:2025-12-25 Published:2025-12-25
  • Contact: Song Han E-mail:hansong@ruc.edu.cn

摘要:

多代理Q-Learning模型的行动选择策略优化是复杂经济学博弈模拟过程中亟待解决的问题之一。本文将强制ε-greedy行动选择策略引入多代理Nash Q-Learning模型中,通过博弈实验对比该行动选择策略与经典ε-greedy策略的效果,探究该行动选择策略对算法计算速度和收敛情况的影响;同时,根据实验结果进行了算法真实性理论验证,并基于多代理模型的性质给出强制ε-greedy的普适性推论。模拟结果表明,强制ε-greedy适用于更复杂、涉及状态行动更多、回合更多的博弈,此时能有效提升多代理Q-Learning算法运行性能,但由于其本质是初期增加对行动的探索,这会消耗一些回合,导致均衡收敛率下降。因此,强制ε-greedy带来的性能提升与损失的均衡收敛率是使用者在应用该策略时需要权衡的问题。

关键词: Nash Q-Learning, ε-greedy')">强制ε-greedy, 行动选择

Abstract:

Optimization of action selection strategy for multi-agent Q-Learning model is one of the urgent problems to be solved in the simulation process of complex economics games. The forced ε-greedy action selection policy is introduced into the multi-agent Nash Q-Learning model, the effect of this action selection policy is compared with the classical ε-greedy policy through game experiments, and the effect of this action selection policy on the computational speed and convergence of the algorithm is explored; at the same time, theoretical verification of the veracity of the algorithm is carried out according to the experimental results, and a theoretical verification of the algorithm based on the nature of the multi-agent model is given. The simulation results show that forced ε-greedy is suitable for more complex games involving more state actions and more rounds, at which time it can effectively improve the running performance of the multi-agent Q-Learning algorithm, but due to its nature of initially increasing the exploration of actions, which will consume some rounds and lead to a decrease in the equilibrium convergence rate. Therefore, the performance improvement brought by enforcing ε-greedy versus the lost equilibrium convergence rate is an issue that users need to weigh when applying this policy.The main contributions of this paper are:1) comparing the model performance under the classical action selection strategy and the extended action selection policy, and conducting theoretical validation of the algorithm's veracity based on the simulation results, and summarizing the characteristics of the application of this extended policy in the multi-agent Nash Q model; 2) based on the analysis of the theoretical properties of multiple multi-agent models and the process of proof of convergence, the conclusion of the generalization of this action selection policy is given for the multi-agent reinforcement learning algorithms with action selection link provide a strategy to enhance the model computing speed and improve the convergence results. This promotes the operation efficiency of complex reinforcement learning models in economics, and to a certain extent, it can reduce the cost of experiments, and increase the number of experiments within the allowable range of computational volume to improve the reliability of experimental results.

Key words: Nash Q-Learning, ε-greedy')">forcedε-greedy, action selection

中图分类号: