稳定且受限的新强化学习SAC算法

admin · 发表于 2024-12-14 02:32

文档名：稳定且受限的新强化学习SAC算法
摘要：为解决由于固定温度SAC(SoftActorCritic)算法中存在的Q函数高估可能会导致算法陷入局部最优的问题,通过深入分析提出了一个稳定且受限的SAC算法(SCSAC:StableConstrainedSoftActorCritic).该算法通过改进最大熵目标函数修复固定温度SAC算法中的Q函数高估问题,同时增强算法在测试过程中稳定性的效果.最后,在4个OpenAIGymMujoco环境下对SCSAC算法进行了验证,实验结果表明,稳定且受限的SAC算法相比固定温度SAC算法可以有效减小Q函数高估出现的次数并能在测试中获得更加稳定的结果.

Abstract：TosolvetheproblemthatQfunctionoverestimationmaycauseSAC(SoftActorCritic)algorithmtrappedinlocaloptimalsolution,SCSAC(StableConstrainedSoftActorCritic)algorithmisproposedforperfectlyresolvingtheaboveweaknesshiddeninmaximumentropyobjectivefunctionimprovingthestabilityofStableConstrainedSoftActorCriticalgorithmintrailingprocess.TheresultofevaluatingStableConstrainedSoftActorCriticalgorithmonthesuiteofOpenAIGymMujocoenvironmentsshowslessQvalueoverestimationappearanceandmorestableresultsintrailingprocesscomparingwithSACalgorithm.

作者：海日张兴亮姜源杨永健 Author：HAIRi ZHANGXingliang JIANGYuan YANGYongjian
作者单位：吉林大学计算机科学与技术学院,长春130012中国移动通信集团有限公司中国移动通信集团吉林有限公司,长春130022
刊名：吉林大学学报（信息科学版） ISTIC
Journal：JournalofJilinUniversity(InformationScienceEdition)
年，卷(期)：2024, 42(2)
分类号：TP301
关键词：强化学习  最大熵强化学习  Q值高估  SAC算法
Keywords：reinforcementlearning  maximumentropyreinforcementlearning  Qvalueoverestimation  softactorcritic(SAC)algorithm
机标分类号：TP181TP391TP242
在线出版日期：2024年5月27日
基金项目：吉林省发改委创新能力建设基金资助项目，吉林省科技发展计划重点基金资助项目稳定且受限的新强化学习SAC算法[
期刊论文]  吉林大学学报（信息科学版）--2024, 42(2)海日  张兴亮  姜源  杨永健为解决由于固定温度SAC(SoftActorCritic)算法中存在的Q函数高估可能会导致算法陷入局部最优的问题,通过深入分析提出了一个稳定且受限的SAC算法(SCSAC:StableConstrainedSoftActorCritic).该算法通过改进最大熵目...参考文献和引证文献
参考文献
引证文献
本文读者也读过
相似文献
相关博文

稳定且受限的新强化学习SAC算法  Novel Reinforcement Learning Algorithm:Stable Constrained Soft Actor Critic

稳定且受限的新强化学习SAC算法.pdf

2024-12-14 02:32 上传

稳定且受限的新强化学习SAC算法.pdf

文件大小:: 2.55 MB

下载次数:: 60

高速下载

稳定且受限的新强化学习SAC算法

能源电力

化工

建筑工程

机械

电子信息

医药

科学