基于分类回放双延迟贝叶斯深度确定性策略梯度的燃料电池温度控制

黄金阳光 · 发表于 2024-10-4 00:39

文档摘要：质子交换膜燃料电池(PEMFC)是一种难以精确建模的非线性系统,因此需要具有较强鲁棒性与高适应性的控制器来控制PEMFC电堆温度.该文提出一种基于深度强化学习的数据驱动控制器来控制电堆温度.考虑PEMFC系统的特点,包括其非线性、不确定性和环境条件的影响,提出一种新的深度强化学习算法,即分类回放双延迟贝叶斯深度确定性策略梯度(CTDB-DDPG)算法.该算法的设计引入贝叶斯神经网络、分类经验回放等技术,提高了控制器的性能.通过仿真结果与RT-Lab实验平台的结果表明,利用CTDB-DDPG算法的高适应性与强鲁棒性,所提算法可以更有效地控制PEMFC电堆温度,具有一定的实际意义.

Abstract：Protonexchangemembranefuelcells(PEMFCs)havethecharacteristicsofdifficultytomodelaccuratelyandstrongnonlinearity;inaddition,theradiatorandcirculatingwaterpumpinthehydrothermalmanagementsystemofthefuelcellsystemhavethecharacteristicsofstrongcoupling,whichmakesitdifficultforthemodel-basedcontrolalgorithmstoachieveaccuratecontrolofthefuelcelltemperature,thispaperproposesadata-drivenmodel-freealgorithmbasedontheonclassifiedreplaytwindelayedBayesiandeepdeterministicpolicygradient(CTDB-DDPG)toachievethecontrolofthefuelcelltemperaturesystem.Firstly,theuseofdeepdeterministicpolicygradientisproposedtosolvetheproblemofintricatemodelingoffuelcells.Then,theclassificationexperienceplaybackstrategyisaddedtothealgorithm,andtheCTDB-DDPGalgorithmusestwoexperiencebufferpoolstostoretheexperiencedata.Whenconstructingthenetworkmodel,theaverageTDerrorofallsamplesinthesetwoexperiencebufferpoolsisinitializedto0.Whenevernewexperiencedataisgenerated,theaverageTDerrorsofallexperiencedataarefirstupdated.IfitsTDerrorexceedsthemeanvalue,itisstoredintheempiricalbufferpoolI.Otherwise,itisstoredintheempiricalbufferpoolⅡ.Classifyingeachexperiencesample'sTDerrorhelpsbetterusetheempiricaldatatotrainthenetworkmodel.CTDB-DDPGconsiderstheneuralnetwork'suncertaintybyincorporatingaBayesianneuralnetworkintothealgorithm,andtheproposedBootstrapwithrandominitializationleadstoareasonableuncertaintyestimation.Atthebeginningofeachroundorfixedintervalduringthelearningprocess,unbiasedhypothesesareobtainedfromtheposteriordistributionsoftheMDPparametersandestimatedusingamulti-headsharednetworkBootstrapvaluefunction,whichdoesnotrequireadditionalcomputationalresources.Moreover,usingQ-learningpreservestheuncertaintyofthecumulativediscount,whichismoreeffectiveforenvironmentsrequiringdeepexploration.RandomlyselectingtheheadnetworkandsimulatingThompsonsamplingcaneffectivelyavoidineffectiveboostingofintelligenceinthenoisestrategy,acceleratingtheconvergenceoftheCTDB-DDPGalgorithm.Inaddition,thefuelcellthermalmanagementsystemhasalargeinertia;thealgorithminthispaperaddsOUnoisetotheactiontoimprovetheexplorationefficiency.OUnoiseisatemporarycorrelationnoiseextractedfromtheOrnstein-Uhlenbeckprocess,whichhelpsthealgorithmtobetterexploredifferentstrategiesbygeneratingtemporalcorrelationnoise.Thisexplorationprocesscanhelpthealgorithmtofindpossiblebetterstrategies,thusimprovingtheperformanceandefficiencyofthealgorithm.Althoughtheadditionofnoisecancausethealgorithm'sperformancetodeteriorateintheshortterm,inthelongterm,theadditionofnoisecanhelpthealgorithmtoavoidfallingintoalocaloptimum.Itmayhelptofindabetterstrategy.Finally,thealgorithm'svalidityisverifiedonthesimulationplatformSimulinkaswellastheexperimentalplatformRT-Lab,andsimilarconclusionsareobtained,verifyingthealgorithm'seffectiveness.However,althoughourCTDB-DDPGtemperaturecontrolstrategyhasbeenvalidatedonsimulationandhardware-in-the-looptestplatforms,morecomplexreal-worldworkingconditions,suchasambienttemperatureandhumidityvariationsandequipmentaging,willbeconsideredinfuturestudiestotestandimprovetheadaptabilityandrobustnessofouralgorithminthebroaderrangeofmorecomplexsituations.

作者：赵洪山  潘思潮  马利波  吴雨晨  吕廷彦Author：ZhaoHongshan  PanSichao  MaLibo  WuYuchen  LüTingyan
作者单位：河北省分布式储能与微网重点实验室(华北电力大学)保定071003
刊名：电工技术学报 ISTICEIPKU
Journal：TransactionsofChinaElectrotechnicalSociety
年，卷(期)：2024, 39(13)
分类号：TM911.4U264
关键词：燃料电池  联合控制  深度确定性  贝叶斯网络
Keywords：Fuelcell  jointcontrol  deepreinforcementlearning  Bayesiannetwork
机标分类号：TM911.4TP273TP391
在线出版日期：2024年7月22日
基金项目：基于分类回放双延迟贝叶斯深度确定性策略梯度的燃料电池温度控制[
期刊论文]  电工技术学报--2024, 39(13)赵洪山  潘思潮  马利波  吴雨晨  吕廷彦质子交换膜燃料电池(PEMFC)是一种难以精确建模的非线性系统,因此需要具有较强鲁棒性与高适应性的控制器来控制PEMFC电堆温度.该文提出一种基于深度强化学习的数据驱动控制器来控制电堆温度.考虑PEMFC系统的特点,包括其...参考文献和引证文献
参考文献
引证文献
本文读者也读过
相似文献
相关博文

关键词：燃料电池,联合控制,深度确定性,贝叶斯网络,

2024-10-4 00:39 上传

基于分类回放双延迟贝叶斯深度确定性策略梯度的燃料电池温度控制.pdf

文件大小:: 20.79 MB

下载次数:: 60

高速下载

[储能] 基于分类回放双延迟贝叶斯深度确定性策略梯度的燃料电池温度控制

能源电力

化工

建筑工程

机械

电子信息

医药

科学