本研究中我们提出了一种基于扩散模型的在线强化学习算法
DACER(Diffusion Actor-Critic with Entropy Regulator)
,旨在克服传统强化学习方法在策略参数化中使用高斯分布的局限性。通过利用扩散模型的反向去噪过程,DACER能够有效地学习多模态分布,使得创建更复杂的策略并提高策略性能成为可能。一个显著的挑战来自于缺乏解析表达式来确定扩散策略的熵,使其难以与最大熵强化学习结合,导致性能不佳。为了解决这一问题,我们采用
高斯混合模型(GMM)来估计熵,从而促进了关键参数α的学习
,该参数通过调节动作输出中的噪声方差来实现探索和利用的平衡。在MuJoCo基准测试和多模态任务上的实证测试显示了DACER的优越性能。
5 参考文献
[1] S Eben Li. Reinforcement Learning for Sequential Decision and Optimal Control. Springer Verlag, Singapore, 2023.
[2] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
[3] Saining Xie Xinlei Chen, Zhuang Liu and Kaiming He. Deconstructing denoising diffusion models for self-supervised learning. arXiv preprint arXiv:2401.14404, 2024.
[4] Marco F Huber, Tim Bailey, Hugh Durrant-Whyte, and Uwe D Hanebeck. On entropy approximation for gaussian mixture random vectors. In 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, pages 181–188. IEEE, 2008.
[5] Jingliang Duan, Yang Guan, Shengbo Eben Li, Yangang Ren, Qi Sun, and Bo Cheng. Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE Transactions on Neural Networks and Learning Systems, 33(11):6584–6598, 2021.