[LG]《Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs》N L Roux, M G. Bellemare, J Lebensold, A Bergeron... [Mila & Reliant AI] (2025)
网页链接
#机器学习#
#人工智能#
#论文#
#AI创造营#