权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Scalable Bayesian Reinforcement Learning in the Games Industry

游戏行业中的可扩展贝叶斯强化学习

基本信息

批准号：
2890029
负责人：
金额：
--
依托单位：
Queen Mary University of London
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2890029
关键词：
Scalable Bayesian Reinforcement Learning Games

项目摘要

One of the main challenges in reinforcement learning is identifying good data sampling strategies that effectively balance between exploring the space of all possible policies, and exploiting the trajectories that have yielded better outcomes so far. In environments with complex state and action spaces, such as those in video-games, this challenge becomes more apparent with traditional reinforcement learning algorithms often suffering from sample inefficiency, model bias, and over-fitting. Through incorporating uncertainty estimation and prior knowledge into the learning process, Bayesian reinforcement learning naturally balances this exploration-exploitation trade-off, making it a natural candidate for application in these environments. However, Bayesian reinforcement learning algorithms are more computationally intensive, which has hindered their wide-spread adoption. The proposed study will investigate methods to scale Bayesian reinforcement learning to handle large-scale problems, while maintaining computational efficiency and accuracy. On successful completion, this study will result in more efficient and stable training of reinforcement learning agents in the games industry. Through this, reinforcement learning algorithms can be more easily integrated into the design pipelines, resulting in quicker and more stable development as well as enhanced user experience.

强化学习的主要挑战之一是确定良好的数据采样策略，以有效地平衡探索所有可能政策的空间和利用迄今为止产生更好结果的轨迹。在具有复杂状态和动作空间的环境中，例如视频游戏中的环境，传统的强化学习算法通常会受到样本效率低下，模型偏差和过度拟合的影响，这一挑战变得更加明显。通过将不确定性估计和先验知识纳入学习过程，贝叶斯强化学习自然地平衡了这种探索-利用权衡，使其成为这些环境中应用的自然候选者。然而，贝叶斯强化学习算法的计算量更大，这阻碍了它们的广泛采用。该研究将研究扩展贝叶斯强化学习以处理大规模问题的方法，同时保持计算效率和准确性。成功完成后，这项研究将导致游戏行业中强化学习代理的更有效和更稳定的训练。通过这种方式，强化学习算法可以更容易地集成到设计流程中，从而实现更快，更稳定的开发以及增强的用户体验。