权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Fully Bayesian Reinforcement Learning for Control of Continuous Industrial Processes

用于控制连续工业过程的完全贝叶斯强化学习

基本信息

批准号：
2640133
负责人：
金额：
--
依托单位：
University of Liverpool
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2021
资助国家：
英国
起止时间：
2021 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2640133
关键词：
Fully Bayesian Reinforcement Learning Control

项目摘要

This exciting and innovative PhD, in partnership with NSG, relates to settings where a continuous manufacturing process is monitored and so controlled with a focus on both guaranteeing the quality of the product and minimising the costs of doing so, e.g. by minimising the amount of excess material used to guarantee that certain specifications of the product (e.g. thickness or defect rate) are met. The focus is on manufacture and treatment of glass. In such settings there is often a significant latency (i.e. minutes) between the control input changing and the response being observable. It is challenging to apply feedback control in these contexts, so existing Engineering solutions often make use of physical models for the process and employ predictive model-based control. While this does make it possible to produce desired variations in the product, the approach relies on the physical models for the process and the models for the sensors to be known. These models are well understood in general, but there are aspects where it is not possible to build accurate models that, for example, can infer how the fine detail of the thickness profile is impacted by variation in the power applied to heating elements at some historic time. Furthermore, the real-world changes over time (e.g. because valves become worn or because scheduled maintenance has not occurred recently) and while it is possible to develop work-arounds to adapt to these changes, these work-arounds can fail. Such failures can result in sudden and significant degradation in the quality of product. The fundamental challenge is then to develop a control strategy that fully capitalises on: offline historic data; parameterised models that capture the extensive but incomplete understanding of the processes and sensors' performance; offline simulated experience derived from those models; online data from sensors. Developing such a control strategy will require numerical Bayesian inference algorithms (e.g. Markov Chain Monte Carlo) to make inferences about the models in a way that exploits the historic data and domain experts' existing understanding. Borrowing from recent successful applications of Reinforcement Learning (RL) in other domains, RL will then be used to learn how best to apply the control given the inferred model. Such RL is computationally intensive and will therefore require use of High-Performance Computing resources.

这个令人兴奋和创新的博士学位，与NSG合作，涉及到连续制造过程的监控和控制，重点是保证产品的质量和最小化这样做的成本，例如通过最大限度地减少用于保证产品的某些规格（例如厚度或缺陷率）的多余材料的量。重点是玻璃的制造和处理。在这样的设置中，在控制输入改变和可观察到的响应之间通常存在显著的延迟（即，分钟）。在这些情况下应用反馈控制具有挑战性，因此现有的工程解决方案通常使用过程的物理模型并采用基于模型的预测控制。虽然这确实可以在产品中产生所需的变化，但该方法依赖于过程的物理模型和已知的传感器模型。这些模型通常被很好地理解，但是存在不可能构建精确模型的方面，例如，该精确模型可以推断厚度轮廓的精细细节如何受到在某个历史时间施加到加热元件的功率的变化的影响。此外，现实世界随着时间的推移而变化（例如，因为阀门磨损或因为最近没有进行定期维护），虽然可以开发变通方法来适应这些变化，但这些变通方法可能会失败。此类故障可能导致产品质量突然显著下降。最根本的挑战是开发一种控制策略，充分利用：离线历史数据;参数化模型，捕获对过程和传感器性能的广泛但不完整的理解;从这些模型中获得的离线模拟经验;来自传感器的在线数据。开发这样的控制策略将需要数值贝叶斯推理算法（例如马尔可夫链蒙特卡罗），以利用历史数据和领域专家现有理解的方式对模型进行推理。借鉴强化学习（RL）最近在其他领域的成功应用，RL将用于学习如何在给定推断模型的情况下最好地应用控制。这种RL是计算密集型的，因此需要使用高性能计算资源。