权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Data-Driven Algorithms for Data Acquisition

用于数据采集的数据驱动算法

基本信息

批准号：
EP/Y037200/1
负责人：
Tom Rainforth
金额：
$ 156.63万
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2024
资助国家：
英国
起止时间：
2024 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FY037200%2F1
关键词：
Data Driven Algorithms Acquisition

项目摘要

Advances in machine learning have transformed our ability to utilize data. But far less progress has been made on intelligently acquiring such data in the first place. Consequently, though data-driven approaches are now ubiquitous across science and industry, hand-crafted and heuristic approaches are typically still the norm for data acquisition itself.My goal is to address this shortfall by developing principled quantitative methods for data acquisition. In particular, I will construct adaptive algorithms that leverage information from previous data to guide future data acquisition. The basis for doing this will be the framework of Bayesian adaptive design (BAD), which formalizes the utility of data through the information it provides, then exploits this to optimize the controllable aspects of the acquisition process.Despite its principled foundations, BAD has not yet seen substantial uptake due to some key challenges in its deployment. Most notably, it has crippling computational bottlenecks that undermine its usage. By overcoming these with a new policy-based approach, I hope to turn BAD's potential into a reality, providing a powerful basis for intelligent data acquisition in domains as diverse as interactive surveys and virtual assistants, to laboratory experiments and psychology trials.One area of particular focus will be active learning, wherein one iteratively selects points to label from an unlabelled pool. Here BAD has already provided some success, but I believe it is currently fundamentally misapplied. I hope to substantially improve state-of-the-art in the area through various innovations, such as targeting information gain in predictions rather than parameters, properly utilizing unlabelled data, and developing policy-based approaches. I further propose to revisit the foundations of the Bayesian neural network models often used in such settings, questioning their fundamental assumptions and developing radically new approaches.

机器学习的进步改变了我们利用数据的能力。但在智能地获取这些数据方面取得的进展要少得多。因此，尽管数据驱动的方法现在在科学和工业中无处不在，但手工制作和启发式方法通常仍然是数据采集本身的规范。我的目标是通过开发数据采集的原则性定量方法来解决这一不足。特别是，我将构建自适应算法，利用以前的数据信息来指导未来的数据采集。这样做的基础将是贝叶斯自适应设计（BAD）的框架，它通过它提供的信息形式化数据的效用，然后利用它来优化采购过程的可控方面，尽管它的原则基础，BAD还没有看到大量的吸收，由于在其部署的一些关键挑战。最值得注意的是，它有严重的计算瓶颈，破坏了它的使用。通过一种新的基于政策的方法来克服这些问题，我希望将BAD的潜力变成现实，为交互式调查和虚拟助手，实验室实验和心理学试验等领域的智能数据采集提供强大的基础。在这方面，BAD已经取得了一些成功，但我认为它目前从根本上被误用了。我希望通过各种创新来大幅提高该领域的最新技术水平，例如在预测中瞄准信息增益而不是参数，适当利用未标记的数据，以及开发基于政策的方法。我还建议重新审视在这种情况下经常使用的贝叶斯神经网络模型的基础，质疑他们的基本假设，并开发全新的方法。