Developing Efficient Numerical Algorithms Using Fast Bayesian Random Forests

使用快速贝叶斯随机森林开发高效的数值算法

基本信息

  • 批准号:
    2748743
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Studentship
  • 财政年份:
    2022
  • 资助国家:
    英国
  • 起止时间:
    2022 至 无数据
  • 项目状态:
    未结题

项目摘要

Is working with Random Forests a key interest of yours? Would you like to become adept at using Sequential Monte Carlo Samplers and parallel computing techniques? GCHQ and the CDT is looking for a keen PhD candidate who is conversant in Random Forest algorithms to assist them in developing new divide-and-conquer approaches that can help generate more accurate predictions. Does this sound like the challenging project you are looking for?Random Forests (e.g. in sk-learn) are in pervasive use in data science and machine learning. In such algorithms, each tree describes succinct rules that relate the inputs to the outputs (which can be both continuous values, in the context of regression, and discrete labels, in the context of classification). Random Forests then use the diversity of the set of trees to convey uncertainty about which rules apply to any datum. This combination of succinctness and diversity is perhaps why the training algorithms for random forests are both fast and often (remarkably) effective.When seen through the lens of statistics, the training algorithms for Random Forests are ad-hoc. The belief that a more principled approach could lead to improved performance has motivated historic attempts to use numerical Bayesian techniques to estimate the parameters of tree-based data science algorithms. The resulting approaches, e.g. Bayesian Additive Regression Trees (BART) and Classification Additive Regression Trees (CART), despite using models that are arguably less sophisticated than those used in Random Forests, are typically slow. This lack of speed comes about because algorithms like BART and CART use a specific numerical Bayesian technique, Markov Chain Monte Carlo (MCMC). While efficient general-purpose variants of MCMC (e.g. the No-U-Turn-Sampler (NUTS)) exist and can applied to many problems, these MCMC variants are not applicable in contexts where the number of parameters is unknown. Since trees can have different numbers of nodes and so different numbers of parameters, algorithms like NUTS can't be used to improve the run-time for tree-based algorithms like BART and CART.NUTS achieves its efficiency by using gradient information to identify the directions in which to move to optimise the parameters of the model. While one cannot calculate gradients in the context of trees, one can emulate the calculation of gradients in such settings using a hierarchy of what is known as mini-batches in the neural network literature. An algorithm that exploits this idea, Hierarchical Importance with Nested Training Samples (HINTS), was developed in 2004, but it is now obscure and its potential largely untapped. Furthermore, there is potential to use more recent advances in the context of Sequential Monte Carlo (SMC) samplers, a family of algorithms that can be exploit parallel processing resources (e.g. GPUs) and that can remove the need for the sequential burn-in phase that is often responsible for MCMC algorithms' slow run-time.This PhD will seek to develop efficient numerical Bayesian algorithms (likely to be based on a combination of HINTS and SMC samplers) that can be used to develop a variant of a Random Forest algorithms. The intent is that the new algorithms would be drop-in replacements to the Random Forest algorithms used in sk-learn that offer the same interfaces, but can use parallel computational resources to use a given training dataset to provide GCHQ more accurate predictions in the same elapsed time.
与随机森林合作是你的主要兴趣吗?您想熟练使用顺序蒙特卡罗采样器和并行计算技术吗?GCHQ和CDT正在寻找一位精通随机森林算法的博士候选人,以帮助他们开发新的分而治之的方法,以帮助生成更准确的预测。这听起来像是你正在寻找的具有挑战性的项目吗?随机森林(例如sk-learn)在数据科学和机器学习中广泛使用。在这样的算法中,每棵树都描述了将输入与输出(在回归的上下文中可以是连续值,在分类的上下文中可以是离散标签)相关联的简洁规则。然后,随机森林使用树集合的多样性来传达关于哪些规则适用于任何数据的不确定性。这种简洁性和多样性的结合也许就是为什么随机森林的训练算法既快速又(非常)有效的原因。当通过统计学的透镜来看时,随机森林的训练算法是特别的。更有原则的方法可以提高性能的信念,促使人们尝试使用数值贝叶斯技术来估计基于树的数据科学算法的参数。由此产生的方法,例如贝叶斯加性回归树(BART)和分类加性回归树(CART),尽管使用的模型可能不如随机森林中使用的模型复杂,但通常速度很慢。这种速度的缺乏是因为像BART和CART这样的算法使用了一种特定的数值贝叶斯技术,即马尔可夫链蒙特卡罗(MCMC)。虽然MCMC的有效通用变体(例如,无U形转弯采样器(NUTS))存在并且可以应用于许多问题,但这些MCMC变体不适用于参数数量未知的情况。由于树可以有不同数量的节点,因此不同数量的参数,像NUTS这样的算法不能用于改善基于树的算法(如BART和CART)的运行时间。NUTS通过使用梯度信息来识别移动的方向以优化模型的参数来实现其效率。虽然不能在树的上下文中计算梯度,但可以使用神经网络文献中称为小批量的层次结构来模拟这种设置中的梯度计算。2004年开发了一种利用这一思想的算法,嵌套训练样本的层次重要性(HINTS),但它现在很模糊,其潜力在很大程度上尚未开发。此外,有可能在顺序蒙特卡罗(SMC)采样器的背景下使用更新的进展,可以利用并行处理资源的算法族(例如GPU),并且可以消除对顺序老化阶段的需要,这通常是MCMC算法运行缓慢的原因。该博士将寻求开发有效的数值贝叶斯算法(可能基于HINTS和SMC采样器的组合),其可用于开发随机森林算法的变体。其目的是,新算法将是sk-learn中使用的随机森林算法的直接替代品,提供相同的接口,但可以使用并行计算资源来使用给定的训练数据集,在相同的时间内为GCHQ提供更准确的预测。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

其他文献

吉治仁志 他: "トランスジェニックマウスによるTIMP-1の線維化促進機序"最新医学. 55. 1781-1787 (2000)
Hitoshi Yoshiji 等:“转基因小鼠中 TIMP-1 的促纤维化机制”现代医学 55. 1781-1787 (2000)。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
LiDAR Implementations for Autonomous Vehicle Applications
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
生命分子工学・海洋生命工学研究室
生物分子工程/海洋生物技术实验室
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
吉治仁志 他: "イラスト医学&サイエンスシリーズ血管の分子医学"羊土社(渋谷正史編). 125 (2000)
Hitoshi Yoshiji 等人:“血管医学与科学系列分子医学图解”Yodosha(涉谷正志编辑)125(2000)。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
Effect of manidipine hydrochloride,a calcium antagonist,on isoproterenol-induced left ventricular hypertrophy: "Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,K.,Teragaki,M.,Iwao,H.and Yoshikawa,J." Jpn Circ J. 62(1). 47-52 (1998)
钙拮抗剂盐酸马尼地平对异丙肾上腺素引起的左心室肥厚的影响:“Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:

的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('', 18)}}的其他基金

An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
  • 批准号:
    2901954
  • 财政年份:
    2028
  • 资助金额:
    --
  • 项目类别:
    Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
  • 批准号:
    2896097
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
  • 批准号:
    2780268
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
  • 批准号:
    2908918
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
  • 批准号:
    2908693
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
  • 批准号:
    2908917
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
  • 批准号:
    2879438
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
  • 批准号:
    2890513
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
CDT year 1 so TBC in Oct 2024
CDT 第 1 年,预计 2024 年 10 月
  • 批准号:
    2879865
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
  • 批准号:
    2876993
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship

相似海外基金

Collaborative Research: Accurate and Structure-Preserving Numerical Schemes for Variable Temperature Phase Field Models and Efficient Solvers
合作研究:用于变温相场模型和高效求解器的精确且结构保持的数值方案
  • 批准号:
    2309547
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Exploration of efficient turbulence stimulation method with data assimilation of numerical simulation and measurement
数值模拟与测量数据同化的高效湍流模拟方法探索
  • 批准号:
    23H01622
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Scalable Bayesian regression: Analytical and numerical tools for efficient Bayesian analysis in the large data regime
可扩展贝叶斯回归:在大数据领域进行高效贝叶斯分析的分析和数值工具
  • 批准号:
    2311354
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: Accurate and Structure-Preserving Numerical Schemes for Variable Temperature Phase Field Models and Efficient Solvers
合作研究:用于变温相场模型和高效求解器的精确且结构保持的数值方案
  • 批准号:
    2309548
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Robust and Efficient Numerical Methods for Wave Equations in the Time Domain: Nonlinear and Multiscale Problems
时域波动方程的鲁棒高效数值方法:非线性和多尺度问题
  • 批准号:
    2309687
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Efficient numerical methods for wave-action transport and scattering
波作用输运和散射的高效数值方法
  • 批准号:
    EP/W007436/1
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Research Grant
Efficient and well-balanced numerical methods for nonhydrostatic three-dimensional shallow flows with moving beds and boundaries
具有移动床和边界的非静水三维浅流的高效且平衡的数值方法
  • 批准号:
    RGPAS-2020-00102
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Accelerator Supplements
Efficient and well-balanced numerical methods for nonhydrostatic three-dimensional shallow flows with moving beds and boundaries
具有移动床和边界的非静水三维浅流的高效且平衡的数值方法
  • 批准号:
    RGPIN-2020-06278
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
Accurate and Efficient Computational Methods for the Numerical Solution of High-Dimensional Partial Differential Equations in Computational Finance
计算金融中高维偏微分方程数值解的准确高效计算方法
  • 批准号:
    569181-2022
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Postgraduate Scholarships - Doctoral
Novel experimental and numerical techniques for efficient earthquake safety assessment of critical dam infrastructure
用于对关键大坝基础设施进行有效地震安全评估的新颖实验和数值技术
  • 批准号:
    RGPIN-2017-06891
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了