Novel Methods for Large Scale Presence only Data in Biological Systems Engineering

生物系统工程中大规模仅存在数据的新方法

基本信息

  • 批准号:
    9975868
  • 负责人:
  • 金额:
    $ 14.58万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-09-01 至 2024-01-31
  • 项目状态:
    已结题

项目摘要

This proposal will develop a number of novel statistical tools for learning genotype-phenotype mappings from experimental data. Massive genotype-phenotype data sets can be generated by genetic diversification, followed by high-throughput screening/selection and next-generation DNA sequencing of functionally-distinct populations. The resulting data presents new and interesting statistical challenges including large numbers of examples, presence-only responses, and noisy/missing data. Presence-only responses arise because most high-throughput screening/selection methods isolate only functional examples (positive responses), while non-functional examples (negatives) are difficult or impossible to obtain. The resulting data sets contain the initial unlabelled variant library and positive examples. The modeling tools developed in this proposal apply to all levels of biological organization spanning from molecules to ecosystems. The novel statistical methods developed in this proposal will model the relationships between protein sequence, structure, and function, with the goal of gaining insight into biochemical mechanisms and designing new and useful proteins. This proposal will (i) develop new theory and tools to analyze the large quantities of protein sequence­ function data that are being generated by emerging high-throughput methods; (ii) address challenges associated with positive-unlabeled (PU) learning, extremely large data size, low- quality/missing data, and (iii) encoding side information from existing databases or physical models. Furthermore, applying the methods and algorithms developed in this work will generate novel scientific insights and engineered biological systems.
这一建议将开发一些新的统计工具,用于从实验数据中学习基因-表型映射。通过基因多样化可以产生大量的基因-表型数据集,然后是高通量筛选/选择和功能不同群体的下一代DNA测序。由此产生的数据提出了新的有趣的统计挑战,包括大量的示例、仅存在的响应以及噪声/丢失的数据。由于大多数高通量筛选/选择方法只分离功能示例(阳性响应),而非功能示例(阴性)很难或不可能获得,因此出现了仅存在响应。所得到的数据集包含初始的未标记的变异库和正例。本提案中开发的建模工具适用于从分子到生态系统的所有级别的生物组织。在这项建议中开发的新的统计方法将对蛋白质序列、结构和功能之间的关系进行建模,目的是深入了解生化机制并设计新的有用的蛋白质。 这项提议将(I)开发新的理论和工具来分析由新兴的高通量方法产生的大量蛋白质序列功能数据;(Ii)解决与正向无标记(PU)学习、极大的数据量、低质量/缺失数据以及 (3)对现有数据库或物理模型中的辅助信息进行编码。此外,应用在这项工作中开发的方法和算法将产生新的科学见解和工程生物系统。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
PUlasso: High-Dimensional Variable Selection With Presence-Only Data
PUlasso:仅存在数据的高维变量选择
The bias of isotonic regression
  • DOI:
    10.1214/20-ejs1677
  • 发表时间:
    2020-01-01
  • 期刊:
  • 影响因子:
    1.1
  • 作者:
    Dai, Ran;Song, Hyebin;Raskutti, Garvesh
  • 通讯作者:
    Raskutti, Garvesh
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

GARVESH RASKUTTI其他文献

GARVESH RASKUTTI的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
  • 批准号:
    EP/Y029089/1
  • 财政年份:
    2024
  • 资助金额:
    $ 14.58万
  • 项目类别:
    Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
  • 批准号:
    2337776
  • 财政年份:
    2024
  • 资助金额:
    $ 14.58万
  • 项目类别:
    Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
  • 批准号:
    2338816
  • 财政年份:
    2024
  • 资助金额:
    $ 14.58万
  • 项目类别:
    Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
  • 批准号:
    2338846
  • 财政年份:
    2024
  • 资助金额:
    $ 14.58万
  • 项目类别:
    Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
  • 批准号:
    2348261
  • 财政年份:
    2024
  • 资助金额:
    $ 14.58万
  • 项目类别:
    Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
  • 批准号:
    2348346
  • 财政年份:
    2024
  • 资助金额:
    $ 14.58万
  • 项目类别:
    Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
  • 批准号:
    2348457
  • 财政年份:
    2024
  • 资助金额:
    $ 14.58万
  • 项目类别:
    Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
  • 批准号:
    2404989
  • 财政年份:
    2024
  • 资助金额:
    $ 14.58万
  • 项目类别:
    Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
  • 批准号:
    2339310
  • 财政年份:
    2024
  • 资助金额:
    $ 14.58万
  • 项目类别:
    Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
  • 批准号:
    2339669
  • 财政年份:
    2024
  • 资助金额:
    $ 14.58万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了