Interpreting function of non-coding sequences with synthetic biology and machine learning
用合成生物学和机器学习解释非编码序列的功能
基本信息
- 批准号:10417177
- 负责人:
- 金额:$ 3.08万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-07-01 至 2023-06-04
- 项目状态:已结题
- 来源:
- 关键词:AddressAffectAlgorithm DesignArchitectureBase SequenceBiological AssayBiological ModelsBiologyCell Culture TechniquesCell LineCell modelCellsCellular AssayDNA SequenceDataData SetDiseaseEngineeringGene ExpressionGenesGenomeHumanImageIn VitroIndividualLibrariesMachine LearningMethodsModelingPerformancePhotoreceptorsPhysiologicalProductionRecommendationRegulationReporterRetinaStructureSystemTestingTissuesTrainingUntranslated RNAValidationVariantcell typecellular engineeringcomputer frameworkdesignexperienceexperimental studyfunctional genomicsgenetic variantgenomic datahigh throughput screeningimprovedin vivomachine learning algorithmmachine learning modelpractical applicationprecision medicinesynthetic biologytranscription factor
项目摘要
PROJECT SUMMARY/ABSTRACT
Most disease-associated variants lie in non-coding regions of the genome and exert their influence through
effects on gene expression. However, we lack a predictive framework to interpret such non-coding variants,
limiting how genomic data is used in precision medicine. We may be able to interpret non-coding variants with
new machine learning algorithms, but so far the practical applications of machine learning in functional
genomics have been limited because of two major challenges. First, the size and diversity of training data sets
in functional genomics are orders of magnitude smaller than in applications where machine learning has been
successful, such as image recognition and product recommendation. A second challenge is that if training data
are not collected in an appropriate in vitro cellular model, then the resulting machine learning models may not
generalize to relevant in vivo cell types. To improve the application of machine learning to non-coding variants,
I propose to address both the limited size of training data sets and the efficacy of cell culture models.
A core principle of machine learning is that model performance improves with more data. In Aim 1, I propose to
increase the size and diversity of training data by performing iterative cycles of machine learning and
experimental validation with Massively Parallel Reporter Assays (MPRAs). The key aspect of my approach is to
algorithmically design each successive MPRA library to contain sequences that are most likely to improve the
next round of modeling. I recently trained my first model on data that I collected from MPRA experiments of
cis-regulatory sequences that function in mammalian photoreceptors. To avoid any issues with cell lines, I
performed these experiments in ex vivo developing retinas, which retain the appropriate tissue architecture.
However, unlike photoreceptors, most cell types are not experimentally tractable in their native physiological
context. Thus, it will be important to determine how well in vitro cell lines recapitulate in vivo cis-regulation. In
Aim 2, I propose to determine whether a tractable cell culture model can recapitulate results from ex vivo
retinas. I will use existing MPRA data from ex vivo retinas as a standard to compare against data collected in
cell lines engineered to express combinations of photoreceptor transcription factors. I aim to address whether
engineering tractable cell lines to express tissue-specific transcription factors might be a general approach for
collecting data to train machine learning models that generalize to in vivo systems. Successful completion of
these aims will produce a general approach to increase the size and diversity of functional genomic training
data, and may result in a general method for producing experimentally tractable systems for machine learning
applications, ultimately helping us better apply genomic data to precision medicine.
项目摘要/摘要
大多数与疾病相关的变异位于基因组的非编码区,并通过
对基因表达的影响。然而,我们缺乏一个预测框架来解释这种非编码变体,
限制基因组数据在精确医学中的使用。我们或许能够用以下命令解释非编码变体
新的机器学习算法,但到目前为止机器学习的实际应用在函数式
由于两大挑战,基因组学一直受到限制。首先,训练数据集的大小和多样性
在功能基因组学中的应用比机器学习一直以来的应用要小几个数量级
成功,如图像识别和产品推荐。第二个挑战是,如果训练数据
没有在适当的体外细胞模型中收集,那么得到的机器学习模型可能不会
概括为相关的体内细胞类型。为了改进机器学习对非编码变体的应用,
我建议同时解决训练数据集的有限大小和细胞培养模型的有效性。
机器学习的一个核心原则是,随着数据的增加,模型的性能会提高。在目标1中,我建议
通过执行机器学习的迭代循环和
大规模平行报道者分析(MPRAS)的实验验证。我的方法的关键方面是
通过算法设计每个连续的MPRA文库,以包含最有可能改善
下一轮模特。我最近用我从MPRA实验中收集的数据来训练我的第一个模型
哺乳动物光感受器中起作用的顺式调节序列。为了避免细胞系出现任何问题,我
在体外发育中的视网膜上进行了这些实验,这些视网膜保留了适当的组织结构。
然而,与光感受器不同的是,大多数细胞类型在其天然生理状态下不是实验上可驯化的。
背景。因此,重要的是要确定体外细胞系在体内的顺式调控能力有多好。在……里面
目的2,我建议确定一个易于处理的细胞培养模型是否可以概括体外实验的结果。
视网膜。我将使用现有的来自体外视网膜的MPRA数据作为标准,与在
细胞系被设计来表达光感受器转录因子的组合。我的目标是解决是否
设计易处理的细胞系来表达组织特异性转录因子可能是一种通用的方法
收集数据以训练推广到活体系统的机器学习模型。成功完成
这些目标将产生一种增加功能基因组训练的规模和多样性的一般方法
数据,并可能导致产生用于机器学习的实验上易于处理的系统的一般方法
应用,最终帮助我们更好地将基因组数据应用于精确医学。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Information content differentiates enhancers from silencers in mouse photoreceptors.
- DOI:10.7554/elife.67403
- 发表时间:2021-09-06
- 期刊:
- 影响因子:7.7
- 作者:Friedman RZ;Granas DM;Myers CA;Corbo JC;Cohen BA;White MA
- 通讯作者:White MA
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ryan Zachary Friedman其他文献
Ryan Zachary Friedman的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ryan Zachary Friedman', 18)}}的其他基金
Interpreting function of non-coding sequences with synthetic biology and machine learning
用合成生物学和机器学习解释非编码序列的功能
- 批准号:
10177882 - 财政年份:2020
- 资助金额:
$ 3.08万 - 项目类别:
Interpreting function of non-coding sequences with synthetic biology and machine learning
用合成生物学和机器学习解释非编码序列的功能
- 批准号:
10065897 - 财政年份:2020
- 资助金额:
$ 3.08万 - 项目类别:
相似海外基金
RII Track-4:NSF: From the Ground Up to the Air Above Coastal Dunes: How Groundwater and Evaporation Affect the Mechanism of Wind Erosion
RII Track-4:NSF:从地面到沿海沙丘上方的空气:地下水和蒸发如何影响风蚀机制
- 批准号:
2327346 - 财政年份:2024
- 资助金额:
$ 3.08万 - 项目类别:
Standard Grant
BRC-BIO: Establishing Astrangia poculata as a study system to understand how multi-partner symbiotic interactions affect pathogen response in cnidarians
BRC-BIO:建立 Astrangia poculata 作为研究系统,以了解多伙伴共生相互作用如何影响刺胞动物的病原体反应
- 批准号:
2312555 - 财政年份:2024
- 资助金额:
$ 3.08万 - 项目类别:
Standard Grant
How Does Particle Material Properties Insoluble and Partially Soluble Affect Sensory Perception Of Fat based Products
不溶性和部分可溶的颗粒材料特性如何影响脂肪基产品的感官知觉
- 批准号:
BB/Z514391/1 - 财政年份:2024
- 资助金额:
$ 3.08万 - 项目类别:
Training Grant
Graduating in Austerity: Do Welfare Cuts Affect the Career Path of University Students?
紧缩毕业:福利削减会影响大学生的职业道路吗?
- 批准号:
ES/Z502595/1 - 财政年份:2024
- 资助金额:
$ 3.08万 - 项目类别:
Fellowship
感性個人差指標 Affect-X の構築とビスポークAIサービスの基盤確立
建立个人敏感度指数 Affect-X 并为定制人工智能服务奠定基础
- 批准号:
23K24936 - 财政年份:2024
- 资助金额:
$ 3.08万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Insecure lives and the policy disconnect: How multiple insecurities affect Levelling Up and what joined-up policy can do to help
不安全的生活和政策脱节:多种不安全因素如何影响升级以及联合政策可以提供哪些帮助
- 批准号:
ES/Z000149/1 - 财政年份:2024
- 资助金额:
$ 3.08万 - 项目类别:
Research Grant
How does metal binding affect the function of proteins targeted by a devastating pathogen of cereal crops?
金属结合如何影响谷类作物毁灭性病原体靶向的蛋白质的功能?
- 批准号:
2901648 - 财政年份:2024
- 资助金额:
$ 3.08万 - 项目类别:
Studentship
ERI: Developing a Trust-supporting Design Framework with Affect for Human-AI Collaboration
ERI:开发一个支持信任的设计框架,影响人类与人工智能的协作
- 批准号:
2301846 - 财政年份:2023
- 资助金额:
$ 3.08万 - 项目类别:
Standard Grant
Investigating how double-negative T cells affect anti-leukemic and GvHD-inducing activities of conventional T cells
研究双阴性 T 细胞如何影响传统 T 细胞的抗白血病和 GvHD 诱导活性
- 批准号:
488039 - 财政年份:2023
- 资助金额:
$ 3.08万 - 项目类别:
Operating Grants
How motor impairments due to neurodegenerative diseases affect masticatory movements
神经退行性疾病引起的运动障碍如何影响咀嚼运动
- 批准号:
23K16076 - 财政年份:2023
- 资助金额:
$ 3.08万 - 项目类别:
Grant-in-Aid for Early-Career Scientists