COMPUTATIONAL LEARNING & DISCOVERY FOR BIOLOGICAL SEQUENCE, STRUCTURE, FUNCTION
计算学习
基本信息
- 批准号:7369285
- 负责人:
- 金额:$ 0.12万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2006
- 资助国家:美国
- 起止时间:2006-07-01 至 2007-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. Seven focus areas in the realm of protein structure have been identified for application of the language analogy approach. These focus areas are: protein folding, conformational changes, protein-protein interactions, protein/gene networks and pathways, secondary structure and repetitive folds prediction and segmentation, protein family classification, and genome comparison. The ultimate goal is to develop linguistic models for each that are capable of advancing the understanding of these areas. The protocol followed in this process consists of several steps. The first step is to utilize existing ¿benchmark ¿ datasets or to define datasets suitable for training and testing of these models. As controls, existing approaches in the focus areas, if available, are studied and a scheme is designed for evaluating the language model approaches and comparing them to existing other approaches. The next step is to implement our language approach. This implementation initially needs to meet one or both of two requirements: (i) the system has to perform equally well or better than existing systems as defined in step 2 and/or (ii) it needs to provide interpretable biological hypotheses. For example, a neural network might be the algorithm with best performance in a classification task, but the underlying features resulting in this performance can be unclear. A language-based approach that might have lesser performance but allows the researcher to analyze the types of features that result in successful classification can be used to build hypotheses on the fundamental building blocks of protein sequence language. The final step in the protocol is to design and carry out experiments that specifically test these hypotheses. The following systems have been chosen as experimental test cases for the language models: G protein coupled receptors (GPCR) such as rhodopsin, metabotropic glutamate receptors, epidermal growth factor receptor, viral tailspike protein, virus infection process, peptide n-grams. For each of the seven focus areas, we are working to identify or develop benchmark datasets for training and testing of linguistic models. Students and postdoctoral fellows participate in all aspects of the projects.
这个子项目是利用由NIH/NCRR资助的中心拨款提供的资源的许多研究子项目之一。子项目和调查员(PI)可能从另一个NIH来源获得了主要资金,因此可能会出现在其他CRISE条目中。列出的机构是针对中心的,而不一定是针对调查员的机构。蛋白质结构领域的七个重点领域已被确定为语言类比方法的应用。这些重点领域是:蛋白质折叠、构象变化、蛋白质-蛋白质相互作用、蛋白质/基因网络和途径、二级结构和重复折叠的预测和分割、蛋白质家族分类和基因组比较。最终目标是为每个人开发能够促进对这些领域的理解的语言模型。此过程中遵循的协议由几个步骤组成。第一步是利用现有的基准数据集或定义适用于这些模型的训练和测试的数据集。作为对照,研究了重点领域中的现有方法,如果有的话,并设计了一种方案来评估语言模型方法并将它们与现有的其他方法进行比较。下一步是实现我们的语言方法。这种实施最初需要满足两个要求中的一个或两个:(I)系统必须与步骤2中定义的现有系统一样好或更好,和/或(Ii)它需要提供可解释的生物学假设。例如,神经网络可能是分类任务中性能最好的算法,但导致这种性能的基本特征可能不清楚。一种基于语言的方法可能具有较低的性能,但允许研究人员分析导致成功分类的特征类型,可以用于在蛋白质序列语言的基本构建块上建立假设。该协议的最后一步是设计和进行专门测试这些假设的实验。选择下列系统作为语言模型的实验测试用例:G蛋白偶联受体(GPCR),如视紫红质、代谢性谷氨酸受体、表皮生长因子受体、病毒尾尖蛋白、病毒感染过程、肽n-gram。对于七个重点领域中的每一个领域,我们都在努力确定或开发用于培训和测试语言模型的基准数据集。学生和博士后研究员参与项目的各个方面。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
RAJ REDDY其他文献
RAJ REDDY的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('RAJ REDDY', 18)}}的其他基金
COMPUTATIONAL LEARNING & DISCOVERY FOR BIOLOGICAL SEQUENCE, STRUCTURE, FUNCTION
计算学习
- 批准号:
7723019 - 财政年份:2008
- 资助金额:
$ 0.12万 - 项目类别:
COMPUTATIONAL LEARNING & DISCOVERY FOR BIOLOGICAL SEQUENCE, STRUCTURE, FUNCTION
计算学习
- 批准号:
7602013 - 财政年份:2007
- 资助金额:
$ 0.12万 - 项目类别:
COMPUTATIONAL LEARNING & DISCOVERY FOR BIOLOGICAL SEQUENCE, STRUCTURE, FUNCTION
计算学习
- 批准号:
7182240 - 财政年份:2005
- 资助金额:
$ 0.12万 - 项目类别:
COMPUTATIONAL LEARNING & DISCOVERY FOR BIOLOGICAL SEQUENCE, STRUCTURE, FUNCTION
计算学习
- 批准号:
6978546 - 财政年份:2004
- 资助金额:
$ 0.12万 - 项目类别:
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Understanding structural evolution of galaxies with machine learning
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
煤矿安全人机混合群智感知任务的约束动态多目标Q-learning进化分配
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于领弹失效考量的智能弹药编队短时在线Q-learning协同控制机理
- 批准号:62003314
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
集成上下文张量分解的e-learning资源推荐方法研究
- 批准号:61902016
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
具有时序迁移能力的Spiking-Transfer learning (脉冲-迁移学习)方法研究
- 批准号:61806040
- 批准年份:2018
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
基于Deep-learning的三江源区冰川监测动态识别技术研究
- 批准号:51769027
- 批准年份:2017
- 资助金额:38.0 万元
- 项目类别:地区科学基金项目
具有时序处理能力的Spiking-Deep Learning(脉冲深度学习)方法研究
- 批准号:61573081
- 批准年份:2015
- 资助金额:64.0 万元
- 项目类别:面上项目
基于有向超图的大型个性化e-learning学习过程模型的自动生成与优化
- 批准号:61572533
- 批准年份:2015
- 资助金额:66.0 万元
- 项目类别:面上项目
E-Learning中学习者情感补偿方法的研究
- 批准号:61402392
- 批准年份:2014
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Combined Machine Learning and Computational Chemistry Guided Discovery of Chevrel Phases for Electrocatalytic CO2 Reduction
机器学习和计算化学相结合引导发现 Chevrel 相用于电催化 CO2 还原
- 批准号:
2016225 - 财政年份:2020
- 资助金额:
$ 0.12万 - 项目类别:
Standard Grant
EAGER: Thermal Materials Discovery via Deep Learning based High-Throughput Computational Screening
EAGER:通过基于深度学习的高通量计算筛选来发现热材料
- 批准号:
1905775 - 财政年份:2019
- 资助金额:
$ 0.12万 - 项目类别:
Standard Grant
COMPUTATIONAL LEARNING & DISCOVERY FOR BIOLOGICAL SEQUENCE, STRUCTURE, FUNCTION
计算学习
- 批准号:
7723019 - 财政年份:2008
- 资助金额:
$ 0.12万 - 项目类别:
COMPUTATIONAL LEARNING & DISCOVERY FOR BIOLOGICAL SEQUENCE, STRUCTURE, FUNCTION
计算学习
- 批准号:
7602013 - 财政年份:2007
- 资助金额:
$ 0.12万 - 项目类别:
COMPUTATIONAL LEARNING & DISCOVERY FOR BIOLOGICAL SEQUENCE, STRUCTURE, FUNCTION
计算学习
- 批准号:
7182240 - 财政年份:2005
- 资助金额:
$ 0.12万 - 项目类别:
COMPUTATIONAL LEARNING & DISCOVERY FOR BIOLOGICAL SEQUENCE, STRUCTURE, FUNCTION
计算学习
- 批准号:
6978546 - 财政年份:2004
- 资助金额:
$ 0.12万 - 项目类别:
ITR: Collaborative Research - Computational Learning and Discovery in Biological Sequence, Structure and Function Mapping
ITR:协作研究 - 生物序列、结构和功能绘图中的计算学习和发现
- 批准号:
0225609 - 财政年份:2002
- 资助金额:
$ 0.12万 - 项目类别:
Continuing Grant
ITR: Collaborative Research: Computational Learning and Discovery in Biological Sequence, Structure and Function Mapping
ITR:协作研究:生物序列、结构和功能绘图中的计算学习和发现
- 批准号:
0225636 - 财政年份:2002
- 资助金额:
$ 0.12万 - 项目类别:
Continuing Grant
ITR: Computational Learning and Discovery in Biological Sequence, Structure and Function Mapping
ITR:生物序列、结构和功能绘图中的计算学习和发现
- 批准号:
0225607 - 财政年份:2002
- 资助金额:
$ 0.12万 - 项目类别:
Continuing Grant
Computational Learning and Discovery in Biological Sequence, Structure and Function Mapping
生物序列、结构和功能绘图中的计算学习和发现
- 批准号:
0225656 - 财政年份:2002
- 资助金额:
$ 0.12万 - 项目类别:
Continuing Grant