Statistics of Sequence Comparison
序列比较统计
基本信息
- 批准号:8558094
- 负责人:
- 金额:$ 26.03万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:AgreementAlgorithmsAmino Acid SequenceAmino AcidsBehaviorConsensusDNA SequenceDataEvaluationGoalsGoldKnowledgeLengthMeasuresModelingPatternPeptide Sequence DeterminationPositioning AttributeProbabilityProcessProtein FamilyProtein Sequence AnalysisResearchRunningSamplingSequence AlignmentSumSystemTestingWorkabstractingcostdensityimprovedprogramsprotein profilingstatistics
项目摘要
The primary focus this year was on the assessment of substitution
scoring systems for aligning protein profiles to one another.
Pairwise protein sequence alignments are generally evaluated using
scores defined as the sum of "substitution scores" for aligning
amino acids to one another, and "gap scores" for aligning runs of
amino acids in one sequence to null characters inserted into the
other. Protein "profiles" may be abstracted from multiple alignments
of protein sequences, and substitution and gap scores have been
generalized to the alignment of such profiles either to single
sequences or to other profiles. Although there is widespread
agreement on the general form substitution scores should take for
profile-sequence alignment, little consensus has been reached on how
best to construct profile-profile substitution scores, and a large
number of these scoring systems have been proposed. We assessed
a variety of such substitution scores, using several sets of "gold
standard" multiple alignments. For our evaluation, we calculated
the probability that a profile column yields a higher substitution
score when aligned to a related than to an unrelated column. We
also considered the same measure applied to sets of two or three
adjacent columns. This simple approach had the advantages that it
did not depend primarily upon the gold standard alignment columns
with the weakest empirical support, and that it did not need to fit
gap and offset costs for use with each substitution cost studied.
No substitution scoring system emerges as superior in all our tests,
but two show consistently strong behavior: a generalization of
profile-sequence scores similar to those used in the Compass
alignment program, and the recently proposed Bayesian Integral
Log-odds (BILD) scores.
A secondary focus was on the issues related to the Dirichlet mixture
model, used to analyze protein sequences. The Dirichlet mixture model
was introduced to protein sequence analysis by a Haussler's group
at UCSC. In brief, this model imagines a particular position in a
protein family is described by a multinomial distribution on the set
of amino acids. Although the multinomial for a particular position
may be unique, the study of many protein families reveals that certain
regions of multinomial space are much more heavily populated than
others. This general knowledge may be summarized by a "Dirichlet
mixture prior", which is a probability density over multinomial
space that lends itself to easy analysis. Our research on Dirichlet
mixture priors this year centered on the question of how best to
derive such priors from a set of multiple alignment data. Our
previous work had applied the Minimum Description Length principle
and a Gibbs sampling algorithm to this problem. Work begun this
year applied the Dirichlet Process to this problem, which preliminary
results suggest leads to much improved mixtures with many more
components.
今年的主要重点是对替代品的评估
用于将蛋白质图谱相互比对的评分系统。
成对蛋白质序列比对通常使用
分数定义为用于比对的“替换分数”的总和
氨基酸之间的差异,以及用于比对氨基酸序列的“空位分数”。
将一个序列中的氨基酸替换为插入到
其他. 蛋白质“谱”可以从多重比对中提取
蛋白质序列,取代和差距分数已经
一般化到这样的轮廓的对准,或者到单个
序列或其他谱。 虽然有广泛的
就替代分数的一般形式达成一致,
序列比对,很少有共识已经达成了如何
最好的构造配置文件的替代分数,和一个大的
已经提出了许多这样的评分系统。 我们评估
各种各样的这种替代分数,使用几套“黄金
标准”多重比对。 为了评估,我们计算了
配置文件列产生更高替换的概率
与相关列对齐时的得分高于与不相关列对齐时的得分。 我们
也认为同样的措施适用于两个或三个集合
相邻列。 这种简单的方法具有以下优点:
并不主要依赖于黄金标准对齐列
最弱的经验支持,它不需要适合
与所研究的每个替代成本一起使用的差距和抵消成本。
在我们所有的测试中,没有一个替代评分系统是上级的,
但有两个表现出一贯的强烈行为:
轮廓序列分数类似于指南针中使用的分数
对齐程序,以及最近提出的贝叶斯积分
对数赔率(BILD)评分。
第二个焦点是与狄利克雷混合物有关的问题
模型,用于分析蛋白质序列。 Dirichlet混合模型
是由Haussler的小组引入蛋白质序列分析的
在UCSC。 简而言之,这个模型想象了一个特定的位置,
蛋白质家族由集合上的多项分布描述,
的氨基酸。 虽然对于特定位置的多项式
可能是独一无二的,对许多蛋白质家族的研究表明,某些
多项空间的区域人口密度比
他人 这一常识可以概括为“狄利克雷
混合先验”,这是一个概率密度超过多项式
空间,使其易于分析。 我们对狄利克雷的研究
今年的混合物先验集中在如何最好地
从一组多个比对数据中导出这样的先验。 我们
以前的工作应用了最小描述长度原则
和Gibbs抽样算法来解决这个问题。 这项工作始于
一年应用狄利克雷过程这个问题,这初步
结果表明,导致更好的混合物,
件.
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
STEPHEN F ALTSCHUL其他文献
STEPHEN F ALTSCHUL的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('STEPHEN F ALTSCHUL', 18)}}的其他基金
Improvements And Extensions To The Blast Algorithms
Blast 算法的改进和扩展
- 批准号:
6546809 - 财政年份:
- 资助金额:
$ 26.03万 - 项目类别:
Improvements And Extensions To The Blast Algorithms
Blast 算法的改进和扩展
- 批准号:
6843572 - 财政年份:
- 资助金额:
$ 26.03万 - 项目类别:
IMPROVEMENTS AND EXTENSIONS TO THE BLAST ALGORITHMS
Blast 算法的改进和扩展
- 批准号:
6432754 - 财政年份:
- 资助金额:
$ 26.03万 - 项目类别:
Improvements and Extensions to the BLAST Algorithms
BLAST 算法的改进和扩展
- 批准号:
9555732 - 财政年份:
- 资助金额:
$ 26.03万 - 项目类别:
相似海外基金
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 26.03万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 26.03万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 26.03万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 26.03万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 26.03万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 26.03万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 26.03万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 26.03万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 26.03万 - 项目类别:
Continuing Grant
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 26.03万 - 项目类别:
Research Grant














{{item.name}}会员




