Computing Patterns in Strings
字符串中的计算模式
基本信息
- 批准号:RGPIN-2017-04691
- 负责人:
- 金额:$ 1.68万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2018
- 资助国家:加拿大
- 起止时间:2018-01-01 至 2019-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A string is a sequence of symbols, usually called letters, drawn from some alphabet. The Bible can be thought of as a string, about two million positions long, on an alphabet of English letters, integers and punctuation symbols; the genome of every living thing can be thought of as a string, whose length is usually in the billions, on a four-letter DNA alphabet (a,c,g,t); a bit stream transmitted from space is a string, perhaps trillions of positions long, on alphabet (0,1). For literary, biological or military research, the patterns in these strings are fundamental: Where does a certain phrase recur in the Bible? How do repeated DNA segments in the genome indicate susceptibility to Parkinson's disease? What clues do certain recurring bit patterns provide about coded segments of the electronic transmission?******In 1975 there were a few dozen researchers in string algorithms round the world; now there are surely many thousands, a result of the widespread use of computers for information storage and a huge upsurge in computational biology. For the last 15 years there have been two main themes of my research: indeterminate strings and the computation of regularities.******In a DNA sequence it may be unclear whether a given entry is a or c, and so the indeterminate symbol {a,c} is used. We could then say that {a,c} matches another symbol {c,g} which in turn matches {g,t} -- but {a,c} certainly does not match {g,t}! This seemingly innocuous difficulty, the nontransitivity of matching, makes the processing of indeterminate strings much more difficult. Thus a combinatorial understanding of indeterminate strings becomes essential to the development of efficient methods for their processing.******With indeterminate strings, as with ordinary ones, the main task is the recognition/computation of patterns called regularities. For example, the string acaacaa has period 3, since positions i and i+3 are always the same; at the same time, even though acaacaca is not periodic, it nevertheless has a cover aca, since an occurrence of aca covers every position. These regularities, and many others, are fundamental to the calculation of the patterns in strings that provide us with the understanding we seek about patterns in the real world. In a DNA sequence, a single letter that breaks the pattern can mark a crucial genomic vulnerability or advantage.******For 15 years, much of my research has embraced these two themes. Using insights based on mathematical analysis, I seek to identify regularities, especially in indeterminate strings, that have real-world significance, and I try to design methods ("algorithms") to compute them quickly, even for string lengths in the billions or trillions. For example: patterns in DNA that indicate susceptibility to specific diseases; coincidence of terminology in terabytes of Internet text that indicate meaning or topic; recurring appropriate bit patterns that suggest a coded message. Many patterns, many applications!*****
字符串是一系列符号,通常称为字母,从一些字母表中提取。 圣经可以被看作是一个字符串,大约有200万个位置长,在一个由英文字母、整数和标点符号组成的字母表上;每一个生物的基因组可以被看作是一个字符串,它的长度通常是数十亿,在一个由四个字母组成的DNA字母表(a,c,g,t)上;从太空传输的比特流是一个字符串,也许有数万亿个位置长,在字母表(0,1)上。 对于文学、生物学或军事研究来说,这些字符串中的模式是基本的:圣经中的某个短语在哪里重现? 基因组中重复的DNA片段如何表明对帕金森病的易感性? 某些重复出现的比特模式为电子传输的编码段提供了什么线索?* 1975年,全世界只有几十个研究弦算法的人;现在肯定有几千个了,这是计算机广泛用于信息存储和计算生物学兴起的结果。 在过去的15年里,我的研究有两个主要主题:不定字符串和字符串计算。在DNA序列中,可能不清楚给定条目是a还是c,因此使用不确定符号{a,c}。 然后我们可以说{a,c}匹配另一个符号{c,g},后者又匹配{g,t} --但是{a,c}肯定不匹配{g,t}! 这种看似无害的困难,即匹配的非传递性,使得处理不确定字符串变得更加困难。 因此,对不确定字符串的组合理解对于开发处理它们的有效方法变得至关重要。对于不确定的字符串,和普通字符串一样,主要任务是识别/计算被称为“模式”的模式。 例如,字符串acaacaa的周期为3,因为位置i和i+3总是相同的;同时,即使acaacaca不是周期的,它仍然有一个覆盖aca,因为aca的出现覆盖了每个位置。 这些符号,以及其他许多符号,是计算弦中模式的基础,为我们提供了对真实的世界中模式的理解。 在DNA序列中,一个打破模式的字母可以标志着一个关键的基因组弱点或优势。15年来,我的大部分研究都围绕着这两个主题。 使用基于数学分析的见解,我试图识别具有现实意义的不确定字符串,特别是在不确定字符串中,我试图设计方法(“算法”)来快速计算它们,即使是数十亿或数万亿的字符串长度。 举例来说:表明对特定疾病易感性的DNA模式;表明含义或主题的互联网文本中TB级术语的重合;表明编码信息的重复出现的适当位模式。 多种模式,多种应用!*
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Smyth, William其他文献
Smyth, William的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Smyth, William', 18)}}的其他基金
Computing Patterns in Strings
字符串中的计算模式
- 批准号:
RGPIN-2017-04691 - 财政年份:2021
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Computing Patterns in Strings
字符串中的计算模式
- 批准号:
RGPIN-2017-04691 - 财政年份:2020
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Computing Patterns in Strings
字符串中的计算模式
- 批准号:
RGPIN-2017-04691 - 财政年份:2019
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Computing Patterns in Strings
字符串中的计算模式
- 批准号:
RGPIN-2017-04691 - 财政年份:2017
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
"Regularities in Strings: New Combinatorial Properties, More Efficient Algorithms, Applications"
“字符串中的规则:新的组合属性、更高效的算法、应用程序”
- 批准号:
8180-2012 - 财政年份:2016
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
"Regularities in Strings: New Combinatorial Properties, More Efficient Algorithms, Applications"
“字符串中的规则:新的组合属性、更高效的算法、应用程序”
- 批准号:
8180-2012 - 财政年份:2015
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
"Regularities in Strings: New Combinatorial Properties, More Efficient Algorithms, Applications"
“字符串中的规则:新的组合属性、更高效的算法、应用程序”
- 批准号:
8180-2012 - 财政年份:2014
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
"Regularities in Strings: New Combinatorial Properties, More Efficient Algorithms, Applications"
“字符串中的规则:新的组合属性、更高效的算法、应用程序”
- 批准号:
8180-2012 - 财政年份:2013
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
"Regularities in Strings: New Combinatorial Properties, More Efficient Algorithms, Applications"
“字符串中的规则:新的组合属性、更高效的算法、应用程序”
- 批准号:
8180-2012 - 财政年份:2012
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Improved algorithms on strings
改进的字符串算法
- 批准号:
8180-2007 - 财政年份:2011
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
相似海外基金
Spatiotemporal dynamics of acetylcholine activity in adaptive behaviors and response patterns
适应性行为和反应模式中乙酰胆碱活性的时空动态
- 批准号:
24K10485 - 财政年份:2024
- 资助金额:
$ 1.68万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Collaborative Research: Unraveling the phylogenetic and evolutionary patterns of fragmented mitochondrial genomes in parasitic lice
合作研究:揭示寄生虱线粒体基因组片段的系统发育和进化模式
- 批准号:
2328117 - 财政年份:2024
- 资助金额:
$ 1.68万 - 项目类别:
Standard Grant
Uncovering the evolutionary patterns of the Aculeata stinger
揭示 Aculeata 毒刺的进化模式
- 批准号:
24K18174 - 财政年份:2024
- 资助金额:
$ 1.68万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Illuminating patterns and processes of water quality in U.S. rivers using physics-guided deep learning
使用物理引导的深度学习阐明美国河流的水质模式和过程
- 批准号:
2346471 - 财政年份:2024
- 资助金额:
$ 1.68万 - 项目类别:
Continuing Grant
Collaborative Research: Can Irregular Structural Patterns Beat Perfect Lattices? Biomimicry for Optimal Acoustic Absorption
合作研究:不规则结构模式能否击败完美晶格?
- 批准号:
2341950 - 财政年份:2024
- 资助金额:
$ 1.68万 - 项目类别:
Standard Grant
Collaborative Research: Unraveling the phylogenetic and evolutionary patterns of fragmented mitochondrial genomes in parasitic lice
合作研究:揭示寄生虱线粒体基因组片段的系统发育和进化模式
- 批准号:
2328119 - 财政年份:2024
- 资助金额:
$ 1.68万 - 项目类别:
Standard Grant
Effects of Environmental Change on Microbial Self-organized Patterns in Antarctic Lakes
环境变化对南极湖泊微生物自组织模式的影响
- 批准号:
2333917 - 财政年份:2024
- 资助金额:
$ 1.68万 - 项目类别:
Standard Grant
AGS-PRF: Understanding Historical Trends in Tropical Pacific Sea Surface Temperature Patterns
AGS-PRF:了解热带太平洋海面温度模式的历史趋势
- 批准号:
2317224 - 财政年份:2024
- 资助金额:
$ 1.68万 - 项目类别:
Fellowship Award
Asymptotic patterns and singular limits in nonlinear evolution problems
非线性演化问题中的渐近模式和奇异极限
- 批准号:
EP/Z000394/1 - 财政年份:2024
- 资助金额:
$ 1.68万 - 项目类别:
Research Grant
Collaborative Research: Linking carbon preferences and competition to predict and test patterns of functional diversity in soil microbial communities
合作研究:将碳偏好和竞争联系起来,预测和测试土壤微生物群落功能多样性的模式
- 批准号:
2312302 - 财政年份:2024
- 资助金额:
$ 1.68万 - 项目类别:
Standard Grant