Development and Maintenance of RepeatMasker and RepeatModeler
RepeatMasker和RepeatModeler的开发和维护
基本信息
- 批准号:10367846
- 负责人:
- 金额:$ 53.24万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-02-04 至 2027-01-31
- 项目状态:未结题
- 来源:
- 关键词:AgeAlgorithmsBiologyCalibrationClassificationCodeCommunitiesComplexComputer softwareConflict (Psychology)Consensus SequenceDNA Insertion ElementsDNA Transposable ElementsDNA TransposonsData SetDatabasesDetectionDevelopmentDistantElementsEnvironmentEventEvolutionExposure toFamilyFloodsFoundationsFunding AgencyGenerationsGenetic RecombinationGenomeGrowthHandHumanHuman ResourcesKnowledgeLibrariesLightMaintenanceMammalsMethodsModelingMutationOrganismPhylogenetic AnalysisPhylogenyProcessReportingResearchResolutionSelfish DNASensitivity and SpecificitySequence AlignmentSequence AnalysisSourceSpeedTandem Repeat SequencesTreesUpdateVertebratesWorkadjudicateadjudicationbasecluster computinggenome analysisgenome annotationimprovedmammalian genomemarkov modelnovelprogramsreconstructionsearch enginetooltrustworthinessusabilityvertebrate genome
项目摘要
Project Summary
Mammalian and most other eukaryotic genomes contain a large number of interspersed repeats (IRs), most of
which are copies of transposable elements (TEs) at varying levels of decay. Their presence complicates many
genome sequence analyses, but their accurate identification in an early analysis stage can reduce these
complications. In addition to their pervasiveness, over the last decades the research community has become
widely familiar with their enormous impact on genome activity and evolution.
Every species has been exposed to a unique, complex set of TEs leaving recognizable copies from as long
ago as 300 million years to as recently as the present day. These TEs are uncovered and reconstructed by de
novo discovery methods, often by our RepeatModeler tool, while their copies are then annotated by our
RepeatMasker software. De novo methods can create TE libraries at a reasonable pace, but the product is far
from the desired quality that can be reached by hand curation. With the recent explosive growth in sequenced
species, these finishing steps, perhaps never fully automatable, now form a severe bottleneck in genome
analyses due to a lack of manpower and expertise, while the results, especially when produced by different
methods from different research groups, lack consistency and suffer from redundancy. Furthermore, the
annotation of genomes for which high-quality libraries have been created is not keeping up with library
improvements due to the computational burden of re-analysis.
In this proposal, we describe a plan to refactor RepeatMasker by generalizing and improving TE alignment
adjudication, switching to a family-centric search strategy with support for incremental re-analysis, improving
annotation reporting and supporting cluster environments. Responding to the need for improved methods for
automated TE library generation we propose making significant changes to RepeatModeler’s core discovery
algorithms, develop a novel model extension tool, and. In addition, we will extend our novel methods for
exploiting multi-species alignments and ancestral reconstructions and utilize them to build a comprehensive
mammalian TE library.
项目摘要
哺乳动物和大多数其他真核生物基因组含有大量的散布重复序列(IR),大多数
其是不同衰变水平的转座因子(TE)的拷贝。他们的存在使许多人
基因组序列分析,但他们在早期分析阶段的准确识别可以减少这些
并发症除了他们的普遍性,在过去的几十年里,研究界已经成为
他们对基因组活动和进化的巨大影响非常熟悉。
每一个物种都暴露在一套独特的、复杂的TE中,
从三亿年前到现在这些TE被发现和重建的de
novo发现方法,通常由我们的RepeatModeler工具,而它们的副本则由我们的
RepeatMasker软件。从头方法可以以合理的速度创建TE库,但产品远远不够。
从所需的质量,可以达到手工策展。随着最近的爆炸性增长,
物种,这些完成步骤,也许永远不会完全自动化,现在形成了一个严重的瓶颈,基因组
由于缺乏人力和专门知识,分析结果,特别是由不同的专家提出的结果,
不同研究小组的方法缺乏一致性,而且存在冗余。而且
已经创建了高质量文库的基因组的注释没有跟上文库的发展。
由于重新分析的计算负担而导致的改进。
在这个提议中,我们描述了一个通过推广和改进TE对齐来重构RepeatMasker的计划
裁定,切换到以家庭为中心的搜索策略,支持增量重新分析,改善
注释报告和支持集群环境。针对改进方法的需要,
我们建议对RepeatModeler的核心发现进行重大更改
算法,开发了一种新的模型扩展工具,并。此外,我们将扩展我们的新方法,
利用多物种比对和祖先重建,并利用它们建立一个全面的
哺乳动物TE文库。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Robert MacDonald Hubley其他文献
Robert MacDonald Hubley的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Robert MacDonald Hubley', 18)}}的其他基金
Development and Maintenance of RepeatMasker and RepeatModeler
RepeatMasker和RepeatModeler的开发和维护
- 批准号:
10563214 - 财政年份:2022
- 资助金额:
$ 53.24万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
10165778 - 财政年份:2018
- 资助金额:
$ 53.24万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
10714226 - 财政年份:2018
- 资助金额:
$ 53.24万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
9764454 - 财政年份:2018
- 资助金额:
$ 53.24万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
10407543 - 财政年份:2018
- 资助金额:
$ 53.24万 - 项目类别:
相似海外基金
REU Site: Algorithms and Optimization for Sustainability and Biology
REU 网站:可持续性和生物学的算法和优化
- 批准号:
2243010 - 财政年份:2023
- 资助金额:
$ 53.24万 - 项目类别:
Standard Grant
Multi-resolution Molecular Dynamics Algorithms for Computational Biology
计算生物学的多分辨率分子动力学算法
- 批准号:
EP/V047469/1 - 财政年份:2021
- 资助金额:
$ 53.24万 - 项目类别:
Research Grant
Developing novel machine learning algorithms for network biology
为网络生物学开发新颖的机器学习算法
- 批准号:
RGPIN-2015-06751 - 财政年份:2020
- 资助金额:
$ 53.24万 - 项目类别:
Discovery Grants Program - Individual
Developing novel machine learning algorithms for network biology
为网络生物学开发新颖的机器学习算法
- 批准号:
RGPIN-2015-06751 - 财政年份:2019
- 资助金额:
$ 53.24万 - 项目类别:
Discovery Grants Program - Individual
Developing novel machine learning algorithms for network biology
为网络生物学开发新颖的机器学习算法
- 批准号:
RGPIN-2015-06751 - 财政年份:2018
- 资助金额:
$ 53.24万 - 项目类别:
Discovery Grants Program - Individual
Machine Learning Algorithms for Actionable Knowledge Discovery in Synthetic Biology
合成生物学中可操作知识发现的机器学习算法
- 批准号:
2132169 - 财政年份:2018
- 资助金额:
$ 53.24万 - 项目类别:
Studentship
AF: Medium: Collaborative Research: Sequential and Parallel Algorithms for Approximate Sequence Matching with Applications to Computational Biology
AF:媒介:协作研究:近似序列匹配的顺序和并行算法及其在计算生物学中的应用
- 批准号:
1704552 - 财政年份:2017
- 资助金额:
$ 53.24万 - 项目类别:
Standard Grant
Developing novel machine learning algorithms for network biology
为网络生物学开发新颖的机器学习算法
- 批准号:
RGPIN-2015-06751 - 财政年份:2017
- 资助金额:
$ 53.24万 - 项目类别:
Discovery Grants Program - Individual
Workshop on Future Directions for Algorithms in Biology
生物学算法未来方向研讨会
- 批准号:
1748493 - 财政年份:2017
- 资助金额:
$ 53.24万 - 项目类别:
Standard Grant
AF: Medium: Collaborative Research: Sequential and Parallel Algorithms for Approximate Sequence Matching with Applications to Computational Biology
AF:媒介:协作研究:近似序列匹配的顺序和并行算法及其在计算生物学中的应用
- 批准号:
1703489 - 财政年份:2017
- 资助金额:
$ 53.24万 - 项目类别:
Standard Grant