Development and Maintenance of RepeatMasker and RepeatModeler
RepeatMasker和RepeatModeler的开发和维护
基本信息
- 批准号:10563214
- 负责人:
- 金额:$ 58.82万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-02-04 至 2027-01-31
- 项目状态:未结题
- 来源:
- 关键词:AgeAlgorithmsBiologyCalibrationClassificationCodeCommunitiesComplexComputer softwareConsensus SequenceDNA Insertion ElementsDNA Transposable ElementsDNA TransposonsData SetDatabasesDetectionDevelopmentDistantElementsEnvironmentEventEvolutionExposure toFamilyFunding AgencyGenerationsGenomeGrowthHandHumanHuman ResourcesKnowledgeLibrariesLightMaintenanceMammalsMethodsModelingMutationOrganismPhylogenetic AnalysisPhylogenyProcessReportingResearchResolutionSelfish DNASensitivity and SpecificitySequence AlignmentSequence AnalysisShapesSourceSpeedTandem Repeat SequencesTreesUpdateVertebratesWorkadjudicationcluster computingconflict resolutiongenome analysisgenome annotationimprovedmammalian genomemarkov modelnovelprogramsreconstructionsearch enginetooltrustworthinessusabilityvertebrate genome
项目摘要
Project Summary
Mammalian and most other eukaryotic genomes contain a large number of interspersed repeats (IRs), most of
which are copies of transposable elements (TEs) at varying levels of decay. Their presence complicates many
genome sequence analyses, but their accurate identification in an early analysis stage can reduce these
complications. In addition to their pervasiveness, over the last decades the research community has become
widely familiar with their enormous impact on genome activity and evolution.
Every species has been exposed to a unique, complex set of TEs leaving recognizable copies from as long
ago as 300 million years to as recently as the present day. These TEs are uncovered and reconstructed by de
novo discovery methods, often by our RepeatModeler tool, while their copies are then annotated by our
RepeatMasker software. De novo methods can create TE libraries at a reasonable pace, but the product is far
from the desired quality that can be reached by hand curation. With the recent explosive growth in sequenced
species, these finishing steps, perhaps never fully automatable, now form a severe bottleneck in genome
analyses due to a lack of manpower and expertise, while the results, especially when produced by different
methods from different research groups, lack consistency and suffer from redundancy. Furthermore, the
annotation of genomes for which high-quality libraries have been created is not keeping up with library
improvements due to the computational burden of re-analysis.
In this proposal, we describe a plan to refactor RepeatMasker by generalizing and improving TE alignment
adjudication, switching to a family-centric search strategy with support for incremental re-analysis, improving
annotation reporting and supporting cluster environments. Responding to the need for improved methods for
automated TE library generation we propose making significant changes to RepeatModeler’s core discovery
algorithms, develop a novel model extension tool, and. In addition, we will extend our novel methods for
exploiting multi-species alignments and ancestral reconstructions and utilize them to build a comprehensive
mammalian TE library.
项目概要
哺乳动物和大多数其他真核生物基因组包含大量散在重复序列(IR),大多数
它们是不同衰变水平的转座因子(TE)的副本。他们的存在使许多事情变得复杂
基因组序列分析,但在早期分析阶段准确识别可以减少这些
并发症。除了它们的普遍性之外,在过去的几十年里,研究界已经变得
广泛熟悉它们对基因组活动和进化的巨大影响。
每个物种都接触过一组独特、复杂的 TE,从很长一段时间以来都留下了可识别的副本
从三亿年前到今天。这些 TE 是由 de 发现并重建的。
novo 发现方法,通常是通过我们的 RepeatModeler 工具,而它们的副本则由我们的注释
RepeatMasker 软件。 De novo 方法可以以合理的速度创建 TE 库,但产品还远远不够。
通过手工管理可以达到所需的质量。随着最近测序的爆炸性增长
对于物种来说,这些整理步骤也许永远无法完全自动化,现在形成了基因组的严重瓶颈
由于缺乏人力和专业知识而进行分析,而结果,特别是当由不同的人产生时
不同研究小组的方法缺乏一致性并且存在冗余。此外,
已创建高质量文库的基因组注释未能跟上文库的步伐
由于重新分析的计算负担而有所改进。
在这个提案中,我们描述了一个通过推广和改进 TE 对齐来重构 RepeatMasker 的计划
裁决,转向以家庭为中心的搜索策略,支持增量重新分析,改进
注释报告和支持集群环境。满足改进方法的需要
自动化 TE 库生成 我们建议对 RepeatModeler 的核心发现进行重大更改
算法,开发一种新颖的模型扩展工具,以及。此外,我们将扩展我们的新方法
利用多物种排列和祖先重建,并利用它们建立一个全面的
哺乳动物 TE 库。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Robert MacDonald Hubley其他文献
Robert MacDonald Hubley的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Robert MacDonald Hubley', 18)}}的其他基金
Development and Maintenance of RepeatMasker and RepeatModeler
RepeatMasker和RepeatModeler的开发和维护
- 批准号:
10367846 - 财政年份:2022
- 资助金额:
$ 58.82万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
10165778 - 财政年份:2018
- 资助金额:
$ 58.82万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
10714226 - 财政年份:2018
- 资助金额:
$ 58.82万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
9764454 - 财政年份:2018
- 资助金额:
$ 58.82万 - 项目类别:
Dfam: sustainable growth, curation support, and improved quality for mobile element annotation
Dfam:可持续增长、管理支持和移动元素注释质量的提高
- 批准号:
10407543 - 财政年份:2018
- 资助金额:
$ 58.82万 - 项目类别:
相似海外基金
REU Site: Algorithms and Optimization for Sustainability and Biology
REU 网站:可持续性和生物学的算法和优化
- 批准号:
2243010 - 财政年份:2023
- 资助金额:
$ 58.82万 - 项目类别:
Standard Grant
Multi-resolution Molecular Dynamics Algorithms for Computational Biology
计算生物学的多分辨率分子动力学算法
- 批准号:
EP/V047469/1 - 财政年份:2021
- 资助金额:
$ 58.82万 - 项目类别:
Research Grant
Developing novel machine learning algorithms for network biology
为网络生物学开发新颖的机器学习算法
- 批准号:
RGPIN-2015-06751 - 财政年份:2020
- 资助金额:
$ 58.82万 - 项目类别:
Discovery Grants Program - Individual
Developing novel machine learning algorithms for network biology
为网络生物学开发新颖的机器学习算法
- 批准号:
RGPIN-2015-06751 - 财政年份:2019
- 资助金额:
$ 58.82万 - 项目类别:
Discovery Grants Program - Individual
Developing novel machine learning algorithms for network biology
为网络生物学开发新颖的机器学习算法
- 批准号:
RGPIN-2015-06751 - 财政年份:2018
- 资助金额:
$ 58.82万 - 项目类别:
Discovery Grants Program - Individual
Machine Learning Algorithms for Actionable Knowledge Discovery in Synthetic Biology
合成生物学中可操作知识发现的机器学习算法
- 批准号:
2132169 - 财政年份:2018
- 资助金额:
$ 58.82万 - 项目类别:
Studentship
AF: Medium: Collaborative Research: Sequential and Parallel Algorithms for Approximate Sequence Matching with Applications to Computational Biology
AF:媒介:协作研究:近似序列匹配的顺序和并行算法及其在计算生物学中的应用
- 批准号:
1704552 - 财政年份:2017
- 资助金额:
$ 58.82万 - 项目类别:
Standard Grant
Developing novel machine learning algorithms for network biology
为网络生物学开发新颖的机器学习算法
- 批准号:
RGPIN-2015-06751 - 财政年份:2017
- 资助金额:
$ 58.82万 - 项目类别:
Discovery Grants Program - Individual
Workshop on Future Directions for Algorithms in Biology
生物学算法未来方向研讨会
- 批准号:
1748493 - 财政年份:2017
- 资助金额:
$ 58.82万 - 项目类别:
Standard Grant
AF: Medium: Collaborative Research: Sequential and Parallel Algorithms for Approximate Sequence Matching with Applications to Computational Biology
AF:媒介:协作研究:近似序列匹配的顺序和并行算法及其在计算生物学中的应用
- 批准号:
1703489 - 财政年份:2017
- 资助金额:
$ 58.82万 - 项目类别:
Standard Grant














{{item.name}}会员




