Computational Methods for Sequence Alignment, Genotyping, and Diploid Genome Assembly
序列比对、基因分型和二倍体基因组组装的计算方法
基本信息
- 批准号:10022496
- 负责人:
- 金额:$ 40.88万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-09-23 至 2023-08-31
- 项目状态:已结题
- 来源:
- 关键词:ATAC-seqAddressAlgorithmsAreaBioinformaticsChIP-seqClinicalComputational algorithmComputer AnalysisComputer softwareComputersComputing MethodologiesCustomDataData SetDiploidyEffectivenessEngineeringGene Expression ProfilingGene FusionGenesGenetic DatabasesGenetic DiseasesGenomeGenomicsGenotypeGenotype-Tissue Expression ProjectGraphHLA AntigensHealthcareHigh-Throughput Nucleotide SequencingHumanHuman BiologyHuman GenomeIndividualLengthLinkLocationMapsMeasurableMemoryMethodsMethylationPerformancePersonsPlayPreparationProcessProtocols documentationReportingResearchResearch PersonnelRoleSamplingSchemeSequence AlignmentSequence AnalysisSoftware ToolsSomatic MutationSpeedStatistical ModelsStructureSystems AnalysisTechnologyThe Cancer Genome AtlasThinnessTissue-Specific Gene ExpressionTranscriptVariantWorkapplication programming interfacebasebioinformatics toolbiological researchbisulfitebisulfite sequencingcancer geneticsclinical practicecostdesignexperienceexperimental studygenetic makeupgenetic variantgenome-widehuman diseaseimprovedindexinginsightmetagenomic sequencingnext generation sequencingnovelpersonalized medicineprogramssoftware systemstooltranscriptometranscriptome sequencingusabilityvirtual
项目摘要
Project Summary/Abstract
Massive sequencing is revolutionizing biological research and clinical practice. Over the past decades, projects
such as the 1,000 Genomes Project, TCGA, GTEx, and GEUVADIS have generated hundreds of trillions of
reads. The recent completion of the UK’s 100K WGS project has inspired many other nations to develop their
own 100K WGS projects. The improvements in throughput and reduced costs of sequencing have enabled more
thorough and deeper studies of cancer, genetic disorders, and other areas of human biology. Advanced
sequencing alignment and computational methodologies have played a major role in conducting these
analyses. In recent years, our lab has contributed to these global scale and unprecedented endeavors by
developing several widely used bioinformatics tools for analyzing NGS sequencing reads: TopHat2 and
HISAT for aligning RNA-seq reads, TopHat-Fusion for identifying gene fusions, Centrifuge for classifying
metagenomics sequencing reads, HISAT2 for graph alignment at the human genome scale, and HISAT-
genotype for HLA gene typing and assembly.
This proposal addresses several key challenges in the areas of sequence alignment, genotyping, and
diploid genome assembly. First, we plan to research and develop various indexing strategies. Virtually all
alignment programs rely on one type of index for aligning reads to a reference. Alignment accuracy and speed
will be further enhanced by incorporating additional types of indexes. Second, we will develop genotyping and
diploid genome assembly algorithms. As sequencing costs continue to decline, it will become routine for people
to have their own genomes sequenced for clinical purposes. We will further develop our initial version of
HISAT-genotype into a comprehensive suite of tools that can genotype and assemble a person’s whole diploid
genome in one day on a desktop. Third, we will continue to maintain and improve HISAT2, and develop a new
more versatile aligner. We propose to unify widely used alignment programs by developing several common
functions of alignment programs (input processing, indexing, aligning, and reporting) as modules and provide
application programming interfaces (APIs) that expose those modules, enabling bioinformatics engineers to
use the APIs for developing their own indexes and alignment algorithms that are customized for best analyzing
their own data sets. We plan to demonstrate the usability of the new sequence aligner, SARTOR (Sequence
Alignment Repertoire To Optimize Reference-guided analysis), by effectively handling different types of reads
(WGS, WES, RNA-seq, ChIP-seq, BS-seq, etc.,) produced by different sequencing technologies (short, long, and
linked reads). Upon successful completion, the proposed software systems will promote personalized medicine
by drawing upon customized personal genomes, with key functionalities including differential gene expression
analysis and somatic mutation identification. The programs will also allow researchers to perform unbiased,
accurate, and rapid analyses in large-scale NGS experiments.
项目总结/摘要
大规模测序正在彻底改变生物学研究和临床实践。在过去的几十年里,项目
例如1,000个基因组计划,TCGA,GTEx和GEUVADIS已经产生了数百万亿个
阅读。最近完成的英国100 K WGS项目激励了许多其他国家开发其
拥有10万个WGS项目。通量的提高和测序成本的降低使得更多的
对癌症、遗传性疾病和人类生物学的其他领域进行彻底和深入的研究。先进
测序比对和计算方法在进行这些研究中发挥了重要作用。
分析。近年来,我们的实验室为这些全球规模和前所未有的努力做出了贡献,
开发了几种广泛使用的用于分析NGS测序读数的生物信息学工具:TopHat 2和
用于比对RNA-seq读数的HISAT,用于鉴定基因融合的TopHat-Fusion,
宏基因组测序读数,用于人类基因组规模的图形比对的HISAT 2,以及HISAT-
HLA基因分型和装配。
该提案解决了序列比对、基因分型和
二倍体基因组组装首先,我们计划研究和开发各种索引策略。几乎所有
比对程序依赖于一种类型的索引来将读段与参照进行比对。对准精度和速度
将通过纳入其他类型的索引来进一步增强。第二,我们将开发基因分型,
二倍体基因组组装算法。随着测序成本的不断下降,
为了临床目的对他们自己的基因组进行测序。我们将进一步开发我们的初始版本,
HISAT基因型转化为一套全面的工具,可以基因型和组装一个人的整个二倍体
一天之内就能在桌面上完成一个基因组。第三,我们将继续维护和改进HISAT 2,并开发新的
更灵活的校准器。我们建议通过开发几种通用的比对程序来统一广泛使用的比对程序。
校准程序的功能(输入处理、索引、校准和报告)作为模块,
应用程序编程接口(API),公开这些模块,使生物信息学工程师,
使用API开发自己的索引和对齐算法,这些算法是为最佳分析而定制的
自己的数据集。我们计划展示新的序列比对器SARTOR(Sequence
比对库优化参考引导分析),通过有效处理不同类型的读取
(WGS、WES、RNA-seq、ChIP-seq、BS-seq等)通过不同的测序技术(短,长,
链接读取)。成功完成后,拟议的软件系统将促进个性化医疗
通过利用定制的个人基因组,其关键功能包括差异基因表达
分析和体细胞突变鉴定。该计划还将允许研究人员进行公正,
在大规模NGS实验中进行准确、快速的分析。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Daehwan Kim其他文献
Daehwan Kim的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Daehwan Kim', 18)}}的其他基金
Computational Methods for Sequence Alignment, Genotyping, and Diploid Genome Assembly
序列比对、基因分型和二倍体基因组组装的计算方法
- 批准号:
10483191 - 财政年份:2019
- 资助金额:
$ 40.88万 - 项目类别:
Computational Methods for Sequence Alignment, Genotyping, and Diploid Genome Assembly
序列比对、基因分型和二倍体基因组组装的计算方法
- 批准号:
10242898 - 财政年份:2019
- 资助金额:
$ 40.88万 - 项目类别:
相似海外基金
Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
- 批准号:
MR/S03398X/2 - 财政年份:2024
- 资助金额:
$ 40.88万 - 项目类别:
Fellowship
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
- 批准号:
EP/Y001486/1 - 财政年份:2024
- 资助金额:
$ 40.88万 - 项目类别:
Research Grant
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
- 批准号:
2338423 - 财政年份:2024
- 资助金额:
$ 40.88万 - 项目类别:
Continuing Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
- 批准号:
MR/X03657X/1 - 财政年份:2024
- 资助金额:
$ 40.88万 - 项目类别:
Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
- 批准号:
2348066 - 财政年份:2024
- 资助金额:
$ 40.88万 - 项目类别:
Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
- 批准号:
AH/Z505481/1 - 财政年份:2024
- 资助金额:
$ 40.88万 - 项目类别:
Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
- 批准号:
10107647 - 财政年份:2024
- 资助金额:
$ 40.88万 - 项目类别:
EU-Funded
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
- 批准号:
2341402 - 财政年份:2024
- 资助金额:
$ 40.88万 - 项目类别:
Standard Grant
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
- 批准号:
10106221 - 财政年份:2024
- 资助金额:
$ 40.88万 - 项目类别:
EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
- 批准号:
AH/Z505341/1 - 财政年份:2024
- 资助金额:
$ 40.88万 - 项目类别:
Research Grant