Novel bioinformatics methods for integrative detection of structural variants from long-read sequencing
用于从长读长测序中综合检测结构变异的新型生物信息学方法
基本信息
- 批准号:10752265
- 负责人:
- 金额:$ 4.77万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-09-15 至 2026-09-14
- 项目状态:未结题
- 来源:
- 关键词:AddressAreaAwarenessBase PairingBioinformaticsBiomedical EngineeringBiomedical ResearchCollectionCommunicationCommunitiesComplexDataData ScienceDetectionDevelopmentDiseaseEducationEthnic PopulationFutureGenetic VariationGenomeGenomic DNAGenomicsGoalsGraphHaplotypesHumanHuman GenomeLeadLengthMapsMethodsModelingOpticsOralPerformancePopulationRepetitive SequenceResearch TrainingResolutionResourcesSequence AlignmentSiteSourceStructureTechnologyTimeVariantWorkWritingbasecandidate identificationcareercomputerized toolscontigdata integrationdisease phenotypedoctoral studentfile formatgenome sequencinggenome-widegenomic platformgenomic variationhuman pangenomehuman reference genomeinsertion/deletion mutationmachine learning modelnovelpan-genomereference genomerestriction enzymescaffoldsequencing platformskillsstatisticstechnology developmenttoolvariant detectionwhole genome
项目摘要
Project Summary/Abstract
Structural variants (SVs) are the largest source of variations in the human genome and are frequently
associated with disease phenotypes. Thus, the identification and characterization of SVs are essential for
understanding human genome structure and function. The goal of this proposal is to develop a generalized SV
calling pipeline that can leverage information from the latest developments in sequencing technology and
human reference genome representations to discover and resolve SVs at high accuracy. I will first integrate
information across sequencing platforms to increase SV calling accuracy. Multiple sequencing and mapping
platforms are now used to detect SVs from human genome data. My pipeline will increase the accuracy of SV
calling with a data integration model that handles a diverse set of genomic platforms. I will next develop a novel
SV scoring model based on genomic context and coverage. Several factors, such as the generally low
sequence coverage in typical long-read studies, as well as alignment errors due to highly repetitive sequences,
can result in a potentially high rates of false positives for SVs when using parameters for high-sensitivity
calling. I use two sets of important features of SVs, genomic context and coverage, into a machine-learning
model to compute confidence in SV calls for downstream analysis. Finally, I will add support for graph genome
alignments by implementing support for sequence data aligned to graph genome assemblies in GFA file
format. Unlike single reference genomes, pangenomes are particularly useful for characterizing large-scale
structural differences in genomes between different ethnicity groups. Pangenomes would bring us closer to
capturing the full extent of human genomic variation, and thus represent an important resource to leverage for
SV calling. In summary, in this project I will develop a generalized SV calling pipeline capable of integrating
multiple technical platforms for discovering SVs and providing support for future developments in pangenome
graph assemblies. With the research training plan, I will 1) gain expertise in genomics and bioinformatics, 2)
promote diversity in biomedical research though involvement in educational efforts in the community, 3)
develop oral and written communication skills, and 4) prepare a scientific career focused on the study and
education of human genome variation.
项目总结/摘要
结构变异(SV)是人类基因组中变异的最大来源,并且经常被用于人类基因组的遗传学研究。
与疾病表型相关。因此,SV的识别和表征对于以下方面至关重要:
了解人类基因组的结构和功能。本建议的目标是开发一个通用的SV
调用管道,可以利用来自测序技术最新发展的信息,
人类参考基因组表示,以高精度发现和解决SV。我先整合一下
在整个测序平台上的信息,以增加SV调用准确性。多重测序和作图
平台现在用于从人类基因组数据中检测SV。我的管道将增加SV的准确性
调用处理不同基因组平台的数据集成模型。我接下来要写一本小说
基于基因组背景和覆盖度的SV评分模型。几个因素,如普遍低
在典型的长读段研究中的序列覆盖率,以及由于高度重复序列引起的比对误差,
当使用高灵敏度参数时,可能导致SV的假阳性率很高
调用.我使用SV的两组重要特征,基因组背景和覆盖率,进入机器学习
计算SV置信度的模型要求进行下游分析。最后,我将添加对graph genome的支持
通过实现对序列数据的支持来进行比对,以在GFA文件中绘制基因组组装图
格式.与单一参考基因组不同,泛基因组对于表征大规模的基因组特别有用。
不同种族群体之间基因组的结构差异。泛星系群会让我们更接近
捕获人类基因组变异的全部范围,因此代表了利用的重要资源,
SV呼叫总之,在这个项目中,我将开发一个通用的SV调用管道,
多个技术平台,用于发现SV并为泛基因组的未来发展提供支持
图形程序集。通过研究培训计划,我将1)获得基因组学和生物信息学方面的专业知识,2)
通过参与社区教育工作促进生物医学研究的多样性,3)
发展口头和书面沟通能力,4)准备一个科学的职业生涯集中在研究和
人类基因组变异的教育。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jonathan Perdomo其他文献
Jonathan Perdomo的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
层出镰刀菌氮代谢调控因子AreA 介导伏马菌素 FB1 生物合成的作用机理
- 批准号:2021JJ40433
- 批准年份:2021
- 资助金额:0.0 万元
- 项目类别:省市级项目
寄主诱导梢腐病菌AreA和CYP51基因沉默增强甘蔗抗病性机制解析
- 批准号:32001603
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
AREA国际经济模型的移植.改进和应用
- 批准号:18870435
- 批准年份:1988
- 资助金额:2.0 万元
- 项目类别:面上项目
相似海外基金
Onboarding Rural Area Mathematics and Physical Science Scholars
农村地区数学和物理科学学者的入职
- 批准号:
2322614 - 财政年份:2024
- 资助金额:
$ 4.77万 - 项目类别:
Standard Grant
TRACK-UK: Synthesized Census and Small Area Statistics for Transport and Energy
TRACK-UK:交通和能源综合人口普查和小区域统计
- 批准号:
ES/Z50290X/1 - 财政年份:2024
- 资助金额:
$ 4.77万 - 项目类别:
Research Grant
Wide-area low-cost sustainable ocean temperature and velocity structure extraction using distributed fibre optic sensing within legacy seafloor cables
使用传统海底电缆中的分布式光纤传感进行广域低成本可持续海洋温度和速度结构提取
- 批准号:
NE/Y003365/1 - 财政年份:2024
- 资助金额:
$ 4.77万 - 项目类别:
Research Grant
Point-scanning confocal with area detector
点扫描共焦与区域检测器
- 批准号:
534092360 - 财政年份:2024
- 资助金额:
$ 4.77万 - 项目类别:
Major Research Instrumentation
Collaborative Research: Scalable Manufacturing of Large-Area Thin Films of Metal-Organic Frameworks for Separations Applications
合作研究:用于分离应用的大面积金属有机框架薄膜的可扩展制造
- 批准号:
2326714 - 财政年份:2024
- 资助金额:
$ 4.77万 - 项目类别:
Standard Grant
Collaborative Research: Scalable Manufacturing of Large-Area Thin Films of Metal-Organic Frameworks for Separations Applications
合作研究:用于分离应用的大面积金属有机框架薄膜的可扩展制造
- 批准号:
2326713 - 财政年份:2024
- 资助金额:
$ 4.77万 - 项目类别:
Standard Grant
Unlicensed Low-Power Wide Area Networks for Location-based Services
用于基于位置的服务的免许可低功耗广域网
- 批准号:
24K20765 - 财政年份:2024
- 资助金额:
$ 4.77万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
RAPID: Collaborative Research: Multifaceted Data Collection on the Aftermath of the March 26, 2024 Francis Scott Key Bridge Collapse in the DC-Maryland-Virginia Area
RAPID:协作研究:2024 年 3 月 26 日 DC-马里兰-弗吉尼亚地区 Francis Scott Key 大桥倒塌事故后果的多方面数据收集
- 批准号:
2427233 - 财政年份:2024
- 资助金额:
$ 4.77万 - 项目类别:
Standard Grant
RAPID: Collaborative Research: Multifaceted Data Collection on the Aftermath of the March 26, 2024 Francis Scott Key Bridge Collapse in the DC-Maryland-Virginia Area
RAPID:协作研究:2024 年 3 月 26 日 DC-马里兰-弗吉尼亚地区 Francis Scott Key 大桥倒塌事故后果的多方面数据收集
- 批准号:
2427232 - 财政年份:2024
- 资助金额:
$ 4.77万 - 项目类别:
Standard Grant
RAPID: Collaborative Research: Multifaceted Data Collection on the Aftermath of the March 26, 2024 Francis Scott Key Bridge Collapse in the DC-Maryland-Virginia Area
RAPID:协作研究:2024 年 3 月 26 日 DC-马里兰-弗吉尼亚地区 Francis Scott Key 大桥倒塌事故后果的多方面数据收集
- 批准号:
2427231 - 财政年份:2024
- 资助金额:
$ 4.77万 - 项目类别:
Standard Grant