Next Generation Machine Learning for the Accurate Detection of DNA Variations from High-Throughput Sequencing Data
用于从高通量测序数据中准确检测 DNA 变异的下一代机器学习
基本信息
- 批准号:RGPIN-2019-04896
- 负责人:
- 金额:$ 2.04万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2022
- 资助国家:加拿大
- 起止时间:2022-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Deoxyribonucleic acid (DNA) is the hereditary material in humans and almost all other organisms. The emergence of "next generation sequencing" (NGS) technology?has created unprecedented opportunities to study DNA in large scales. While extremely powerful, NGS produces massive quantities of data (~250GB/cancer patient). These large datasets contain errors that distort and obscure the true DNA changes (referred to as variations). Significant leap in the development of algorithms for the detection of DNA variations from different tissue types and technologies is needed, because: (a) the hand-crafted and parameterized algorithms developed so far still produce thousands of errors and miss true DNA variations; (b) while the recent single cell DNA sequencing (SCS) technologies are now able to characterize the DNA of each cell, algorithms for detection of DNA variations in SCS data lag far behind the data generation throughout; and (c) a common approach to archive tissue material is formalin fixation, which introduces false DNA variations and poses a challenge to the identification of true DNA variations in the stored tissue. Our long-term objective is to develop computational methods to identify DNA variations that cause biological abnormalities. Within this cycle of the Discovery program, we will develop novel machine learning frameworks (based on deep learning and ensemble learning) that detect DNA variations (regardless of whether they cause abnormalities) from various sources of tissue sequenced by NGS technologies. Model datasets will be used to validate these novel algorithms and enrich their approaches. Successful execution of this basic research program will lead to a novel class of algorithms and software that will (a) maximize the benefit of significant resources committed to DNA sequencing saving millions of dollars in follow up experiments to validate the DNA variations identified from noisy NGS data; (b) enable SCS to more effectively characterize cells which will have a broad impact on many diverse fields including microbiology, neurobiology, development, immunology and cancer; and (c) open the door for the effective assessment of DNA sequence in formalin-fixed tissue, providing an explosion in data to screen and comprehensively evaluate disease markers. To the best of our knowledge, the proposed program is unique and novel in Canada and will train 2 PhD and 1 MSc student, as well as 7 undergraduate students with highly demanded skills in academia and industry. A detailed training plan that allows individuals to reach their full potential is integrated within the development of research objectives to ensure a high quality, interactive, flourishing, and an equitable research environment for HQP as per UBC Equity and Diversity Strategic Plan and Policy #2.
脱氧核糖核酸(DNA)是人类和几乎所有其他生物的遗传物质。“下一代测序”(NGS)技术的出现?为大规模研究DNA创造了前所未有的机会。虽然功能非常强大,但NGS会产生大量数据(每个癌症患者约250GB)。这些大型数据集包含扭曲和模糊真实DNA变化(称为变异)的错误。需要在检测来自不同组织类型和技术的DNA变异的算法方面取得重大飞跃,因为:(a)迄今为止开发的手工制作和参数化算法仍然会产生数千个错误并错过真正的DNA变异;(b)虽然最近的单细胞DNA测序(SCS)技术现在能够表征每个细胞的DNA,但检测SCS数据中DNA变异的算法远远落后于整个数据生成;(c)归档组织材料的常用方法是福尔马林固定,这会引入虚假的DNA变异,并对鉴定存储组织中的真实DNA变异构成挑战。我们的长期目标是开发计算方法来识别导致生物异常的DNA变异。在探索计划的这个周期内,我们将开发新的机器学习框架(基于深度学习和集成学习),从NGS技术测序的各种组织来源检测DNA变异(无论它们是否导致异常)。模型数据集将用于验证这些新算法并丰富其方法。这一基础研究计划的成功实施将导致一种新型的算法和软件,这将(a)最大限度地利用致力于DNA测序的重要资源,节省数百万美元的后续实验,以验证从嘈杂的NGS数据中识别出的DNA变异;(b)使SCS能够更有效地表征细胞,这将对许多不同领域产生广泛影响,包括微生物学、神经生物学、发育、免疫学和癌症;(c)为有效评估福尔马林固定组织中的DNA序列打开了大门,为筛选和综合评估疾病标志物提供了大量数据。据我们所知,该计划在加拿大是独一无二的,将培养2名博士和1名硕士学生,以及7名具有学术界和工业界高要求技能的本科生。一个详细的培训计划,使个人能够充分发挥其潜力,并与研究目标的发展相结合,以确保HQP根据UBC公平和多样性战略计划和政策#2的高质量,互动,繁荣和公平的研究环境。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Bashashati, Ali其他文献
The utility of color normalization for AI-based diagnosis of hematoxylin and eosin-stained pathology images
- DOI:
10.1002/path.5797 - 发表时间:
2021-11-06 - 期刊:
- 影响因子:7.3
- 作者:
Boschman, Jeffrey;Farahani, Hossein;Bashashati, Ali - 通讯作者:
Bashashati, Ali
Distinct evolutionary trajectories of primary high-grade serous ovarian cancers revealed through spatial mutational profiling.
- DOI:
10.1002/path.4230 - 发表时间:
2013-09 - 期刊:
- 影响因子:7.3
- 作者:
Bashashati, Ali;Ha, Gavin;Tone, Alicia;Ding, Jiarui;Prentice, Leah M.;Roth, Andrew;Rosner, Jamie;Shumansky, Karey;Kalloger, Steve;Senz, Janine;Yang, Winnie;McConechy, Melissa;Melnyk, Nataliya;Anglesio, Michael;Luk, Margaret T. Y.;Tse, Kane;Zeng, Thomas;Moore, Richard;Zhao, Yongjun;Marra, Marco A.;Gilks, Blake;Yip, Stephen;Huntsman, David G.;McAlpine, Jessica N.;Shah, Sohrab P. - 通讯作者:
Shah, Sohrab P.
Genomic consequences of aberrant DNA repair mechanisms stratify ovarian cancer histotypes
- DOI:
10.1038/ng.3849 - 发表时间:
2017-06-01 - 期刊:
- 影响因子:30.8
- 作者:
Wang, Yi Kan;Bashashati, Ali;Shah, Sohrab P. - 通讯作者:
Shah, Sohrab P.
An improved asynchronous brain interface: making use of the temporal history of the LF-ASD feature vectors
- DOI:
10.1088/1741-2560/3/2/002 - 发表时间:
2006-06-01 - 期刊:
- 影响因子:4
- 作者:
Bashashati, Ali;Mason, Steve;Birch, Gary E. - 通讯作者:
Birch, Gary E.
Per-channel basis normalization methods for flow cytometry data.
- DOI:
10.1002/cyto.a.20823 - 发表时间:
2010-02 - 期刊:
- 影响因子:3.7
- 作者:
Hahne, Florian;Khodabakhshi, Alireza Hadj;Bashashati, Ali;Wong, Chao-Jen;Gascoyne, Randy D.;Weng, Andrew P.;Seyfert-Margolis, Vicky;Bourcier, Katarzyna;Asare, Adam;Lumley, Thomas;Gentleman, Robert;Brinkman, Ryan R. - 通讯作者:
Brinkman, Ryan R.
Bashashati, Ali的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Bashashati, Ali', 18)}}的其他基金
Next Generation Machine Learning for the Accurate Detection of DNA Variations from High-Throughput Sequencing Data
用于从高通量测序数据中准确检测 DNA 变异的下一代机器学习
- 批准号:
RGPIN-2019-04896 - 财政年份:2021
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Next Generation Machine Learning for the Accurate Detection of DNA Variations from High-Throughput Sequencing Data
用于从高通量测序数据中准确检测 DNA 变异的下一代机器学习
- 批准号:
RGPIN-2019-04896 - 财政年份:2020
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Next Generation Machine Learning for the Accurate Detection of DNA Variations from High-Throughput Sequencing Data
用于从高通量测序数据中准确检测 DNA 变异的下一代机器学习
- 批准号:
RGPIN-2019-04896 - 财政年份:2019
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Next Generation Machine Learning for the Accurate Detection of DNA Variations from High-Throughput Sequencing Data
用于从高通量测序数据中准确检测 DNA 变异的下一代机器学习
- 批准号:
DGECR-2019-00028 - 财政年份:2019
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Launch Supplement
Computational methods for the analysis of flow cytometry data
流式细胞术数据分析的计算方法
- 批准号:
343277-2007 - 财政年份:2009
- 资助金额:
$ 2.04万 - 项目类别:
Postdoctoral Fellowships
Computational methods for the analysis of flow cytometry data
流式细胞术数据分析的计算方法
- 批准号:
343277-2007 - 财政年份:2008
- 资助金额:
$ 2.04万 - 项目类别:
Postdoctoral Fellowships
Computational methods for the analysis of flow cytometry data
流式细胞术数据分析的计算方法
- 批准号:
343277-2007 - 财政年份:2007
- 资助金额:
$ 2.04万 - 项目类别:
Postdoctoral Fellowships
相似国自然基金
Next Generation Majorana Nanowire Hybrids
- 批准号:
- 批准年份:2020
- 资助金额:20 万元
- 项目类别:
相似海外基金
REDONDA: A Next-Generation State-Machine Replication Protocol for Blockchain
REDONDA:区块链的下一代状态机复制协议
- 批准号:
EP/Y036425/1 - 财政年份:2024
- 资助金额:
$ 2.04万 - 项目类别:
Research Grant
Next-Generation Algorithms in Statistical Genetics Based on Modern Machine Learning
基于现代机器学习的下一代统计遗传学算法
- 批准号:
10714930 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别:
Next generation closed-loop brain-machine interfaces
下一代闭环脑机接口
- 批准号:
LP220100256 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别:
Linkage Projects
Redonda: A Next-Generation State-Machine Replication Protocol for Blockchain
Redonda:区块链的下一代状态机复制协议
- 批准号:
EP/Y036417/1 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别:
Research Grant
(PHYMOL) Physics, Accuracy and Machine Learning: Towards the next-generation of Molecular Potentials
(PHYMOL) 物理学、准确性和机器学习:迈向下一代分子潜力
- 批准号:
EP/X036863/1 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别:
Research Grant
Physics-informed and physics-constrained machine learning for next generation imaging
用于下一代成像的物理知情和物理约束的机器学习
- 批准号:
2898384 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别:
Studentship
Machine-Learning-Driven Synthesis of the Next Generation Carbon Dots with Tunable Fluorescence/Band-Gap
机器学习驱动的具有可调荧光/带隙的下一代碳点合成
- 批准号:
2754236 - 财政年份:2022
- 资助金额:
$ 2.04万 - 项目类别:
Studentship
Hyper-fast hyper-parameter tuning for the next generation of machine learning
下一代机器学习的超快速超参数调整
- 批准号:
RGPIN-2022-03669 - 财政年份:2022
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
RINGS: Provably Robust Machine Learning for Next Generation Cellular Networks
RINGS:可证明稳健的下一代蜂窝网络机器学习
- 批准号:
2148583 - 财政年份:2022
- 资助金额:
$ 2.04万 - 项目类别:
Standard Grant
CC* Compute: COllaborative Next-generation Technology In the Northeast: the UMassUnity Machine (CONTINUUM)
CC* 计算:东北地区的协作下一代技术:UMassUnity 机器 (CONTINUUM)
- 批准号:
2201106 - 财政年份:2022
- 资助金额:
$ 2.04万 - 项目类别:
Standard Grant