Machine learning methods to increase genomic accessibility by next-gen sequencing

通过下一代测序提高基因组可访问性的机器学习方法

基本信息

  • 批准号:
    8350385
  • 负责人:
  • 金额:
    $ 22万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2012
  • 资助国家:
    美国
  • 起止时间:
    2012-08-01 至 2015-06-30
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): DNA sequencing has become an indispensable tool in many areas of biology and medicine. Recent techno- logical breakthroughs in next-generation sequencing (NGS) have made it possible to sequence billions of bases quickly and cheaply. A number of NGS-based tools have been created, including ChIP-seq, RNA-seq, Methyl- seq and exon/whole-genome sequencing, enabling a fundamentally new way of studying diseases, genomes and epigenomes. The widespread use of NGS-based methods calls for better and more efficient tools for the analysis and interpretation of the NGS high-throughput data. Although a number of computational tools have been devel- oped, they are insufficient in mapping and studying genome features located within repeat, duplicated and other so-called unmappable regions of genomes. In this project, computational algorithms and software that expand genomic accessibility of NGS to these previously understudied regions will be developed. The algorithms will begin with a new way of mapping raw reads from NGS to the reference genome, followed by a machine learning method to resolve ambiguously mapped reads, and will be integrated into a comprehen- sive analysis pipeline for ChIP-seq. More specifically, the three aims of the research are to develop: (1) Data structures and efficient algorithms for read mapping to rapidly identify all mapping locations. Unlike existing methods, the focus of this research is to rapidly identify all candidate locations of each read, instead of one or only a few locations. (2) Machine learning algorithms for read analysis to resolve ambiguously mapped reads for both ChIP-seq analysis and genetic variation detection. This work will develop probabilistic models to resolve ambiguously mapped reads by pooling information from the entire collection of reads. (3) A comprehensive ChIP- seq analysis pipeline to systematically study genomic features located within unmappable regions of genomes. These algorithms will be tested and refined using both publicly available data and data from established wet-lab collaborators. In addition to discovering new genomic features located within repeat, duplicated or other previously unac- cessible regions, this work will provide the NGS community with (a) a faster and more accurate tool for mapping short sequence reads, (b) a general methodology for expanding genomic accessibility of NGS, and (c) a versatile, modular, open-source toolbox of algorithms for NGS data analysis, (d) a comprehensive analysis of protein-DNA interactions in repeat regions in all publicly available ChIP-seq datasets. This work is a close collaboration between computer scientists and web-lab biologists who are developing NGS assays to study biomedical problems. In particular, we will collaborate with Timothy Osborne of Sanford- Burnham Medical Research Institute to study regulators involved in cholesterol and fatty acid metabolism, with Kyoko Yokomori of UC Irvine to study Cohesin, Nipbl and their roles in Cornelia de Lange syndrome, and Ken Cho of UC Irvine to study the roles of FoxH1 and Schnurri in development and growth control. PUBLIC HEALTH RELEVANCE: DNA-sequencing has become an indispensable tool for basic biomedical research as well as for discovering new treatments and helping biomedical researchers understand disease mechanisms. Next-generation sequencing, which enables rapid generation of billions of bases at relatively low cost, poses a significant computational challenge on how to analyze the large amount of sequence data efficiently and accurately. The goal of this research is to develop open-source software to improve both the efficiency and accuracy of the next-generation sequencing analysis tools, and thereby allowing biomedical researchers to take full advantage of next-generation sequencing to study biology and disease.
描述(由申请人提供):DNA测序已成为生物学和医学许多领域不可或缺的工具。下一代测序(NGS)的最新技术突破使得快速且廉价地测序数十亿个碱基成为可能。已经创建了许多基于NGS的工具,包括ChIP-seq,RNA-seq,Methyl-seq和外显子/全基因组测序,从而实现了研究疾病,基因组和表观基因组的全新方式。基于NGS的方法的广泛使用需要更好和更有效的工具来分析和解释NGS高通量数据。尽管已经开发了许多计算工具,但它们不足以定位和研究位于基因组的重复、重复和其他所谓的不可定位区域内的基因组特征。在这个项目中,将开发将NGS的基因组可及性扩展到这些以前研究不足的区域的计算算法和软件。 这些算法将开始以一种新的方式将原始读数从NGS映射到参考基因组,随后是一种机器学习方法来解决模糊映射的读数,并将被整合到ChIP-seq的主动分析管道中。更具体地说,本研究的三个目标是:(1)用于读取映射的数据结构和有效算法,以快速识别所有映射位置。与现有方法不同,本研究的重点是快速识别每个读数的所有候选位置,而不是一个或仅几个位置。(2)用于读段分析的机器学习算法,以解决ChIP-seq分析和遗传变异检测的模糊映射读段。这项工作将开发概率模型,通过汇集来自整个读数集合的信息来解决模糊映射的读数。(3)一个全面的ChIP-seq分析管道,系统地研究位于基因组不可映射区域内的基因组特征。这些算法将使用公开可用的数据和来自已建立的湿实验室合作者的数据进行测试和改进。 除了发现位于重复、重复或其他先前不可及区域内的新基因组特征之外,这项工作将为NGS社区提供(a)用于映射短序列读数的更快和更准确的工具,(B)用于扩展NGS的基因组可及性的一般方法,以及(c)用于NGS数据分析的通用、模块化、开源算法工具箱,(d)全面分析所有公开的ChIP-seq数据集中重复区域的蛋白质-DNA相互作用。 这项工作是计算机科学家和网络实验室生物学家之间的密切合作,他们正在开发NGS分析来研究生物医学问题。特别是,我们将与桑福德-伯纳姆医学研究所的蒂莫西·奥斯本合作,研究参与胆固醇和脂肪酸代谢的调节剂,与加州大学欧文分校的横森恭子合作,研究内聚素、尼布尔及其在科尔内利亚德兰格综合征中的作用,以及加州大学欧文分校的肯·曹合作,研究福克斯H 1和施努里在发育和生长控制中的作用。 公共卫生关系:DNA测序已成为基础生物医学研究以及发现新疗法和帮助生物医学研究人员了解疾病机制的不可或缺的工具。下一代测序能够以相对较低的成本快速生成数十亿个碱基,这对如何有效和准确地分析大量序列数据提出了重大的计算挑战。这项研究的目标是开发开源软件,以提高下一代测序分析工具的效率和准确性,从而使生物医学研究人员能够充分利用下一代测序来研究生物学和疾病。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Xiaohui Xie其他文献

Xiaohui Xie的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Xiaohui Xie', 18)}}的其他基金

Machine learning methods to increase genomic accessibility by next-gen sequencing
通过下一代测序提高基因组可访问性的机器学习方法
  • 批准号:
    8683213
  • 财政年份:
    2012
  • 资助金额:
    $ 22万
  • 项目类别:
Machine learning methods to increase genomic accessibility by next-gen sequencing
通过下一代测序提高基因组可访问性的机器学习方法
  • 批准号:
    8518436
  • 财政年份:
    2012
  • 资助金额:
    $ 22万
  • 项目类别:

相似海外基金

An epidemiological study on HPV prevalence of external genitalia, urinary tract, oropharynx and anus among Japanese men
日本男性外生殖器、尿道、口咽、肛门HPV感染情况的流行病学研究
  • 批准号:
    26861261
  • 财政年份:
    2014
  • 资助金额:
    $ 22万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
Effects of training using an obstetric simulator for midwifery students to improve their techniques for protection of the perineum and anus, and related problems
使用产科模拟器对助产士学生提高会阴和肛门保护技术的培训效果及相关问题
  • 批准号:
    24531195
  • 财政年份:
    2012
  • 资助金额:
    $ 22万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Mucosal immune response of the anus in women to HPV, intercourse, smoking and OCs
女性肛门对 HPV、性交、吸烟和口服避孕药的粘膜免疫反应
  • 批准号:
    7813440
  • 财政年份:
    2010
  • 资助金额:
    $ 22万
  • 项目类别:
INFRARED COAGULATOR FOR SQUAMOUS INTRAEPITHELIAL NEOPLASIA OF ANUS IN HIV+
红外线凝固器治疗 HIV 患者肛门鳞状上皮内瘤变
  • 批准号:
    7202667
  • 财政年份:
    2005
  • 资助金额:
    $ 22万
  • 项目类别:
INFRARED COAGULATOR FOR SQUAMOUS INTRAEPITHELIAL NEOPLASIA OF ANUS IN HIV+
红外线凝固器治疗 HIV 患者肛门鳞状上皮内瘤变
  • 批准号:
    6972327
  • 财政年份:
    2004
  • 资助金额:
    $ 22万
  • 项目类别:
Priliminary study to develop a perineal artificial anus
会阴人工肛门开发的初步研究
  • 批准号:
    07671351
  • 财政年份:
    1995
  • 资助金额:
    $ 22万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Positional cloning of the genes related to malformations of eye, anus and heart
眼、肛门、心脏畸形相关基因的定位克隆
  • 批准号:
    06670824
  • 财政年份:
    1994
  • 资助金额:
    $ 22万
  • 项目类别:
    Grant-in-Aid for General Scientific Research (C)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了