Machine learning methods to increase genomic accessibility by next-gen sequencing
通过下一代测序提高基因组可访问性的机器学习方法
基本信息
- 批准号:8683213
- 负责人:
- 金额:$ 22.13万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2012
- 资助国家:美国
- 起止时间:2012-08-01 至 2016-06-30
- 项目状态:已结题
- 来源:
- 关键词:AlgorithmsAnusAreaBindingBiologicalBiological AssayBiologyBiomedical ResearchBruck-de Lange syndromeChIP-seqCholesterolChromatinCollaborationsCollectionCommunitiesComputational algorithmComputer softwareComputersDNA SequenceDNA-Protein InteractionDataData AnalysesData SetDetectionDiseaseExonsFacioscapulohumeralFoundationsGenerationsGenetic VariationGenomeGenomicsGoalsGrowth and Development functionInternetLocationMachine LearningMapsMedical ResearchMedicineMethodologyMethodsMuscular DystrophiesProceduresPublishingReadingResearchResearch InstituteResearch PersonnelRoleScientistSequence AnalysisSoftware EngineeringSpeedStatistical ModelsStructureTestingUncertaintyWorkbasecohesincomputerized toolscostepigenomefatty acid metabolismfunctional genomicsgenome sequencinggenome-wideimprovedinsertion/deletion mutationnext generation sequencingnovelopen sourcepublic health relevancetooltranscription factortranscriptome sequencingxenopus development
项目摘要
DESCRIPTION (provided by applicant): DNA sequencing has become an indispensable tool in many areas of biology and medicine. Recent techno- logical breakthroughs in next-generation sequencing (NGS) have made it possible to sequence billions of bases quickly and cheaply. A number of NGS-based tools have been created, including ChIP-seq, RNA-seq, Methyl- seq and exon/whole-genome sequencing, enabling a fundamentally new way of studying diseases, genomes and epigenomes. The widespread use of NGS-based methods calls for better and more efficient tools for the analysis and interpretation of the NGS high-throughput data. Although a number of computational tools have been devel- oped, they are insufficient in mapping and studying genome features located within repeat, duplicated and other so-called unmappable regions of genomes. In this project, computational algorithms and software that expand genomic accessibility of NGS to these previously understudied regions will be developed. The algorithms will begin with a new way of mapping raw reads from NGS to the reference genome, followed by a machine learning method to resolve ambiguously mapped reads, and will be integrated into a comprehen- sive analysis pipeline for ChIP-seq. More specifically, the three aims of the research are to develop: (1) Data structures and efficient algorithms for read mapping to rapidly identify all mapping locations. Unlike existing methods, the focus of this research is to rapidly identify all candidate locations of each read, instead of one or only a few locations. (2) Machine learning algorithms for read analysis to resolve ambiguously mapped reads for both ChIP-seq analysis and genetic variation detection. This work will develop probabilistic models to resolve ambiguously mapped reads by pooling information from the entire collection of reads. (3) A comprehensive ChIP- seq analysis pipeline to systematically study genomic features located within unmappable regions of genomes. These algorithms will be tested and refined using both publicly available data and data from established wet-lab collaborators. In addition to discovering new genomic features located within repeat, duplicated or other previously unac- cessible regions, this work will provide the NGS community with (a) a faster and more accurate tool for mapping short sequence reads, (b) a general methodology for expanding genomic accessibility of NGS, and (c) a versatile, modular, open-source toolbox of algorithms for NGS data analysis, (d) a comprehensive analysis of protein-DNA interactions in repeat regions in all publicly available ChIP-seq datasets. This work is a close collaboration between computer scientists and web-lab biologists who are developing NGS assays to study biomedical problems. In particular, we will collaborate with Timothy Osborne of Sanford- Burnham Medical Research Institute to study regulators involved in cholesterol and fatty acid metabolism, with Kyoko Yokomori of UC Irvine to study Cohesin, Nipbl and their roles in Cornelia de Lange syndrome, and Ken Cho of UC Irvine to study the roles of FoxH1 and Schnurri in development and growth control.
描述(申请人提供):DNA测序已成为许多生物学和医学领域不可或缺的工具。最近在下一代测序(NGS)方面的技术突破使快速和廉价地对数十亿个碱基进行测序成为可能。已经建立了一些基于NGS的工具,包括芯片序列、RNA序列、甲基序列和外显子/全基因组测序,从而为研究疾病、基因组和表观基因组提供了一种全新的方法。基于NGS的方法的广泛使用需要更好和更有效的工具来分析和解释NGS高通量数据。虽然已经开发了一些计算工具,但它们在绘制和研究基因组重复、重复和其他所谓的不可映射区域中的基因组特征方面还不够。在这个项目中,将开发计算算法和软件,将NGS的基因组可及性扩展到这些以前未被研究的区域。这些算法将从一种新的方法开始,将原始读数从NGS映射到参考基因组,然后使用机器学习方法来解决模糊映射的读数,并将集成到芯片序列的综合分析流水线中。更具体地说,研究的三个目标是开发:(1)读映射的数据结构和高效算法,以快速识别所有映射位置。与现有方法不同,本研究的重点是快速识别每个阅读的所有候选位置,而不是一个或几个位置。(2)用于读数分析的机器学习算法,以解决用于芯片序列分析和遗传变异检测的模糊映射读数。这项工作将开发概率模型,通过汇集整个阅读集合的信息来解决模糊映射的阅读。(3)一个综合性的芯片-序列分析流水线,系统地研究位于基因组不可映射区的基因组特征。这些算法将使用公开可用的数据和来自已建立的湿实验室合作者的数据进行测试和改进。除了发现位于重复、重复或其他以前无法获得的区域内的新基因组特征外,这项工作还将为NGS社区提供(A)更快、更准确地绘制短序列阅读图谱的工具,(B)扩大NGS基因组可获得性的一般方法,以及(C)用于NGS数据分析的通用、模块化、开放源码的算法工具箱,(D)在所有公开可用的CHIP-SEQ数据集中全面分析重复区域的蛋白质-DNA相互作用。这项工作是计算机科学家和网络实验室生物学家之间的密切合作,他们正在开发NGS分析来研究生物医学问题。特别是,我们将与Sanford-Burnham医学研究所的Timothy Osborne合作研究参与胆固醇和脂肪酸代谢的调节剂,与加州大学欧文分校的Kyoko Yokomori合作研究粘附素、Nipbl及其在Cornelia de Lange综合征中的作用,以及与加州大学欧文分校的Ken Cho合作研究FoxH1和SchNurri在发育和生长控制中的作用。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Improving read mapping using additional prefix grams.
- DOI:10.1186/1471-2105-15-42
- 发表时间:2014-02-05
- 期刊:
- 影响因子:3
- 作者:Kim J;Li C;Xie X
- 通讯作者:Xie X
MixClone: a mixture model for inferring tumor subclonal populations.
- DOI:10.1186/1471-2164-16-s2-s1
- 发表时间:2015
- 期刊:
- 影响因子:4.4
- 作者:Li Y;Xie X
- 通讯作者:Xie X
A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues.
- DOI:10.1186/1471-2105-14-s5-s11
- 发表时间:2013
- 期刊:
- 影响因子:3
- 作者:Li Y;Xie X
- 通讯作者:Xie X
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Xiaohui Xie其他文献
Xiaohui Xie的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Xiaohui Xie', 18)}}的其他基金
Machine learning methods to increase genomic accessibility by next-gen sequencing
通过下一代测序提高基因组可访问性的机器学习方法
- 批准号:
8350385 - 财政年份:2012
- 资助金额:
$ 22.13万 - 项目类别:
Machine learning methods to increase genomic accessibility by next-gen sequencing
通过下一代测序提高基因组可访问性的机器学习方法
- 批准号:
8518436 - 财政年份:2012
- 资助金额:
$ 22.13万 - 项目类别:
相似海外基金
An epidemiological study on HPV prevalence of external genitalia, urinary tract, oropharynx and anus among Japanese men
日本男性外生殖器、尿道、口咽、肛门HPV感染情况的流行病学研究
- 批准号:
26861261 - 财政年份:2014
- 资助金额:
$ 22.13万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Effects of training using an obstetric simulator for midwifery students to improve their techniques for protection of the perineum and anus, and related problems
使用产科模拟器对助产士学生提高会阴和肛门保护技术的培训效果及相关问题
- 批准号:
24531195 - 财政年份:2012
- 资助金额:
$ 22.13万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Mucosal immune response of the anus in women to HPV, intercourse, smoking and OCs
女性肛门对 HPV、性交、吸烟和口服避孕药的粘膜免疫反应
- 批准号:
7813440 - 财政年份:2010
- 资助金额:
$ 22.13万 - 项目类别:
INFRARED COAGULATOR FOR SQUAMOUS INTRAEPITHELIAL NEOPLASIA OF ANUS IN HIV+
红外线凝固器治疗 HIV 患者肛门鳞状上皮内瘤变
- 批准号:
7202667 - 财政年份:2005
- 资助金额:
$ 22.13万 - 项目类别:
INFRARED COAGULATOR FOR SQUAMOUS INTRAEPITHELIAL NEOPLASIA OF ANUS IN HIV+
红外线凝固器治疗 HIV 患者肛门鳞状上皮内瘤变
- 批准号:
6972327 - 财政年份:2004
- 资助金额:
$ 22.13万 - 项目类别:
Priliminary study to develop a perineal artificial anus
会阴人工肛门开发的初步研究
- 批准号:
07671351 - 财政年份:1995
- 资助金额:
$ 22.13万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Positional cloning of the genes related to malformations of eye, anus and heart
眼、肛门、心脏畸形相关基因的定位克隆
- 批准号:
06670824 - 财政年份:1994
- 资助金额:
$ 22.13万 - 项目类别:
Grant-in-Aid for General Scientific Research (C)














{{item.name}}会员




