Machine learning methods to increase genomic accessibility by next-gen sequencing
通过下一代测序提高基因组可访问性的机器学习方法
基本信息
- 批准号:8683213
- 负责人:
- 金额:$ 22.13万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2012
- 资助国家:美国
- 起止时间:2012-08-01 至 2016-06-30
- 项目状态:已结题
- 来源:
- 关键词:AlgorithmsAnusAreaBindingBiologicalBiological AssayBiologyBiomedical ResearchBruck-de Lange syndromeChIP-seqCholesterolChromatinCollaborationsCollectionCommunitiesComputational algorithmComputer softwareComputersDNA SequenceDNA-Protein InteractionDataData AnalysesData SetDetectionDiseaseExonsFacioscapulohumeralFoundationsGenerationsGenetic VariationGenomeGenomicsGoalsGrowth and Development functionInternetLocationMachine LearningMapsMedical ResearchMedicineMethodologyMethodsMuscular DystrophiesProceduresPublishingReadingResearchResearch InstituteResearch PersonnelRoleScientistSequence AnalysisSoftware EngineeringSpeedStatistical ModelsStructureTestingUncertaintyWorkbasecohesincomputerized toolscostepigenomefatty acid metabolismfunctional genomicsgenome sequencinggenome-wideimprovedinsertion/deletion mutationnext generation sequencingnovelopen sourcepublic health relevancetooltranscription factortranscriptome sequencingxenopus development
项目摘要
DESCRIPTION (provided by applicant): DNA sequencing has become an indispensable tool in many areas of biology and medicine. Recent techno- logical breakthroughs in next-generation sequencing (NGS) have made it possible to sequence billions of bases quickly and cheaply. A number of NGS-based tools have been created, including ChIP-seq, RNA-seq, Methyl- seq and exon/whole-genome sequencing, enabling a fundamentally new way of studying diseases, genomes and epigenomes. The widespread use of NGS-based methods calls for better and more efficient tools for the analysis and interpretation of the NGS high-throughput data. Although a number of computational tools have been devel- oped, they are insufficient in mapping and studying genome features located within repeat, duplicated and other so-called unmappable regions of genomes. In this project, computational algorithms and software that expand genomic accessibility of NGS to these previously understudied regions will be developed. The algorithms will begin with a new way of mapping raw reads from NGS to the reference genome, followed by a machine learning method to resolve ambiguously mapped reads, and will be integrated into a comprehen- sive analysis pipeline for ChIP-seq. More specifically, the three aims of the research are to develop: (1) Data structures and efficient algorithms for read mapping to rapidly identify all mapping locations. Unlike existing methods, the focus of this research is to rapidly identify all candidate locations of each read, instead of one or only a few locations. (2) Machine learning algorithms for read analysis to resolve ambiguously mapped reads for both ChIP-seq analysis and genetic variation detection. This work will develop probabilistic models to resolve ambiguously mapped reads by pooling information from the entire collection of reads. (3) A comprehensive ChIP- seq analysis pipeline to systematically study genomic features located within unmappable regions of genomes. These algorithms will be tested and refined using both publicly available data and data from established wet-lab collaborators. In addition to discovering new genomic features located within repeat, duplicated or other previously unac- cessible regions, this work will provide the NGS community with (a) a faster and more accurate tool for mapping short sequence reads, (b) a general methodology for expanding genomic accessibility of NGS, and (c) a versatile, modular, open-source toolbox of algorithms for NGS data analysis, (d) a comprehensive analysis of protein-DNA interactions in repeat regions in all publicly available ChIP-seq datasets. This work is a close collaboration between computer scientists and web-lab biologists who are developing NGS assays to study biomedical problems. In particular, we will collaborate with Timothy Osborne of Sanford- Burnham Medical Research Institute to study regulators involved in cholesterol and fatty acid metabolism, with Kyoko Yokomori of UC Irvine to study Cohesin, Nipbl and their roles in Cornelia de Lange syndrome, and Ken Cho of UC Irvine to study the roles of FoxH1 and Schnurri in development and growth control.
描述(由申请人提供):DNA测序已成为生物学和医学许多领域不可或缺的工具。最近新一代测序 (NGS) 领域的技术突破使得快速、廉价地对数十亿个碱基进行测序成为可能。许多基于 NGS 的工具已经诞生,包括 ChIP-seq、RNA-seq、Mmethyl-seq 和外显子/全基因组测序,为研究疾病、基因组和表观基因组提供了一种全新的方法。基于NGS的方法的广泛使用需要更好、更有效的工具来分析和解释NGS高通量数据。尽管已经开发了许多计算工具,但它们不足以绘制和研究位于基因组重复、重复和其他所谓的不可绘制区域内的基因组特征。在该项目中,将开发计算算法和软件,将 NGS 的基因组可访问性扩展到这些先前研究不足的区域。 该算法将从将 NGS 的原始读数映射到参考基因组的新方法开始,然后采用机器学习方法来解决模糊映射的读数,并将集成到 ChIP-seq 的综合分析流程中。更具体地说,该研究的三个目标是开发:(1)用于读取映射的数据结构和高效算法,以快速识别所有映射位置。与现有方法不同,这项研究的重点是快速识别每次读取的所有候选位置,而不是一个或仅几个位置。 (2) 用于读取分析的机器学习算法,以解决 ChIP-seq 分析和遗传变异检测的模糊映射读取。这项工作将开发概率模型,通过汇集整个读数集合的信息来解决模糊映射的读数。 (3) 全面的 ChIP-seq 分析流程,用于系统地研究位于基因组不可映射区域内的基因组特征。这些算法将使用公开数据和来自已建立的湿实验室合作者的数据进行测试和完善。 除了发现位于重复、重复或其他以前无法访问的区域内的新基因组特征之外,这项工作还将为 NGS 社区提供(a)更快、更准确的工具来绘制短序列读数,(b)扩展 NGS 基因组可访问性的通用方法,以及(c)用于 NGS 数据分析的多功能、模块化、开源算法工具箱,(d)全面分析 所有公开可用的 ChIP-seq 数据集中重复区域的蛋白质-DNA 相互作用。 这项工作是计算机科学家和网络实验室生物学家之间的密切合作,他们正在开发 NGS 分析方法来研究生物医学问题。特别是,我们将与桑福德-伯纳姆医学研究所的 Timothy Osborne 合作,研究参与胆固醇和脂肪酸代谢的调节因子;与加州大学欧文分校的 Kyoko Yokomori 合作,研究 Cohesin、Nipbl 及其在 Cornelia de Lange 综合征中的作用;与加州大学欧文分校的 Ken Cho 合作,研究 FoxH1 和 Schnurri 在发育和生长控制中的作用。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Improving read mapping using additional prefix grams.
- DOI:10.1186/1471-2105-15-42
- 发表时间:2014-02-05
- 期刊:
- 影响因子:3
- 作者:Kim J;Li C;Xie X
- 通讯作者:Xie X
MixClone: a mixture model for inferring tumor subclonal populations.
- DOI:10.1186/1471-2164-16-s2-s1
- 发表时间:2015
- 期刊:
- 影响因子:4.4
- 作者:Li Y;Xie X
- 通讯作者:Xie X
A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues.
- DOI:10.1186/1471-2105-14-s5-s11
- 发表时间:2013
- 期刊:
- 影响因子:3
- 作者:Li Y;Xie X
- 通讯作者:Xie X
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Xiaohui Xie其他文献
Xiaohui Xie的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Xiaohui Xie', 18)}}的其他基金
Machine learning methods to increase genomic accessibility by next-gen sequencing
通过下一代测序提高基因组可访问性的机器学习方法
- 批准号:
8350385 - 财政年份:2012
- 资助金额:
$ 22.13万 - 项目类别:
Machine learning methods to increase genomic accessibility by next-gen sequencing
通过下一代测序提高基因组可访问性的机器学习方法
- 批准号:
8518436 - 财政年份:2012
- 资助金额:
$ 22.13万 - 项目类别:
相似海外基金
An epidemiological study on HPV prevalence of external genitalia, urinary tract, oropharynx and anus among Japanese men
日本男性外生殖器、尿道、口咽、肛门HPV感染情况的流行病学研究
- 批准号:
26861261 - 财政年份:2014
- 资助金额:
$ 22.13万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Effects of training using an obstetric simulator for midwifery students to improve their techniques for protection of the perineum and anus, and related problems
使用产科模拟器对助产士学生提高会阴和肛门保护技术的培训效果及相关问题
- 批准号:
24531195 - 财政年份:2012
- 资助金额:
$ 22.13万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Mucosal immune response of the anus in women to HPV, intercourse, smoking and OCs
女性肛门对 HPV、性交、吸烟和口服避孕药的粘膜免疫反应
- 批准号:
7813440 - 财政年份:2010
- 资助金额:
$ 22.13万 - 项目类别:
INFRARED COAGULATOR FOR SQUAMOUS INTRAEPITHELIAL NEOPLASIA OF ANUS IN HIV+
红外线凝固器治疗 HIV 患者肛门鳞状上皮内瘤变
- 批准号:
7202667 - 财政年份:2005
- 资助金额:
$ 22.13万 - 项目类别:
INFRARED COAGULATOR FOR SQUAMOUS INTRAEPITHELIAL NEOPLASIA OF ANUS IN HIV+
红外线凝固器治疗 HIV 患者肛门鳞状上皮内瘤变
- 批准号:
6972327 - 财政年份:2004
- 资助金额:
$ 22.13万 - 项目类别:
Priliminary study to develop a perineal artificial anus
会阴人工肛门开发的初步研究
- 批准号:
07671351 - 财政年份:1995
- 资助金额:
$ 22.13万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Positional cloning of the genes related to malformations of eye, anus and heart
眼、肛门、心脏畸形相关基因的定位克隆
- 批准号:
06670824 - 财政年份:1994
- 资助金额:
$ 22.13万 - 项目类别:
Grant-in-Aid for General Scientific Research (C)














{{item.name}}会员




