Machine learning methods to increase genomic accessibility by next-gen sequencing
通过下一代测序提高基因组可访问性的机器学习方法
基本信息
- 批准号:8683213
- 负责人:
- 金额:$ 22.13万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2012
- 资助国家:美国
- 起止时间:2012-08-01 至 2016-06-30
- 项目状态:已结题
- 来源:
- 关键词:AlgorithmsAnusAreaBindingBiologicalBiological AssayBiologyBiomedical ResearchBruck-de Lange syndromeChIP-seqCholesterolChromatinCollaborationsCollectionCommunitiesComputational algorithmComputer softwareComputersDNA SequenceDNA-Protein InteractionDataData AnalysesData SetDetectionDiseaseExonsFacioscapulohumeralFoundationsGenerationsGenetic VariationGenomeGenomicsGoalsGrowth and Development functionInternetLocationMachine LearningMapsMedical ResearchMedicineMethodologyMethodsMuscular DystrophiesProceduresPublishingReadingResearchResearch InstituteResearch PersonnelRoleScientistSequence AnalysisSoftware EngineeringSpeedStatistical ModelsStructureTestingUncertaintyWorkbasecohesincomputerized toolscostepigenomefatty acid metabolismfunctional genomicsgenome sequencinggenome-wideimprovedinsertion/deletion mutationnext generation sequencingnovelopen sourcepublic health relevancetooltranscription factortranscriptome sequencingxenopus development
项目摘要
DESCRIPTION (provided by applicant): DNA sequencing has become an indispensable tool in many areas of biology and medicine. Recent techno- logical breakthroughs in next-generation sequencing (NGS) have made it possible to sequence billions of bases quickly and cheaply. A number of NGS-based tools have been created, including ChIP-seq, RNA-seq, Methyl- seq and exon/whole-genome sequencing, enabling a fundamentally new way of studying diseases, genomes and epigenomes. The widespread use of NGS-based methods calls for better and more efficient tools for the analysis and interpretation of the NGS high-throughput data. Although a number of computational tools have been devel- oped, they are insufficient in mapping and studying genome features located within repeat, duplicated and other so-called unmappable regions of genomes. In this project, computational algorithms and software that expand genomic accessibility of NGS to these previously understudied regions will be developed. The algorithms will begin with a new way of mapping raw reads from NGS to the reference genome, followed by a machine learning method to resolve ambiguously mapped reads, and will be integrated into a comprehen- sive analysis pipeline for ChIP-seq. More specifically, the three aims of the research are to develop: (1) Data structures and efficient algorithms for read mapping to rapidly identify all mapping locations. Unlike existing methods, the focus of this research is to rapidly identify all candidate locations of each read, instead of one or only a few locations. (2) Machine learning algorithms for read analysis to resolve ambiguously mapped reads for both ChIP-seq analysis and genetic variation detection. This work will develop probabilistic models to resolve ambiguously mapped reads by pooling information from the entire collection of reads. (3) A comprehensive ChIP- seq analysis pipeline to systematically study genomic features located within unmappable regions of genomes. These algorithms will be tested and refined using both publicly available data and data from established wet-lab collaborators. In addition to discovering new genomic features located within repeat, duplicated or other previously unac- cessible regions, this work will provide the NGS community with (a) a faster and more accurate tool for mapping short sequence reads, (b) a general methodology for expanding genomic accessibility of NGS, and (c) a versatile, modular, open-source toolbox of algorithms for NGS data analysis, (d) a comprehensive analysis of protein-DNA interactions in repeat regions in all publicly available ChIP-seq datasets. This work is a close collaboration between computer scientists and web-lab biologists who are developing NGS assays to study biomedical problems. In particular, we will collaborate with Timothy Osborne of Sanford- Burnham Medical Research Institute to study regulators involved in cholesterol and fatty acid metabolism, with Kyoko Yokomori of UC Irvine to study Cohesin, Nipbl and their roles in Cornelia de Lange syndrome, and Ken Cho of UC Irvine to study the roles of FoxH1 and Schnurri in development and growth control.
描述(由申请人提供):在许多生物学和医学领域,DNA测序已成为必不可少的工具。下一代测序(NG)的最新技术突破使得可以快速,廉价地对数十亿个基础进行序列。已经创建了许多基于NGS的工具,包括CHIP-SEQ,RNA-SEQ,甲基SEQ和外显子/全基因组测序,从而使研究疾病,基因组和表观基因组的新方法具有根本性的新方法。基于NGS的方法的广泛使用需要更好,更有效的工具来分析和解释NGS高通量数据。尽管已经进行了许多计算工具,但它们不足以映射和研究位于重复,重复和其他所谓的基因组中的基因组特征。在这个项目中,将开发将NGS扩展到这些先前研究区域的基因组可及性的计算算法和软件。 该算法将以一种从NGS到参考基因组映射原始读取的新方法,然后采用机器学习方法来解决模棱两可的映射读取,并将集成到chip-seq的综合分析管道中。更具体地说,研究的三个目的是开发:(1)数据结构和有效算法用于读取映射以快速识别所有映射位置。与现有方法不同,这项研究的重点是快速识别每个读取的所有候选位置,而不是一个或仅几个位置。 (2)用于读取分析的机器学习算法以解决模棱两可的映射读取,用于芯片序列分析和遗传变异检测。这项工作将开发概率模型,以通过汇总整个读取集合中的信息来解决模棱两可的映射读取。 (3)全面的芯片分析管道,以系统地研究位于基因组不易加区域内的基因组特征。这些算法将使用已建立的湿LAB合作者的公开数据和数据进行测试和完善。 In addition to discovering new genomic features located within repeat, duplicated or other previously unac- cessible regions, this work will provide the NGS community with (a) a faster and more accurate tool for mapping short sequence reads, (b) a general methodology for expanding genomic accessibility of NGS, and (c) a versatile, modular, open-source toolbox of algorithms for NGS data analysis, (d) a comprehensive analysis在所有可公开可用的芯片seq数据集中重复区域中蛋白-DNA相互作用的相互作用。 这项工作是计算机科学家与网络LAB生物学家之间的密切合作,他们正在开发NGS测定法以研究生物医学问题。特别是,我们将与桑福德 - 伯纳姆医学研究所的蒂莫西·奥斯本(Timothy Osborne)合作研究涉及胆固醇和脂肪酸代谢的监管机构,以及UC Irvine的Kyoko Yokomori研究凝聚蛋白,NIPBL及其在Cornelia de Lange综合征中的角色及其在Cornelia de Lange综合征中的作用,以及ken irvine Cho of UC Irvine Choer in Develpment of Forkance in Devancation and Schn of Schn of Schn and Schn of Schn of Schn of Schn and Schn of Schn of Schn of Schn and Schn of Schn of Schn and Schn。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Improving read mapping using additional prefix grams.
- DOI:10.1186/1471-2105-15-42
- 发表时间:2014-02-05
- 期刊:
- 影响因子:3
- 作者:Kim J;Li C;Xie X
- 通讯作者:Xie X
MixClone: a mixture model for inferring tumor subclonal populations.
- DOI:10.1186/1471-2164-16-s2-s1
- 发表时间:2015
- 期刊:
- 影响因子:4.4
- 作者:Li Y;Xie X
- 通讯作者:Xie X
A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues.
- DOI:10.1186/1471-2105-14-s5-s11
- 发表时间:2013
- 期刊:
- 影响因子:3
- 作者:Li Y;Xie X
- 通讯作者:Xie X
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Xiaohui Xie其他文献
Xiaohui Xie的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Xiaohui Xie', 18)}}的其他基金
Machine learning methods to increase genomic accessibility by next-gen sequencing
通过下一代测序提高基因组可访问性的机器学习方法
- 批准号:
8350385 - 财政年份:2012
- 资助金额:
$ 22.13万 - 项目类别:
Machine learning methods to increase genomic accessibility by next-gen sequencing
通过下一代测序提高基因组可访问性的机器学习方法
- 批准号:
8518436 - 财政年份:2012
- 资助金额:
$ 22.13万 - 项目类别:
相似国自然基金
电针联合骨髓间充质干细胞移植通过HGF/c-Met/mTOR信号通路抑制肛门括约肌卫星细胞自噬的机制研究
- 批准号:82305230
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向肛门失禁疾病的仿耻骨直肠肌式人工肛门括约肌系统功能重建研究
- 批准号:62103263
- 批准年份:2021
- 资助金额:24.00 万元
- 项目类别:青年科学基金项目
面向肛门失禁疾病的仿耻骨直肠肌式人工肛门括约肌系统功能重建研究
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
novel_circ_001042/miR-298-5p/Capn1轴调节线粒体能量代谢在先天性肛门直肠畸形发生中的作用机制研究
- 批准号:
- 批准年份:2021
- 资助金额:55 万元
- 项目类别:面上项目
基于电磁层析显影技术的直肠肛门排便生理功能的精确测量及临床诊断价值评估
- 批准号:
- 批准年份:2021
- 资助金额:53 万元
- 项目类别:面上项目
相似海外基金
Optimizing Age-based Anal Cancer Screening Among People Living with HIV using Decision Analytic Modeling
使用决策分析模型优化艾滋病毒感染者中基于年龄的肛门癌筛查
- 批准号:
9886218 - 财政年份:2019
- 资助金额:
$ 22.13万 - 项目类别:
Optimizing Age-based Anal Cancer Screening Among People Living with HIV using Decision Analytic Modeling
使用决策分析模型优化艾滋病毒感染者中基于年龄的肛门癌筛查
- 批准号:
10350629 - 财政年份:2019
- 资助金额:
$ 22.13万 - 项目类别: