权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

AF: Small: Redundancy exploiting algorithms for high throughput genomics

AF：小：利用冗余算法实现高通量基因组学

基本信息

批准号：
1619081
负责人：
Qin Zhang
金额：
$ 40万
依托单位：
Indiana University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2016
资助国家：
美国
起止时间：
2016-08-01 至 2020-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1619081&HistoricalAwards=false
关键词：
AF Small Redundancy exploiting algorithms

项目摘要

Determining the genomic makeup of individuals is crucial for understanding how certain genomic variants ultimately lead to disease (such as cancer). Determining genomic makeup of agriculturally important plants, trees, farm animals and wild life help improve agriculture, forestry, veterinary medicine and environmental science. Since the introduction of "next generation sequencing technologies" in 2008, the cost of genome sequencing has dropped by a factor of 1000. This has led to an increase in the speed genomic data is generated that far outpaces the improvements in our computing and data storage capability. With the advent of these cheap, and fast genome sequencing technologies, the scientific community has been able to launch mega-projects such as The Pan Cancer Analysis of Whole Genomes Project, which aim to determine the genome sequences of thousands of cancer patients. Our project aims to address the imminent data size challenges in these large scale genomic studies through new genomic data compression methods that aim to reduce the redundancy in how genomic sequences are represented. The source of this redundancy is the high similarity among genome sequences of individual patients, as well as the high similarity between regions across the genome of a single human genome. Since the main difficulty in extracting information from genome sequences is computational, reduction in the computational resources needed to manage and analyze genomic data through the compression methods will help genomics improve human life and the environment. The impact of this project on student and personnel training will be in terms of two new graduate courses at Indiana University: a course on data management, access and processing for genomic data by PI Sahinalp, and a course on compressed algorithms with a focus on genomic data, emphasizing the effects of new big data paradigms compression, by PI Ergun. Both courses will fit into the CS PhD program, as well as into the existing Bioinformatics and Data Science Master's programs; they are also intended to attract the more curious undergraduates.The rapid advancement of nucleic acid sequencing technology has re-shaped almost every field of life science, from agriculture to bioenergy, and from environmental science to biomedicine. Large-scale genome projects are producing petabyte-scale data from thousands of patients or by mobile sensors collecting environmental samples. As the technology marches forward, most people who visit hospitals will eventually have their (possibly tissue-specific) genomes sequenced. Genomic data will be collected from thousands to millions of non-model organisms and their populations in order to assess the biodiversity within the corresponding ecosystem. Complex microbial communities will be sampled from thousands of geographic locations to study the influence of environmental conditions. Furthermore, these studies will involve continuous data collection efforts, for the purpose of monitoring the dynamic changes in biosystems by the use of genome-wide or transcriptome-wide sequencing. As a result, genomic data generation is to occur at an unprecedented pace, necessitating the development of novel algorithms to help reduce the burden of genomic sequence data on computational, storage and transmission systems. This project combines the unique strengths of the two investigators at Indiana University, bringing a principled, algorithmic approach to critical infrastructure problems in genomics. The project will address the needs of the next stage of genomic data generation by mega cancer projects, portable devices collecting environmental samples, and even smaller sensors to be embedded in the human body, through the use of new compression tools and compressed data structures for communicating, storing, managing, and accessing large collections of (streaming) genome data. For this purpose, we will employ and expand the existing algorithmic repertoire involving approximation algorithms, sublinear algorithms, lossless data compression, I/O efficient, memory hierarchy aware/oblivious and compressed data structures.

确定个体的基因组构成对于理解某些基因组变异如何最终导致疾病（如癌症）至关重要。确定具有重要农业意义的植物、树木、农场动物和野生动物的基因组组成有助于改善农业、林业、兽医学和环境科学。自2008年引入“下一代测序技术”以来，基因组测序的成本下降了1000倍。这导致了基因组数据生成速度的提高，远远超过了我们计算和数据存储能力的改进。随着这些廉价、快速的基因组测序技术的出现，科学界已经能够启动大型项目，如泛癌症全基因组分析项目，旨在确定数千名癌症患者的基因组序列。我们的项目旨在通过新的基因组数据压缩方法来解决这些大规模基因组研究中迫在眉睫的数据规模挑战，这些方法旨在减少基因组序列表示方式中的冗余。这种冗余的来源是个体患者基因组序列之间的高度相似性，以及单个人类基因组区域之间的高度相似性。由于从基因组序列中提取信息的主要困难是计算，通过压缩方法减少管理和分析基因组数据所需的计算资源将有助于基因组学改善人类生活和环境。该项目对学生和人才培养的影响将体现在印第安纳大学的两门新的研究生课程上：PI Sahinalp教授的数据管理、基因组数据的获取和处理课程，以及PI Ergun教授的以基因组数据为重点的压缩算法课程，强调新的大数据范式压缩的影响。这两门课程将适合CS博士课程，以及现有的生物信息学和数据科学硕士课程；它们还旨在吸引更好奇的本科生。核酸测序技术的快速发展几乎重塑了生命科学的每一个领域，从农业到生物能源，从环境科学到生物医学。大规模的基因组计划正在从数千名患者或通过移动传感器收集环境样本中产生pb级的数据。随着技术的进步，大多数到医院就诊的人最终都将对他们的（可能是组织特异性的）基因组进行测序。将收集数千到数百万种非模式生物及其种群的基因组数据，以评估相应生态系统内的生物多样性。复杂的微生物群落将从数千个地理位置取样，以研究环境条件的影响。此外，这些研究将涉及持续的数据收集工作，目的是利用全基因组或全转录组测序来监测生物系统的动态变化。因此，基因组数据的生成将以前所未有的速度发生，这就需要开发新的算法来帮助减轻基因组序列数据在计算、存储和传输系统上的负担。该项目结合了印第安纳大学两位研究人员的独特优势，为基因组学中的关键基础设施问题带来了原则性的算法方法。该项目将通过使用新的压缩工具和压缩数据结构来通信、存储、管理和访问大量（流）基因组数据，解决大型癌症项目、收集环境样本的便携式设备以及嵌入人体的更小的传感器产生下一阶段基因组数据的需求。为此，我们将采用并扩展现有的算法库，包括近似算法、次线性算法、无损数据压缩、I/O效率、内存层次感知/遗忘和压缩数据结构。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Qin Zhang其他文献

Receptor activity‐modifying protein 1 regulates the phenotypic expression of BMSCs via the Hippo/Yap pathway

受体活性-修饰蛋白1通过Hippo/Yap途径调节BMSCs的表型表达

DOI：
10.1002/jcp.28082
发表时间：
2019-08
期刊：
J Cell Physiol
影响因子：
0
作者：
Qin Zhang;Yanjun Guo;Hui Yu;Yufei Tang;Ying Yuan;Yixuan Jiang;Huilu Chen;Ping Gong;Lin Xiang
通讯作者：
Lin Xiang

The gut microbiota modulator berberine ameliorates collagen-induced arthritis in rats by facilitating the generation of butyrate and adjusting the intestinal hypoxia and nitrate supply

肠道微生物群调节剂小檗碱通过促进丁酸盐的产生并调节肠道缺氧和硝酸盐的供应来改善大鼠胶原诱导的关节炎

DOI：
10.1096/fj.201900425rr
发表时间：
2019
期刊：
The FASEB Journal
影响因子：
0
作者：
Mengfan Yue;Yu Tao;Yulai Fang;Xingpan Lian;Qin Zhang;Yufeng Xia;Zhifeng Wei;Yue Dai
通讯作者：
Yue Dai