AF: Small: Redundancy exploiting algorithms for high throughput genomics
AF:小:利用冗余算法实现高通量基因组学
基本信息
- 批准号:1619081
- 负责人:
- 金额:$ 40万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-08-01 至 2020-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Determining the genomic makeup of individuals is crucial for understanding how certain genomic variants ultimately lead to disease (such as cancer). Determining genomic makeup of agriculturally important plants, trees, farm animals and wild life help improve agriculture, forestry, veterinary medicine and environmental science. Since the introduction of "next generation sequencing technologies" in 2008, the cost of genome sequencing has dropped by a factor of 1000. This has led to an increase in the speed genomic data is generated that far outpaces the improvements in our computing and data storage capability. With the advent of these cheap, and fast genome sequencing technologies, the scientific community has been able to launch mega-projects such as The Pan Cancer Analysis of Whole Genomes Project, which aim to determine the genome sequences of thousands of cancer patients. Our project aims to address the imminent data size challenges in these large scale genomic studies through new genomic data compression methods that aim to reduce the redundancy in how genomic sequences are represented. The source of this redundancy is the high similarity among genome sequences of individual patients, as well as the high similarity between regions across the genome of a single human genome. Since the main difficulty in extracting information from genome sequences is computational, reduction in the computational resources needed to manage and analyze genomic data through the compression methods will help genomics improve human life and the environment. The impact of this project on student and personnel training will be in terms of two new graduate courses at Indiana University: a course on data management, access and processing for genomic data by PI Sahinalp, and a course on compressed algorithms with a focus on genomic data, emphasizing the effects of new big data paradigms compression, by PI Ergun. Both courses will fit into the CS PhD program, as well as into the existing Bioinformatics and Data Science Master's programs; they are also intended to attract the more curious undergraduates.The rapid advancement of nucleic acid sequencing technology has re-shaped almost every field of life science, from agriculture to bioenergy, and from environmental science to biomedicine. Large-scale genome projects are producing petabyte-scale data from thousands of patients or by mobile sensors collecting environmental samples. As the technology marches forward, most people who visit hospitals will eventually have their (possibly tissue-specific) genomes sequenced. Genomic data will be collected from thousands to millions of non-model organisms and their populations in order to assess the biodiversity within the corresponding ecosystem. Complex microbial communities will be sampled from thousands of geographic locations to study the influence of environmental conditions. Furthermore, these studies will involve continuous data collection efforts, for the purpose of monitoring the dynamic changes in biosystems by the use of genome-wide or transcriptome-wide sequencing. As a result, genomic data generation is to occur at an unprecedented pace, necessitating the development of novel algorithms to help reduce the burden of genomic sequence data on computational, storage and transmission systems. This project combines the unique strengths of the two investigators at Indiana University, bringing a principled, algorithmic approach to critical infrastructure problems in genomics. The project will address the needs of the next stage of genomic data generation by mega cancer projects, portable devices collecting environmental samples, and even smaller sensors to be embedded in the human body, through the use of new compression tools and compressed data structures for communicating, storing, managing, and accessing large collections of (streaming) genome data. For this purpose, we will employ and expand the existing algorithmic repertoire involving approximation algorithms, sublinear algorithms, lossless data compression, I/O efficient, memory hierarchy aware/oblivious and compressed data structures.
确定个体的基因组构成对于理解某些基因组变异如何最终导致疾病(如癌症)至关重要。确定具有重要农业意义的植物、树木、农场动物和野生动物的基因组组成有助于改善农业、林业、兽医学和环境科学。自2008年引入“下一代测序技术”以来,基因组测序的成本下降了1000倍。这导致了基因组数据生成速度的提高,远远超过了我们计算和数据存储能力的改进。随着这些廉价、快速的基因组测序技术的出现,科学界已经能够启动大型项目,如泛癌症全基因组分析项目,旨在确定数千名癌症患者的基因组序列。我们的项目旨在通过新的基因组数据压缩方法来解决这些大规模基因组研究中迫在眉睫的数据规模挑战,这些方法旨在减少基因组序列表示方式中的冗余。这种冗余的来源是个体患者基因组序列之间的高度相似性,以及单个人类基因组区域之间的高度相似性。由于从基因组序列中提取信息的主要困难是计算,通过压缩方法减少管理和分析基因组数据所需的计算资源将有助于基因组学改善人类生活和环境。该项目对学生和人才培养的影响将体现在印第安纳大学的两门新的研究生课程上:PI Sahinalp教授的数据管理、基因组数据的获取和处理课程,以及PI Ergun教授的以基因组数据为重点的压缩算法课程,强调新的大数据范式压缩的影响。这两门课程将适合CS博士课程,以及现有的生物信息学和数据科学硕士课程;它们还旨在吸引更好奇的本科生。核酸测序技术的快速发展几乎重塑了生命科学的每一个领域,从农业到生物能源,从环境科学到生物医学。大规模的基因组计划正在从数千名患者或通过移动传感器收集环境样本中产生pb级的数据。随着技术的进步,大多数到医院就诊的人最终都将对他们的(可能是组织特异性的)基因组进行测序。将收集数千到数百万种非模式生物及其种群的基因组数据,以评估相应生态系统内的生物多样性。复杂的微生物群落将从数千个地理位置取样,以研究环境条件的影响。此外,这些研究将涉及持续的数据收集工作,目的是利用全基因组或全转录组测序来监测生物系统的动态变化。因此,基因组数据的生成将以前所未有的速度发生,这就需要开发新的算法来帮助减轻基因组序列数据在计算、存储和传输系统上的负担。该项目结合了印第安纳大学两位研究人员的独特优势,为基因组学中的关键基础设施问题带来了原则性的算法方法。该项目将通过使用新的压缩工具和压缩数据结构来通信、存储、管理和访问大量(流)基因组数据,解决大型癌症项目、收集环境样本的便携式设备以及嵌入人体的更小的传感器产生下一阶段基因组数据的需求。为此,我们将采用并扩展现有的算法库,包括近似算法、次线性算法、无损数据压缩、I/O效率、内存层次感知/遗忘和压缩数据结构。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Qin Zhang其他文献
Receptor activity‐modifying protein 1 regulates the phenotypic expression of BMSCs via the Hippo/Yap pathway
受体活性-修饰蛋白1通过Hippo/Yap途径调节BMSCs的表型表达
- DOI:
10.1002/jcp.28082 - 发表时间:
2019-08 - 期刊:
- 影响因子:0
- 作者:
Qin Zhang;Yanjun Guo;Hui Yu;Yufei Tang;Ying Yuan;Yixuan Jiang;Huilu Chen;Ping Gong;Lin Xiang - 通讯作者:
Lin Xiang
The gut microbiota modulator berberine ameliorates collagen-induced arthritis in rats by facilitating the generation of butyrate and adjusting the intestinal hypoxia and nitrate supply
肠道微生物群调节剂小檗碱通过促进丁酸盐的产生并调节肠道缺氧和硝酸盐的供应来改善大鼠胶原诱导的关节炎
- DOI:
10.1096/fj.201900425rr - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Mengfan Yue;Yu Tao;Yulai Fang;Xingpan Lian;Qin Zhang;Yufeng Xia;Zhifeng Wei;Yue Dai - 通讯作者:
Yue Dai
用于工业系统故障诊断的动态不确定因果图的建模和概率推理方法
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:4.3
- 作者:
Chunling Dong;Qin Zhang - 通讯作者:
Qin Zhang
The lattice vibration and microwave dielectric properties of BaZnP 2− x Nb x O 7 ceramics for microwave substrates
微波基片BaZnP 2·x Nb x O 7 陶瓷的晶格振动和微波介电性能
- DOI:
10.1111/jace.18695 - 发表时间:
2022 - 期刊:
- 影响因子:3.9
- 作者:
Fangyi Huang;Hua Su;Qin Zhang;Xiao-Hui Wu;Yulan Jing;Yuanxun Li;Xiaoli Tang - 通讯作者:
Xiaoli Tang
Surface Modification of Colloidal Silica Nanoparticles: Controlling the size and Grafting Process
胶体二氧化硅纳米颗粒的表面改性:控制尺寸和接枝过程
- DOI:
10.5012/bkcs.2013.34.9.2747 - 发表时间:
2013-09 - 期刊:
- 影响因子:0
- 作者:
Lijuan Long;Shuhao Qin;Jie Yu;Qin Zhang - 通讯作者:
Qin Zhang
Qin Zhang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Qin Zhang', 18)}}的其他基金
Collaborative Research: AF: Small: Parallel Reinforcement Learning with Communication and Adaptivity Constraints
协作研究:AF:小型:具有通信和适应性约束的并行强化学习
- 批准号:
2006591 - 财政年份:2020
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CAREER:Foundation of Communication-Efficient Distributed Computation and Monitoring
职业:通信高效的分布式计算和监控的基础
- 批准号:
1844234 - 财政年份:2019
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
BIGDATA: Collaborative Research: F: Efficient Distributed Computation of Large-Scale Graph Problems in Epidemiology and Contagion Dynamics
BIGDATA:协作研究:F:流行病学和传染动力学中大规模图问题的高效分布式计算
- 批准号:
1633215 - 财政年份:2016
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
AF: Small: Efficient Algorithms for Querying Noisy Distributed/Streaming Datasets
AF:小:查询嘈杂分布式/流数据集的高效算法
- 批准号:
1525024 - 财政年份:2015
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
相似国自然基金
昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
- 批准号:32000033
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
- 批准号:31972324
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
- 批准号:81900988
- 批准年份:2019
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
- 批准号:31870821
- 批准年份:2018
- 资助金额:56.0 万元
- 项目类别:面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
- 批准号:31802058
- 批准年份:2018
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
- 批准号:31772128
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
- 批准号:81704176
- 批准年份:2017
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
- 批准号:91640114
- 批准年份:2016
- 资助金额:85.0 万元
- 项目类别:重大研究计划
相似海外基金
NSF-BSF: FET: Small: Redundancy for Storage in the Edge
NSF-BSF:FET:小型:边缘存储的冗余
- 批准号:
2120262 - 财政年份:2021
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CIF: Small: Collaborative Research: Error Correction with Natural Redundancy
CIF:小型:协作研究:利用自然冗余进行纠错
- 批准号:
1717884 - 财政年份:2017
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CIF: Small: Collaborative Research: Error Correction with Natural Redundancy
CIF:小型:协作研究:利用自然冗余进行纠错
- 批准号:
1718886 - 财政年份:2017
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CSR: EDS: Small: Energy-Aware Redundancy Management
CSR:EDS:小型:能源感知冗余管理
- 批准号:
1617551 - 财政年份:2016
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
NeTS: CSR: Small: Towards a Redundancy Aware Network Stack
NeTS:CSR:小型:迈向冗余感知网络堆栈
- 批准号:
1618321 - 财政年份:2016
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CIF: Small: Approaching Capacity in High Throughput Communication Systems with Incremental Redundancy
CIF:小:通过增量冗余接近高吞吐量通信系统的容量
- 批准号:
1618272 - 财政年份:2016
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
SHF: Small: Collaborative Research: RUI: Fast and Precise Dynamic Race Detection: Eliminating State and Checking Redundancy
SHF:小型:协作研究:RUI:快速精确的动态竞争检测:消除状态并检查冗余
- 批准号:
1421051 - 财政年份:2014
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
SHF: Small: Collaborative Research: Fast and Precise Dynamic Race Detection: Eliminating State and Checking Redundancy
SHF:小型:协作研究:快速、精确的动态竞争检测:消除状态并检查冗余
- 批准号:
1421016 - 财政年份:2014
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
SHF: Small: RESYST: Resilience via Synergistic Redundancy and Fault Tolerance for High-End Computing
SHF:小型:RESYST:通过协同冗余和容错实现高端计算的弹性
- 批准号:
1058779 - 财政年份:2010
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
SHF: Small: Minimal Multithreading - Exploiting Redundancy in Parallel Systems
SHF:小:最小多线程 - 利用并行系统中的冗余
- 批准号:
1017578 - 财政年份:2010
- 资助金额:
$ 40万 - 项目类别:
Standard Grant