权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

High-performance Computing System for Bioinformatics

高性能生物信息计算系统

基本信息

批准号：
7595665
负责人：
Huntington F Willard
金额：
$ 46.14万
依托单位：
DUKE UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2009
资助国家：
美国
起止时间：
2009-06-01 至 2010-05-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/7595665
关键词：
Accounting Algorithms Bioinformatics Biological Biological Phenomena Biomedical Research Cells Complication Computational Biology Computer Simulation Computer Systems Computer software Computers Custom Data Data Analyses Data Set Data Storage and Retrieval Development Disasters Equipment Evolution Face Gene Expression Genome Genomics Growth High Performance Computing Individual Investigation Methods Modeling Noise Nucleic Acid Regulatory Sequences Occupations Operative Surgical Procedures Organism Performance Population Recovery Regulator Genes Request for Proposals Research Research Personnel Resolution Resources Running Scientist Signal Transduction Simulate Software Tools Speed System Systems Biology The Sun Time Tissues Work cell type cluster computing computerized tools computing resources cost flexibility instrument mass spectrometer new technology next generation public health relevance response simulation tool translational medicine

项目摘要

DESCRIPTION (provided by applicant): The explosive growth of computational biology has made it difficult for research organizations to keep pace with users' demands for ever-increasing computational power. The complications that biologists face come from two developments. First, new technologies generate huge amounts of data. Of course this makes it possible for biological investigations to broaden their scope to whole genome, cellular, and even organism levels, but at a cost of overtaxing existing methods and resources for data analysis. Second, algorithms and methods of analysis have become more computationally intensive, in part as a response to the opportunities that data richness has brought about and in part to manage the unfortunate signal-to-noise ratio that seem implicit in genomic datasets. Also, the emergence of "systems biology" has led to growing complication in computational work, since systems biology seeks eventually to model biological phenomena in silico. In effect, one of the major -- and indeed the most flexible -- instruments for genome scientists and systems biologists is the high performance computer, because it is an essential tool for making sense of the prodigious amounts of data already coming from high-throughput sequencers, gene expression microarray equipment, mass spectrometers, and the like. On high-performance cluster computers, many researchers are making use of basic "job-level parallelism" by which a single user may run multiple jobs (or independent sub-parts of jobs) on many hundreds of computers at once. Often, this is in the form of computational "parameter space studies" where the same application is run on tens, hundreds and thousands of different sets of inputs. Simulating the evolution of regulatory regions, for example, requires multiple runs in which the size and number of short regulatory motifs are tuned. The prediction of gene regulatory networks requires multiple simulations in which different cell types and different tissue regions are modified. Simulations of gene expression dynamics in populations of cells must also be run multiple times in order to account for "cellular noise" and get a comprehensive picture of the phenomena. This need for repeated computations makes cluster computing an attractive approach for these problems. Our proposal requests 94 power-efficient compute servers and about 8 terabytes (usable) high-speed data storage with matched disaster recovery storage. This equipment will be put into operation using Sun Grid Engine, a software application that coordinates computational resources so that individual machines function as one clustered computational instrument. Bioinformatic software tools, as well as custom-made applications, are available for researchers to use on the equipment. PUBLIC HEALTH RELEVANCE: Next-generation instruments have made acquiring genomic data inexpensive and ever more efficient, and new technologies promise to add greatly to the resolution and richness of data used for biomedical research and for translational medicine. This torrent of data needs equally powerful and flexible tools for analysis and information creation, in effect matching high-throughput data producers with high performance computational tools for analysis. We propose the creation of a well integrated computational system that matches in compute power the prodigious data flows from instruments producing genomic data.

描述（由申请人提供）：计算生物学的爆炸性增长使得研究机构很难跟上用户对不断增长的计算能力的需求。生物学家面临的复杂情况来自两个方面。首先，新技术产生了大量的数据。当然，这使得生物学研究有可能将其范围扩大到全基因组、细胞甚至生物体水平，但代价是现有的数据分析方法和资源负担过重。其次，算法和分析方法已经变得更加计算密集，部分原因是为了应对数据丰富带来的机会，部分原因是为了管理基因组数据集中似乎隐含的不幸的信噪比。此外，“系统生物学”的出现导致计算工作日益复杂，因为系统生物学最终寻求在计算机上模拟生物现象。实际上，对于基因组科学家和系统生物学家来说，高性能计算机是最主要、也是最灵活的工具之一，因为它是处理高通量测序仪、基因表达微阵列设备、质谱仪等产生的海量数据的重要工具。在高性能集群计算机上，许多研究人员正在使用基本的“作业级并行性”，通过这种并行性，单个用户可以同时在数百台计算机上运行多个作业（或作业的独立子部分）。通常，这是以计算“参数空间研究”的形式出现的，在这种情况下，同一个应用程序在数十、数百甚至数千个不同的输入集上运行。例如，模拟调控区域的进化需要多次运行，其中调整短调控基序的大小和数量。基因调控网络的预测需要多种模拟，其中不同的细胞类型和不同的组织区域被修改。为了解释“细胞噪声”并得到一个全面的现象图，细胞群体中基因表达动力学的模拟也必须多次运行。这种对重复计算的需求使得集群计算成为解决这些问题的一种有吸引力的方法。我们的建议需要94台节能计算服务器和大约8tb（可用）的高速数据存储以及匹配的灾难恢复存储。该设备将使用Sun Grid Engine投入运行，Sun Grid Engine是一种软件应用程序，用于协调计算资源，使单个机器作为一个集群计算仪器运行。生物信息学软件工具以及定制的应用程序可供研究人员在设备上使用。