A High-memory Supercomputer for Proteomics, Text Mining and Microbiome Research
用于蛋白质组学、文本挖掘和微生物组研究的高内存超级计算机
基本信息
- 批准号:8334437
- 负责人:
- 金额:$ 190万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2013
- 资助国家:美国
- 起止时间:2013-04-22 至 2015-04-21
- 项目状态:已结题
- 来源:
- 关键词:AccountingAreaBiological SciencesBiomedical ResearchBiotechnologyClientCollaborationsCommunitiesDNA Sequencing FacilityDataDatabasesFacultyFundingGenomeGrantGrowthHigh Performance ComputingHousingInternetLaboratoriesMemoryOccupationsPerformanceProteomicsResearchResearch InfrastructureResourcesRestRunningSoftware ToolsSupercomputingSystemTechnologyTimeTrainingUnited States National Institutes of Healthbaseend of lifeinnovationinstrumentknowledge basemeetingsmicrobiomeoutreachsimulationsupercomputertext searching
项目摘要
DESCRIPTION (provided by applicant): We request funds to purchase an integrated supercomputer to unite 5 highly productive and collaborative laboratories with complementary expertise in the microbiome, proteomics, text mining, and supercomputing, and to extend these capabilities to the broader NIH-funded biomedical research community via cloud and web applications. The critical shared need not met by other systems on campus, unavailable in commercial clouds, and oversubscribed at national labs, is for a system that can run jobs that require high memory (8-32 GB/core) and long duration (>2 weeks wall-time), and is optimized for high-IO tasks that saturate network or storage on other systems. The system will consist of 128 servers, each using 2x8-core 2.93GHz Intel Sandybridge CPUs. 20 large-memory nodes will each have 512GB of RAM (32GB/core), and 100 compute nodes will each have 128GB of RAM (8GB/core). These 120 nodes will each use two 10Gbps Ethernet ports bonded together for a 20Gbps/node (2.5GB/s) connection to the rest of the system, and each node will have 2.4TB raw high- performance local storage. The total aggregate performance of these local disks is over 36GB/s sustained (>300MB/s per node). The remaining 8 nodes will be used for administration, support for advanced software tools and infrastructure, and user interaction. A central high-performance Lustre parallel file system will provide 1.15PB of usable scratch space and sustain 36GB/s to the 128 clients. An archival system of 4 drives/300 tapes will sustain >1GB/s aggregate (accounting for compression), provide 450TB of raw capacity, store ~4.5 PB of user data, and scale to 5x this size. The system, valued at $4.5 million but quoted at $2 million by HP due to the strategic importance of this partnership, will be housed in a state-of-the
art machine room in the new Jennie Smoly Caruthers Biotechnology Building on the Boulder campus (opening Feb 2012), and connect to the rest of the campus at 40Gbps. The system will be a key enabling technology for key scientific areas where data growth is exponential and current systems on campus are end-of-life, solely dedicated to other purposes, or optimized for other tasks. The major users will use the instrument largely for time-consuming one-time tasks such as parameter optimization for microbiome and genome assembly workflows, building knowledgebases, and performing simulations and database searches that will provide resources that are re-used by much broader user communities (hundreds of collaborators; thousands of end users) who lack supercomputing access. One key innovative aspect of this proposal is configuration of part of the system as an academic cloud, which will allow us to pilot workflows that can later be deployed by diverse users on commercial clouds (e.g. Amazon EC2) and academic clouds (e.g. Magellan and DIAG) once those clouds are upgraded. The system will also build a broad expertise base in high-performance computing in the life sciences through outreach to promising new faculty and trainees on NIH training grants, and collaborations with new users of the Sequencing Core. The proposed instrument will thus have a profound impact on NIH-funded research.
描述(由申请人提供):我们要求资金购买集成的超级计算机,以将5个高产和协作实验室团结起来,并具有微生物组,蛋白质组学,文本挖掘和超级计算方面的互补专业知识,并将这些功能扩展到通过云和网络应用程序的广泛的NIH资助的生物医学研究社区。校园中其他系统不可满足的批判性共享需要在商业云中无法获得,并且在国家实验室中订阅了超额订阅,这是一个可以运行需要高内存(8-32 GB/core)和长时间持续时间(> 2周的壁时间)的系统,并且针对饱和网络或在其他系统上存储的高级任务进行了优化。该系统将由128台服务器组成,每台使用2x8核2.93GHz Intel Sandybridge CPU。 20个大型内存节点将每个分别具有512GB的RAM(32GB/CORE),并且100个Compute节点每个都具有128GB的RAM(8GB/CORE)。这些120个节点将使用两个10Gbps以太网端口键入到系统的其余部分的20Gbps/节点(2.5GB/s)连接,每个节点将具有2.4TB RAW高性能的本地存储。这些本地磁盘的总骨料性能超过36GB/s的持续性(每个节点> 300mb/s)。其余的8个节点将用于管理,支持高级软件工具和基础架构以及用户交互。中央高性能光泽并行文件系统将为128个客户提供1.15%的可用刮擦空间,并维持36GB/s。一个由4个驱动器/300个磁带组成的档案系统将维持> 1GB/s的骨料(压缩),提供450TB的原始容量,存储〜4.5 pb的用户数据,并扩展到此尺寸5倍。该系统的价值为450万美元,但由于这种合作伙伴关系的战略重要性,惠普以2000万美元的价格引用了该系统,该系统将被安置在最先进的
在博尔德校园(2012年2月开业)的新珍妮·斯莫利·卡鲁特(Jennie Smoly Caruthers)生物技术建筑的新珍妮·斯莫利·卡鲁特(Jennie Smoly Caruthers Biotechnology)建筑物的艺术机室,并连接到40Gbps的校园。该系统将是关键科学领域的关键促进技术,其中数据增长是指数级的,并且校园中当前的系统是寿命,仅专用于其他目的,或针对其他任务进行了优化。主要用户将主要用于耗时的一次性任务,例如用于微生物组和基因组组装工作流程的参数优化,构建知识库以及执行模拟和数据库搜索,这些搜索将提供由更广泛的用户社区(数百名合作者;最终的用户;几千的最终用户)访问的资源,而这些资源缺乏超级重点访问权限。该提案的一个关键创新方面是将部分系统作为学术云的配置,这将使我们能够在商业云(例如亚马逊EC2)和学术云(例如麦哲伦和diag(例如这些云)上升级的各种用户可以在商业云(例如Amazon EC2)上部署的工作流程。该系统还将通过向有希望的NIH培训补助金的新教师和受训者提供宣传,并与测序核心的新用户合作,在生命科学的高性能计算中建立广泛的专业知识基础。因此,提出的工具将对NIH资助的研究产生深远的影响。
项目成果
期刊论文数量(14)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Low-dimensional representation of genomic sequences
- DOI:10.1007/s00285-019-01348-1
- 发表时间:2019-03
- 期刊:
- 影响因子:1.9
- 作者:Richard C. Tillquist;M. Lladser
- 通讯作者:Richard C. Tillquist;M. Lladser
Physical determinants of bipolar mitotic spindle assembly and stability in fission yeast.
- DOI:10.1126/sciadv.1601603
- 发表时间:2017-01
- 期刊:
- 影响因子:13.6
- 作者:Blackwell R;Edelmaier C;Sweezy-Schindler O;Lamson A;Gergely ZR;O'Toole E;Crapo A;Hough LE;McIntosh JR;Glaser MA;Betterton MD
- 通讯作者:Betterton MD
Alteration of the gut fecal microbiome in children living with HIV on antiretroviral therapy in Yaounde, Cameroon.
- DOI:10.1038/s41598-021-87368-8
- 发表时间:2021-04-07
- 期刊:
- 影响因子:4.6
- 作者:Abange WB;Martin C;Nanfack AJ;Yatchou LG;Nusbacher N;Nguedia CA;Kamga HG;Fokam J;Kennedy SP;Ndjolo A;Lozupone C;Nkenfou CN
- 通讯作者:Nkenfou CN
A generative model for the behavior of RNA polymerase.
- DOI:10.1093/bioinformatics/btw599
- 发表时间:2017-01-15
- 期刊:
- 影响因子:0
- 作者:Azofeifa JG;Dowell RD
- 通讯作者:Dowell RD
RNA Pol II transcription model and interpretation of GRO-seq data.
RNA Pol II 转录模型和 GRO-seq 数据的解释。
- DOI:10.1007/s00285-016-1014-4
- 发表时间:2017
- 期刊:
- 影响因子:1.9
- 作者:Lladser,ManuelE;Azofeifa,JosephG;Allen,MaryA;Dowell,RobinD
- 通讯作者:Dowell,RobinD
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
NATALIE G. AHN其他文献
NATALIE G. AHN的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('NATALIE G. AHN', 18)}}的其他基金
Predoctoral Training Program in Signaling and Cellular Regulation
信号传导和细胞调控博士前培训项目
- 批准号:
10442543 - 财政年份:2021
- 资助金额:
$ 190万 - 项目类别:
Predoctoral Training Program in Signaling and Cellular Regulation
信号传导和细胞调控博士前培训项目
- 批准号:
10270785 - 财政年份:2021
- 资助金额:
$ 190万 - 项目类别:
Predoctoral Training Program in Signaling and Cellular Regulation
信号传导和细胞调控博士前培训项目
- 批准号:
10612084 - 财政年份:2021
- 资助金额:
$ 190万 - 项目类别:
Predoctoral Training Program in Signaling and Cellular Regulation INCLUDE Down Syndrome Supplement
信号传导和细胞调节博士前培训计划包括唐氏综合症补充剂
- 批准号:
10851494 - 财政年份:2021
- 资助金额:
$ 190万 - 项目类别:
Molecular and Cellular Dynamics in Mammalian Signal Transduction
哺乳动物信号转导中的分子和细胞动力学
- 批准号:
10357871 - 财政年份:2020
- 资助金额:
$ 190万 - 项目类别:
Molecular and Cellular Dynamics in Mammalian Signal Transduction
哺乳动物信号转导中的分子和细胞动力学
- 批准号:
10571691 - 财政年份:2020
- 资助金额:
$ 190万 - 项目类别:
Molecular and Cellular Dynamics in Mammalian Signal Transduction
哺乳动物信号转导中的分子和细胞动力学
- 批准号:
10799380 - 财政年份:2020
- 资助金额:
$ 190万 - 项目类别:
Technologies to Define and Map Novel Interorganelle Macromolecular Interactions
定义和绘制新型细胞器间大分子相互作用的技术
- 批准号:
8488980 - 财政年份:2013
- 资助金额:
$ 190万 - 项目类别:
Technologies to Define and Map Novel Interorganelle Macromolecular Interactions
定义和绘制新型细胞器间大分子相互作用的技术
- 批准号:
9059730 - 财政年份:2013
- 资助金额:
$ 190万 - 项目类别:
Technologies to Define and Map Novel Interorganelle Macromolecular Interactions
定义和绘制新型细胞器间大分子相互作用的技术
- 批准号:
8683197 - 财政年份:2013
- 资助金额:
$ 190万 - 项目类别:
相似国自然基金
区域医疗一体化对基层医疗机构合理用药的影响及优化策略——基于创新扩散理论
- 批准号:72304011
- 批准年份:2023
- 资助金额:20 万元
- 项目类别:青年科学基金项目
面向有限监督信息的脑影像感兴趣区域分割及应用
- 批准号:62376123
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
壳聚糖-没食子酸“共价牵手”协同焦磷酸盐“区域保护”调控肌原纤维蛋白凝胶特性的分子机制研究
- 批准号:32302110
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
双区域自然对流耦合模型的高效数值方法研究
- 批准号:12361077
- 批准年份:2023
- 资助金额:28 万元
- 项目类别:地区科学基金项目
典型中小城市区域暴雨积水动态过程集合量化智能解析研究
- 批准号:52379008
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
Improving Prognostication for Traumatic Brain Injury
改善创伤性脑损伤的预后
- 批准号:
10643695 - 财政年份:2023
- 资助金额:
$ 190万 - 项目类别:
Characterization of aneuploidy, cell fate and mosaicism in early development
早期发育中非整倍性、细胞命运和嵌合体的表征
- 批准号:
10877239 - 财政年份:2023
- 资助金额:
$ 190万 - 项目类别:
Developing user-centric training in rigorous research: post-selection inference, publication bias, and critical evaluation of statistical claims.
在严谨的研究中开展以用户为中心的培训:选择后推断、发表偏见和统计声明的批判性评估。
- 批准号:
10721491 - 财政年份:2023
- 资助金额:
$ 190万 - 项目类别:
Non-invasive biometric screening for cerebrovascular disorders in persons with Down syndrome.
唐氏综合症患者脑血管疾病的无创生物识别筛查。
- 批准号:
10816240 - 财政年份:2023
- 资助金额:
$ 190万 - 项目类别:
Mentoring the next generation of substance use, HIV, and epigenetic researchers in sexual and gender minority health
指导下一代性和性别少数健康领域的药物滥用、艾滋病毒和表观遗传学研究人员
- 批准号:
10699933 - 财政年份:2023
- 资助金额:
$ 190万 - 项目类别: