III: Medium: Mining petabytes of data using cloud computing and a massively parallel cyberinstrument
III:中:使用云计算和大规模并行网络仪器挖掘 PB 级数据
基本信息
- 批准号:1302231
- 负责人:
- 金额:$ 100万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2013
- 资助国家:美国
- 起止时间:2013-09-01 至 2019-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
There is a growing need for effective approaches to mining very large, i.e., petabyte scale data sets in many areas of science, engineering, and business.The project aims to design, analyze, and implement a number of fundamental matrix-mining and graph-mining operations that are scalable to petabyte-sized inputs. Such efforts guarantee the continuation of the phenomenal growth in analyzing, visualizing, and extracting information from massive matrices and graphs. Project leverages Rensselaer's unique computing platform in the form of a massively parallel machine (a Blue Gene/Q) with access to approximately 1.2 petabytes of storage, as well as a data-staging layer, named the RAM Storage Accelerator (RSA) with 512 computational nodes and a a total of 8TBs of fast RAM. The platform is configurable to allow the computational nodes at the RSA level to be used to pre-process data from the secondary storage in a cloud-like fashion. The project aims design and analyze approximation algorithms for matrix and graph mining tasks that follow an iterative, two-step approach: given petabytescale data, first, using computationally inexpensive approaches to obtain compact data sketches using the RSA layer as a "cloud" in order to reduce their size from the petabyte scale to the terabyte scale. The resulting data sketches are processed using computationally demanding approaches on the Blue Gene/Q. This process is iterated using the approximate solutions in order to improve the quality of the sketches and the approximation guarantees. The research team expects to release software and libraries for matrix and graph mining algorithms that implement our two-phase approaches for PB-scale matrices and graphs. The resulting tools will be applied to the analysis of petabytes of data from computer simulations of the dynamics of biomolecular systems. The investigators plan to involve students and researchers from other institutions in the design, analysis, and development of the proposed methods through an internship program. The project also offers increased opportunities for research-based training in Data Analytics and High Performance Computing to graduate and undergraduate students at RPI. The results of the research will be made available to the academic community through the project web site.
越来越需要有效的方法来开采非常大的,即,PB级规模的数据集在科学,工程和商业的许多领域。该项目旨在设计,分析和实现一些基本的矩阵挖掘和图形挖掘操作,可扩展到PB级规模的输入。这些努力保证了从大量矩阵和图形中分析、可视化和提取信息的惊人增长的持续。项目利用伦斯勒独特的计算平台,其形式为大规模并行机(Blue Gene/Q),可访问约1.2 PB的存储,以及一个名为RAM存储加速器(RSA)的数据暂存层,具有512个计算节点和总共8 TB的快速RAM。该平台可配置为允许RSA级别的计算节点用于以类似云的方式预处理来自辅助存储的数据。该项目旨在设计和分析矩阵和图挖掘任务的近似算法,这些任务遵循迭代的两步方法:给定PB级数据,首先,使用计算成本低廉的方法来获得紧凑的数据草图,使用RSA层作为“云”,以便将其大小从PB级减少到TB级。在Blue Gene/Q上使用计算要求很高的方法处理所得到的数据草图。使用近似解迭代该过程,以提高草图的质量和近似保证。研究团队希望发布矩阵和图挖掘算法的软件和库,这些算法实现了PB级矩阵和图的两阶段方法。由此产生的工具将被应用于分析从计算机模拟的生物分子系统的动力学PB的数据。研究人员计划通过实习计划让其他机构的学生和研究人员参与设计,分析和开发所提出的方法。该项目还为RPI的研究生和本科生提供了更多的数据分析和高性能计算研究培训机会。研究结果将通过项目网站提供给学术界。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Christopher Carothers其他文献
Huang Chin-Hao and David C. Kang, State Formation through Emulation: The East Asian Model
- DOI:
10.1007/s11366-022-09831-1 - 发表时间:
2022-08-27 - 期刊:
- 影响因子:3.500
- 作者:
Christopher Carothers - 通讯作者:
Christopher Carothers
The Rise and Fall of Anti-Corruption in North Korea
朝鲜反腐败的兴衰
- DOI:
10.1017/jea.2021.38 - 发表时间:
2022 - 期刊:
- 影响因子:1.3
- 作者:
Christopher Carothers - 通讯作者:
Christopher Carothers
A randomized least squares solver for terabyte-sized dense overdetermined systems
- DOI:
10.1016/j.jocs.2016.09.007 - 发表时间:
2019-09-01 - 期刊:
- 影响因子:
- 作者:
Chander Iyer;Haim Avron;Georgios Kollias;Yves Ineichen;Christopher Carothers;Petros Drineas - 通讯作者:
Petros Drineas
Christopher Carothers的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Christopher Carothers', 18)}}的其他基金
MRI: Acquisition of a Next Generation, Data-Centric Supercomputer
MRI:获取下一代以数据为中心的超级计算机
- 批准号:
1828083 - 财政年份:2018
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
MRI: Acquisition of a Balanced Environment for Simulation
MRI:获取模拟的平衡环境
- 批准号:
1126125 - 财政年份:2011
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
NeTS-NR ROSS.Net: A Platform for Integrated Large-Scale Network Design of Experiments and Simulation
NeTS-NR ROSS.Net:实验与仿真集成的大规模网络设计平台
- 批准号:
0435259 - 财政年份:2005
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
CAREER: Scalable, High Performance Network Simulations Using Reverse Computation
职业:使用反向计算进行可扩展的高性能网络模拟
- 批准号:
0133488 - 财政年份:2002
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
相似海外基金
III: Medium: Collaborative Research: Mining and Leveraging Knowledge Hypercubes for Complex Applications
III:媒介:协作研究:挖掘和利用知识超立方体进行复杂应用
- 批准号:
2141037 - 财政年份:2021
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
III: Medium: Collaborative Research: Mining and Leveraging Knowledge Hypercubes for Complex Applications
III:媒介:协作研究:挖掘和利用知识超立方体进行复杂应用
- 批准号:
1956017 - 财政年份:2020
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
III: Medium: Collaborative Research: Mining and Leveraging Knowledge Hypercubes for Complex Applications
III:媒介:协作研究:挖掘和利用知识超立方体进行复杂应用
- 批准号:
1955151 - 财政年份:2020
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
III: Medium: Collaborative Research: Mining and Leveraging Knowledge Hypercubes for Complex Applications
III:媒介:协作研究:挖掘和利用知识超立方体进行复杂应用
- 批准号:
1956151 - 财政年份:2020
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
III: Medium: Collaborative Research: StructNet: Constructing and Mining Structure-Rich Information Networks for Scientific Research
III:媒介:协作研究:StructNet:为科学研究构建和挖掘结构丰富的信息网络
- 批准号:
2034562 - 财政年份:2019
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
III: Medium: Collaborative Research: KMELIN: Knowledge Mining and Embedding Learning for Complex Dynamic Information Networks
III:媒介:协作研究:KMELIN:复杂动态信息网络的知识挖掘和嵌入学习
- 批准号:
1763620 - 财政年份:2018
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
III: Medium: Collaborative Research: KMELIN: Knowledge Mining and Embedding Learning for Complex Dynamic Information Networks
III:媒介:协作研究:KMELIN:复杂动态信息网络的知识挖掘和嵌入学习
- 批准号:
1763452 - 财政年份:2018
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
III: Medium: Collaborative Research: StructNet: Constructing and Mining Structure-Rich Information Networks for Scientific Research
III:媒介:协作研究:StructNet:为科学研究构建和挖掘结构丰富的信息网络
- 批准号:
1704001 - 财政年份:2017
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
III: Medium: Collaborative Research: StructNet: Constructing and Mining Structure-Rich Information Networks for Scientific Research
III:媒介:协作研究:StructNet:为科学研究构建和挖掘结构丰富的信息网络
- 批准号:
1704532 - 财政年份:2017
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
III: Medium: Collaborative Research: Robust Large-Scale Electronic Medical Record Data Mining Framework to Conduct Risk Stratification for Personalized Intervention
III:媒介:协作研究:强大的大规模电子病历数据挖掘框架,用于进行个性化干预的风险分层
- 批准号:
1836938 - 财政年份:2017
- 资助金额:
$ 100万 - 项目类别:
Standard Grant