EAGER: Autonomous Data Partitioning Using Data Mining for High End Computing
EAGER:使用数据挖掘进行高端计算的自主数据分区
基本信息
- 批准号:0954310
- 负责人:
- 金额:$ 12.5万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2009
- 资助国家:美国
- 起止时间:2009-09-01 至 2016-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Query response time and system throughput are the most important metrics when it comes to database and file access performance. Because of data proliferation, efficient access methods and data storage techniques have become increasingly critical to maintain an acceptable query response time and system throughput. One of the common ways to reduce disk I/Os and therefore improve query response time is database clustering, which is a process that partitions the database/file vertically (attribute clustering) and/or horizontally (record clustering). To take advantage of parallelism to improve system throughput, clusters can be placed on different nodes in a cluster machine. This project develops a novel algorithm, AutoClust, for database/file clustering that dynamically and automatically generates attribute and record clusters based on closed item sets mined from the attributes and records sets found in the queries running against the database/files. The algorithm is capable of re-clustering the database/file in order to continue achieving good system performance despite changes in the data and/or query sets. The project then develops innovative ways to implement AutoClust using the cluster computing paradigm to reduce query response time and system throughput even further through parallelism and data redundancy. The algorithms are prototyped on a Dell Linux Cluster computer with 486 compute nodes available at the University of Oklahoma. For broader impacts, performance studies are conducted using not only the decision support system database benchmark (TPC-H) but also real data recorded in database and file formats collected from science and healthcare applications in collaboration with domain experts, including scientists at the Center for Analysis and Prediction of Storms (CAPS) at the University of Oklahoma. The project also makes important impacts on education as it provides training for graduate and undergraduate students working on this project in the areas of national critical needs: database and file management systems, and high-end computing and applications. The developed algorithm and prototype, real datasets and performance evaluation results are made available to the public at the Website: http://www.cs.ou.edu/~database/AutoClust.html.
查询响应时间和系统吞吐量是衡量数据库和文件访问性能的最重要指标。由于数据的激增,有效的访问方法和数据存储技术已经变得越来越重要,以保持可接受的查询响应时间和系统吞吐量。减少磁盘I/O从而提高查询响应时间的一种常见方法是数据库聚类,这是一个垂直(属性聚类)和/或水平(记录聚类)划分数据库/文件的过程。为了利用并行性来提高系统吞吐量,可以将集群放置在集群机器中的不同节点上。 该项目开发了一种新的算法,AutoClust,用于数据库/文件聚类,该算法基于从对数据库/文件运行的查询中发现的属性和记录集挖掘的封闭项集,动态自动地生成属性和记录聚类。该算法能够重新聚类的数据库/文件,以便继续实现良好的系统性能,尽管在数据和/或查询集的变化。 然后,该项目开发了创新的方法来实现AutoClust,使用集群计算模式,通过并行和数据冗余进一步减少查询响应时间和系统吞吐量。 这些算法在俄克拉荷马州大学的戴尔Linux群集计算机上进行原型设计,该计算机具有486个计算节点。 对于更广泛的影响,进行性能研究,不仅使用决策支持系统数据库基准(TPC-H),但也真实的数据记录在数据库和文件格式收集的科学和医疗保健应用领域的专家,包括科学家在中心的分析和预测风暴(CAPS)在俄克拉荷马州大学。该项目还对教育产生了重要影响,因为它为从事该项目的研究生和本科生提供了国家关键需求领域的培训:数据库和文件管理系统以及高端计算和应用。 所开发的算法和原型、真实的数据集和性能评估结果可在网站http://www.cs.ou.edu/~database/AutoClust.html上向公众提供。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sudarshan Dhall其他文献
Sudarshan Dhall的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sudarshan Dhall', 18)}}的其他基金
A Power-Aware Technique to Manage Real-Time Database Transactions in Mobile Ad-Hoc Networks
一种管理移动自组织网络中实时数据库事务的功率感知技术
- 批准号:
0312746 - 财政年份:2003
- 资助金额:
$ 12.5万 - 项目类别:
Standard Grant
A Workshop on Parallel Processing Using the Heterogeneous Element Processor (HEP), March 20-21, 1985, at the University of Oklahoma, Norman, Oklahoma
使用异质元素处理器 (HEP) 进行并行处理的研讨会,1985 年 3 月 20 日至 21 日,在俄克拉荷马州诺曼市俄克拉荷马大学举行
- 批准号:
8500481 - 财政年份:1985
- 资助金额:
$ 12.5万 - 项目类别:
Standard Grant
相似海外基金
Collaborative Research: Data-Driven Microreaction Engineering by Autonomous Robotic Experimentation in Flow
协作研究:通过自主机器人实验进行数据驱动的微反应工程
- 批准号:
2208489 - 财政年份:2023
- 资助金额:
$ 12.5万 - 项目类别:
Standard Grant
International Collaboration on Mobility Digital Twin for Accelerating Data Driven Autonomous Driving Design Platform Synthesis
移动数字孪生国际合作加速数据驱动的自动驾驶设计平台综合
- 批准号:
22KK0237 - 财政年份:2023
- 资助金额:
$ 12.5万 - 项目类别:
Fund for the Promotion of Joint International Research (Fostering Joint International Research (A))
Autonomous Unmanned Aerial Vehicle data analysis: is it the key to the many:1 ratio, or are we missing a step?
自主无人机数据分析:是多:1比例的关键,还是我们遗漏了一步?
- 批准号:
2891512 - 财政年份:2023
- 资助金额:
$ 12.5万 - 项目类别:
Studentship
Understanding drone sensor data for autonomous flight
了解无人机传感器数据以实现自主飞行
- 批准号:
10061081 - 财政年份:2023
- 资助金额:
$ 12.5万 - 项目类别:
Collaborative R&D
Collaborative Research: Data-Driven Microreaction Engineering by Autonomous Robotic Experimentation in Flow
协作研究:通过自主机器人实验进行数据驱动的微反应工程
- 批准号:
2208406 - 财政年份:2023
- 资助金额:
$ 12.5万 - 项目类别:
Standard Grant
COM Sensor Data and Resource Management for Autonomous Airborne Platforms
自主机载平台的 COM 传感器数据和资源管理
- 批准号:
2881339 - 财政年份:2023
- 资助金额:
$ 12.5万 - 项目类别:
Studentship
Improved MRI guidance of pediatric catheterization via autonomous multi-beat data synthesis
通过自主多节拍数据合成改进儿科导管插入术的 MRI 指导
- 批准号:
10412491 - 财政年份:2022
- 资助金额:
$ 12.5万 - 项目类别:
Improved MRI guidance of pediatric catheterization via autonomous multi-beat data synthesis
通过自主多节拍数据合成改进儿科导管插入术的 MRI 指导
- 批准号:
10646226 - 财政年份:2022
- 资助金额:
$ 12.5万 - 项目类别:
Oshen - autonomous sailboats to collect ocean data
Oshen - 收集海洋数据的自主帆船
- 批准号:
10047385 - 财政年份:2022
- 资助金额:
$ 12.5万 - 项目类别:
Collaborative R&D
Haul Truck Production and Maintenance Data modelling of Traditional, Autonomous and Operator Assist Scenarios
传统、自主和操作员辅助场景的运输卡车生产和维护数据建模
- 批准号:
RGPIN-2018-05885 - 财政年份:2022
- 资助金额:
$ 12.5万 - 项目类别:
Discovery Grants Program - Individual