SOFTWARE: Framework for Mining Large and Complex Scientific Datasets

软件:挖掘大型复杂科学数据集的框架

基本信息

项目摘要

Numerical simulations are replacing traditional experiments in gaining insights into complex physical phenomena. Given recent advances in computer hardware and numerical methods, it is now possible to simulate physical phenomena at very fine temporal and spatial resolutions. As a result, the the amount of data generated is overwhelming. Scientists are interested in analyzing and visualizing the data produced by such simulations to better understand the process that is being simulated. Analyzing such large scale data is hard. Not only the methods used are computationally expense, current programming tools make the analysis difficult to specify and modify. Thus, there is a dire need for a systematic approach, along with supporting algorithms and methodologies for flexible parallel implementations, to achieve scalable and interactive analysis on large scientific datasets. In this project, we propose the construction of such a scalable toolkit, namely the Computational Analysis Toolkit (CAT). This toolkit proposes to exploit ongoing work in feature analysis, scalable data mining and parallel programing environments. The crux of the approach is feature-mining; a process where by regions are delineated through various stages of detection, verification, de-noising, and tracking of points of interest. Additionally, we propose the use of some key data mining mining algorithms for achieving enhanced and robust implementations of feature-mining algorithms. It is our objective that the CAT toolkit should not only allow for the detection of features but also provide for a means to control the analysis in an interactive setting. For example, demographic and lifetime analysis of certain critical features as determined by the user/scientist may be an important way of understanding the underlying process being simulated. These critical features, once tagged via a suitable interface, can be profiled and a concise representation this profile can then be presented to the user as needed. We believe that for long-term use of a tool for feature and data mining, it is important that a) the algorithms are parallelized on a variety of platforms, b) the parallel implementations are easy to maintain and modify, and c) APIs are available for users to rapidly create scalable implementations of new mining algorithms. We are proposing to achieve these goals by using and extending a parallelization framework developed locally. This framework, referred to as FRamework for Rapid Implementations of Datamining Engines (FREERIDE), offers high-level APIs and runtime techniques to enable parallelization of algorithms for data mining and related tasks. It allows parallelization on both distributed memory and shared memory configurations, and further supports efficient processing of disk-resident datasets.The proposal, besides providing a useful toolkit, is likely engender the use of methodologies for large data exploration. Our efforts are likely to contribute to literature in scalable data and feature mining algorithms, and feature profile summarization.
数值模拟正在取代传统的实验,以深入了解复杂的物理现象。鉴于计算机硬件和数值方法的最新进展,现在可以以非常精细的时间和空间分辨率模拟物理现象。因此,产生的数据量是压倒性的。科学家们有兴趣分析和可视化这些模拟产生的数据,以更好地理解正在模拟的过程。分析如此大规模的数据是非常困难的。不仅使用的方法是计算费用,目前的编程工具使分析难以指定和修改。因此,迫切需要一种系统的方法,沿着支持灵活的并行实现的算法和方法,以实现对大型科学数据集的可扩展和交互式分析。在这个项目中,我们提出了这样一个可扩展的工具包,即计算分析工具包(CAT)的建设。该工具包建议利用正在进行的工作,功能分析,可扩展的数据挖掘和并行编程环境。该方法的关键是特征挖掘;通过检测,验证,去噪和跟踪感兴趣点的各个阶段来划分区域的过程。 此外,我们建议使用一些关键的数据挖掘挖掘算法,以实现增强和强大的实现功能挖掘算法。这是我们的目标,CAT工具包应该不仅允许检测的功能,但也提供了一种手段来控制在一个互动的设置分析。 例如,由用户/科学家确定的某些关键特征的人口统计和寿命分析可能是理解正在模拟的基础过程的重要方式。一旦通过合适的界面标记这些关键特征,就可以对其进行剖析,然后可以根据需要将该剖析的简明表示呈现给用户。我们认为,对于长期使用的功能和数据挖掘工具,重要的是a)算法在各种平台上并行化,B)并行实现易于维护和修改,以及c)API可供用户快速创建新的挖掘算法的可扩展实现。我们建议通过使用和扩展本地开发的并行化框架来实现这些目标。该框架被称为数据挖掘引擎快速实现框架(FREERIDE),提供高级API和运行时技术,以实现数据挖掘和相关任务算法的并行化。 它允许在分布式内存和共享内存配置上并行化,并进一步支持磁盘驻留数据集的有效处理。 使用大数据探索的方法。 我们的努力很可能有助于文献中的可扩展数据和特征挖掘算法,和功能配置文件总结。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Raghu Machiraju其他文献

Raghu Machiraju的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Raghu Machiraju', 18)}}的其他基金

Collaborative Research: Autonomous Computing Materials
合作研究:自主计算材料
  • 批准号:
    1940168
  • 财政年份:
    2019
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Continuing Grant
Spokes: MEDIUM: MIDWEST: Collaborative: Community-Driven Data Engineering for Substance Abuse Prevention in the Rural Midwest
辐条:媒介:中西部:协作:社区驱动的数据工程,用于中西部农村地区的药物滥用预防
  • 批准号:
    1761969
  • 财政年份:
    2018
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Standard Grant
SCC-Planning: Using Innovations in Big Data and Technology to Address the High Rate of Infant Mortality in Greater Columbus Ohio
SCC-Planning:利用大数据和技术创新解决俄亥俄州大哥伦布市婴儿死亡率高的问题
  • 批准号:
    1737560
  • 财政年份:
    2017
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Standard Grant
BCSP: ABI Innovation: Collaborative Research: Predicting changes in protein activity from changes in sequence by identifying the underlying Biophysical Conditional Random Field
BCSP:ABI 创新:协作研究:通过识别潜在的生物物理条件随机场,根据序列变化预测蛋白质活性的变化
  • 批准号:
    1262469
  • 财政年份:
    2014
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Standard Grant
G&V: Medium: Collaborative Research: Large Data Visualization Using An Interactive Machine Learning Framework
G
  • 批准号:
    1065025
  • 财政年份:
    2011
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Standard Grant
ITR/NGS: A Framework for Discovery, Exploration and Analysis of Evolutionary Simulation Data (DEAS)
ITR/NGS:进化模拟数据发现、探索和分析的框架 (DEAS)
  • 批准号:
    0326386
  • 财政年份:
    2003
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Continuing Grant
CAREER: On the Assessment of Volume Rendering Algorithms in Visual Computing
职业:视觉计算中体积渲染算法的评估
  • 批准号:
    0196242
  • 财政年份:
    2000
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Continuing grant
CAREER: On the Assessment of Volume Rendering Algorithms in Visual Computing
职业:视觉计算中体积渲染算法的评估
  • 批准号:
    9734483
  • 财政年份:
    1998
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Continuing Grant

相似海外基金

BIGDATA: IA: Collaborative Research: Asynchronous Distributed Machine Learning Framework for Multi-Site Collaborative Brain Big Data Mining
BIGDATA:IA:协作研究:用于多站点协作大脑大数据挖掘的异步分布式机器学习框架
  • 批准号:
    2348159
  • 财政年份:
    2023
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Standard Grant
CRII: SHF: A Parallel and Distributed Framework for Graph Mining on GPUs
CRII:SHF:GPU 上图挖掘的并行分布式框架
  • 批准号:
    2245792
  • 财政年份:
    2023
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Standard Grant
SCH: New Advanced Machine Learning Framework for Mining Heterogeneous Ocular Data to Accelerate
SCH:新的先进机器学习框架,用于挖掘异构眼部数据以加速
  • 批准号:
    10601180
  • 财政年份:
    2022
  • 资助金额:
    $ 37.3万
  • 项目类别:
Integrated Mining and Waste Management Optimization Framework for Sustainable Resource Development
可持续资源开发的综合采矿和废物管理优化框架
  • 批准号:
    RGPIN-2016-05707
  • 财政年份:
    2022
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Discovery Grants Program - Individual
Towards an integrated, self-learning stochastic mining complex framework and new digital technologies for the sustainable development of mineral resources
为矿产资源的可持续发展建立一个集成的、自学习的随机采矿复杂框架和新的数字技术
  • 批准号:
    RGPIN-2021-02777
  • 财政年份:
    2022
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Discovery Grants Program - Individual
SCH: New Advanced Machine Learning Framework for Mining Heterogeneous Ocular Data to Accelerate
SCH:新的先进机器学习框架,用于挖掘异构眼部数据以加速
  • 批准号:
    10665804
  • 财政年份:
    2022
  • 资助金额:
    $ 37.3万
  • 项目类别:
Integrated Mining and Waste Management Optimization Framework for Sustainable Resource Development
可持续资源开发的综合采矿和废物管理优化框架
  • 批准号:
    RGPIN-2016-05707
  • 财政年份:
    2021
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Discovery Grants Program - Individual
Integrated Mining and Waste Management Optimization Framework for Sustainable Resource Development
可持续资源开发的综合采矿和废物管理优化框架
  • 批准号:
    RGPIN-2016-05707
  • 财政年份:
    2020
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Discovery Grants Program - Individual
A new framework, 'paleoecotoxicology', to assess the legacy effects of mining on aquatic environments
一个新的框架“古生态毒理学”,用于评估采矿对水生环境的遗留影响
  • 批准号:
    503817-2017
  • 财政年份:
    2019
  • 资助金额:
    $ 37.3万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Doctoral
An Integrative Computational Framework for DNA Hydroxymethylation Data Mining and Interpretation
DNA 羟甲基化数据挖掘和解释的综合计算框架
  • 批准号:
    10210409
  • 财政年份:
    2019
  • 资助金额:
    $ 37.3万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了