Online Learning in Big-Data Stream Mining

大数据流挖掘在线学习

基本信息

  • 批准号:
    1407712
  • 负责人:
  • 金额:
    $ 45万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2014
  • 资助国家:
    美国
  • 起止时间:
    2014-09-01 至 2018-08-31
  • 项目状态:
    已结题

项目摘要

Online Learning in Big Data Stream MiningThe world is increasingly information-driven. Vast amounts of data are being produced by diverse sources and in diverse formats including sensor readings, physiological measurements, documents, emails, transactions, tweets, and audio or video files. Many businesses and government institutions are also embracing automation and relying on a variety of sensors and infrastructure to collect, store, and analyze data on a continuous basis. It is becoming critical to endow assessment systems with the ability to process streaming information from sensors in real-time in order to better manage physical systems, derive informed decisions, tweak production processes, and optimize logistics choices. Data stream mining refers to the broad class of techniques that can be used in sense and respond systems that continuously receive data streams from multiple sources and employ analytics aimed at detecting and predicting actionable information. Such techniques are useful in many domains including medical and health informatics, intelligent connected network systems for transportation, security, and energy, as well as social, multimedia, and business intelligence. The aim of this proposal is to develop methods and techniques for real-time stream mining with the aim of extracting information from large data streams. The framework accomplishes this objective by networking multiple learners to pose and answer queries at different levels and in real time. A central aspect of the framework is that it accommodates distributed data sources and distributed processing of the data. Besides the aforementioned applications, the proposed research is expected to have an impact on user interfaces, human computer interactions, and machine-to-machine communication and services.This research focuses on developing a framework for distributed knowledge extraction from high-volume data streams using a network of adaptive learners/classifiers that is deployed over a distributed computing infrastructure. The proposed paradigm differs from existing mining and search solutions, which are mainly query driven. Instead, the proposed framework is data and concept driven and can catalyze a shift in the design and implementation of networked stream mining applications by allowing continuous learning and dynamic adaptation of networked learners in response to latency, resource and data characteristics, and by allowing various learners to proactively reason and shape their interactions with other learners based on their capabilities and knowledge. In this regard, the approach to stream mining developed in this proposal addresses several unique technical challenges: (a) the need to develop decentralized approaches for stream mining where learners make decisions based on local interactions with their neighbors. This step involves formally defining local objectives and metrics and associated inter-node message exchanges that enable the decomposition of the application into a set of autonomously operating nodes, while ensuring global performance; (b) the need to develop algorithms that are able to cope with asynchronous events including different data rates at the nodes, link failures, and dynamic topology configurations; (c) the need to develop distributed solutions that can cope effectively with system overload, due to large data volumes and limited system resources (including CPU, memory, and I/O bandwidth). There is usually a large computational cost incurred by each learner and solutions need to be sensitive to the rates at which individual learners can handle data; and (d) the need to develop adaptive stream-mining systems to track concept drifts especially since data characteristics evolve over time due to many factors including congestion at shared processing nodes and communication delays between processing nodes.
大数据流挖掘中的在线学习世界越来越信息化。大量的数据是由不同的来源和不同的格式产生的,包括传感器读数、生理测量、文档、电子邮件、交易、推文和音频或视频文件。许多企业和政府机构也在接受自动化,并依赖各种传感器和基础设施来连续收集、存储和分析数据。赋予评估系统实时处理来自传感器的流信息的能力变得至关重要,以便更好地管理物理系统,获得明智的决策,调整生产流程并优化物流选择。数据流挖掘是指可以用于感测和响应系统的广泛技术,这些系统连续接收来自多个源的数据流,并采用旨在检测和预测可操作信息的分析。这些技术在许多领域都很有用,包括医疗和健康信息学,用于交通、安全和能源的智能连接网络系统,以及社交、多媒体和商业智能。该提案的目的是开发实时流挖掘的方法和技术,目的是从大型数据流中提取信息。 该框架通过将多个学习者联网来实现这一目标,以在不同的级别和真实的时间内提出和回答查询。 该框架的一个核心方面是它适应分布式数据源和分布式数据处理。 除了上述的应用程序,拟议的研究预计将产生影响的用户界面,人机交互,和机器到机器的通信和services.This研究的重点是开发一个框架,从大容量的数据流使用网络的自适应学习器/分类器,部署在分布式计算基础设施的分布式知识提取。 所提出的范例不同于现有的挖掘和搜索解决方案,主要是查询驱动。相反,所提出的框架是数据和概念驱动的,可以催化网络流挖掘应用程序的设计和实现的转变,通过允许网络学习者响应延迟,资源和数据特性的持续学习和动态适应,并允许各种学习者主动推理和塑造他们的互动与其他学习者的基础上,他们的能力和知识。在这方面,本提案中开发的流挖掘方法解决了几个独特的技术挑战:(a)需要开发分散的流挖掘方法,学习者根据与邻居的本地交互做出决策。该步骤涉及正式定义本地目标和度量以及相关的节点间消息交换,其使得能够将应用分解为一组自主操作的节点,同时确保全局性能;(B)需要开发能够科普异步事件的算法,所述异步事件包括节点处的不同数据速率、链路故障和动态拓扑结构配置;(c)有需要发展分散式的解决方案,以有效科普因数据量庞大而系统资源(包括中央处理器、记忆体和输入/输出带宽)有限而造成的系统负荷过重的情况。 每个学习者通常会产生很大的计算成本,解决方案需要对单个学习者处理数据的速率敏感;以及(d)需要开发自适应流挖掘系统来跟踪概念漂移,特别是因为数据特征随着时间的推移而演变,这是由于许多因素,包括共享处理节点处的拥塞和处理节点之间的通信延迟。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Ali Sayed其他文献

Psychology of Craving
贪爱心理学
  • DOI:
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    S. Sharma;B. Nepal;C. S. Moon;Anthony Chabenne;A. Khogali;Co Ojo;Esther Hong;Rochelle Gaudet;Ali Sayed;Amanda Jacob;Mujtaba Murtuza;Michelle L. Firlit
  • 通讯作者:
    Michelle L. Firlit

Ali Sayed的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Ali Sayed', 18)}}的其他基金

CIF: Small: Inference over Asymmetric Network and Data Structures
CIF:小:非对称网络和数据结构的推理
  • 批准号:
    1524250
  • 财政年份:
    2015
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
CIF: Large: Collaborative Research: Cooperation and Learning Over Cognitive Networks
CIF:大型:协作研究:认知网络上的合作与学习
  • 批准号:
    1011918
  • 财政年份:
    2010
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
NSF Workshop on Distributed Processing over Cognitive Networks
NSF 认知网络分布式处理研讨会
  • 批准号:
    0956382
  • 财政年份:
    2009
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
CIF: SMALL: Explorations and Insights into Adaptive Networks, Animal Flocking Behavior, and Swarm Intelligence
CIF:小:对自适应网络、动物聚集行为和群体智能的探索和见解
  • 批准号:
    0942936
  • 财政年份:
    2009
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Adaptive Sampling Strategies with Application to Water Resource Management
自适应采样策略在水资源管理中的应用
  • 批准号:
    0725441
  • 财政年份:
    2007
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Cyber Systems: Adaptive Distributed Systems Based on Cooperative and Combination Strategies
网络系统:基于合作和组合策略的自适应分布式系统
  • 批准号:
    0601266
  • 财政年份:
    2006
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
Advanced Signal Processing for Ultra-Wide-Band (UWB) Communications in Wireless Networks
无线网络中超宽带 (UWB) 通信的高级信号处理
  • 批准号:
    0401188
  • 财政年份:
    2004
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
High-Performance Adaptive Receivers for Broadband Multi-User Communications
用于宽带多用户通信的高性能自适应接收器
  • 批准号:
    0208573
  • 财政年份:
    2002
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
Estimation and Control with Bounded Data Uncertainties
有界数据不确定性的估计和控制
  • 批准号:
    9820765
  • 财政年份:
    1999
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
Fast Reliable Algorithms for Structured Computations
用于结构化计算的快速可靠的算法
  • 批准号:
    9732376
  • 财政年份:
    1998
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队
Understanding structural evolution of galaxies with machine learning
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
煤矿安全人机混合群智感知任务的约束动态多目标Q-learning进化分配
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于领弹失效考量的智能弹药编队短时在线Q-learning协同控制机理
  • 批准号:
    62003314
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
集成上下文张量分解的e-learning资源推荐方法研究
  • 批准号:
    61902016
  • 批准年份:
    2019
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
具有时序迁移能力的Spiking-Transfer learning (脉冲-迁移学习)方法研究
  • 批准号:
    61806040
  • 批准年份:
    2018
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
基于Deep-learning的三江源区冰川监测动态识别技术研究
  • 批准号:
    51769027
  • 批准年份:
    2017
  • 资助金额:
    38.0 万元
  • 项目类别:
    地区科学基金项目
具有时序处理能力的Spiking-Deep Learning(脉冲深度学习)方法研究
  • 批准号:
    61573081
  • 批准年份:
    2015
  • 资助金额:
    64.0 万元
  • 项目类别:
    面上项目
基于有向超图的大型个性化e-learning学习过程模型的自动生成与优化
  • 批准号:
    61572533
  • 批准年份:
    2015
  • 资助金额:
    66.0 万元
  • 项目类别:
    面上项目
E-Learning中学习者情感补偿方法的研究
  • 批准号:
    61402392
  • 批准年份:
    2014
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Improving NHS perimenopausal diagnosis and HRT prescription through AI, machine learning and big data
通过人工智能、机器学习和大数据改善 NHS 围绝经期诊断和 HRT 处方
  • 批准号:
    10053966
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Collaborative R&D
BIGDATA: IA: Collaborative Research: Asynchronous Distributed Machine Learning Framework for Multi-Site Collaborative Brain Big Data Mining
BIGDATA:IA:协作研究:用于多站点协作大脑大数据挖掘的异步分布式机器学习框架
  • 批准号:
    2348159
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Big Data and Deep Learning for the Interictal-Ictal-Injury Contiuum
发作间期-发作期-损伤连续体的大数据和深度学习
  • 批准号:
    10761842
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
III: Small: A Big Data and Machine Learning Approach for Improving the Efficiency of Two-sided Online Labor Markets
III:小:提高双边在线劳动力市场效率的大数据和机器学习方法
  • 批准号:
    2311582
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Excellence in Research: Harnessing Big Data and Domain Knowledge to Advance Deep Learning for Interpretable Cell Quantitation
卓越的研究:利用大数据和领域知识推进深度学习以实现可解释的细胞定量
  • 批准号:
    2302274
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Construction of an Efficient and Robust Ophthalmic Big Data and AI System through Implementation of Federated Learning
通过实施联邦学习构建高效、鲁棒的眼科大数据和人工智能系统
  • 批准号:
    23K17434
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Grant-in-Aid for Challenging Research (Pioneering)
Construction of big data analysis platform for fish behavior in the sea by image processing, change detection, and machine learning techniques
利用图像处理、变化检测、机器学习技术构建海洋鱼类行为大数据分析平台
  • 批准号:
    23K14005
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
What's the Big Deal About Big Data?: How Machine Learning is Transforming Health and Healthcare in Nova Scotia
大数据有什么大不了的?:机器学习如何改变新斯科舍省的健康和医疗保健
  • 批准号:
    485607
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Miscellaneous Programs
The Development of Mathematical Models using Machine Learning with Educational Big Data for Language Acquisition and Individually Optimized Learning
利用机器学习和教育大数据开发数学模型,用于语言习得和个体优化学习
  • 批准号:
    23K00651
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Learning models of metabolism and gene expression from biological big data
从生物大数据中学习新陈代谢和基因表达模型
  • 批准号:
    RGPIN-2020-06325
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了