权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Online Learning in Big-Data Stream Mining

大数据流挖掘在线学习

基本信息

批准号：
1407712
负责人：
Ali Sayed
金额：
$ 45万
依托单位：
University of California-Los Angeles
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-09-01 至 2018-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1407712&HistoricalAwards=false
关键词：
Online Learning Big Data Stream

项目摘要

Online Learning in Big Data Stream MiningThe world is increasingly information-driven. Vast amounts of data are being produced by diverse sources and in diverse formats including sensor readings, physiological measurements, documents, emails, transactions, tweets, and audio or video files. Many businesses and government institutions are also embracing automation and relying on a variety of sensors and infrastructure to collect, store, and analyze data on a continuous basis. It is becoming critical to endow assessment systems with the ability to process streaming information from sensors in real-time in order to better manage physical systems, derive informed decisions, tweak production processes, and optimize logistics choices. Data stream mining refers to the broad class of techniques that can be used in sense and respond systems that continuously receive data streams from multiple sources and employ analytics aimed at detecting and predicting actionable information. Such techniques are useful in many domains including medical and health informatics, intelligent connected network systems for transportation, security, and energy, as well as social, multimedia, and business intelligence. The aim of this proposal is to develop methods and techniques for real-time stream mining with the aim of extracting information from large data streams. The framework accomplishes this objective by networking multiple learners to pose and answer queries at different levels and in real time. A central aspect of the framework is that it accommodates distributed data sources and distributed processing of the data. Besides the aforementioned applications, the proposed research is expected to have an impact on user interfaces, human computer interactions, and machine-to-machine communication and services.This research focuses on developing a framework for distributed knowledge extraction from high-volume data streams using a network of adaptive learners/classifiers that is deployed over a distributed computing infrastructure. The proposed paradigm differs from existing mining and search solutions, which are mainly query driven. Instead, the proposed framework is data and concept driven and can catalyze a shift in the design and implementation of networked stream mining applications by allowing continuous learning and dynamic adaptation of networked learners in response to latency, resource and data characteristics, and by allowing various learners to proactively reason and shape their interactions with other learners based on their capabilities and knowledge. In this regard, the approach to stream mining developed in this proposal addresses several unique technical challenges: (a) the need to develop decentralized approaches for stream mining where learners make decisions based on local interactions with their neighbors. This step involves formally defining local objectives and metrics and associated inter-node message exchanges that enable the decomposition of the application into a set of autonomously operating nodes, while ensuring global performance; (b) the need to develop algorithms that are able to cope with asynchronous events including different data rates at the nodes, link failures, and dynamic topology configurations; (c) the need to develop distributed solutions that can cope effectively with system overload, due to large data volumes and limited system resources (including CPU, memory, and I/O bandwidth). There is usually a large computational cost incurred by each learner and solutions need to be sensitive to the rates at which individual learners can handle data; and (d) the need to develop adaptive stream-mining systems to track concept drifts especially since data characteristics evolve over time due to many factors including congestion at shared processing nodes and communication delays between processing nodes.

大数据流挖掘中的在线学习世界越来越信息化。大量的数据是由不同的来源和不同的格式产生的，包括传感器读数、生理测量、文档、电子邮件、交易、推文和音频或视频文件。许多企业和政府机构也在接受自动化，并依赖各种传感器和基础设施来连续收集、存储和分析数据。赋予评估系统实时处理来自传感器的流信息的能力变得至关重要，以便更好地管理物理系统，获得明智的决策，调整生产流程并优化物流选择。数据流挖掘是指可以用于感测和响应系统的广泛技术，这些系统连续接收来自多个源的数据流，并采用旨在检测和预测可操作信息的分析。这些技术在许多领域都很有用，包括医疗和健康信息学，用于交通、安全和能源的智能连接网络系统，以及社交、多媒体和商业智能。该提案的目的是开发实时流挖掘的方法和技术，目的是从大型数据流中提取信息。该框架通过将多个学习者联网来实现这一目标，以在不同的级别和真实的时间内提出和回答查询。该框架的一个核心方面是它适应分布式数据源和分布式数据处理。除了上述的应用程序，拟议的研究预计将产生影响的用户界面，人机交互，和机器到机器的通信和services.This研究的重点是开发一个框架，从大容量的数据流使用网络的自适应学习器/分类器，部署在分布式计算基础设施的分布式知识提取。所提出的范例不同于现有的挖掘和搜索解决方案，主要是查询驱动。相反，所提出的框架是数据和概念驱动的，可以催化网络流挖掘应用程序的设计和实现的转变，通过允许网络学习者响应延迟，资源和数据特性的持续学习和动态适应，并允许各种学习者主动推理和塑造他们的互动与其他学习者的基础上，他们的能力和知识。在这方面，本提案中开发的流挖掘方法解决了几个独特的技术挑战：（a）需要开发分散的流挖掘方法，学习者根据与邻居的本地交互做出决策。该步骤涉及正式定义本地目标和度量以及相关的节点间消息交换，其使得能够将应用分解为一组自主操作的节点，同时确保全局性能;（B）需要开发能够科普异步事件的算法，所述异步事件包括节点处的不同数据速率、链路故障和动态拓扑结构配置;（c）有需要发展分散式的解决方案，以有效科普因数据量庞大而系统资源（包括中央处理器、记忆体和输入/输出带宽）有限而造成的系统负荷过重的情况。每个学习者通常会产生很大的计算成本，解决方案需要对单个学习者处理数据的速率敏感;以及（d）需要开发自适应流挖掘系统来跟踪概念漂移，特别是因为数据特征随着时间的推移而演变，这是由于许多因素，包括共享处理节点处的拥塞和处理节点之间的通信延迟。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Ali Sayed其他文献

Psychology of Craving

贪爱心理学

DOI：
发表时间：
2014
期刊：
影响因子：
0
作者：
S. Sharma;B. Nepal;C. S. Moon;Anthony Chabenne;A. Khogali;Co Ojo;Esther Hong;Rochelle Gaudet;Ali Sayed;Amanda Jacob;Mujtaba Murtuza;Michelle L. Firlit
通讯作者：
Michelle L. Firlit