Finding Groups in Big Data
在大数据中寻找群体
基本信息
- 批准号:RGPIN-2016-04850
- 负责人:
- 金额:$ 3.35万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2018
- 资助国家:加拿大
- 起止时间:2018-01-01 至 2019-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The main objective of the proposed research program is to advance theory and practice of “Big Data” cluster analysis. “Big Data” refers to today's ever larger and more complex data sets, which are typically collected on a large scale by automatic equipment (e.g. microarray chips, sensors, logging devices). These data sets have become abundant, and they hold the potential for the discovery of new insights, which can lead to new opportunities for improved, data-driven decision making. Unsupervised, exploratory methods for knowledge discovery play an important role in realizing this potential.******One of the major exploratory data analysis tasks is finding “natural” groups in data. Understanding the groups in Big Data allows a better organization and classification of Big Data, more efficient browsing and searching, focusing an analysis on specific groups, and discovering unknown relationships between groups.******Clustering is the most common unsupervised approach to finding groups in data. Traditional clustering methods, however, face challenges when applied to Big Data, due to the typically large volume and high dimensionality of the data; and they are not designed to take advantage of properties such as the time dependence of some Big Data sets and relationships between different data sources.******My proposed research program is aimed at advancing the theory and practice of clustering methods –particularly density-based clustering (i.e., where clusters are considered dense regions in the data space, separated by regions of lower point density)– applied to “Big Data”. Based on the theoretical insights we will gain, we will develop novel and improved algorithms for clustering Big Data that overcome limitations of current clustering methods and extend the applicability of clustering to a wider range of Big Data scenarios. ***The research will focus on the following aspects: (a) fast methods to deal with large data volumes, (b) projected- and subspace-clustering that can address issues of high-dimensionality such as data sparseness and “irrelevant” attributes, (c) semi-supervised clustering based on constraints, including constrains derived from related data sources, that can guide an algorithm to a solution which is consistent with these constraints, (d) combining projected-/subspace clustering with semi-supervision, to find simultaneously the subspace(s) and clusters that are most consistent with given constraints, and (e) modelling the development of cluster structures over time to allow the discovery and tracking of relationships between different clusters.***Progress with these issues will benefit a wide range and fast increasing number of application areas, in which Big Data is being collected, and which includes industrial and business application areas, as well as medical, biological, and and other scientific domains.*** **
提出的研究计划的主要目的是推进“大数据”聚类分析的理论和实践。“大数据”指的是当今越来越大、越来越复杂的数据集,这些数据集通常是由自动化设备(如微阵列芯片、传感器、测井设备)大规模收集的。这些数据集已经变得丰富,它们具有发现新见解的潜力,这可以为改进数据驱动的决策带来新的机会。无监督的探索性知识发现方法在实现这一潜力方面发挥着重要作用。******主要的探索性数据分析任务之一是在数据中找到“自然”组。了解大数据中的群体,可以更好地组织和分类大数据,更高效地浏览和搜索,集中分析特定群体,发现群体之间未知的关系。******聚类是在数据中查找组的最常见的无监督方法。然而,传统的聚类方法在应用于大数据时面临挑战,因为数据通常是大容量和高维的;而且它们的设计并不是为了利用一些大数据集的时间依赖性和不同数据源之间的关系等属性。******我提出的研究计划旨在推进聚类方法的理论和实践-特别是基于密度的聚类(即,聚类被认为是数据空间中的密集区域,由较低点密度的区域分隔)-应用于“大数据”。基于我们将获得的理论见解,我们将开发新的和改进的算法来聚类大数据,克服当前聚类方法的局限性,并将聚类的适用性扩展到更广泛的大数据场景。***研究将集中在以下几个方面:(a)处理大数据量的快速方法,(b)可以解决高维问题(如数据稀疏和“不相关”属性)的投影和子空间聚类,(c)基于约束的半监督聚类,包括来自相关数据源的约束,可以指导算法得到与这些约束一致的解决方案,(d)将投影/子空间聚类与半监督相结合,同时找到与给定约束最一致的子空间和集群,以及(e)随着时间的推移对集群结构的发展进行建模,以便发现和跟踪不同集群之间的关系。***这些问题的进展将有利于大数据收集的广泛且快速增长的应用领域,其中包括工业和商业应用领域,以及医疗、生物和其他科学领域。* * * * *
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sander, Jörg其他文献
Sander, Jörg的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sander, Jörg', 18)}}的其他基金
Finding Groups in Big Data
在大数据中寻找群体
- 批准号:
RGPIN-2016-04850 - 财政年份:2021
- 资助金额:
$ 3.35万 - 项目类别:
Discovery Grants Program - Individual
Finding Groups in Big Data
在大数据中寻找群体
- 批准号:
RGPIN-2016-04850 - 财政年份:2019
- 资助金额:
$ 3.35万 - 项目类别:
Discovery Grants Program - Individual
Finding Groups in Big Data
在大数据中寻找群体
- 批准号:
RGPIN-2016-04850 - 财政年份:2017
- 资助金额:
$ 3.35万 - 项目类别:
Discovery Grants Program - Individual
Finding Groups in Big Data
在大数据中寻找群体
- 批准号:
RGPIN-2016-04850 - 财政年份:2016
- 资助金额:
$ 3.35万 - 项目类别:
Discovery Grants Program - Individual
Projected and semi-supervised clustering for high-dimensional data
高维数据的投影和半监督聚类
- 批准号:
250344-2011 - 财政年份:2015
- 资助金额:
$ 3.35万 - 项目类别:
Discovery Grants Program - Individual
Projected and semi-supervised clustering for high-dimensional data
高维数据的投影和半监督聚类
- 批准号:
250344-2011 - 财政年份:2014
- 资助金额:
$ 3.35万 - 项目类别:
Discovery Grants Program - Individual
Projected and semi-supervised clustering for high-dimensional data
高维数据的投影和半监督聚类
- 批准号:
412377-2011 - 财政年份:2013
- 资助金额:
$ 3.35万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Projected and semi-supervised clustering for high-dimensional data
高维数据的投影和半监督聚类
- 批准号:
250344-2011 - 财政年份:2013
- 资助金额:
$ 3.35万 - 项目类别:
Discovery Grants Program - Individual
Finding significant temporal and spatial patterns in industrial process data
寻找工业过程数据中重要的时间和空间模式
- 批准号:
412198-2011 - 财政年份:2012
- 资助金额:
$ 3.35万 - 项目类别:
Collaborative Research and Development Grants
Projected and semi-supervised clustering for high-dimensional data
高维数据的投影和半监督聚类
- 批准号:
250344-2011 - 财政年份:2012
- 资助金额:
$ 3.35万 - 项目类别:
Discovery Grants Program - Individual
相似海外基金
Finding Groups in Big Data
在大数据中寻找群体
- 批准号:
RGPIN-2016-04850 - 财政年份:2021
- 资助金额:
$ 3.35万 - 项目类别:
Discovery Grants Program - Individual
Finding Groups in Big Data
在大数据中寻找群体
- 批准号:
RGPIN-2016-04850 - 财政年份:2019
- 资助金额:
$ 3.35万 - 项目类别:
Discovery Grants Program - Individual
Finding Groups in Big Data
在大数据中寻找群体
- 批准号:
RGPIN-2016-04850 - 财政年份:2017
- 资助金额:
$ 3.35万 - 项目类别:
Discovery Grants Program - Individual
Finding Groups in Big Data
在大数据中寻找群体
- 批准号:
RGPIN-2016-04850 - 财政年份:2016
- 资助金额:
$ 3.35万 - 项目类别:
Discovery Grants Program - Individual
Collaborative Research: Big Data from Small Groups: Learning Analytics and Adaptive Support in Game-based Collaborative Learning
协作研究:来自小组的大数据:基于游戏的协作学习中的学习分析和自适应支持
- 批准号:
1561486 - 财政年份:2016
- 资助金额:
$ 3.35万 - 项目类别:
Continuing Grant
Collaborative Research: Big Data from Small Groups: Learning Analytics and Adaptive Support in Game-based Collaborative Learning
协作研究:来自小组的大数据:基于游戏的协作学习中的学习分析和自适应支持
- 批准号:
1561655 - 财政年份:2016
- 资助金额:
$ 3.35万 - 项目类别:
Continuing Grant
Determinantal point processes and representations of big groups.
大群的行列式点过程和表示。
- 批准号:
420555-2012 - 财政年份:2014
- 资助金额:
$ 3.35万 - 项目类别:
Postgraduate Scholarships - Doctoral
Determinantal point processes and representations of big groups.
大群的行列式点过程和表示。
- 批准号:
420555-2012 - 财政年份:2013
- 资助金额:
$ 3.35万 - 项目类别:
Postgraduate Scholarships - Doctoral
Determinantal point processes and representations of big groups.
大群的行列式点过程和表示。
- 批准号:
420555-2012 - 财政年份:2012
- 资助金额:
$ 3.35万 - 项目类别:
Postgraduate Scholarships - Doctoral
Research on development process of Japanese "Big Six" company groups : An analysis based on company historical data
日本“六大”企业集团发展历程研究——基于企业历史数据的分析
- 批准号:
17530284 - 财政年份:2005
- 资助金额:
$ 3.35万 - 项目类别:
Grant-in-Aid for Scientific Research (C)














{{item.name}}会员




