权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

BIGDATA: F: DKA: Usable Multiple Scale Big Data Analytics through Interactive Visualization

BIGDATA：F：DKA：通过交互式可视化进行可用的多尺度大数据分析

基本信息

批准号：
1447416
负责人：
Christopher North
金额：
$ 99.89万
依托单位：
Virginia Polytechnic Institute and State University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-09-01 至 2018-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1447416&HistoricalAwards=false
关键词：
BIGDATA DKA Usable Multiple Scale

项目摘要

Gaining big insight from big data requires big analytics, which poses big usability problems. Analyses of big data often rely on several computational and statistical models that operate on multiple levels of data scale to discover and characterize noteworthy patterns. The models work jointly or in sequence to filter, group, summarize, and visualize big data so that analysts may assess the data. As a simple example in big text analytics, massive text is first sampled for relevant or representative words, then further reduced by a complex form of modeling (e.g., topic modeling), then visualized by applying a dimension reduction algorithm. As the size of data increases, so does the number of models and, likewise, the need for human interaction in the analytical process. By interacting, humans include expert judgment into the analytical process, and efficiently explore and make sense of big data from varying perspectives. However, for a variety of reasons, interacting with any individual model is difficult, let alone a growing number of models. Thus, current human-computer-interaction research is merged with complex statistical methods and fast computation to develop a usable, multi-model analytic framework for big data. Wrapped in software, the framework will be accessible to both professional and student users alike; i.e., available to make new discoveries in current government and industrial big datasets, as well as, educate future analysts at the undergraduate and graduate levels given new teaching modules. The new analytic framework extends Visual-to-Parametric Interaction (V2PI) to Multi-scale V2PI (MV2PI). V2PI currently supports usable small-data analytics, and enables users to adjust model parameters by interacting directly with data in visualizations. That is, V2PI interprets visual interactions quantitatively to update underlying model parameters and produce new visualizations. MV2PI now links together several models that operate at multiple levels of data-scale in a unified interactive space. In MV2PI, small-scale data interactions in visualizations propagate to larger scale models (by inverting them and updating their parameters) and new visualizations are generated. In the text analytics example, if users drag several data points together to hypothesize a cluster, the inverted dimension reduction model computes updated dimension weights, queries relevant new hits at the large scale, identifies changed topics, and updates the layout to show big-data support for the new cluster. With MV2PI, users may interactively explore large-scale data and complex inter-relationships between models in real time, and in a usable fashion that directly supports their natural cognitive sensemaking process. Development of MV2PI involves: (1) formulation of an explicitly stated framework ; (2) creation of new interactive models (e.g., Interactive K-means and Interactive Latent Dirichlet Allocation) that cover different levels of scale and support MV2PI model inversion; (3) implementation of computational methods to support high-performance, real-time model updates; and (4) evaluation of MV2PI software framework for usability and effectiveness. The project web site (http://www.apps.stat.vt.edu/bava/mv2pi.html) will include information on MV2PI development, access to software, datasets, educational materials, and publications.

从大数据中获得深刻见解需要大分析，这带来了很大的可用性问题。大数据的分析通常依赖于几个计算和统计模型，这些模型在多个数据规模级别上运行，以发现和表征值得注意的模式。这些模型联合或按顺序工作，以过滤、分组、汇总和可视化大数据，以便分析师可以评估数据。作为大文本分析中的一个简单示例，首先针对相关或代表性单词对大量文本进行采样，然后通过复杂形式的建模（例如，主题建模），然后通过应用降维算法来可视化。随着数据大小的增加，模型的数量也会增加，同样，分析过程中对人机交互的需求也会增加。通过交互，人类将专家判断纳入分析过程，并从不同的角度有效地探索和理解大数据。然而，由于各种原因，与任何单个模型交互都很困难，更不用说越来越多的模型了。因此，当前的人机交互研究与复杂的统计方法和快速计算相结合，为大数据开发了一个可用的多模型分析框架。该框架被软件包裹，专业和学生用户都可以使用;即，可在当前的政府和工业大数据集中进行新的发现，并在本科和研究生阶段教育未来的分析师，提供新的教学模块。新的分析框架将视觉参数交互（V2 PI）扩展到多尺度V2 PI（MV 2 PI）。V2 PI目前支持可用的小数据分析，并使用户能够通过直接与可视化中的数据交互来调整模型参数。也就是说，V2 PI定量地解释视觉交互，以更新底层模型参数并生成新的可视化。MV 2 PI现在将多个模型连接在一起，这些模型在统一的交互空间中以多个数据规模级别运行。在MV 2 PI中，可视化中的小规模数据交互传播到更大规模的模型（通过反转它们并更新它们的参数），并生成新的可视化。在文本分析示例中，如果用户将几个数据点拖在一起以假设一个聚类，则反向降维模型计算更新的维度权重，查询大规模的相关新命中，识别更改的主题，并更新布局以显示对新聚类的大数据支持。通过MV 2 PI，用户可以以直接支持其自然认知意义构建过程的可用方式，以真实的实时交互式探索大规模数据和模型之间复杂的相互关系。 MV 2 PI的开发包括：（1）制定明确的框架;（2）创建新的交互模型（例如，交互式K-means和交互式潜在Dirichlet分配），涵盖不同的规模水平，并支持MV 2 PI模型反演;（3）实施计算方法，以支持高性能，实时模型更新;以及（4）评估MV 2 PI软件框架的可用性和有效性。该项目网站（http：//www.apps.stat.vt.edu/bava/mv2pi.html）将包括关于MV 2 PI开发、获取软件、数据集、教育材料和出版物的信息。