权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

BIGDATA: Small: DA: Collaborative Research: From Data to Users: Providing Interpretable and Verifiable Explanations in Data Mining

BIGDATA：小：DA：协作研究：从数据到用户：在数据挖掘中提供可解释和可验证的解释

基本信息

批准号：
1251049
负责人：
Suresh Venkatasubramanian
金额：
$ 50万
依托单位：
University of Utah
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2013
资助国家：
美国
起止时间：
2013-09-15 至 2017-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1251049&HistoricalAwards=false
关键词：
BIGDATA Small DA Collaborative Research

项目摘要

The fruits of data mining pervade every aspect of our lives. We have books and movies recommended; we are given differential pricing for insurance; screened for potential terror threats; diagnosed with various diseases; and targeted for political advertising. The ability to sift through massive data sets with sophisticated algorithms has resulted in applications with impressive predictive power. And yet there is still a gap between what such tools can deliver, and what the users of data mining really need. It is often hard to interpret the answers produced by a learning algorithm, due to its sophistication and the use of large data sets to build models. The results of mining are often "one-size-fits-all", and convincing a user that results are actually relevant to them is difficult. Finally, there is the important problem of validation. As the results of data mining affect more and more of our lives, the more crucial it is that the user be able to validate decisions made on their behalf and that affect them. The common theme tying these issues together is a user-centric perspective on the problems of data mining. Rather than asking "What patterns can be found in this mountain of data?" this work instead asks "What structures in this data affect me?" These issues arise precisely because of the vast amounts of data we now have the ability to mine, and the sophisticated methods at our disposal to analyze this data. In this research, the PIs develop a computational framework and key tools for user-centric data mining. A central theme in this research is the idea of interaction. In both machine learning and in the foundations of complexity theory, interaction has been used to allow a (weaker) entity to probe a much more powerful system and determine answers that it lacks the resources to compute directly itself. The PIs use formal interaction mechanisms both from the perspective of a user interacting with a powerful algorithm, as well as a client interacting with a computing source with access to large data, in order to enable the user to interpret and validate the results of data mining. The goal of this project is to develop a computational framework for user-centric data mining that enables existing users to tailor data analysis to their needs and facilitates the use of data mining in new areas where existing The team proposes interactive mechanisms that start with the results of a learning process and, via interaction with the user, produce an explanation expressed in terms of meaningful features, drawing on ideas from active learning, feature selection, and domain adaptation. 2. Locality: Answers that are relevant. Here, the focus is on providing information that depends more on a user?s local neighborhood, achieved via a new local notion of stability. 3. Verifiability: Answers you can check. The team proposes a framework for the validation of computationally-intensive data mining by the computationally-weak user, with ideas from interactive proof theory and stream algorithms. Tools for analyzing patient medical data have become more sophisticated and individual medical profiles play a far more significant role in diagnosis and treatment.The research examines user-centric data mining via three core primitives (classification, regression and clustering), and studies the three problems of interpreting results, providing local explanations, and validating the results of data mining. Firstly, the research draws on ideas from active learning, feature selection and domain adaptation to build interpretable results via interaction with users. Secondly, it introduces local notions of stability as a way of validating predictions for a specific user. Finally, it develops a general framework for validation of an analysis by a computationally-weak user, by drawing on ideas from the theory of interactive proofs and streaming algorithms.

数据挖掘的成果渗透到我们生活的方方面面。我们有推荐的书籍和电影；我们被给予不同的保险定价；我们被筛选出潜在的恐怖威胁；被诊断出患有各种疾病；以及政治广告的目标。使用复杂的算法筛选海量数据集的能力导致了具有令人印象深刻的预测能力的应用程序。然而，在这些工具能提供什么，与数据挖掘用户真正需要的东西之间，仍然存在差距。通常很难解释学习算法产生的答案，因为它很复杂，而且使用大型数据集来建立模型。挖掘的结果往往是“一刀切”的，让用户相信结果实际上与他们相关是很困难的。最后，还有一个重要的验证问题。随着数据挖掘的结果越来越多地影响到我们的生活，用户能够验证代表他们做出的决策以及影响他们的决策就越重要。将这些问题联系在一起的共同主题是对数据挖掘问题的以用户为中心的观点。与其问“在这些堆积如山的数据中能找到什么模式？”相反，这项工作问的是“这些数据中的哪些结构会影响我？”这些问题的出现正是因为我们现在有能力挖掘海量数据，以及我们可以使用复杂的方法来分析这些数据。在这项研究中，PI为以用户为中心的数据挖掘开发了一个计算框架和关键工具。这项研究的一个中心主题是互动的想法。无论是在机器学习中，还是在复杂性理论的基础上，交互都被用来允许(较弱的)实体探索更强大的系统，并确定它缺乏直接计算资源的答案。PI使用正式的交互机制，既从用户与强大的算法交互的角度，也从客户端与具有访问大数据的计算源的交互的角度，以使用户能够解释和验证数据挖掘的结果。该项目的目标是开发一个以用户为中心的数据挖掘计算框架，使现有用户能够根据他们的需要进行数据分析，并促进在现有数据挖掘的新领域使用数据挖掘。该小组提出了互动机制，从学习过程的结果开始，通过与用户的互动，利用主动学习、特征选择和领域适应的想法，产生以有意义的特征表示的解释。2.地方性：相关的答案。在这里，重点是提供更依赖于用户的信息-S当地社区，通过当地一种新的稳定概念实现。3.可验证性：答案可以查看。该团队结合交互证明理论和流算法的思想，提出了一个框架，用于验证计算能力较弱的用户对计算密集型数据挖掘的有效性。分析患者医疗数据的工具已经变得越来越复杂，个人医疗档案在诊断和治疗中发挥着越来越重要的作用。本研究通过三个核心原语(分类、回归和聚类)来研究以用户为中心的数据挖掘，并研究了数据挖掘的结果解释、局部解释和结果验证三个问题。首先，该研究借鉴了主动学习、特征选择和领域自适应的思想，通过与用户的互动来构建可解释的结果。其次，它引入了局部稳定性的概念，作为验证特定用户预测的一种方式。最后，通过借鉴交互证明理论和流算法的思想，它开发了一个通用框架，用于验证计算能力较弱的用户的分析。