权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CHS: Medium: Scaling Qualitative Inductive Analysis through Computational Methods

CHS：中：通过计算方法扩展定性归纳分析

基本信息

批准号：
1764089
负责人：
Jed Brubaker
金额：
$ 108.11万
依托单位：
University of Colorado at Boulder
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-08-01 至 2023-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1764089&HistoricalAwards=false
关键词：
CHS Medium Scaling Qualitative Inductive

项目摘要

This project focuses on the integration of people and computation in the context of qualitative inductive methods (QIMs), in which experts deeply engage with text corpora such as open-ended surveys, transcribed interviews, or collections of social media content. This engagement can produce insights, but constraints on expertise and time make these methods hard to scale to large datasets. Technologies like machine learning and natural language processing (NLP), which can mine certain kinds of patterns from text data at scales not feasible for even large teams of humans, may offer a way forward; however, machines make mistakes understanding the nuances of language, lack the context and expertise of human analysts, and may fail to detect interesting small-scale patterns necessary to solve particular problems. The goal of this project is to scale up the use of QIMs by inverting traditional models where humans are used to verify computational results ("human-in-the-loop"), starting instead with human insights that can be amplified through computational models, support, and suggestions for analysis ("computer-in-the-loop"). Working with collaborators in domains including mental health, public health, disaster response, policy making, and philanthropy, the team will conduct qualitative studies of their QIM practices and needs, then develop and evaluate systems with the goal of improving both the quality and scale of the insights experts can generate. The team will produce publicly available versions of the tools and disseminate them through an online community for interested researchers from all fields. The project activities will also inform courses on information visualization, human computer interaction, and applied machine learning, along with workshops aimed at recruiting high school women to careers in computing.The work is organized around three main threads. The first thread is to conduct a deep analysis of qualitative work processes, using both existing accounts of qualitative work in the literature and participant observation methods with at least 10 teams who approach QIM from a range of disciplines, domains, and scales. Through analysis of interviews, logs, and artifacts, the team will generate rich descriptions of these work processes that will further both understanding of QIMs as a method and identify open problems amenable to computational support. The second thread is to develop computational models of QIMs that align with analysts' processes and judgments around qualitative data, using a variety of datasets from the research team and their partners. Because many QIM methods label specific text passages as relevant to a specific concept, taking those as positive examples and nearby, unlabeled data as negative examples may allow QIMs to be modeled as a series of binary classification problems. This will allow the team to use NLP methods guided by domain knowledge and insights from the first thread to generate features for machine learning-based models. The third thread involves connecting these models to analysts' processes through a series of passage-level, document-level, and theme-level visualizations that leverage the models' predictions to suggest other passages relevant to a concept in a given document, hierarchical aggregation of patterns across documents to support the extraction of higher-level themes along with ways to merge and divide concepts, and statistical analysis of the prevalence of themes in corpora-level analysis. The algorithms and tools will be evaluated through a series of offline tests against existing analyzed datasets, short analysis challenge contests in workshop settings to evaluate usability and reactions to the tool, longitudinal three-month deployments with partners that involve weekly semi-structured questionnaires about their usability in practice, and in an online community to both provide support for and collect feedback about the system while growing a methodological community of practice around this style of big data analysis.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

该项目侧重于定性归纳方法（QIMs）背景下的人与计算的整合，在这种方法中，专家们深入研究文本语料库，如开放式调查、转录采访或社交媒体内容集合。这种参与可以产生洞察力，但专业知识和时间的限制使这些方法难以扩展到大型数据集。像机器学习和自然语言处理（NLP）这样的技术，可以从文本数据中挖掘出某些类型的模式，其规模甚至对大型人类团队来说都是不可行的，可能会提供一条前进的道路；然而，机器在理解语言的细微差别时会犯错误，缺乏人类分析师的背景和专业知识，并且可能无法检测到解决特定问题所需的有趣的小规模模式。这个项目的目标是通过反转传统模型来扩大QIMs的使用，在传统模型中，人类被用来验证计算结果（“人在循环”），取而代之的是人类的见解，可以通过计算模型、支持和分析建议来放大（“计算机在循环”）。该团队将与心理健康、公共卫生、灾难应对、政策制定和慈善事业等领域的合作者合作，对他们的QIM实践和需求进行定性研究，然后开发和评估系统，以提高专家可以产生的见解的质量和规模。该团队将制作这些工具的公开版本，并通过在线社区向所有领域感兴趣的研究人员传播。该项目活动还将为信息可视化、人机交互和应用机器学习等课程提供信息，并举办旨在招募高中女性从事计算机职业的讲习班。这项工作是围绕三个主线组织的。第一个线索是对定性工作过程进行深入分析，使用文献中现有的定性工作描述和参与者观察方法，至少有10个团队从一系列学科、领域和规模中接近QIM。通过对访谈、日志和工件的分析，团队将生成对这些工作过程的丰富描述，这些描述将进一步理解QIMs作为一种方法，并确定适合计算支持的开放问题。第二条主线是使用来自研究团队及其合作伙伴的各种数据集，开发QIMs的计算模型，使其与分析师围绕定性数据的过程和判断保持一致。由于许多qm方法将特定文本段落标记为与特定概念相关，因此将这些文本段落作为正例，将附近未标记的数据作为负例，可能允许将qm建模为一系列二元分类问题。这将允许团队使用NLP方法，由领域知识和来自第一个线程的见解指导，为基于机器学习的模型生成特征。第三条线索涉及通过一系列段落级、文档级和主题级可视化将这些模型连接到分析人员的流程，这些可视化利用模型的预测来建议与给定文档中某个概念相关的其他段落，跨文档的模式分层聚合以支持抽取高级主题以及合并和划分概念的方法，以及对语料库级分析中主题流行程度的统计分析。算法和工具将通过一系列针对现有分析数据集的离线测试进行评估，在研讨会设置中进行简短的分析挑战竞赛，以评估工具的可用性和对工具的反应，与合作伙伴进行为期三个月的纵向部署，每周对其在实践中的可用性进行半结构化问卷调查。在一个在线社区中，为系统提供支持和收集反馈，同时围绕这种风格的大数据分析发展一个方法论社区。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（11）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Using Machine Learning and Visualization for Qualitative Inductive Analyses of Big Data

使用机器学习和可视化对大数据进行定性归纳分析

DOI：
发表时间：
2020
期刊：
Proceedings of the 2019 Workshop on Machine Learning from User Interaction
影响因子：
0
作者：
Muthukrishnan, Harshini Priya;Szafir, Danielle Albers
通讯作者：
Szafir, Danielle Albers

Qualitative Methods for CSCW: Challenges and Opportunities

CSCW 的定性方法：挑战与机遇

DOI：
10.1145/3311957.3359428
发表时间：
2019
期刊：
Conference Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing - CSCW ’19
影响因子：
0
作者：
Fiesler, Casey;Brubaker, Jed R.;Forte, Andrea;Guha, Shion;McDonald, Nora;Muller, Michael
通讯作者：
Muller, Michael

Putting Tools in Their Place: The Role of Time and Perspective in Human-AI Collaboration for Qualitative Analysis

DOI：
10.1145/3479856
发表时间：
2021-10
期刊：
Proceedings of the ACM on Human-Computer Interaction
影响因子：
0
作者：
Jessica L. Feuston;Jed R. Brubaker
通讯作者：
Jessica L. Feuston;Jed R. Brubaker

A Design Space of Vision Science Methods for Visualization Research

DOI：
10.1109/tvcg.2020.3029413
发表时间：
2021-02-01
期刊：
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS
影响因子：
5.2
作者：
Elliott, Madison A.;Nothelfer, Christine;Szafir, Danielle Albers
通讯作者：
Szafir, Danielle Albers

Cultivating Visualization Literacy for Children Through Curiosity and Play

通过好奇心和游戏培养孩子的可视化素养

DOI：
10.1109/tvcg.2022.3209442
发表时间：
2023
期刊：
IEEE Transactions on Visualization and Computer Graphics
影响因子：
5.2
作者：
Bae, S. Sandra;Vanukuru, Rishi;Yang, Ruhan;Gyory, Peter;Zhou, Ran;Do, Ellen Yi-Luen;Szafir, Danielle Albers
通讯作者：
Szafir, Danielle Albers