III: Large: Collaborative Research: Analysis Engineering for Robust End-to-End Data Science

III:大型:协作研究:稳健的端到端数据科学的分析工程

基本信息

  • 批准号:
    1900991
  • 负责人:
  • 金额:
    $ 71.25万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-10-01 至 2024-09-30
  • 项目状态:
    已结题

项目摘要

From poor statistical practices leading to retractions of scientific "discoveries" to low-level spreadsheet errors subverting high-stakes analyses, failures of data analysis can have catastrophic consequences. The rapid growth of data science practice in the last decade has led to large collaborative efforts to develop new data processing, machine learning, and analytics tools that put more advanced data analysis into the hands of a wider audience of practitioners, from students to scientists to designers. The most dominant tool for data science is code, where cutting-edge algorithms can be applied from an existing libraries. However, as this democratization of data science has lowered the barrier to using advanced methods, safely using these tools under sound statistical practice remains as difficult as ever. To facilitate more robust data science, this project investigates models and tools for analysis engineering by data scientists who write programs. The focus is on the complete end-to-end process of data analysis performed with code: the iterative, and often exploratory, steps that analysts go through to turn data into This project will contribute insights and characterizations of analytic work, novel methods for capturing and analyzing data science activities, and develop new programming tools and visualization methods for authoring and validating analyses. If successful, this project will augment people's ability to conduct and assess data analyses, promoting more robust results and reducing the gap between novice and expert analysts. The findings and tools from the project will be incorporated into educational efforts, including classroom teaching and tutorials and available as open source software integrated into popular analytical environments (e.g., Jupyter).Data analysis is a central activity to scientific research, yet is too often conducted in an undisciplined fashion. This project treats the entire analytic process as our central phenomenon of study. The project will employ mixed methods to study and characterize common analysis practices and pitfalls, including direct observations of data analysts, large-scale analysis of computational notebooks, and instrumentation of analytic programming environments like JupyterLab. The project will contribute new methods for specifying and safeguarding analyses, including domain-specific languages and program synthesis methods to guide users to preferred next steps. It will also explore "multiverse" workflows to manage and assess a diversity of analysis decisions. Analogues of debugging and testing tools will be developed to flag problems and perform error analysis, while the capture and visualization of analytic provenance to aid reproducibility, verification, and collaborative review. The work will be evaluated through controlled studies, classroom use, and open-source deployment for wide-scale field use.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
从导致科学“发现”撤回的不良统计实践到破坏高风险分析的低级电子表格错误,数据分析的失败可能会产生灾难性的后果。在过去十年中,数据科学实践的快速增长导致了大规模的合作努力,以开发新的数据处理,机器学习和分析工具,将更先进的数据分析交给更广泛的从业者,从学生到科学家再到设计师。数据科学最主要的工具是代码,可以从现有的库中应用尖端的算法。然而,随着数据科学的民主化降低了使用先进方法的障碍,在合理的统计实践下安全地使用这些工具仍然像以往一样困难。为了促进更强大的数据科学,该项目研究了编写程序的数据科学家进行分析工程的模型和工具。重点是使用代码执行的完整的端到端数据分析过程:分析师将数据转化为数据的迭代且通常是探索性的步骤。该项目将有助于分析工作的见解和特征,捕获和分析数据科学活动的新方法,并开发新的编程工具和可视化方法来创作和验证分析。如果成功,该项目将提高人们进行和评估数据分析的能力,促进更有力的结果,并缩小新手和专家分析师之间的差距。该项目的研究结果和工具将被纳入教育工作,包括课堂教学和教程,并作为开源软件集成到流行的分析环境中(例如,数据分析是科学研究的核心活动,但往往是以一种无纪律的方式进行的。这个项目把整个分析过程作为我们研究的中心现象。该项目将采用混合方法来研究和描述常见的分析实践和陷阱,包括数据分析师的直接观察,计算笔记本的大规模分析以及分析编程环境(如XuanyterLab)的仪器化。该项目将为指定和保护分析提供新的方法,包括特定领域的语言和程序合成方法,以指导用户选择下一步。它还将探索“多元宇宙”工作流程,以管理和评估各种分析决策。将开发类似的调试和测试工具,以标记问题并执行错误分析,同时捕获和可视化分析出处,以帮助再现性,验证和协作审查。这项工作将通过受控研究、课堂使用和大规模现场使用的开源部署进行评估。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的知识价值和更广泛的影响审查标准进行评估来支持。

项目成果

期刊论文数量(11)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs
B2: Bridging Code and Interactive Visualization in Computational Notebooks
Accessible Visualization via Natural Language Descriptions: A Four-Level Model of Semantic Content
Striking a Balance: Reader Takeaways and Preferences when Integrating Text and Charts
取得平衡:整合文本和图表时读者的要点和偏好
Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and their Needs
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Arvind Satyanarayan其他文献

Visual Debugging Techniques for Reactive Data Visualization
反应式数据可视化的可视化调试技术
  • DOI:
    10.1111/cgf.12903
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    2.5
  • 作者:
    J. Hoffswell;Arvind Satyanarayan;Jeffrey Heer
  • 通讯作者:
    Jeffrey Heer
Varv: Reprogrammable Interactive Software as a Declarative Data Structure
Varv:作为声明性数据结构的可重新编程交互式软件
Umwelt: Accessible Structured Editing of Multimodal Data Representations
Umwelt:多模式数据表示的可访问结构化编辑
“Customization is Key”: Reconfigurable Textual Tokens for Accessible Data Visualizations
“定制是关键”:可重新配置文本标记以实现可访问的数据可视化
CAREER: Effective Interaction Design for Data Visualization
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Arvind Satyanarayan
  • 通讯作者:
    Arvind Satyanarayan

Arvind Satyanarayan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Arvind Satyanarayan', 18)}}的其他基金

CAREER: Effective Interaction Design for Data Visualization
职业:数据可视化的有效交互设计
  • 批准号:
    1942659
  • 财政年份:
    2020
  • 资助金额:
    $ 71.25万
  • 项目类别:
    Continuing Grant

相似国自然基金

水稻穗粒数调控关键因子LARGE6的分子遗传网络解析
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
量子自旋液体中拓扑拟粒子的性质:量子蒙特卡罗和新的large-N理论
  • 批准号:
  • 批准年份:
    2020
  • 资助金额:
    62 万元
  • 项目类别:
    面上项目
甘蓝型油菜Large Grain基因调控粒重的分子机制研究
  • 批准号:
    31972875
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
Large PB/PB小鼠 视网膜新生血管模型的研究
  • 批准号:
    30971650
  • 批准年份:
    2009
  • 资助金额:
    8.0 万元
  • 项目类别:
    面上项目
基因discs large在果蝇卵母细胞的后端定位及其体轴极性形成中的作用机制
  • 批准号:
    30800648
  • 批准年份:
    2008
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
LARGE基因对口腔癌细胞中α-DG糖基化及表达的分子调控
  • 批准号:
    30772435
  • 批准年份:
    2007
  • 资助金额:
    29.0 万元
  • 项目类别:
    面上项目

相似海外基金

III: Medium: Collaborative Research: Integrating Large-Scale Machine Learning and Edge Computing for Collaborative Autonomous Vehicles
III:媒介:协作研究:集成大规模机器学习和边缘计算以实现协作自动驾驶汽车
  • 批准号:
    2348169
  • 财政年份:
    2023
  • 资助金额:
    $ 71.25万
  • 项目类别:
    Continuing Grant
Collaborative Research: III: Small: Taming Large-Scale Streaming Graphs in an Open World
协作研究:III:小型:在开放世界中驯服大规模流图
  • 批准号:
    2236578
  • 财政年份:
    2023
  • 资助金额:
    $ 71.25万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Small: Taming Large-Scale Streaming Graphs in an Open World
协作研究:III:小型:在开放世界中驯服大规模流图
  • 批准号:
    2236579
  • 财政年份:
    2023
  • 资助金额:
    $ 71.25万
  • 项目类别:
    Standard Grant
III: Small: Collaborative Research: Cost-Efficient Sampling and Estimation from Large-Scale Networks
III:小型:协作研究:大规模网络的经济高效采样和估计
  • 批准号:
    2209921
  • 财政年份:
    2021
  • 资助金额:
    $ 71.25万
  • 项目类别:
    Standard Grant
Collaborative Research: Chameleon Phase III: A Large-Scale, Reconfigurable Experimental Environment for Cloud Research
合作研究:Chameleon 第三阶段:用于云研究的大规模、可重构实验环境
  • 批准号:
    2027170
  • 财政年份:
    2020
  • 资助金额:
    $ 71.25万
  • 项目类别:
    Cooperative Agreement
Collaborative Research: Chameleon Phase III: A Large-Scale, Reconfigurable Experimental Environment for Cloud Research
合作研究:Chameleon 第三阶段:用于云研究的大规模、可重构实验环境
  • 批准号:
    2027174
  • 财政年份:
    2020
  • 资助金额:
    $ 71.25万
  • 项目类别:
    Cooperative Agreement
III: Medium: Collaborative Research: Integrating Large-Scale Machine Learning and Edge Computing for Collaborative Autonomous Vehicles
III:媒介:协作研究:集成大规模机器学习和边缘计算以实现协作自动驾驶汽车
  • 批准号:
    1956002
  • 财政年份:
    2020
  • 资助金额:
    $ 71.25万
  • 项目类别:
    Continuing Grant
Collaborative Research: Chameleon Phase III: A Large-Scale, Reconfigurable Experimental Environment for Cloud Research
合作研究:Chameleon 第三阶段:用于云研究的大规模、可重构实验环境
  • 批准号:
    2027173
  • 财政年份:
    2020
  • 资助金额:
    $ 71.25万
  • 项目类别:
    Cooperative Agreement
Collaborative Research: Chameleon Phase III: A Large-Scale, Reconfigurable Experimental Environment for Cloud Research
合作研究:Chameleon 第三阶段:用于云研究的大规模、可重构实验环境
  • 批准号:
    2027176
  • 财政年份:
    2020
  • 资助金额:
    $ 71.25万
  • 项目类别:
    Cooperative Agreement
III: Medium: Collaborative Research: Integrating Large-Scale Machine Learning and Edge Computing for Collaborative Autonomous Vehicles
III:媒介:协作研究:集成大规模机器学习和边缘计算以实现协作自动驾驶汽车
  • 批准号:
    1955890
  • 财政年份:
    2020
  • 资助金额:
    $ 71.25万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了