III: Large: Collaborative Research: Analysis Engineering for Robust End-to-End Data Science
III:大型:协作研究:稳健的端到端数据科学的分析工程
基本信息
- 批准号:1856641
- 负责人:
- 金额:$ 71.25万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-10-01 至 2024-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
From poor statistical practices leading to retractions of scientific "discoveries" to low-level spreadsheet errors subverting high-stakes analyses, failures of data analysis can have catastrophic consequences. The rapid growth of data science practice in the last decade has led to large collaborative efforts to develop new data processing, machine learning, and analytics tools that put more advanced data analysis into the hands of a wider audience of practitioners, from students to scientists to designers. The most dominant tool for data science is code, where cutting-edge algorithms can be applied from an existing libraries. However, as this democratization of data science has lowered the barrier to using advanced methods, safely using these tools under sound statistical practice remains as difficult as ever. To facilitate more robust data science, this project investigates models and tools for analysis engineering by data scientists who write programs. The focus is on the complete end-to-end process of data analysis performed with code: the iterative, and often exploratory, steps that analysts go through to turn data into This project will contribute insights and characterizations of analytic work, novel methods for capturing and analyzing data science activities, and develop new programming tools and visualization methods for authoring and validating analyses. If successful, this project will augment people's ability to conduct and assess data analyses, promoting more robust results and reducing the gap between novice and expert analysts. The findings and tools from the project will be incorporated into educational efforts, including classroom teaching and tutorials and available as open source software integrated into popular analytical environments (e.g., Jupyter).Data analysis is a central activity to scientific research, yet is too often conducted in an undisciplined fashion. This project treats the entire analytic process as our central phenomenon of study. The project will employ mixed methods to study and characterize common analysis practices and pitfalls, including direct observations of data analysts, large-scale analysis of computational notebooks, and instrumentation of analytic programming environments like JupyterLab. The project will contribute new methods for specifying and safeguarding analyses, including domain-specific languages and program synthesis methods to guide users to preferred next steps. It will also explore "multiverse" workflows to manage and assess a diversity of analysis decisions. Analogues of debugging and testing tools will be developed to flag problems and perform error analysis, while the capture and visualization of analytic provenance to aid reproducibility, verification, and collaborative review. The work will be evaluated through controlled studies, classroom use, and open-source deployment for wide-scale field use.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
从糟糕的统计实践导致科学“发现”的撤回,到低级电子表格错误颠覆高风险的分析,数据分析的失败可能会产生灾难性的后果。在过去十年中,数据科学实践的快速增长导致了大规模的合作努力,以开发新的数据处理、机器学习和分析工具,将更高级的数据分析掌握在更广泛的从业者手中,从学生到科学家再到设计师。数据科学最主要的工具是代码,在代码中可以从现有的库中应用尖端算法。然而,由于数据科学的民主化降低了使用先进方法的障碍,在合理的统计实践下安全使用这些工具仍然一如既往地困难。为了促进更强大的数据科学,该项目研究了由编写程序的数据科学家进行分析工程的模型和工具。重点是使用代码执行的完整的端到端数据分析过程:分析师将数据转化为此项目所经历的迭代且通常是探索性的步骤将有助于分析工作的洞察和特征、捕获和分析数据科学活动的新方法,以及开发用于创作和验证分析的新编程工具和可视化方法。如果成功,这个项目将增强人们进行和评估数据分析的能力,促进更有力的结果,并缩小新手和专家分析师之间的差距。该项目的成果和工具将纳入教育工作,包括课堂教学和教程,并作为开放源码软件整合到流行的分析环境中(例如木星)。数据分析是科学研究的一项中心活动,但往往以无纪律的方式进行。这个项目将整个分析过程视为我们研究的中心现象。该项目将使用混合方法来研究和描述常见的分析做法和陷阱,包括对数据分析员的直接观察、对计算笔记本的大规模分析以及对JupyterLab等分析编程环境的工具。该项目将提供指定和保护分析的新方法,包括特定领域的语言和程序合成方法,以指导用户首选的下一步步骤。它还将探索“多元宇宙”工作流程,以管理和评估各种分析决策。将开发调试和测试工具的模拟工具,以标记问题和进行错误分析,同时捕获和可视化分析来源,以帮助重复性、核查和协作审查。这项工作将通过受控研究、课堂使用和大规模现场使用的开源部署进行评估。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Brad Myers其他文献
Using traits of web macro scripts to predict reuse
- DOI:
10.1016/j.jvlc.2010.08.003 - 发表时间:
2010-12-01 - 期刊:
- 影响因子:
- 作者:
Chris Scaffidi;Chris Bogart;Margaret Burnett;Allen Cypher;Brad Myers;Mary Shaw - 通讯作者:
Mary Shaw
Brad Myers的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Brad Myers', 18)}}的其他基金
SHF: Small: Personalizing API Documentation
SHF:小型:个性化 API 文档
- 批准号:
2007482 - 财政年份:2020
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
CHS: Small: Multimodal Conversational Assistant that Learns from Demonstrations
CHS:Small:从演示中学习的多模式对话助手
- 批准号:
1814472 - 财政年份:2018
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
TWC: Small: Empirical Evaluation of the Usability and Security Implications of Application Programming Interface Design
TWC:小:应用程序编程接口设计的可用性和安全性影响的实证评估
- 批准号:
1423054 - 财政年份:2014
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
HCC: Large: Collaborative Research: Variations to Support Exploratory Programming
HCC:大型:协作研究:支持探索性编程的变体
- 批准号:
1314356 - 财政年份:2013
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
HCC: Small: Better Tools for Authoring Interactive Behaviors
HCC:小:用于创作交互行为的更好工具
- 批准号:
1116724 - 财政年份:2011
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
Pilot: Exploratory Programming for Interactive Behaviors: Unleashing Interaction Designers' Creativity
试点:交互行为的探索性编程:释放交互设计师的创造力
- 批准号:
0757511 - 财政年份:2008
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
CPA-SEL: Better Tools for Software Understanding
CPA-SEL:更好的软件理解工具
- 批准号:
0811610 - 财政年份:2008
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
Automatically Generating Consistent User Interfaces for Multiple Appliances
自动为多个设备生成一致的用户界面
- 批准号:
0534349 - 财政年份:2005
- 资助金额:
$ 71.25万 - 项目类别:
Continuing Grant
Lowering the Barriers to Successful Programming
降低成功编程的障碍
- 批准号:
0329090 - 财政年份:2003
- 资助金额:
$ 71.25万 - 项目类别:
Continuing Grant
ITR: Collaborative Research: Dependable End-User Software
ITR:协作研究:可靠的最终用户软件
- 批准号:
0324770 - 财政年份:2003
- 资助金额:
$ 71.25万 - 项目类别:
Continuing Grant
相似国自然基金
水稻穗粒数调控关键因子LARGE6的分子遗传网络解析
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
量子自旋液体中拓扑拟粒子的性质:量子蒙特卡罗和新的large-N理论
- 批准号:
- 批准年份:2020
- 资助金额:62 万元
- 项目类别:面上项目
甘蓝型油菜Large Grain基因调控粒重的分子机制研究
- 批准号:31972875
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
Large PB/PB小鼠 视网膜新生血管模型的研究
- 批准号:30971650
- 批准年份:2009
- 资助金额:8.0 万元
- 项目类别:面上项目
基因discs large在果蝇卵母细胞的后端定位及其体轴极性形成中的作用机制
- 批准号:30800648
- 批准年份:2008
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
LARGE基因对口腔癌细胞中α-DG糖基化及表达的分子调控
- 批准号:30772435
- 批准年份:2007
- 资助金额:29.0 万元
- 项目类别:面上项目
相似海外基金
III: Medium: Collaborative Research: Integrating Large-Scale Machine Learning and Edge Computing for Collaborative Autonomous Vehicles
III:媒介:协作研究:集成大规模机器学习和边缘计算以实现协作自动驾驶汽车
- 批准号:
2348169 - 财政年份:2023
- 资助金额:
$ 71.25万 - 项目类别:
Continuing Grant
Collaborative Research: III: Small: Taming Large-Scale Streaming Graphs in an Open World
协作研究:III:小型:在开放世界中驯服大规模流图
- 批准号:
2236578 - 财政年份:2023
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
Collaborative Research: III: Small: Taming Large-Scale Streaming Graphs in an Open World
协作研究:III:小型:在开放世界中驯服大规模流图
- 批准号:
2236579 - 财政年份:2023
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
III: Small: Collaborative Research: Cost-Efficient Sampling and Estimation from Large-Scale Networks
III:小型:协作研究:大规模网络的经济高效采样和估计
- 批准号:
2209921 - 财政年份:2021
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
Collaborative Research: Chameleon Phase III: A Large-Scale, Reconfigurable Experimental Environment for Cloud Research
合作研究:Chameleon 第三阶段:用于云研究的大规模、可重构实验环境
- 批准号:
2027170 - 财政年份:2020
- 资助金额:
$ 71.25万 - 项目类别:
Cooperative Agreement
Collaborative Research: Chameleon Phase III: A Large-Scale, Reconfigurable Experimental Environment for Cloud Research
合作研究:Chameleon 第三阶段:用于云研究的大规模、可重构实验环境
- 批准号:
2027174 - 财政年份:2020
- 资助金额:
$ 71.25万 - 项目类别:
Cooperative Agreement
III: Medium: Collaborative Research: Integrating Large-Scale Machine Learning and Edge Computing for Collaborative Autonomous Vehicles
III:媒介:协作研究:集成大规模机器学习和边缘计算以实现协作自动驾驶汽车
- 批准号:
1956002 - 财政年份:2020
- 资助金额:
$ 71.25万 - 项目类别:
Continuing Grant
Collaborative Research: Chameleon Phase III: A Large-Scale, Reconfigurable Experimental Environment for Cloud Research
合作研究:Chameleon 第三阶段:用于云研究的大规模、可重构实验环境
- 批准号:
2027173 - 财政年份:2020
- 资助金额:
$ 71.25万 - 项目类别:
Cooperative Agreement
Collaborative Research: Chameleon Phase III: A Large-Scale, Reconfigurable Experimental Environment for Cloud Research
合作研究:Chameleon 第三阶段:用于云研究的大规模、可重构实验环境
- 批准号:
2027176 - 财政年份:2020
- 资助金额:
$ 71.25万 - 项目类别:
Cooperative Agreement
III: Medium: Collaborative Research: Integrating Large-Scale Machine Learning and Edge Computing for Collaborative Autonomous Vehicles
III:媒介:协作研究:集成大规模机器学习和边缘计算以实现协作自动驾驶汽车
- 批准号:
1955890 - 财政年份:2020
- 资助金额:
$ 71.25万 - 项目类别:
Continuing Grant