SHF: Medium: Interactive Debegging for Big Data Analytics
SHF:中:大数据分析的交互式调试
基本信息
- 批准号:1764077
- 负责人:
- 金额:$ 90万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-07-01 至 2024-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
An abundance of data in science, engineering, national security, and health care has led to the emerging field of big data analytics. To process massive quantities of data, developers leverage data-intensive scalable computing (DISC) systems in the cloud, such as Google's MapReduce, Apache Hadoop, and Apache Spark. While DISC systems help to address the scalability challenges of big data analytics, they also introduce an enormous challenge for data scientists in understanding and resolving errors. This project addresses the severe lack of debugging support in DISC systems today, which makes it difficult for data scientists to understand their applications, determine the causes of identified errors, and ensure that such errors are properly repaired. The research provides two kinds of debugging support for big data processing programs in modern DISC systems like Apache Spark: new interactive, real-time debugging primitives for large-scale distributed processing and tool-assisted fault-localization services for big data. Technical approaches include a new data provenance technique for providing fine-grained visibility into large-scale distributed data processing and runtime optimizations for iterative development and debugging workloads. Tool-assisted fault localization services leverage these underlying provenance and optimization techniques to pinpoint and characterize the root causes of errors efficiently. Big data analytics is increasingly important in the 21st century, where daily lives leave behind a detailed digital record and decision-makers of all kinds, from companies to government agencies, would like to base their actions on data. The research contributes to improving productivity and correctness of big data applications, which is crucial for many disciplines that distill terabytes of low-value data into high-value insights.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
科学、工程、国家安全和医疗保健领域的大量数据催生了大数据分析这一新兴领域。 为了处理大量数据,开发人员利用云中的数据密集型可扩展计算(DISC)系统,例如Google的MapReduce,Apache Hadoop和Apache Spark。虽然DISC系统有助于解决大数据分析的可扩展性挑战,但它们也为数据科学家在理解和解决错误方面带来了巨大的挑战。该项目解决了当今DISC系统严重缺乏调试支持的问题,这使得数据科学家难以理解其应用程序,确定已识别错误的原因,并确保此类错误得到正确修复。 该研究为Apache Spark等现代DISC系统中的大数据处理程序提供了两种调试支持:用于大规模分布式处理的新交互式实时调试原语和用于大数据的工具辅助故障定位服务。技术方法包括一种新的数据出处技术,用于为大规模分布式数据处理提供细粒度的可见性,并为迭代开发和调试工作负载提供运行时优化。工具辅助的故障定位服务利用这些底层的起源和优化技术来有效地查明和描述错误的根本原因。大数据分析在21世纪变得越来越重要,日常生活留下了详细的数字记录,从公司到政府机构的各种决策者都希望根据数据采取行动。该研究有助于提高大数据应用程序的生产力和正确性,这对于许多将TB级低价值数据提取为高价值见解的学科至关重要。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(16)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
An Empirical Study of Common Challenges in Developing Deep Learning Applications
- DOI:10.1109/issre.2019.00020
- 发表时间:2019-10
- 期刊:
- 影响因子:0
- 作者:Tianyi Zhang;Cuiyun Gao;Lei Ma;Michael R. Lyu;Miryung Kim
- 通讯作者:Tianyi Zhang;Cuiyun Gao;Lei Ma;Michael R. Lyu;Miryung Kim
Software Engineering for Data Analytics
数据分析软件工程
- DOI:10.1109/ms.2020.2985775
- 发表时间:2020
- 期刊:
- 影响因子:3.3
- 作者:Kim, Miryung
- 通讯作者:Kim, Miryung
Canvas: Isolated and Adaptive Swapping for Multi-Applications on Remote Memory
- DOI:10.48550/arxiv.2203.09615
- 发表时间:2022-03
- 期刊:
- 影响因子:0
- 作者:Chenxi Wang;Yifan Qiao;Haoran Ma;Shiafun Liu;Yiying Zhang;Wenguang Chen;R. Netravali;Miryung Kim;Guoqing Harry Xu
- 通讯作者:Chenxi Wang;Yifan Qiao;Haoran Ma;Shiafun Liu;Yiying Zhang;Wenguang Chen;R. Netravali;Miryung Kim;Guoqing Harry Xu
Sibylvariant Transformations for Robust Text Classification
用于稳健文本分类的 Sibylvariant 变换
- DOI:10.18653/v1/2022.findings-acl.140
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Harel-Canada, Fabrice;Gulzar, Muhammad Ali;Peng, Nanyun;Kim, Miryung
- 通讯作者:Kim, Miryung
Gerenuk: thin computation over big native data using speculative program transformation
- DOI:10.1145/3341301.3359643
- 发表时间:2019-10
- 期刊:
- 影响因子:0
- 作者:Christian Navasca;Cheng Cai;Khanh Nguyen;Brian Demsky;Shan Lu;Miryung Kim;G. Xu
- 通讯作者:Christian Navasca;Cheng Cai;Khanh Nguyen;Brian Demsky;Shan Lu;Miryung Kim;G. Xu
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Miryung Kim其他文献
Chapter 16 Recommending Program Transformations Automating Repetitive Software Changes
第 16 章建议程序转换自动化重复的软件更改
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Miryung Kim;Na Meng - 通讯作者:
Na Meng
Equity and Access in Algorithms, Mechanisms, and Optimization
算法、机制、优化的公平与准入
- DOI:
10.1145/3551624 - 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Miryung Kim;Thomas Zimmermann;R. Deline;Andrew Begel - 通讯作者:
Andrew Begel
NaturalFuzz: Natural Input Generation for Big Data Analytics
NaturalFuzz:大数据分析的自然输入生成
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Ahmad Humayun;Yao Wu;Miryung Kim;Muhammad Ali Gulzar - 通讯作者:
Muhammad Ali Gulzar
C p – C d ≠ ? Eclipse Refactoring APIs P ’ Pure Refactoring Version P ’ ≠
C p – C d ≠ Eclipse 重构 API P ’ 纯重构版本 P ’ ≠ ?
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Everton L. G. Alves;Myoungkyu Song;T. Massoni;Patricia D. L. Machado;Miryung Kim - 通讯作者:
Miryung Kim
SE4ML - Software Engineering for AI-ML-based Systems (Dagstuhl Seminar 20091)
SE4ML - 基于 AI-ML 的系统的软件工程(Dagstuhl 研讨会 20091)
- DOI:
10.4230/dagrep.10.2.76 - 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
K. Kersting;Miryung Kim;Guy Van den Broeck;Thomas Zimmermann - 通讯作者:
Thomas Zimmermann
Miryung Kim的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Miryung Kim', 18)}}的其他基金
Collaborative Research: SHF: Medium: Reinventing Fuzz Testing for Data and Compute Intensive Systems
协作研究:SHF:中:重新发明数据和计算密集型系统的模糊测试
- 批准号:
2106404 - 财政年份:2021
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
CHS: Medium: Collaborative Research: Code demography: Addressing information needs at scale for programming interface users and designers
CHS:媒介:协作研究:代码人口统计:大规模解决编程接口用户和设计者的信息需求
- 批准号:
1956322 - 财政年份:2020
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
I-Corps: Interactive and Automated Debugging for Big Data Analytics
I-Corps:大数据分析的交互式和自动调试
- 批准号:
1842657 - 财政年份:2018
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
SHF: Small: Analytical Support for Investigating Software Modifications in Collaborative Development Environment
SHF:小型:为研究协作开发环境中的软件修改提供分析支持
- 批准号:
1533791 - 财政年份:2014
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
CAREER: Analysis and Automation of Systematic Software Modifications
职业:系统软件修改的分析和自动化
- 批准号:
1460325 - 财政年份:2014
- 资助金额:
$ 90万 - 项目类别:
Continuing Grant
CAREER: Analysis and Automation of Systematic Software Modifications
职业:系统软件修改的分析和自动化
- 批准号:
1149391 - 财政年份:2012
- 资助金额:
$ 90万 - 项目类别:
Continuing Grant
SHF: Small: Analytical Support for Investigating Software Modifications in Collaborative Development Environment
SHF:小型:为研究协作开发环境中的软件修改提供分析支持
- 批准号:
1117902 - 财政年份:2011
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
Information Needs about Software Modification during Collaborative Development Tasks
协同开发任务期间软件修改的信息需求
- 批准号:
1043810 - 财政年份:2010
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
相似海外基金
HCC: Medium: Optimizing Interactive Machine Learning Tools to Support Plant Scientists using Human Centered Design
HCC:中:优化交互式机器学习工具以支持植物科学家使用以人为本的设计
- 批准号:
2312643 - 财政年份:2023
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
III: Medium: CARE: Interactive Systems for Scalable, Causal Data Science
III:媒介:CARE:可扩展因果数据科学的交互式系统
- 批准号:
2312561 - 财政年份:2023
- 资助金额:
$ 90万 - 项目类别:
Continuing Grant
Interactive assistance via eHealth for small and medium-sized enterprises' employer and health care manager teams on tobacco control
通过电子健康为中小企业雇主和医疗保健经理团队提供控烟互动协助
- 批准号:
22H03326 - 财政年份:2022
- 资助金额:
$ 90万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
III: Medium: VOCAL: Video Organization and Interactive Compositional AnaLytics
III:媒介:声乐:视频组织和交互式构图分析
- 批准号:
2211133 - 财政年份:2022
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Medium: Responsive Parallelism for Interactive Applications: Theory and Practice
协作研究:SHF:媒介:交互式应用程序的响应式并行性:理论与实践
- 批准号:
2107280 - 财政年份:2021
- 资助金额:
$ 90万 - 项目类别:
Continuing Grant
Collaborative Research: CNS CORE: Medium: A Unified Prefetch Framework for Approximation Tolerant Interactive Applications
协作研究:CNS CORE:Medium:用于近似容忍交互式应用程序的统一预取框架
- 批准号:
2106197 - 财政年份:2021
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
Collaborative Research: CNS Core: Medium: A Unified Prefetch Framework for Approximation-Tolerant Interactive Applications
合作研究:CNS Core:Medium:用于近似容忍交互式应用程序的统一预取框架
- 批准号:
2140552 - 财政年份:2021
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Medium: Responsive Parallelism for Interactive Applications: Theory and Practice
协作研究:SHF:媒介:交互式应用程序的响应式并行性:理论与实践
- 批准号:
2107289 - 财政年份:2021
- 资助金额:
$ 90万 - 项目类别:
Continuing Grant
Collaborative Research: CNS Core: Medium: A Unified Prefetch Framework for Approximation-Tolerant Interactive Applications
合作研究:CNS Core:Medium:用于近似容忍交互式应用程序的统一预取框架
- 批准号:
2105773 - 财政年份:2021
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Medium: Responsive Parallelism for Interactive Applications: Theory and Practice
协作研究:SHF:媒介:交互式应用程序的响应式并行性:理论与实践
- 批准号:
2107241 - 财政年份:2021
- 资助金额:
$ 90万 - 项目类别:
Continuing Grant