权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Mining, Intelligence and Automation in Tackling Machine-Learning Bugs

挖掘、智能和自动化解决机器学习缺陷

基本信息

批准号：
RGPIN-2021-03236
负责人：
Rahman, MohammadMasudur
金额：
$ 2.11万
依托单位：
Dalhousie University
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=750281
关键词：
Mining Intelligence Automation Tackling Machine

项目摘要

Motivation and Objectives: According to Fortune Business Insights, Machine Learning (ML) software is projected to have a global market share of $117 billion by 2027. It has found applications in major areas including healthcare, transportation, and business analytics. However, the technology is yet to mature and can be unreliable. Bugs in ML software are highly complex, hard to solve due to their data-driven, non-deterministic nature and have potentially deadly consequences (e.g., the fatal crash of Uber's self-driving car). The standard procedure for correcting bugs is labour-intensive and inefficient, which takes up ~50% of a developer's time. The majority of this time is spent finding and understanding the faulty code before making actual code changes. To date, many approaches were designed to find and solve bugs in traditional software, but most are not accurate enough and are inadequate for application to ML software. My research program aims to design intelligent frameworks for understanding, finding, and reproducing bugs in ML software. Research Plan and Methodology: This proposal encompasses three complementary activities: (1) advancing the current understanding of ML bugs, (2) finding bugs in ML software, and (3) reproducing the identified bugs for a reliable diagnosis. First, we will construct a large dataset from ML applications found at GitHub to systematically study the characteristics and central challenges of ML bugs and analyze the effectiveness of traditional debugging solutions to these bugs. Second, we will design an intelligent framework that can (a) detect faulty components using intelligent Information Retrieval methods, (b) detect the faulty code within these components using their static properties and dynamic behaviours, and (c) complement these results with meaningful explanations (e.g., type of bug). Third, we will design an intelligent framework that can (a) help a developer understand how a bug might trigger, and (b) deliver appropriate test cases to reproduce the identified bugs in ML software using reinforcement learning and a technology sandbox. Novelty and Expected Significance: This research program has three novel aspects: (a) intelligent debugging supports for ML software, (b) extension of developer's cognitive abilities with machine intelligence, and (c) enrichment of tools' results with complementary information. It will advance the current state of research for cost-effective debugging and will also benefit parallel practices such as change management. My research will also produce tools that will be adopted by industry, such as through my collaborations with Mozilla Corporation and Canadian software companies. By supporting developers in solving ML bugs efficiently and by providing high-quality training to students in an area of acute need, this program will thus assist in the development of safe, reliable machine-learning software and significantly contribute to the Canadian economy.

动机和目标：根据《财富》商业洞察，到2027年，机器学习（ML）软件的全球市场份额预计将达到1170亿美元。它已经在包括医疗保健、交通和商业分析在内的主要领域得到了应用。然而，这项技术还不成熟，可能不可靠。机器学习软件中的漏洞非常复杂，由于其数据驱动的非确定性性质而难以解决，并且可能导致致命的后果（例如，优步自动驾驶汽车的致命车祸）。纠正错误的标准程序是劳动密集型和低效的，它占用了开发人员约50%的时间。在进行实际的代码更改之前，大部分时间都花在查找和理解错误代码上。迄今为止，许多方法都是为了发现和解决传统软件中的错误而设计的，但大多数方法都不够准确，不足以应用于ML软件。我的研究项目旨在设计智能框架，用于理解、发现和重现机器学习软件中的错误。研究计划和方法：该提案包括三个互补的活动：(1)推进当前对机器学习错误的理解，(2)发现机器学习软件中的错误，(3)重现已识别的错误以进行可靠的诊断。首先，我们将从GitHub上找到的ML应用程序构建一个大型数据集，系统地研究ML错误的特征和主要挑战，并分析传统调试解决方案对这些错误的有效性。其次，我们将设计一个智能框架，它可以(a)使用智能信息检索方法检测有缺陷的组件，(b)使用这些组件的静态属性和动态行为检测这些组件中的有缺陷代码，以及(c)用有意义的解释（例如，bug类型）补充这些结果。第三，我们将设计一个智能框架，可以(a)帮助开发人员了解bug可能如何触发，以及(b)提供适当的测试用例，以使用强化学习和技术沙箱重现ML软件中已识别的bug。新颖性和预期意义：本研究项目有三个新颖性方面：(a)支持机器学习软件的智能调试，(b)用机器智能扩展开发人员的认知能力，(c)用补充信息丰富工具的结果。它将推进成本效益调试的研究现状，也将有利于类似的实践，如变更管理。我的研究还将产生被工业采用的工具，例如通过我与Mozilla公司和加拿大软件公司的合作。通过支持开发人员有效地解决机器学习错误，并在迫切需要的领域为学生提供高质量的培训，该计划将有助于开发安全可靠的机器学习软件，并为加拿大经济做出重大贡献。