Mining Software Test Histories to Identify Flaky Tests and Model Commit Failure Risk
挖掘软件测试历史来识别不稳定的测试和模型提交失败风险
基本信息
- 批准号:RGPIN-2020-06807
- 负责人:
- 金额:$ 2.11万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2021
- 资助国家:加拿大
- 起止时间:2021-01-01 至 2022-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
My research begins by listening to the daily struggles of software developers and ultimately provides theoretically based general solutions to common software testing problems. Objective 1, Flaky tests: Flaky software tests fail without identifying a fault and waste developer time and reduce confidence in test outcomes. Flaky tests affect all projects and are particularly damaging on large projects or in complex test environments. The conventional practice is to re-run flaky tests or to quarantine them and require developers to fix them. However, I have observed that flaky tests have varying degrees of flakiness. For example, a test that has a flaky failure 1 in 100 runs is more reliable than a test that has 10 flaky failures 100 runs. By quantifying the historical FlakeRate, e.g. 10/100, we can use statistical tests to determine when a test is failing at its normal FlakeRate, 10%, or when it FlakeRate has changed, indicating instability and a potential product fault. My objective is to quantify the degree of test flakiness from prior runs and statistically identify changes in the FlakeRate as indications that a test has become unstable. Tool support will allow developers to rank tests by FlakeRate instability. This should reduce investigation effort because flaky failures that do not have a statistically significant corresponding change in FlakeRate are stable and less likely to need investigation. Objective 2, Commit failure risk: A common development practice is to test each commit individually to isolate immediately the commit that caused a test failure. Since most tests do not fail and most commits pass all tests, we propose to batch commits based on how likely a commit is to contain a failing test. A batch that contains 10 commits, will require 1 run of the test suite saving 9 executions over the test-all-passing scenario. However, on failure, a bisection process will require additional runs. We will adapt commit risk models that identify bug-introducing changes to the context of selecting low risk commits to be included in large batches and high risk commits that should be tested individually. The adaption and development of these statistical approaches is complex and evaluating them in production CI environments requires systematic rigor. Objective 3, Generalize and Tools: In the long-term, I will increase the use of software analytics by software testers and developers by simulating and demonstrating improved efficiency in a wide range of software environments. In my experience developers have a single focus of reducing faults and releasing software on time. This focus has limited the use of mining and analytics in practice. By focusing on developer problems, I have been able to and will continue to provide general solutions to time consuming, unpleasant, and expensive development problems.
我的研究从倾听软件开发人员的日常挣扎开始,最终为常见的软件测试问题提供基于理论的通用解决方案。目标1,不稳定的测试:不稳定的软件测试在没有确定错误的情况下失败,浪费了开发人员的时间,降低了对测试结果的信心。不稳定的测试影响所有项目,对大型项目或复杂的测试环境尤其有害。传统的做法是重新运行零散的测试,或者隔离它们,并要求开发人员修复它们。然而,我观察到片状测试有不同程度的片状。例如,100次运行中有1次零星故障的测试比100次运行中有10次零星故障的测试更可靠。通过量化历史的flakrate(例如10/100),我们可以使用统计测试来确定何时测试在正常的flakrate(10%)下失败,或者何时flakrate发生变化(表明不稳定和潜在的产品故障)。我的目标是量化之前运行的测试碎片的程度,并统计识别FlakeRate的变化,作为测试变得不稳定的指示。工具支持将允许开发人员根据FlakeRate不稳定性对测试进行排名。这将减少调查工作,因为在FlakeRate中没有统计上显著的相应变化的片状故障是稳定的,不太可能需要调查。目标2,提交失败风险:一个常见的开发实践是单独测试每个提交,以立即隔离导致测试失败的提交。由于大多数测试不会失败,并且大多数提交都通过了所有测试,因此我们建议根据提交包含失败测试的可能性对提交进行批处理。一个包含10个提交的批处理将需要运行一次测试套件,在测试全部通过的场景中节省9次执行。然而,如果出现故障,分割过程将需要额外的运行。我们将调整提交风险模型,以识别引入bug的更改,以选择将低风险提交包含在大批量中,以及应该单独测试的高风险提交。这些统计方法的适应和开发是复杂的,在生产CI环境中对它们进行评估需要系统的严格性。目标3,概括和工具:从长远来看,我将通过在广泛的软件环境中模拟和演示改进的效率来增加软件测试人员和开发人员对软件分析的使用。根据我的经验,开发人员只关注减少错误和按时发布软件。这种关注限制了挖掘和分析在实践中的使用。通过关注开发人员的问题,我已经能够并将继续为耗时、令人不快和昂贵的开发问题提供通用的解决方案。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Rigby, Peter其他文献
Rigby, Peter的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Rigby, Peter', 18)}}的其他基金
Mining Software Test Histories to Identify Flaky Tests and Model Commit Failure Risk
挖掘软件测试历史来识别不稳定的测试和模型提交失败风险
- 批准号:
RGPIN-2020-06807 - 财政年份:2022
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Mining Software Test Histories to Identify Flaky Tests and Model Commit Failure Risk
挖掘软件测试历史来识别不稳定的测试和模型提交失败风险
- 批准号:
RGPIN-2020-06807 - 财政年份:2020
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Contemporary Software Peer Review: Modern practices, fault prediction, and extraction of design decisions
当代软件同行评审:现代实践、故障预测和设计决策提取
- 批准号:
435674-2013 - 财政年份:2019
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Contemporary Software Peer Review: Modern practices, fault prediction, and extraction of design decisions
当代软件同行评审:现代实践、故障预测和设计决策提取
- 批准号:
435674-2013 - 财政年份:2018
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Test Effectiveness, Localization, Prioritization, and Risk in Ericssons's Complex Test Environment
爱立信复杂测试环境中的测试有效性、本地化、优先级和风险
- 批准号:
502012-2016 - 财政年份:2018
- 资助金额:
$ 2.11万 - 项目类别:
Collaborative Research and Development Grants
Contemporary Software Peer Review: Modern practices, fault prediction, and extraction of design decisions
当代软件同行评审:现代实践、故障预测和设计决策提取
- 批准号:
435674-2013 - 财政年份:2017
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Test Effectiveness, Localization, Prioritization, and Risk in Ericssons's Complex Test Environment
爱立信复杂测试环境中的测试有效性、本地化、优先级和风险
- 批准号:
502012-2016 - 财政年份:2017
- 资助金额:
$ 2.11万 - 项目类别:
Collaborative Research and Development Grants
Test Effectiveness, Localization, Prioritization, and Risk in Ericssons's Complex Test Environment
爱立信复杂测试环境中的测试有效性、本地化、优先级和风险
- 批准号:
502012-2016 - 财政年份:2016
- 资助金额:
$ 2.11万 - 项目类别:
Collaborative Research and Development Grants
Test Prioritization and Localization at Ericsson
爱立信的测试优先级和本地化
- 批准号:
485041-2015 - 财政年份:2015
- 资助金额:
$ 2.11万 - 项目类别:
Engage Grants Program
The Impact of Disruptive Events on Software Systems
破坏性事件对软件系统的影响
- 批准号:
445741-2012 - 财政年份:2014
- 资助金额:
$ 2.11万 - 项目类别:
Department of National Defence / NSERC Research Partnership
相似海外基金
Mining Software Test Histories to Identify Flaky Tests and Model Commit Failure Risk
挖掘软件测试历史来识别不稳定的测试和模型提交失败风险
- 批准号:
RGPIN-2020-06807 - 财政年份:2022
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Automatically Analysing and Improving Software Test Suite Health
自动分析和改善软件测试套件的健康状况
- 批准号:
2784413 - 财政年份:2022
- 资助金额:
$ 2.11万 - 项目类别:
Studentship
Development of an online platform with new software incorporated that enables small businesses in B2B to test marketing ideas via peer feedback, providing objective experienced opinions, reducing marketing budget wastage
开发一个包含新软件的在线平台,使 B2B 中的小型企业能够通过同行反馈测试营销理念,提供客观的经验意见,减少营销预算浪费
- 批准号:
10013825 - 财政年份:2021
- 资助金额:
$ 2.11万 - 项目类别:
Responsive Strategy and Planning
CISE Core: CCF: SHF: Small: Future-Proof Test Corpus Synthesis for Evolving Software
CISE 核心:CCF:SHF:小型:面向发展软件的面向未来的测试语料库合成
- 批准号:
2120955 - 财政年份:2021
- 资助金额:
$ 2.11万 - 项目类别:
Standard Grant
pleioR: A powerful and fast test and software for the study of pleiotropy in systems involving many traits with biobank-sized data
pleioR:一个强大而快速的测试和软件,用于研究涉及生物库大小数据的许多性状的系统中的多效性
- 批准号:
10187158 - 财政年份:2021
- 资助金额:
$ 2.11万 - 项目类别:
pleioR: A powerful and fast test and software for the study of pleiotropy in systems involving many traits with biobank-sized data
pleioR:一个强大而快速的测试和软件,用于研究涉及生物库大小数据的许多性状的系统中的多效性
- 批准号:
10424541 - 财政年份:2021
- 资助金额:
$ 2.11万 - 项目类别:
Mining Software Test Histories to Identify Flaky Tests and Model Commit Failure Risk
挖掘软件测试历史来识别不稳定的测试和模型提交失败风险
- 批准号:
RGPIN-2020-06807 - 财政年份:2020
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
microCT test system with nanofocus X-ray source including software package for 3D-reconstruction
具有纳米焦点 X 射线源的 microCT 测试系统,包括用于 3D 重建的软件包
- 批准号:
451111116 - 财政年份:2020
- 资助金额:
$ 2.11万 - 项目类别:
Major Research Instrumentation
Develop test scripts to verify various web-based components of the TUNet Control Center software.
开发测试脚本来验证 TUNet 控制中心软件的各种基于 Web 的组件。
- 批准号:
537514-2018 - 财政年份:2019
- 资助金额:
$ 2.11万 - 项目类别:
Experience Awards (previously Industrial Undergraduate Student Research Awards)
C/C++ Test Software Developer
C/C 测试软件开发商
- 批准号:
537114-2018 - 财政年份:2019
- 资助金额:
$ 2.11万 - 项目类别:
Experience Awards (previously Industrial Undergraduate Student Research Awards)