权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Statistical methods and algorithms for the analysis of combinatorial mass spectrometry data

职业：组合质谱数据分析的统计方法和算法

基本信息

批准号：
1845465
负责人：
Oliver Serang
金额：
$ 105.25万
依托单位：
University of Montana
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-06-01 至 2022-02-28
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1845465&HistoricalAwards=false
关键词：
CAREER Statistical methods algorithms analysis

项目摘要

Mass spectrometry is a crucial modern research tool that allows analysis of the components of samples at several scales: nuclear, small chemicals and biological molecules. In biological research, mass spectrometry is used in the analysis of protein ("proteomics") and metabolic ("metabolomics") data, while in non-research areas it is deployed, for example, to detect bomb-associated chemicals in routine airport security screenings. This research addresses three unmet needs in the processing of the data from mass spectrometry machines: The first is statistical identification of proteins in a biological sample; this is important for understanding what makes cells different, e.g., what makes a skin cell different from a blood cell. The second is identification of which biological species are in a sample; this is crucial in applications such as, for example, enabling accurate and automated disease diagnostics. The third is finding the "alphabet" of basic molecular ingredients in a sample. This research addresses these aims by developing new algorithmic and statistical methods that can correctly separate the basic elements of a complex mixture. The researchers working on this project create mathematical tools that are implemented as researcher-friendly software tools for solving the listed problems. To help make the ideas more accessible to both scientific and non-scientific audiences, the researchers will create teaching modules and podcast episodes to explain how the algorithms work, and what math tricks were developed to break down the complexity of the problem so it is amenable to a useful solution.Problems with combinatorial dependencies are ubiquitous in mass spectrometry. Symmetries in combinatorial dependencies can be exploited to construct special dynamic programming algorithms: convolution trees, fast numeric max-convolution, and other approaches, all of which were invented and developed by the researchers. The researchers will use and improve these symmetry-exploiting algorithms to implement superior mass spectrometry-based protein identification, species classification, and small molecule analysis. Convolution trees can be used to solve these problems in quasilinear time, and so they can be applied to a very large number of proteins, species, or small molecules (or to a large number of spectra from any of these problems). The researchers will construct a library of software implementations of these algorithms with permissive open source licensing for unrestricted academic and industrial use. As they further develop these combinatorial methods, the researchers will create a combinatorics curriculum for intuitively teaching these concepts to K-12 students and create podcast episodes explaining these ideas in an accessible manner.The fruits of this research will be freely available at https://alg.cs.umt.edu/nsf-career.html .This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

质谱分析是一种重要的现代研究工具，它可以分析几个尺度上的样品成分：核、小化学物质和生物分子。在生物学研究中，质谱法用于分析蛋白质（“蛋白质组学”）和代谢（“代谢组学”）数据，而在非研究领域，它被用于在机场例行安全检查中检测与炸弹有关的化学物质。本研究解决了质谱机数据处理中三个未满足的需求：首先是生物样品中蛋白质的统计鉴定；这对于理解是什么使细胞不同是很重要的，例如，是什么使皮肤细胞不同于血细胞。第二是鉴定样本中有哪些生物物种；这在诸如实现准确和自动化疾病诊断等应用中至关重要。第三是找出样本中基本分子成分的“字母表”。这项研究通过开发新的算法和统计方法来解决这些目标，这些方法可以正确地分离复杂混合物的基本元素。从事这个项目的研究人员创建了数学工具，这些工具作为研究人员友好的软件工具来实现，用于解决所列出的问题。为了让科学和非科学观众更容易理解这些想法，研究人员将创建教学模块和播客剧集，解释算法是如何工作的，以及开发了哪些数学技巧来分解问题的复杂性，从而使其易于得到有用的解决方案。组合依赖的问题在质谱分析中是普遍存在的。组合依赖关系中的对称性可以用来构造特殊的动态规划算法：卷积树、快速数值最大卷积和其他方法，这些方法都是由研究人员发明和发展的。研究人员将使用并改进这些对称性利用算法，以实现基于质谱的优越蛋白质鉴定、物种分类和小分子分析。卷积树可以用于在拟线性时间内解决这些问题，因此它们可以应用于大量的蛋白质，物种或小分子（或来自任何这些问题的大量光谱）。研究人员将构建这些算法的软件实现库，并使用开放源代码许可，以供不受限制的学术和工业使用。随着他们进一步发展这些组合方法，研究人员将创建一个组合学课程，直观地向K-12学生教授这些概念，并创建播客集，以一种易于理解的方式解释这些想法。这项研究的成果将在https://alg.cs.umt.edu/nsf-career.html上免费提供。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。