权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Administrative Supplement: Using machine learning to predict odor characteristics from molecular structure

行政补充：利用机器学习从分子结构预测气味特征

基本信息

批准号：
10405294
负责人：
Emily Jo Mayhew
金额：
$ 0.25万
依托单位：
MONELL CHEMICAL SENSES CENTER
依托单位国家：
美国
项目类别：
财政年份：
2020
资助国家：
美国
起止时间：
2020-09-04 至 2022-09-03
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/10405294
关键词：
Address Administrative Supplement Analytical Chemistry Characteristics Chemical Structure Chemicals Chemistry Classification Collection Complex Consumption Data Data Set Descriptor Development Evaluation Food Fruit Gas Chromatography Goals Gold Health Food Hour Human Human Resources Knowledge Learning Link Machine Learning Mass Fragmentography Measurement Measures Methods Modeling Molecular Structure National Institute on Deafness and Other Communication Disorders Odors Olfactory Pathways Palate Perception Positioning Attribute Procedures Programmed Learning Property Protocols documentation Psychophysics Quality Control Recipe Research Research Technics Resolution Resources Sampling Science Scientist Sensory Smell Perception Speed Stimulus Structure Testing Time Training Work base data quality design experience food science human subject improved machine learning algorithm model building predictive modeling prevent rapid technique skills sound

项目摘要

PROJECT SUMMARY/ABSTRACT We cannot yet look at a chemical structure and predict if the molecule will have an odor, much less what character it will have. The goal of the proposed research is to apply machine learning to predict perceptual characteristics from chemical features of molecules. The specific aims of the proposal will determine (1) which molecules are odorous , and (2) what data are needed to model odor character. Building a highly predictive model requires two key ingredients: high-quality data and a sound modeling approach. High-quality data must be accurate (ratings are consistent and describe true odor properties) and detailed (ratings describe even small differences in odor properties). We have collected human psychophysical data on a diverse set of molecules and have trained a model to predict if a molecule has an odor, but pilot data identified odorous contaminants that limit model training and measurement of model accuracy. In Aim 1, I will apply my background in analytical chemistry to evaluate the accuracy of the data, using gas chromatography to identify and correct errors caused by chemical contaminants. In Aim 2, I will apply my experience in human sensory evaluation to measure and compare the consistency and the degree of detail in ratings that can be achieved with different sensory methods and subject training procedures. By executing my training plan, I will develop the skills in statistical programming and machine learning needed to employ a sound modeling approach to these problems. The model constructed in Aim 1 will enable prediction of odor classification (odor/odorless) for any molecule and thus define which molecules are perceptually relevant. Predicting odor character is a far more complex challenge – while a molecule can have only one of two odor classifications (odor or odorless) it may elicit any number of diverse odor character attributes (fruity, floral, musky, sweet, etc.). Descriptive Analysis (DA) is the gold standard method for generating accurate and detailed sensory profiles, but this method is time-consuming. We estimate that an odor character dataset will be large enough (“model-ready”) to predict odor character with approximately 10,000 molecules and that it would require more than 30,000 hours of human subject evaluation, or approximately 6 years for the typical trained panel, to produce this dataset using DA. Before we invest the time and resources, it is responsible to evaluate the relative data quality of more rapid sensory methods. The results of Aim 2 are expected to determine the best approach for generating a model-ready dataset by quantifying trade-offs in degree of detail (data resolution), rating consistency, and method speed of five candidate sensory methods. Together, these aims represent a significant step forward in linking chemical recipe to human odor perception, an advancement that supports the NIDCD goal of understanding normal olfactory function (how stimulus relates to percept) and has many potential applications in foods (what composition of molecules should be present to produce a target aroma percept).

项目摘要/摘要我们还不能通过观察化学结构来预测分子是否有气味，更不用说什么了。它将拥有的性格。提出的研究目标是应用机器学习来预测知觉从分子的化学特征中得出的特征。提案的具体目标将决定(1) 分子是有气味的，以及(2)需要什么数据来模拟气味特征。构建高度预测性的模型需要两个关键要素：高质量的数据和合理的建模方法。高质量的数据必须准确(评级是一致的，描述真实的气味特性)和详细(评级描述甚至气味特性的微小差异)。我们已经收集了一组不同的人类心理物理数据并训练了一个模型来预测分子是否有气味，但试点数据确定了气味限制模型训练和模型精度测量的污染物。在目标1中，我将应用我的在分析化学的背景下评估数据的准确性，使用气相色谱来识别并纠正由化学污染物引起的错误。在《目标2》中，我将运用我在人类感官方面的经验评估：衡量和比较可实现的评级的一致性和详细程度有不同的感官方法和学科训练程序。通过执行我的培训计划，我将发展使用合理的建模方法所需的统计编程和机器学习技能这些问题。目标1中构建的模型将能够预测以下物质的气味分类(有气味/无气味) 任何分子，从而定义哪些分子在感知上是相关的。预测气味特征是一件很遥远的事情更复杂的挑战-虽然一个分子只能有两种气味类别(有气味或无气味)中的一种，但它可以引出任何数量的不同气味特征属性(水果味、花香、麝香、甜味等)。描述性分析(DA)是生成准确和详细的感觉轮廓的黄金标准方法，但这方法非常耗时。我们估计气味特征数据集将足够大(“模型就绪”) 用大约10,000个分子预测气味特征，需要30,000多个小时人类受试者评估的时间，或者对于典型的训练有素的小组来说，大约6年才能产生这个数据集使用检察官。在我们投入时间和资源之前，有责任对数据的相对质量进行评估更快速的感官方法。目标2的结果有望确定生成的最佳方法通过量化细节程度(数据分辨率)、评级一致性和方法5种候选感官方法的速度。这些目标加在一起是向前迈出的重要一步将化学配方与人类气味感知联系起来，这一进步支持了NIDCD的目标了解正常的嗅觉功能(刺激如何与知觉相关)并有许多潜在的应用在食品中(产生目标香气的分子组成应该是什么)。