权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

EAGER: III: Learning with less data: Capitalizing on formal pedagogies and human performance to incorporate domain knowledge into deep learning models

EAGER：III：用更少的数据学习：利用正规教学法和人类表现将领域知识纳入深度学习模型

基本信息

批准号：
2228910
负责人：
Johanna Devaney
金额：
$ 20万
依托单位：
CUNY Brooklyn College
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-09-01 至 2024-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2228910&HistoricalAwards=false
关键词：
EAGER III Learning less data

项目摘要

Humans are able to learn with greater efficiency than machine learning models, in large part because they learn not just from exposure, but also from domain knowledge, which includes codified knowledge and guided practice. This project will develop new approaches for integrating domain knowledge into deep learning models. It will create models that can be trained with less data as well as mitigate data biases (e.g., data collection that is skewed towards inducing a particular pattern that is not necessarily reflective of the range of ways humans perform a given task). This research will be explored within the musical domain, as it has rich pedagogical and performance traditions for skill generation that can be leveraged in model development. In addition, working with music is an excellent testbed for developing models that can be applied to other domains. For example, there are direct parallels between music and language in terms of pedagogy and practice. Broadly, the models developed in this project will have utility for scientists interested in modeling domains that are data-poor, but expertise-rich as well as counteracting known biases in training datasets. This work also has the potential to foster the participation of a wider range of scholars in computer science research, as expressions of their domain expertise would be more relevant to model development.This project will demonstrate the value of incorporating domain knowledge into structured prediction for temporal deep learning models in complex domains using distillations of established pedagogies and expressions of skilled practice. Its goal is to help machines learn more efficiently by mimicking the ways in which humans learn, as well as to develop models with increased accuracy and interpretability. A central hypothesis underlying this project is that the types of pedagogies that are useful for efficiently teaching humans are also useful for teaching machines. The project examines the research hypothesis through the task of reducing complex musical signals, i.e., digital representations of musical sound, into their essential structural components. Musical signals are particularly challenging to perform this type of reduction on because they are complex temporal signals with a metrical structure. Thus, they are a useful testbed for developing machine learning models for broader applications, most directly in natural language processing but also in other domains with complex temporal signals such as earth science and economics. The task of reducing musical signals will be addressed through three main sub-tasks. The first is model development, which will involve systematic experimentation while integrating domain knowledge as constraints in adversarial networks. The second is domain knowledge encoding, which will establish best practices for encoding pedagogical expertise and performance practice into a machine-readable format. And the third is an exploration of how this music-specific work can be applied to natural language understanding specifically and ultimately formulated as a generalized framework for integrating domain knowledge into deep learning models.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人类能够以比机器学习模型更高的效率学习，这在很大程度上是因为他们不仅从接触中学习，而且还从领域知识中学习，包括编码的知识和指导性实践。该项目将开发将领域知识集成到深度学习模型中的新方法。它将创建可以用更少数据训练的模型，并减轻数据偏差（例如，倾向于诱导特定模式的数据收集，该特定模式不一定反映人类执行给定任务的方式的范围）。这项研究将在音乐领域进行探索，因为它具有丰富的教学和表演传统，可以在模型开发中利用。此外，使用音乐是开发可应用于其他领域的模型的绝佳测试平台。例如，音乐和语言在教学和实践方面有直接的相似之处。从广义上讲，该项目中开发的模型将对那些对数据贫乏但专业知识丰富的领域建模感兴趣的科学家有用，并可以抵消训练数据集中的已知偏见。这项工作也有可能促进更广泛的学者参与计算机科学研究，因为他们的领域专业知识的表达将与模型开发更相关。该项目将展示将领域知识融入复杂领域的时间深度学习模型的结构化预测中的价值，使用已建立的体系结构和熟练实践的表达。它的目标是通过模仿人类学习的方式来帮助机器更有效地学习，并开发具有更高准确性和可解释性的模型。这个项目的一个核心假设是，对有效地教授人类有用的学习方法类型也对教授机器有用。该项目通过减少复杂音乐信号的任务来检验研究假设，即，音乐声音的数字化表现形式，融入到它们的基本结构成分中。音乐信号是特别具有挑战性的执行这种类型的减少，因为它们是复杂的时间信号与韵律结构。因此，它们是开发机器学习模型的一个有用的测试平台，用于更广泛的应用，最直接的应用是自然语言处理，但也适用于其他具有复杂时间信号的领域，如地球科学和经济学。减少音乐信号的任务将通过三个主要的子任务来解决。首先是模型开发，这将涉及系统的实验，同时将领域知识作为对抗网络中的约束。第二个是领域知识编码，这将建立最佳做法，将教学专业知识和绩效实践编码成机器可读的格式。第三个奖项是探索如何将这一特定于音乐的工作具体应用于自然语言理解，并最终制定为将领域知识整合到深度学习模型中的通用框架。该奖项反映了NSF的法定使命，并通过使用基金会的智力价值和更广泛的影响力审查标准进行评估，被认为值得支持。