权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Personalized Speech Enhancement: Test-Time Adaptation Using No or Few Private Data

职业：个性化语音增强：不使用或很少使用私人数据的测试时适应

基本信息

批准号：
2046963
负责人：
Minje Kim
金额：
$ 47.8万
依托单位：
Indiana University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-04-01 至 2026-03-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2046963&HistoricalAwards=false
关键词：
CAREER Personalized Speech Enhancement Test

项目摘要

Current general-purpose speech enhancement systems employ large models trained from big datasets of audio signals which are too bulky to run on small personal devices. A personalized model can be a resource-efficient solution because it focuses on a particular user and a specific test environment for which a smaller model architecture can be good enough. However, training a personalized model requires clean voice data from the test-time user in advance, which are not always available because of the user’s privacy concerns or problems with recording. This CAREER project develops machine-learning methods to achieve the personalization goal while requiring no or few data samples from the test-time users. Because the project achieves the personalization goal in a privacy-preserving and resource-efficient way, it is a step towards a more available and affordable use of artificial intelligence for all members of society.The project circumvents the lack of personal data in the context of personalized speech enhancement using no- and few-shot learning frameworks with help from adversarial and self-supervised learning. First, it verifies that a personalized system with reduced computational complexity can still compete with a generic model in speech enhancement performance. To this end, the training algorithm divides the potentially large model into multiple sub-modules, each of which handles a particular sub-problem (e.g., a particular user's utterance). If the sub-problems are defined to be mutually exclusive, the test-time inference can be made efficiently by using only the most suitable sub-module. Since the sub-module selection is done on noisy speech, it achieves personalization with no additional training on the test user's data. Second, the project explores a no-shot learning approach, in which the fundamental challenge lies in optimizing a machine learning model with no available target. To this end, an already-trained general-purpose model is fine-tuned for an unseen test environment using adversarial optimization. The third research topic handles the case when a small amount of user's clean speech is available, which falls in the category of few-shot learning. The project overcomes data shortage via a self-supervised learning method that learns effective features from noisy speech data, which are more available than the clean ones. That way, the model can be prepared for a subsequent fine-tuning step, which can be done with only a few clean user-specific speech utterances.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

目前的通用语音增强系统采用从音频信号大数据集训练的大型模型，这些模型过于庞大，无法在小型个人设备上运行。个性化模型可以是一种资源高效的解决方案，因为它关注于一个特定的用户和一个特定的测试环境，对于这些环境，一个较小的模型体系结构就足够好了。然而，训练一个个性化的模型需要事先从测试时间的用户那里获得干净的语音数据，由于用户的隐私问题或录音问题，这些数据并不总是可用的。这个CAREER项目开发了机器学习方法来实现个性化目标，同时不需要或只需要很少的测试时间用户的数据样本。由于该项目以保护隐私和节约资源的方式实现了个性化目标，因此它是向所有社会成员更容易获得和负担得起的人工智能使用迈出的一步。该项目在对抗性学习和自我监督学习的帮助下，使用无镜头和少镜头学习框架，解决了个性化语音增强背景下缺乏个人数据的问题。首先，它验证了降低计算复杂度的个性化系统仍然可以在语音增强性能上与通用模型竞争。为此，训练算法将潜在的大模型分成多个子模块，每个子模块处理一个特定的子问题（例如，一个特定的用户的话语）。如果将子问题定义为互斥的，则只需使用最合适的子模块即可有效地进行测试时间推断。由于子模块的选择是在噪声语音上完成的，因此无需对测试用户的数据进行额外的训练即可实现个性化。其次，该项目探索了一种无射击学习方法，其中最基本的挑战在于优化没有可用目标的机器学习模型。为此，使用对抗性优化对已经训练好的通用模型进行微调，以适应看不见的测试环境。第三个研究课题处理少量用户干净语音的情况，属于few-shot学习的范畴。该项目通过一种自监督学习方法克服了数据短缺的问题，该方法从有噪声的语音数据中学习有效的特征，这些特征比干净的语音数据更可用。这样，模型就可以为后续的微调步骤做好准备，这可以只用几个干净的用户特定语音来完成。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（9）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Efficient Personalized Speech Enhancement Through Self-Supervised Learning

DOI：
10.1109/jstsp.2022.3181782
发表时间：
2021-04
期刊：
IEEE Journal of Selected Topics in Signal Processing
影响因子：
7.5
作者：
Aswin Sivaraman;Minje Kim
通讯作者：
Aswin Sivaraman;Minje Kim

The Potential of Neural Speech Synthesis-Based Data Augmentation for Personalized Speech Enhancement

DOI：
10.1109/icassp49357.2023.10096601
发表时间：
2022-11
期刊：
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
影响因子：
0
作者：
Anastasia Kuznetsova;Aswin Sivaraman;Minje Kim
通讯作者：
Anastasia Kuznetsova;Aswin Sivaraman;Minje Kim

Bloom-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement

DOI：
10.1109/icassp43922.2022.9746767
发表时间：
2021-11
期刊：
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
影响因子：
0
作者：
Sunwoo Kim;Minje Kim
通讯作者：
Sunwoo Kim;Minje Kim

Personalized Speech Enhancement through Self-Supervised Data Augmentation and Purification

DOI：
10.21437/interspeech.2021-1868
发表时间：
2021-04
期刊：
影响因子：
0
作者：
Aswin Sivaraman;Sunwoo Kim;Minje Kim
通讯作者：
Aswin Sivaraman;Sunwoo Kim;Minje Kim

Test-Time Adaptation Toward Personalized Speech Enhancement: Zero-Shot Learning with Knowledge Distillation

DOI：
10.1109/waspaa52581.2021.9632771
发表时间：
2021-05
期刊：
2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
影响因子：
0
作者：
Sunwoo Kim;Minje Kim
通讯作者：
Sunwoo Kim;Minje Kim

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Minje Kim其他文献

Generative De-Quantization for Neural Speech Codec via Latent Diffusion

通过潜在扩散进行神经语音编解码器的生成去量化

DOI：
10.48550/arxiv.2311.08330
发表时间：
2023
期刊：
ArXiv
影响因子：
0
作者：
Haici Yang;Inseon Jang;Minje Kim
通讯作者：
Minje Kim

Does Restricting the Entry of Formula Businesses Help Mom-and-Pop Stores? The Case of Small American Towns With Unique Community Character

限制配方奶企业进入对夫妻店有帮助吗？

DOI：
发表时间：
2021
期刊：
影响因子：
0
作者：
Minje Kim;Tingyu Zhou
通讯作者：
Tingyu Zhou

Collaborative Deep Learning for speech enhancement: A run-time model selection method using autoencoders

DOI：
10.1109/icassp.2017.7952121
发表时间：
2017-03
期刊：
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
影响因子：
0
作者：
Minje Kim
通讯作者：
Minje Kim