Self-supervised graph-based representation for language and speaker detection
用于语言和说话人检测的自监督基于图的表示
基本信息
- 批准号:21K17776
- 负责人:
- 金额:$ 2.91万
- 依托单位:
- 依托单位国家:日本
- 项目类别:Grant-in-Aid for Early-Career Scientists
- 财政年份:2021
- 资助国家:日本
- 起止时间:2021-04-01 至 2024-03-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
I focused on investigating how to better represent speech signals for both language recognition and speech recognition tasks. In detail, the following work was done to progress this project:1. Improving the representation of speech signal for language identification (LID): We propose a novel transducer-based language embedding approach for LID tasks by integrating an RNN transducer model into a language embedding framework. Benefiting from the advantages of the RNN transducer's linguistic representation capability, the proposed method can exploit both phonetically-aware acoustic features and explicit linguistic features for LID tasks. The research paper was accepted by Interspeech 2022. Additionally, we further investigated these techniques on the NICT LID system, which also demonstrated robustness on cross-channel data.2. Another work focuses on improving RNN-T for Mandarin ASR. I propose to use a novel pronunciation-aware unique character encoding for building end-to-end RNN-T-based Mandarin ASR systems. The proposed encoding is a combination of pronunciation-based syllable and character index (CI). By introducing the CI, the RNN-T model can overcome the homophone problem while utilizing the pronunciation information for extracting modeling units. With the proposed encoding, the model outputs can be converted into the final recognition result through a one-to-one mapping. This paper was accepted by IEEE SLT 2022.
我专注于研究如何更好地表示语言识别和语音识别任务的语音信号。具体而言,为推进本项目的实施,主要做了以下工作:1.改进用于语言识别(LID)的语音信号的表示:我们通过将RNN转换器模型集成到语言嵌入框架中,提出了一种用于LID任务的基于转换器的语言嵌入方法。受益于RNN换能器的语言表示能力的优点,所提出的方法可以利用语音感知的声学特征和显式语言特征用于LID任务。该研究论文被Interspeech 2022接受。此外,我们还在NICT LID系统上进一步研究了这些技术,该系统也证明了跨通道数据的鲁棒性。2.另一项工作重点是改进RNN-T用于普通话ASR。我建议使用一种新的发音感知的独特字符编码来构建端到端的基于RNN-T的普通话ASR系统。所提出的编码是基于发音的音节和字符索引(CI)的组合。通过引入CI,RNN-T模型可以克服同音异义词问题,同时利用发音信息提取建模单元。利用所提出的编码,模型输出可以通过一对一的映射转换为最终的识别结果。该论文被IEEE EQUIP 2022接受。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Transducer-based language embedding for spoken language identification
- DOI:10.48550/arxiv.2204.03888
- 发表时间:2022-04
- 期刊:
- 影响因子:0
- 作者:Peng Shen;Xugang Lu;H. Kawai
- 通讯作者:Peng Shen;Xugang Lu;H. Kawai
Partial Coupling of Optimal Transport for Spoken Language Identification
- DOI:10.48550/arxiv.2203.17036
- 发表时间:2022-03
- 期刊:
- 影响因子:0
- 作者:Xugang Lu;Peng Shen;Yu Tsao;H. Kawai
- 通讯作者:Xugang Lu;Peng Shen;Yu Tsao;H. Kawai
Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification
用于说话人验证的联合贝叶斯模型结构的连体神经网络
- DOI:
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Lu Xugang;Shen Peng;Tsao Yu;Kawai Hisashi
- 通讯作者:Kawai Hisashi
Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification
- DOI:10.1109/taslp.2021.3129360
- 发表时间:2021-01
- 期刊:
- 影响因子:0
- 作者:Xugang Lu;Peng Shen;Yu-Yu Tsao-Yu;H. Kawai
- 通讯作者:Xugang Lu;Peng Shen;Yu-Yu Tsao-Yu;H. Kawai
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
沈 鵬其他文献
沈 鵬的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('沈 鵬', 18)}}的其他基金
An investigation of generative acoustic latent representations for meeting speech recognition and summarization
用于满足语音识别和摘要的生成声学潜在表示的研究
- 批准号:
24K15004 - 财政年份:2024
- 资助金额:
$ 2.91万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
相似海外基金
An investigation of generative acoustic latent representations for meeting speech recognition and summarization
用于满足语音识别和摘要的生成声学潜在表示的研究
- 批准号:
24K15004 - 财政年份:2024
- 资助金额:
$ 2.91万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Disrupter or enabler? Assessing the impact of using automatic speech recognition technology in interpreter-mediated legal proceedings
颠覆者还是推动者?
- 批准号:
2889440 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
Studentship
Analysis of speech recognition as a tool in medical English education
语音识别作为医学英语教育工具的分析
- 批准号:
23K00767 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Automatic Speech Recognition (ASR) engine to improve autistic children speech
自动语音识别(ASR)引擎可改善自闭症儿童的言语能力
- 批准号:
10056712 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
Grant for R&D
Industrial research into the reduction of biases in foundational Automatic Speech Recognition models.
减少基础自动语音识别模型中偏差的工业研究。
- 批准号:
10068091 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
Collaborative R&D
M3OLR: Towards Effective Multilingual, Multimodal and Multitask Oriental Low-resourced Language Speech Recognition
M3OLR:迈向有效的多语言、多模态和多任务东方稀缺语言语音识别
- 批准号:
23K11227 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Establishment of intraoperative education model using speech recognition and language information processing technology
利用语音识别和语言信息处理技术建立术中教育模型
- 批准号:
23K16281 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
SaTC: CORE: Small: Robust Speaker and Speech Recognition Under AI-Driven Physical and Digital Attacks
SaTC:核心:小型:人工智能驱动的物理和数字攻击下的鲁棒扬声器和语音识别
- 批准号:
2310207 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
Continuing Grant
A State-of-the-Art Automatic Speech Recognition and Conversational Platform to Enable Socially Assistive Robots for Persons with Alzheimer's Disease and Related Dementias
最先进的自动语音识别和对话平台,为阿尔茨海默病和相关痴呆症患者提供社交辅助机器人
- 批准号:
10699887 - 财政年份:2023
- 资助金额:
$ 2.91万 - 项目类别:
CRCNS US-Spain Research Proposal: Collaborative Research: Tracking and modeling the neurobiology of multilingual speech recognition
CRCNS 美国-西班牙研究提案:合作研究:跟踪和建模多语言语音识别的神经生物学
- 批准号:
2207770 - 财政年份:2022
- 资助金额:
$ 2.91万 - 项目类别:
Continuing Grant