权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

階層的Ｅｎｄ－ｔｏ－Ｅｎｄモデルに基づく音声対話における心的状態推定に関する研究

基于分层端到端模型的口语对话心理状态估计研究

基本信息

批准号：
18J22864
负责人：
稲熊寛文
金额：
$ 1.41万
依托单位：
Kyoto University
依托单位国家：
日本
项目类别：
Grant-in-Aid for JSPS Fellows
财政年份：
2018
资助国家：
日本
起止时间：
2018-04-25 至 2021-03-31
项目状态：
已结题

项目摘要

昨年度に引き続き，話者が発話を終了するのを待たずにリアルタイムで動作するオンラインストリーミング音声認識の研究に取り組んだ．Monotonic chunkwise attention (MoChA)というストリーミングEnd-to-end音声認識モデルが推論時に単語を出力するタイミングが実際に対応する音声が発せられたタイミングよりも遅延するという問題に着目した．このレイテンシを削減するため，connectionist temporal classification (CTC)というモデルから得られるアライメント情報を使ってレイテンシを削減する「CTC同期学習」という手法を提案した．その成果はInterspeech2020に採択され，さらにジャーナル論文としてまとめて投稿した．また，End-to-end音声翻訳のモデルの推論速度を高速化するため，非自己回帰型モデルの研究にも取り組んだ．精度は高いが推論速度が遅い自己回帰モデルと精度は低いが推論速度が速い非自己回帰型モデルの欠点を補完するため，後者から高速に得られる出力を前者でリスコアリングする手法を提案し，ICASSP2021に採択された．また2つのテキストベースの機械翻訳モデルを使ってソース言語とターゲット言語の両方から得られる知識を1つのend-to-end音声翻訳モデルに蒸留する手法を提案し，自然言語処理のトップカンファレンスであるNAACL-HLT2021に採択された．

A study on the study of acoustic cognition of the speaker in the past year was conducted.Monotonic chunkwise attention (MoChA) was selected for the study of acoustic cognition of the speaker in the End-to-end period. connectionist temporal classification (CTC) is proposed to reduce CTC temporal classification. The results of Interspeech2020 were collected and published. End-to-end sound inversion and inference speed increase. The accuracy is high, the inference speed is low, the inference speed is high, the inference speed is low, the inference speed is high, the inference speed is low, the inference speed is high, the inference speed, the inference speed is high, the inference speed is high, the inference speed, the inference speed is high, the inference speed Also, the mechanical translation of 2 sets of text-to-speech objects enables the knowledge gained from the square of speech to be evaporated into 1 set of end-to-end sound-to-speech translation objects, which will be used in NAACL-HLT2021 for natural speech processing.

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Improved Mask-CTC for Non-Autoregressive End-to-End ASR

DOI：
10.1109/icassp39728.2021.9414198
发表时间：
2020-10
期刊：
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
影响因子：
0
作者：
Yosuke Higuchi;H. Inaguma;Shinji Watanabe;Tetsuji Ogawa;Tetsunori Kobayashi
通讯作者：
Yosuke Higuchi;H. Inaguma;Shinji Watanabe;Tetsuji Ogawa;Tetsunori Kobayashi

Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation

DOI：
10.18653/v1/2021.naacl-main.150
发表时间：
2021-04
期刊：
影响因子：
0
作者：
H. Inaguma;T. Kawahara;Shinji Watanabe
通讯作者：
H. Inaguma;T. Kawahara;Shinji Watanabe

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR

DOI：
10.1109/icassp40776.2020.9054098
发表时间：
2020-04
期刊：
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
影响因子：
0
作者：
H. Inaguma;Yashesh Gaur;Liang Lu;Jinyu Li;Y. Gong
通讯作者：
H. Inaguma;Yashesh Gaur;Liang Lu;Jinyu Li;Y. Gong

Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition

DOI：
10.1109/icassp.2019.8683380
发表时间：
2018-11
期刊：
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
影响因子：
0
作者：
Jaejin Cho;Shinji Watanabe;Takaaki Hori;M. Baskar;H. Inaguma;J. Villalba;N. Dehak
通讯作者：
Jaejin Cho;Shinji Watanabe;Takaaki Hori;M. Baskar;H. Inaguma;J. Villalba;N. Dehak

Johns Hopkins University(米国)

约翰·霍普金斯大学（美国）