权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

実用性の高いEnd-to-End音声認識に向けた研究

高实用性端到端语音识别研究

基本信息

批准号：
22KJ2898
负责人：
樋口陽祐
金额：
$ 1.41万
依托单位：
Waseda University
依托单位国家：
日本
项目类别：
Grant-in-Aid for JSPS Fellows
财政年份：
2023
资助国家：
日本
起止时间：
2023-03-08 至 2024-03-31
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-22KJ2898/
关键词：
音声認識

项目摘要

音声を使ったインターフェースの実用性を向上させるために、高速で高精度な音声認識技術の開発を進めている。これまでの研究では、マスク言語モデルに基づいた非自己回帰型のEnd-to-End音声認識モデルを構築し、従来モデルと遜色ない認識精度を実現しつつ、推論速度を大幅に高速化できることを示してきた。本年度は、提案モデルに大規模汎用言語モデルを取り入れることで、認識精度をさらに向上できることを明らかにした。また、提案モデルのストリーミング音声認識における有効性を確認した。音声認識において正確な文を生成するには、単語間の依存関係を捉えることが重要となるが、これを音声情報のみから抽出するのは容易でない。これに対し、大規模言語モデルであるBERTから得られる汎用的な言語知識を、音声処理の過程に組み込むことで、出力の文脈情報を効果的に捉えられる音声認識手法を考案した。様々な言語や発話スタイル、学習データ量を用いた音声認識実験において提案手法を評価した結果、従来モデルよりもも高い認識精度が得られることを確認した。また、これまでに開発した推論アルゴリズムと組み合わせることで、認識速度を大幅に高速化できることも明らかとなった。当該成果は、自然言語処理の主要会議であるEmpirical Methods inNatural Language Processing (EMNLP 2022)のFindings、および音声処理の主要会議であるIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)などに採択された。

The development of sound recognition technology with high speed and high precision This research shows that the speed of inference is greatly increased due to the fact that the speech recognition is based on the End-to-End sound recognition of the non-self-return type. This year's proposal is for large-scale universal speech acquisition, recognition accuracy, and transparency. To confirm the validity of the proposal Sound recognition is the key to correct text generation and easy to extract sound information. For example, large-scale speech recognition techniques are used in speech knowledge and sound processing. The results of the evaluation of the proposed method of speech recognition, the accuracy of speech recognition, and the accuracy of speech recognition are confirmed. The speed of cognition has been greatly accelerated. This work was presented at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023).

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

End-to-End音声認識のための粒度の異なるサブワード単位に基づく階層的な条件づけ

基于不同粒度子字单元的分层调节端到端语音识别

DOI：
发表时间：
2021
期刊：
影响因子：
0
作者：
Hiroki Kato;Itsuki Musha;Masaaki Komatsuda;Kei Muto;Junichiro Yamaguchi;下田千華;下田千華;Yifan Peng;Brian Yan;Yosuke Higuchi;Yosuke Higuchi;Yosuke Higuchi;Yosuke Higuchi;Keqi Deng;Masao Someki;Yosuke Higuchi;趙懐博;Keqi Deng;樋口陽祐;樋口陽祐;Yosuke Higuchi;Hirofumi Inaguma;Pengcheng Guo;Shinji Watanabe;Yosuke Higuchi;Huaibo Zhao;Yosuke Higuchi;チョウカイハク;樋口陽祐
通讯作者：
樋口陽祐

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

DOI：
10.1109/asru51503.2021.9688157
发表时间：
2021-10
期刊：
2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
影响因子：
0
作者：
Yosuke Higuchi;Nanxin Chen;Yuya Fujita;H. Inaguma;Tatsuya Komatsu;Jaesong Lee;Jumon Nozaki;Tianzi W
通讯作者：
Yosuke Higuchi;Nanxin Chen;Yuya Fujita;H. Inaguma;Tatsuya Komatsu;Jaesong Lee;Jumon Nozaki;Tianzi W

Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models

DOI：
10.1109/icassp43922.2022.9746316
发表时间：
2022-01
期刊：
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
影响因子：
0
作者：
Keqi Deng;Zehui Yang;Shinji Watanabe;Yosuke Higuchi;Gaofeng Cheng;Pengyuan Zhang
通讯作者：
Keqi Deng;Zehui Yang;Shinji Watanabe;Yosuke Higuchi;Gaofeng Cheng;Pengyuan Zhang

A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding

口语理解中预训练 SSL、ASR、LM 和 SLU 模型集成的研究

DOI：
发表时间：
2023
期刊：
影响因子：
0
作者：
Hiroki Kato;Itsuki Musha;Masaaki Komatsuda;Kei Muto;Junichiro Yamaguchi;下田千華;下田千華;Yifan Peng
通讯作者：
Yifan Peng

The 2020 ESPnet Update: New Features, Broadened Applications, Performance Improvements, and Future Plans

DOI：
10.1109/dslw51110.2021.9523402
发表时间：
2020-12
期刊：
2021 IEEE Data Science and Learning Workshop (DSLW)
影响因子：
0
作者：
Shinji Watanabe;Florian Boyer;Xuankai Chang;Pengcheng Guo;Tomoki Hayashi;Yosuke Higuchi;Takaaki Hori;Wen-Chin Huang;H. Inaguma;Naoyuki Kamo;Shigeki Karita;Chenda Li;Jing Shi;A. Subramanian;Wangyou Zhang
通讯作者：
Shinji Watanabe;Florian Boyer;Xuankai Chang;Pengcheng Guo;Tomoki Hayashi;Yosuke Higuchi;Takaaki Hori;Wen-Chin Huang;H. Inaguma;Naoyuki Kamo;Shigeki Karita;Chenda Li;Jing Shi;A. Subramanian;Wangyou Zhang