权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Inductive re-construction of Japanese grammar and its application to Japanese language education based on the large scale extraction of Japanese formulaic sequences and its structural analyses

基于大规模日语公式序列提取及其结构分析的日语语法归纳重构及其在日语教育中的应用

基本信息

批准号：
20H00096
负责人：
芝野耕司
金额：
$ 27.21万
依托单位：
Tokyo University of Foreign Studies
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (A)
财政年份：
2020
资助国家：
日本
起止时间：
2020-04-01 至 2025-03-31
项目状态：
未结题

项目摘要

15億語以上の大規模話し言葉コーパスを構築するとともに、日本語話し言葉における定形表現の自動抽出を、N-gramを、一文から単語単位で全N-gramを生成し、その文脈を行IDリストとして扱う独自の統合文脈単語N-gram分析をMapReduceで実現した。この独自分析法をリスト抽象化及び文字単位に拡張し、統合文脈文字N-gram分析及び隣接行列による構造分析によって、実際の言語運用に基盤を置く帰納的日本語究へと革新するとともに、この帰納的日本語理解をもとにした大規模コーパスからの日本語教材開発を目指した。システム・評価班では，MapReduceを用いた統合文脈N-gram分析にリスト抽象化を組み込むことによって、ビッグデータ処理のためのシステム構築を行うとともに、アイデア段階であるリスト抽象化の実証を行った。また、定形表現の抽出では同一文脈に関しては最長表現を取り出す処理を行っていることから、特定の定形表現をより短い定形表現を最長一致法で分割することにより、複合定形表現と含まれる定形表現が存在しない原子定形表現とに分析した(定形表現分析)。また、定形表現の前後の連接頻度からなる定形表現隣接頻度行列を生成した。日本語教育班では，システム・評価班が生成する基礎データである定形表現分析及び定形表現隣接頻度行列の基礎データをKey phraseとしての検討を行った。また、教材素材取り出しシステムを利用して、検証用日本語教材の開発を行った。【問い1】昨年度はRubyからPythonへ変更を行った。今年度は、従来の単語単位から、文字単位へ統合文脈N-gram分析を拡張した。【問い2】昨年度の定形表現分割の一つずつの定形表現を中に含まれるより短い単位への分割を行い構造分析につなげた。【問い3】については、海外での日本語教育におけるニーズ調査を行う。

し said more than 1.5 billion language の mass words leaves コーパスを build するとともに, Japan YuHua し said leaf における setting performance の automatic extraction を, N - "gramm を, penny から単 language 単 a で all N -" gramm を generated し, その context を row ids リストとして Cha うの alone integration language context 単 N - "gramm analysis を MapRedu ceで appears in た. この analysis alone をリスト abstraction and text び単に company, zhang し, integration context text N - "gramm analysis and び隣 after procession による tectonic analysis によって, be interstate のに base plate of language use を buy く帰 of Japanese investigate へと innovation するとともに, この帰 of Japanese understanding をもとにした large-scale コーパスからの in Japanese language teaching materials To を means to た. システム · review class 価では, graphs を with いた integration context N - "gramm analysis にリスト abstraction を group み込むことによって, ビッグデータ処 Richard のためのシステム line build をうとともに, アイデア Duan Jie であるリスト abstraction の line card be をった. Pump, setting performance のまたでは same context に masato してはを take り maximum performance out of す処 Richard を line っていることから, specific の setting performance をより short い setting performance を longest consensus method です segmentation ることにより, composite setting performance とまれる setting performance が exist しない atomic setting performance とに analysis した analysis (setting). Youdaoplaceholder0, the frequency of the connection before and after the <s:1> in the shape representation また, the frequency of the connection 隣 in the shape representation 隣 in the frequency column を to generate <s:1> た. Japanese education classes では, システム · review class 価が generated する based データである setting performance analysis and び setting 隣 pick frequency ranks based デのータを Key phrase としての検 line for をった. Youdaoplaceholder0, take out システムを from the teaching materials, use てて, 検 certificate, and publish を lines った in the Japanese textbook. 【 Question: ら 1】 Last year, ら Ruby らPythonへ changed to を lines った. This year, 従 and 従 will conduct a comprehensive N-gram analysis of the 単 language 単 position らら and the 単 writing position へ to integrate the context of を拡 zhang た. Yesterday asked い 2 】【 annual の setting performance segmentation の a つずつのに contain in setting performance をまれるより short い単 a への line segmentation をい tectonic analysis につなげた. Question 3: にににててう, overseas で <s:1> Japanese language education におけるニズズ survey を field う.

项目成果

期刊论文数量（27）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Extracting Japanese Sentence-Ending Expressions using Formulaic Sequences with Consolidated Contextualized N-gram Analysis

使用公式序列和综合上下文 N 元语法分析提取日语句尾表达式

DOI：
发表时间：
2023
期刊：
影响因子：
0
作者：
Hajime Mochizuki;Kohji Shibano
通讯作者：
Kohji Shibano

Mining Formulaic Sequences from a Spoken Japanese Based on Consolidated Contextualized N-gram Analyses and Its Verification with Key Phrases in Japanese Language Textbooks

基于综合语境化N-gram分析的日语口语公式序列挖掘及其与日语教科书关键短语的验证

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Hajime Mochizuki;Kohji Shibano
通讯作者：
Kohji Shibano

Recommendations for Capturing Learning in the Cognitive, Affective, and Psychomotor Domains : Clarifying Can-do Descriptors in Japanese Language Education

捕捉认知、情感和精神运动领域学习的建议：澄清日语教育中的“可以做”描述

DOI：
10.15084/00003514
发表时间：
2022
期刊：
国立国語研究所論集 = NINJAL Research Papers
影响因子：
0
作者：
能勢正仁;池内有為;浅川達人;清水千弘;Yuji Utsumi;鈴木美加
通讯作者：
鈴木美加

『漫才ワークショップ』による学生の学び ―言語を相対的に捉えるネタ作りと即興創作体験―

学生通过“万在工坊”学习 - 创作与语言相关的故事和即兴创作体验 -

DOI：
发表时间：
2023
期刊：
Proceedings on The International Symposium on Japanese Language Education: Rediscovering Japanese - Japanese Language Education in the Spotlight 2022
影响因子：
0
作者：
Mika Suzuki;Manabu Shimaoka
通讯作者：
Manabu Shimaoka

日中対訳小説からみる文末名詞文の使用と説明のメカニズム―ノダ文と中心に―

日汉双语小说中句尾名词句的使用机制及解释 - 以野田句为中心 -

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Okubo Ryo;Yoshioka Takashi;Nakaya Tomoki;Hanibuchi Tomoya;Okano Hiroki;Ikezawa Satoru;Tsuno Kanami;Murayama Hiroshi;Tabuchi Takahiro;高甜，佐野洋
通讯作者：
高甜，佐野洋