The Project for the Corpus of Spontaneous Japanese Spoken by Non-Native Speakers
非母语人士日语自发语料库项目
基本信息
- 批准号:17202011
- 负责人:
- 金额:$ 30.62万
- 依托单位:
- 依托单位国家:日本
- 项目类别:Grant-in-Aid for Scientific Research (A)
- 财政年份:2005
- 资助国家:日本
- 起止时间:2005 至 2006
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The Corpus of Spontaneous Japanese spoken by non-native speakers is a large-scale annotated corpus for spoken language research. Corpus has been focused on as new language research data, which was established with high quality on a large scale. Most of the corpora, however, aim at the speech of native speakers, and few at non-native speakers. Our research, paying attention to such speech of non-native speakers, has established a large-scale annotated corpus. This corpus could show phonetic features in speech of non-native speakers, and furthermore provide research of interlanguage and sociolinguistics with beneficial data.The Corpus of Spontaneous Japanese spoken by non-native speakers contains about 2000 minutes of spontaneous speech that correspond to about 360k words. All these speech material are recorded using head-worn close-talking microphones and DAT, and down-sampled to 16kHz, 16bit accuracy. The speech material is transcribed using a two-way transcription scheme designed especially for CSJ (the Corpus of Spontaneous Japanese).Recorded speech is transcribed in two different ways: orthographic and phonetic transcriptions. In "orthographic" transcription, speech is transcribed using Kanji (Chinese logograph) and Kana (Japanese syllabary) just like ordinary Japanese text, but unlike the ordinary Japanese writing, our orthographic transcription has rigorous rules about the usage of Kanji and Kana letters.Part of the corpus is segment labeled. The labels are basically phonemic, but some phonetic labels are used, too. Phonetic labels are introduced for the study of phonetic variation and spontaneous speech-specific phenomena.5 utterances are also intonation labeled with X-JToBI. In the scheme of X-JToBI both the tone and BI (boundary index) labels were considerably extended to match the paralinguistic features of the spontaneous speech intonation.
非母语者自发日语语料库是一个大规模的口语研究语料库。语料库作为一种新型的语言研究数据,大规模高质量地建立起来,受到了人们的关注。然而,大部分语料库都是针对本族语者的,而针对非本族语者的语料库却很少。我们的研究关注非本族语者的这类言语,建立了一个大规模的标注语料库。该语料库能够反映非母语者的语音特征,为中介语和社会语言学的研究提供有益的数据。非母语者的自发日语语料库包含约2000分钟的自发语音,对应约36万个单词。所有这些语音材料都是使用头戴式近距离说话麦克风和DAT记录的,并下采样到16 kHz,16位精度。语音材料的转录使用了专门为CSJ(The Corpus of Spontaneous Japanese)设计的双向转录方案。在“正字法”转录中,语音像普通日语文本一样使用汉字和假名进行转录,但与普通日语文字不同的是,我们的正字法转录对汉字和假名字母的使用有严格的规则。标签基本上是音素的,但也使用了一些语音标签。语音标签的引入是为了研究语音变异和自发的言语特异性现象。5个话语也被标注了语调X-JToBI。在X-JToBI方案中,声调和BI(边界指数)标签都被大大扩展,以匹配自发语音语调的非语言学特征。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
TOKI Satoshi其他文献
TOKI Satoshi的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('TOKI Satoshi', 18)}}的其他基金
Research Analysis and Further Advancement of "The Corpus of Spontaneous Japanese Spoken by Non-native Speakers" Project
“非母语日语自发语料库”项目的研究分析及进一步进展
- 批准号:
19320077 - 财政年份:2007
- 资助金额:
$ 30.62万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
JAPANESE LANGUAGE AND CULTURE IN MICRONESIA
密克罗尼西亚的日语和文化
- 批准号:
06041070 - 财政年份:1994
- 资助金额:
$ 30.62万 - 项目类别:
Grant-in-Aid for international Scientific Research
A multi-dimentional study in the process and determining factors of acquisition of Japanese as a second language by immigrant workers
移民工人日语作为第二语言习得过程及影响因素的多维度研究
- 批准号:
06301099 - 财政年份:1994
- 资助金额:
$ 30.62万 - 项目类别:
Grant-in-Aid for Scientific Research (A)
Elucidation of a Novel Metabolic Pathway of Morphine and Its Physiological Role
吗啡新代谢途径及其生理作用的阐明
- 批准号:
61571078 - 财政年份:1986
- 资助金额:
$ 30.62万 - 项目类别:
Grant-in-Aid for General Scientific Research (C)
相似海外基金
The neural underpinnings of speech and nonspeech auditory processing in autism: Implications for language
自闭症患者言语和非言语听觉处理的神经基础:对语言的影响
- 批准号:
10827051 - 财政年份:2024
- 资助金额:
$ 30.62万 - 项目类别:
An empirical study of the borrowing process of foreign words in Japanese from the perspective of English and French phonetics
英法语音学视角下日语外来词借用过程的实证研究
- 批准号:
23K00549 - 财政年份:2023
- 资助金额:
$ 30.62万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Understanding Individual Differences in Acoustic-Phonetic and Contextual Cue Use In Aging
了解衰老过程中声学语音和上下文提示使用的个体差异
- 批准号:
10750603 - 财政年份:2023
- 资助金额:
$ 30.62万 - 项目类别:
Identifying the presence of a code-switch: Evaluating the role of acoustic cues.
识别代码转换的存在:评估声音提示的作用。
- 批准号:
10751509 - 财政年份:2023
- 资助金额:
$ 30.62万 - 项目类别:
High-resolution functional imaging of speech-induced sensory modulation
语音引起的感觉调制的高分辨率功能成像
- 批准号:
10802563 - 财政年份:2023
- 资助金额:
$ 30.62万 - 项目类别:
Phonetics and phonology in third language acquisition: Discrimination of consonants in Chongqing Dialect, Standard Mandarin, and English
第三语言习得中的语音与音韵:重庆话、标准普通话、英语的辅音辨析
- 批准号:
2866221 - 财政年份:2023
- 资助金额:
$ 30.62万 - 项目类别:
Studentship
Genomewide association studies in peri-implant bone loss
种植体周围骨丢失的全基因组关联研究
- 批准号:
10351903 - 财政年份:2022
- 资助金额:
$ 30.62万 - 项目类别:
Genomewide association studies in peri-implant bone loss
种植体周围骨丢失的全基因组关联研究
- 批准号:
10557131 - 财政年份:2022
- 资助金额:
$ 30.62万 - 项目类别:
ROAMM-EHR: Pilot Trial of a Real-Time Symptom Surveillance System for Post-Discharge Surgical Patients
ROAMM-EHR:出院后手术患者实时症状监测系统的试点试验
- 批准号:
10641873 - 财政年份:2022
- 资助金额:
$ 30.62万 - 项目类别:
Doctoral Dissertation Research: Phonetics of period doubling
博士论文研究:倍周期语音学
- 批准号:
2141433 - 财政年份:2022
- 资助金额:
$ 30.62万 - 项目类别:
Standard Grant














{{item.name}}会员




