Construction of large scale annotated corpus and its management system
大规模标注语料库建设及其管理系统
基本信息
- 批准号:12480082
- 负责人:
- 金额:$ 8.96万
- 依托单位:
- 依托单位国家:日本
- 项目类别:Grant-in-Aid for Scientific Research (B)
- 财政年份:2000
- 资助国家:日本
- 起止时间:2000 至 2002
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Since the middle of 1980's, natural language processing based on a large scale linguistic data has become a main stream in this research area. For this kind of research, linguistic resources play important role, and there has been many attempts to create various kinds of resources. This research project aims to construct an environment to create syntactically annotated Japanese corpora in large scale. To achieve this goal, we conducted the research in the following topics.In 2000, we built an annotation tool which supports a user to annotate syntactic structure on sentences in interactive way. This tool works with an existing parser and the user cab efficiently select a correct syntactic structure from a number of parser's output. In addition, the tool has an ability to navigate the user by suggesting the order of choices. Following this order, the user can efficiently annotate sentences.In 2001, we extracted grammar rules from the EDR corpus, which is one of the existing largest Japan … More ese coypus. The drawback of the EDR corpus is that the grammar based on which the corpus is annotated is missing. Thus we first extract the grammar from the EDR corpus automatically and improve it so that the ambiguities of the grammar became as small as possible.In addition, we proposed a new framework to build semantic knowledge which plays important role not only in semantic analysis but also in syntactic analysis. It is difficult to build semantic knowledge from scratch, therefore we took an approach to combine existing semantic knowledge.In 2002, we continued to work on the two topics started in 2001. In addition to this, we constructed a management system of annotated corpora. This system allows users to retrieve various kinds of syntactic structures efficiently. The structures in a sentence are stored in a relational database system, providing users versatile retrieve capability.In order to verify the results of above research, we built a Japanese corpus consisting of about 20,000 sentences. This sentence set is an excerpt from the EDR corpus. This corpus is based on the grammar extracted from the EDR corpus and improved in this project. To annotate the corpus, the annotation tool developed in this project was used, and the resultant corpus was managed by the system mentioned above. Less
自20世纪80年代中期以来,基于大规模语言数据的自然语言处理已成为该研究领域的主流。对于此类研究,语言资源发挥着重要作用,并且人们已经进行了许多尝试来创建各种资源。该研究项目旨在构建一个环境来大规模创建带句法注释的日语语料库。为了实现这个目标,我们进行了以下主题的研究。 2000年,我们构建了一个注释工具,支持用户以交互方式注释句子的句法结构。该工具与现有解析器配合使用,用户可以从多个解析器的输出中有效地选择正确的语法结构。此外,该工具还能够通过建议选择顺序来引导用户。按照这个顺序,用户可以高效地注释句子。 2001年,我们从EDR语料库中提取了语法规则,该语料库是现有最大的日本语料库之一。 EDR 语料库的缺点是缺少语料库注释所依据的语法。因此我们首先从EDR语料库中自动提取语法并对其进行改进,使语法的歧义性尽可能小。此外,我们提出了一个新的框架来构建语义知识,该框架不仅在语义分析中发挥着重要作用,而且在句法分析中也发挥着重要作用。从头开始构建语义知识是很困难的,因此我们采取了结合现有语义知识的方法。2002年,我们继续开展2001年开始的两个课题。除此之外,我们还构建了标注语料库管理系统。该系统允许用户有效地检索各种句法结构。句子中的结构存储在关系数据库系统中,为用户提供多功能的检索能力。为了验证上述研究结果,我们构建了一个由约20,000个句子组成的日语语料库。该句子集摘自 EDR 语料库。该语料库基于从 EDR 语料库中提取的语法,并在本项目中进行了改进。为了对语料进行标注,使用了本项目开发的标注工具,并由上述系统对所得语料进行管理。较少的
项目成果
期刊论文数量(40)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
徳永健伸, 阿辺川武: "統計情報による連体修飾節の解析"日本語学. 20・12. 20-27 (2001)
Takenobu Tokunaga、Takeshi Abekawa:“使用统计信息的主语修饰语从句分析”《日语研究》20・12(2001)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Mino, H., Hasimoto, T. Tokunaga, T and Tanaka, H.: "Disambiguation of adverbial phrase attachment by using decision tree"Annual meeting of Association of Natural Language Processing. 411-414 (2002)
Mino, H.、Hasimoto, T. Tokunaga, T 和 Tanaka, H.:“使用决策树消除状语短语附件的歧义”自然语言处理协会年会。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Sirai, K., Ueki, M., Hasimoto, T., Tokunaga, T. and Tanaka H.: "The MSLR parser : A toolkit of natural language processing"Natural Language Processing. 7, No. 5. 93-112 (2000)
Sirai, K.、Ueki, M.、Hasimoto, T.、Tokunaga, T. 和 Tanaka H.:“MSLR 解析器:自然语言处理工具包”自然语言处理。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
美野秀弥, 橋本泰一, 徳永健伸, 田中穂積: "決定リストを利用した形容動詞の修飾先の決定"言語処理学会第8回年次大会予稿集. 411-414 (2002)
Hideya Mino、Taiichi Hashimoto、Kennobu Tokunaga、Hozumi Tanaka:“使用决策列表确定形容词动词的修饰语”语言处理学会第八届年会论文集 411-414(2002 年)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Noro, T., Okazaki, A., Tokunaga, T. and Tanaka, H.: "A study on large Japanese grammar development"Annual meeting of Association of Natural Language Processing. 387-390 (2002)
Noro, T.、Okazaki, A.、Tokunaga, T. 和 Tanaka, H.:“大型日语语法发展研究”自然语言处理协会年会。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
TOKUNAGA Takenobu其他文献
TOKUNAGA Takenobu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('TOKUNAGA Takenobu', 18)}}的其他基金
Understanding and generation of referring expressions using gaze
使用凝视理解和生成指代表达
- 批准号:
21300049 - 财政年份:2009
- 资助金额:
$ 8.96万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Understanding referring expression in dialogue with embodied agents
理解与具体主体对话中的指称表达
- 批准号:
19500116 - 财政年份:2007
- 资助金额:
$ 8.96万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
相似海外基金
Compiling a Japanese-English Collocation Dictionary for English Production Using Large-scale Corpora
使用大规模语料库编写用于英语生产的日英搭配词典
- 批准号:
18H00693 - 财政年份:2018
- 资助金额:
$ 8.96万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
The Usage Patterns of Nouns and Adjectives and their Constructionalizations: A Contrastive Study of Japanese and French based on Large-scale Corpora
名词、形容词的使用模式及其构式——基于大规模语料库的日语和法语对比研究
- 批准号:
26370483 - 财政年份:2014
- 资助金额:
$ 8.96万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Construction of a Vocabulary List for Japanese Learners of English and Development of a System for Analyzing Educational Materials Based on Large-scale Corpora
日语英语学习者词汇表的构建及基于大规模语料库的教材分析系统的开发
- 批准号:
16320076 - 财政年份:2004
- 资助金额:
$ 8.96万 - 项目类别:
Grant-in-Aid for Scientific Research (B)














{{item.name}}会员




