Studies on Corpus Creation and Use for Linguistic Research

语言学研究的语料库创建和使用研究

基本信息

  • 批准号:
    15300046
  • 负责人:
  • 金额:
    $ 9.28万
  • 依托单位:
  • 依托单位国家:
    日本
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
  • 财政年份:
    2003
  • 资助国家:
    日本
  • 起止时间:
    2003 至 2005
  • 项目状态:
    已结题

项目摘要

As for the research for language processing, we augmented the language analysis tools we have been developing, such as Japanese morphological analyzer and Japanese dependency analyzer, for Chinese analysis.As for development of dictionaries, we implemented unknown word analysis system for Chinese, and extracted candidates of new word entries by running the system on a large scale Chinese corpus. Through this experiment, we could successfully construct a large scale Chinese dictionary with about a hundred thousand word entries. For Japanese, we described the constituent word information of Japanese compound words and registered these information in the dictionary. For English, we developed a method for distinguishing literal and idiomatic uses of English multi-word expressions, and showed a high accuracy in distinguishing them.As for the corpus tool development, we made a detailed design of the database schemes for annotated corpus and dictionary entries, and re-implemented the corpus management tool based on these schemes. We also implemented the error correction functions for part-of-speech and dependency analysis errors and designed and implemented the interface for the functions. The visualization function for showing phrasal chunks and their dependency relation, on which one of the error correction functions is realized.The developed corpus management tools are made open to public and we hold two seminars to make it open and to explain the usage to those interested in using the system, aiming at collecting the feedback from the users. We also opened a Web page for introducing and downloading the tools.
在语言处理的研究方面,我们扩充了我们一直在开发的语言分析工具,如日语词素分析器、日语依存分析器等,用于中文分析。在词典的开发方面,我们实现了汉语生词分析系统,并通过在大规模中文语料库上运行该系统来提取候选生词。通过这个实验,我们成功构建了一个包含十万词条的大型汉语词典。对于日语,我们描述了日语复合词的构成词信息,并将这些信息登记在词典中。对于英语,我们开发了一种区分英语多词表达的字面用法和惯用用法的方法,并表现出了较高的区分准确率。在语料库工具开发方面,我们对标注语料库和词典词条的数据库方案进行了详细设计,并基于这些方案重新实现了语料库管理工具。我们还实现了词性和依存分析错误的纠错函数,并设计和实现了该函数的接口。展示短语块及其依存关系的可视化功能,在此基础上实现纠错功能之一。开发的语料库管理工具已向公众开放,我们举办了两次研讨会,向有兴趣使用该系统的人开放并解释其使用方法,旨在收集用户的反馈。我们还开通了一个网页来介绍和下载这些工具。

项目成果

期刊论文数量(28)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Masayuki Asahara, Yuji Matsumoto: "Japanese named entity extraction with redundant morphological analysis"Proc.Human Language Technology and North American Chapter of Association for Computational Linguistics. 4. 8-15 (2003)
Masayuki Asahara、Yuji Matsumoto:“日语命名实体提取与冗余形态分析”Proc.人类语言技术和计算语言学协会北美分会。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
Automatic Extraction of Fixed Multiword Expressions
  • DOI:
    10.1007/11562214_50
  • 发表时间:
    2005-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Campbell Hore;Masayuki Asahara;Yuji Matsumoto
  • 通讯作者:
    Campbell Hore;Masayuki Asahara;Yuji Matsumoto
茶筌と南瓜による日本語解析-構文情報を用いた文の役割分類
使用 Chasen 和 Pumpkin 进行日语分析 - 使用句法信息进行句子角色分类
  • DOI:
  • 发表时间:
    2004
  • 期刊:
  • 影响因子:
    0
  • 作者:
    松本裕治;高岡一馬;浅原正幸;工藤拓
  • 通讯作者:
    工藤拓
Chinese and Japanese Word Segmentation with Word Level and Character Level Information (in Japanese)
具有词级和字符级信息的中文和日文分词(日语)
Masayuki Asahara, Yuji Matsumoto: "Filler and disfluency identification based on morphological analysis and chunking"Proceedings of ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition. 163-166 (2003)
Masayuki Asahara、Yuji Matsumoto:“基于形态分析和组块的填充物和不流畅识别”ISCA 会议记录
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

MATSUMOTO Yuji其他文献

MATSUMOTO Yuji的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('MATSUMOTO Yuji', 18)}}的其他基金

Nanoscale Engineering of Compositional Modulations in Alloys and Composite Thin Film Oxides for Exploration of Their New Properties and Functionalities
合金和复合薄膜氧化物成分调节的纳米工程,探索其新性能和功能
  • 批准号:
    20H02610
  • 财政年份:
    2020
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Development of a comprehensive educational method of algorithmic design for the expansion of creativity using IT
开发算法设计的综合教育方法,以利用信息技术扩展创造力
  • 批准号:
    19K12680
  • 财政年份:
    2019
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Nano-strained interfaces in nanocomposite ferroelectric films and the origin of their free polarization rotation
纳米复合铁电薄膜中的纳米应变界面及其自由极化旋转的起源
  • 批准号:
    15H02021
  • 财政年份:
    2015
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (A)
Development of site-specific nuclease to control mutated mtDNA in MELAS iPS cell-derived neuronal cells
开发位点特异性核酸酶来控制 MELAS iPS 细胞衍生的神经元细胞中的突变 mtDNA
  • 批准号:
    26860831
  • 财政年份:
    2014
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
Development of the hybrid process of sublimation and solution re-crystallization with ionic liquid as a new purification of organic semiconductor materials
开发升华和离子液体溶液重结晶混合工艺作为有机半导体材料的新型纯化方法
  • 批准号:
    25600074
  • 财政年份:
    2013
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Grant-in-Aid for Challenging Exploratory Research
Nano-level observation of shape, size, and, distribution of lignin in cell wall
纳米级观察细胞壁木质素的形状、尺寸和分布
  • 批准号:
    23658140
  • 财政年份:
    2011
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Grant-in-Aid for Challenging Exploratory Research
The value of space affection: Realization of "inspired workplace" for creative work
空间情感的价值:实现创意工作的“灵感职场”
  • 批准号:
    23760574
  • 财政年份:
    2011
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
Joint Natural Language Processing with Global Information
联合自然语言处理与全球信息
  • 批准号:
    23240020
  • 财政年份:
    2011
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (A)
Metallic glass flux-vapor growth of SiC single crystal films
金属玻璃熔剂-SiC单晶薄膜的气相生长
  • 批准号:
    23656028
  • 财政年份:
    2011
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Grant-in-Aid for Challenging Exploratory Research
Study of development of the tumor imaging method using fumarate metabolism
富马酸代谢肿瘤成像方法的开发研究
  • 批准号:
    23659602
  • 财政年份:
    2011
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Grant-in-Aid for Challenging Exploratory Research

相似海外基金

REU Site: Recent Advances in Natural Language Processing
REU 网站:自然语言处理的最新进展
  • 批准号:
    2349452
  • 财政年份:
    2024
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Standard Grant
Navigating Chemical Space with Natural Language Processing and Deep Learning
利用自然语言处理和深度学习驾驭化学空间
  • 批准号:
    EP/Y004167/1
  • 财政年份:
    2024
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Research Grant
Collaborative Research: EAGER: Developing and Optimizing Reflection-Informed STEM Learning and Instruction by Integrating Learning Technologies with Natural Language Processing
合作研究:EAGER:通过将学习技术与自然语言处理相结合来开发和优化基于反思的 STEM 学习和教学
  • 批准号:
    2329273
  • 财政年份:
    2023
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Standard Grant
SBIR Phase I: Sown To Grow - Measuring Growth in Trusting Relationships between Students and Educators with Natural Language Processing and Machine Learning Technologies
SBIR 第一阶段:播种成长 - 使用自然语言处理和机器学习技术衡量学生和教育工作者之间信任关系的增长
  • 批准号:
    2322340
  • 财政年份:
    2023
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Standard Grant
Studies of speech, image and natural language processing for multimodal spoken document retrieval
多模态语音文档检索的语音、图像和自然语言处理研究
  • 批准号:
    23K11216
  • 财政年份:
    2023
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Efficient and Fair Language Modelling for Natural Language Processing, investigating lightweight language modelling approaches and aiming at fairness
自然语言处理的高效公平语言建模,研究轻量级语言建模方法并以公平为目标
  • 批准号:
    2894795
  • 财政年份:
    2023
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Studentship
Harmony AI: Natural Language Processing Enabling Advanced Biomanufacturing
Harmony AI:自然语言处理实现先进生物制造
  • 批准号:
    10761082
  • 财政年份:
    2023
  • 资助金额:
    $ 9.28万
  • 项目类别:
Applying Natural Language Processing to real-world patient data to optimise cancer care
将自然语言处理应用于现实世界的患者数据以优化癌症护理
  • 批准号:
    2897525
  • 财政年份:
    2023
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Studentship
CAREER: Data-driven design of graphene oxide for environmental applications enabled by natural language processing and machine learning techniques
职业:通过自然语言处理和机器学习技术实现氧化石墨烯环境应用的数据驱动设计
  • 批准号:
    2238415
  • 财政年份:
    2023
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Continuing Grant
Collaborative Research: EAGER: Developing and Optimizing Reflection-Informed STEM Learning and Instruction by Integrating Learning Technologies with Natural Language Processing
合作研究:EAGER:通过将学习技术与自然语言处理相结合来开发和优化基于反思的 STEM 学习和教学
  • 批准号:
    2329274
  • 财政年份:
    2023
  • 资助金额:
    $ 9.28万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了