Interactive distributed corpus exploration and annotation infrastructure for large corpora and knowledge-bases

适用于大型语料库和知识库的交互式分布式语料库探索和注释基础设施

基本信息

项目摘要

The goal of this project is a research infrastructure for corpus annotation that scales to large text document collections by flexibly building subcorpora. The infrastructure addresses the needs of computational linguists and corpus linguists for a generic tool to perform selective semantic annotation tasks within and across documents. Such an infrastructure is important because it enables the targeted exploitation of the huge amounts of digitally available text for linguistic analysis. The expert user should be supported by the infrastructure in exploring the large document collections, in setting up an annotation scheme, and in extracting task-specific subcorpora from a large background corpus. The annotation of the corpora should be flexibly distributable to remotely working annotation teams of different qualification levels and backgrounds. Their work should be supported through prioritisation and annotation suggestions based on machine learning technology to efficiently create a large corpus with high-quality annotations for training and evaluating the respective algorithms. Thus, infrastructure should enable the annotation of the same corpus from multiple perspectives by multiple researchers and annotations teams working in parallel. Custom corpora should be importable by the users as needed. Further functionality is needed to maintain and expand the knowledge bases used during the semantic annotation tasks as well as to connect to external standard knowledge bases.
这个项目的目标是一个语料库注释的研究基础设施,通过灵活地构建子语料库来扩展到大型文本文档集合。该基础设施解决了计算语言学家和语料库语言学家对通用工具的需求,以在文档内和跨文档执行选择性语义注释任务。这种基础设施非常重要,因为它能够有针对性地利用大量数字文本进行语言分析。专家用户应该支持的基础设施,在探索大型文档集,在建立一个注释计划,并在提取特定任务的子语料库从一个大的背景语料库。 语料库的注释应该可以灵活地分发给不同资格级别和背景的远程工作注释团队。他们的工作应该通过基于机器学习技术的优先级和注释建议来支持,以有效地创建具有高质量注释的大型语料库,用于训练和评估相应的算法。因此,基础设施应该能够从多个角度对同一语料库进行注释,由多个研究人员和注释团队并行工作。自定义语料库应该是可导入的用户根据需要。需要进一步的功能来维护和扩展语义注释任务期间使用的知识库,以及连接到外部标准知识库。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Dr.-Ing. Richard Eckart de Castilho其他文献

Dr.-Ing. Richard Eckart de Castilho的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

Graphon mean field games with partial observation and application to failure detection in distributed systems
  • 批准号:
  • 批准年份:
    2025
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
基于异构医学影像数据的深度挖掘技术及中枢神经系统重大疾病的精准预测
  • 批准号:
    61672236
  • 批准年份:
    2016
  • 资助金额:
    64.0 万元
  • 项目类别:
    面上项目

相似海外基金

Global studies into the Genetic Architecture of the Brain's White Matter Network through Harmonized and Coordinated Analyses in the ENIGMA-Consortium
通过 ENIGMA 联盟的统一和协调分析对大脑白质网络的遗传结构进行全球研究
  • 批准号:
    10720443
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Perturbations and Behavior
扰动和行为
  • 批准号:
    9983190
  • 财政年份:
    2017
  • 资助金额:
    --
  • 项目类别:
Perturbations and Behavior
扰动和行为
  • 批准号:
    10247575
  • 财政年份:
    2017
  • 资助金额:
    --
  • 项目类别:
A Distributed Wireless Neural Interface System
分布式无线神经接口系统
  • 批准号:
    7664169
  • 财政年份:
    2009
  • 资助金额:
    --
  • 项目类别:
A Distributed Wireless Neural Interface System
分布式无线神经接口系统
  • 批准号:
    8064409
  • 财政年份:
    2009
  • 资助金额:
    --
  • 项目类别:
Individual and joint contributions of the hemispheres to language comprehension
半球对语言理解的单独和联合贡献
  • 批准号:
    7546788
  • 财政年份:
    2008
  • 资助金额:
    --
  • 项目类别:
Optical Tomography of Multiple Parallel Memory Systems in Freely Moving Rats
自由移动大鼠多个并行记忆系统的光学断层扫描
  • 批准号:
    7184882
  • 财政年份:
    2007
  • 资助金额:
    --
  • 项目类别:
Optical Tomography of Multiple Parallel Memory Systems in Freely Moving Rats
自由移动大鼠多个并行记忆系统的光学断层扫描
  • 批准号:
    7348297
  • 财政年份:
    2007
  • 资助金额:
    --
  • 项目类别:
A Study on Constructing Various Acoustic Models using Distributed Speech Corpora
利用分布式语音语料库构建多种声学模型的研究
  • 批准号:
    15200014
  • 财政年份:
    2003
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (A)
Discrete Optimization Problems with Graph and Network Structures and Their Efficient Solution Methods
图和网络结构的离散优化问题及其高效求解方法
  • 批准号:
    10205210
  • 财政年份:
    1998
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research on Priority Areas (B)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了